Sunday Oct 01, 2023
arxiv Preprint - Vision Transformers Need Registers
In this episode we discuss Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski. The paper discusses a solution to artifacts found in the feature maps of Vision Transformers (ViT) in low-informative background areas of images. By adding additional tokens called "registers" to the input sequence, the feature maps and attention maps are improved, leading to better visual processing. This solution is effective for both supervised and self-supervised ViT models and achieves state-of-the-art performance on self-supervised visual models. Additionally, the use of registers enables object discovery methods with larger models.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.