Machine Learning in Image Processing
Week 10 - Attention
One of the most important recent developments in machine learning is the invention of attention mechanisms. Before, neural networks could only process information from receptive fields of varying sizes, but attention lets models focus on important details regardless of their positions. This powerful technique forms the basis of famous transformer architecture, which is fundamental building-block of models such as ChatGPT. At this seminar, we are going to demonstrate the basic operations inside an attention module and how they can be used to create models for image captioning.
Goals:
- Examine a single attention module in detail.
- Gain experience with the various techniques used alongside attention, such as positional encoding.
- Implement a vision transformer and use it to perform image classification.
- Demonstrate image captioning using a sample implementation of the paper "Show, Attend, Tell".
Následující