Cross-modal retrieval and Web search basics 3. 5. 2023
Lecture
Lecture and discussion by Nicola Messina
Title: Transformer Networks for Cross-modal Retrieval
Abstract: Recently, cross-modal retrieval tasks - particularly
text-to-image and text-to-video retrieval - are obtaining a substantial
boost thanks to the incredible advance in image and text representations
through advanced deep learning networks. The core of this innovation
resides in the Transformer architecture, which lays down the basis for
processing all kinds of multimedia data (images, videos, text) in a common
elegant framework. This presentation will introduce the core ideas behind
Transformer and its use in cross-modal retrieval tasks, keeping both
efficiency and effectiveness in mind. It will give in-depth insights into
the Transformer-based feature representation and discuss how to perform
efficient k-nearest neighbor searches on large databases. Finally, this
presentation will show engaging real-world application scenarios and
current research directions.
Readings
Seminar
Additional notes by the speaker:
Downloading a Sketch Engine corpus: Go to Manage Corpus and Download (.txt/.vert); it works for your own corpora only, the precompiled corpora are not publicly available (but can be requested).
Second term project
Below, you can find the homework vaults for submitting the second term project.
Below, you can find the peer assessment applications to review the second term projects of your colleagues.