Metric learning, product quantization, approximate search Jan Sedmidubský Masaryk University sedmidubsky@mail.muni.cz Jan Sedmidubský | PV056 Machine Learning and Data Mining 1 Outline • Metric learning • Vector/Product quantization • Approximate similarity search (e.g., using FAISS) Jan Sedmidubský | PV056 Machine Learning and Data Mining 2 Index (e.g., 64x compression) Metric learning • Metric learning goal – representing data objects, such as images, text or whatever, with numerical vectors • Vectors = embeddings or embedding vectors • Function 𝑓 transforms a given object (e.g., image 𝑥) into an n-dimensional vector 𝑓(𝑥) ∈ ℝ 𝑛 • Former approach – individual features of the vector representation had to be manually specified • Current approach – the vector representation is learned automatically Jan Sedmidubský | PV056 Machine Learning and Data Mining 3 Neural network (0.2, 0.9, …, 0,7) 𝑥 𝑓(𝑥) ∈ ℝ 𝑛 Metric learning • Desired properties: • Similar data objects → vectors that are close together • Dissimilar data objects → vectors that are far apart • Quantification of similarity/closeness: • Requires a distance measure in the underlying vector space • Commonly used measure – Euclidean distance function (L2 norm) • 𝑑𝑖𝑠𝑡 𝑓 𝑥 , 𝑓 𝑦 = 𝑓 𝑥 − 𝑓 𝑦 2 • Metric learning process – pulling together the embeddings for similar objects and pushing apart those for dissimilar objects • What exactly is meant by similar and dissimilar objects? Jan Sedmidubský | PV056 Machine Learning and Data Mining 4 Metric learning • Examples of similar and dissimilar objects on identity-based similarity: • Face recognition • Retail-product recognition • Object identities (products, persons) lead to supervised clustering of the learned embeddings → why not just use a classifier? • Extreme classification – a very large number of classes (e.g., tens of thousands) with highly unbalanced training data • Stanford Online Products dataset (scraped from eBay) contains 120 K images for 23 K product classes • Output layer of some deep neural network with 23 K nodes and 4 images/class → you will get an unsatisfactory result Jan Sedmidubský | PV056 Machine Learning and Data Mining 5 Metric learning • Embedding vectors are: • As close together as they can be for the images in each class • As far as they can be from the embeddings for the other classes • Example of recognition of facial expressions • Basic ideas in metric learning revolve around: • Pairwise contrastive loss • [Hadsell et al.: Dimensionality Reduction by Learning an Invariant Mapping, CVPR 2006] • Triplet loss • [Schroff et al.: FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015] Jan Sedmidubský | PV056 Machine Learning and Data Mining 6 [Roy at al.: Contrastive Learning of View-invariant Representations for Facial Expressions Recognition, ACM TOMM 2023] Pairwise Contrastive (PC) loss • Training a neural network in batches • Goal – extract positive and negative pairs of training samples from a batch (batch – list of training samples) • Positive pairs – carry the same class label • Negative pairs – carry different labels Jan Sedmidubský | PV056 Machine Learning and Data Mining 7 Positive pair Negative pair Three classes: x o * PC loss – idea of calculation • Loss (cost or objective) function 𝐿 measures the discrepancy between the predicted output of the model and the actual target values • Purpose – to give the network feedback on how well it is performing so that it can adjust its parameters (weights and biases) to improve over time • During the training process, the loss should gradually decrease (up to 0) • PC loss calculation: • A sum of the values calculated separately from positive pairs and negative pairs • Contribution to the loss by positive pairs (𝐿 𝑝) + contribution to the loss by negative pairs (𝐿 𝑛) Jan Sedmidubský | PV056 Machine Learning and Data Mining 8 PC loss – positive pairs • Contribution to the loss by positive pairs • Positive pair 𝑥1 𝑖 , 𝑥2 𝑖 – pairwise distances as small as possible • Positive loss – sum over all positive pairs from the batch: 𝐿 𝑝 = ෍ 𝑖 𝑑𝑖𝑠𝑡 𝑓 𝑥1 𝑖 , 𝑓 𝑥2 𝑖 2 • 𝑖 – indexes all the positive pairs from the batch • Square of the distance because it is differentiable everywhere Jan Sedmidubský | PV056 Machine Learning and Data Mining 9 PC loss – negative pairs • Contribution to the loss by negative pairs • Negative pair 𝑥1 𝑗 , 𝑥2 𝑗 – pairwise distances as large as possible • j – indexes all the negative pairs from the batch • But very dissimilar items amount to wasting the learning effort • Two well-separated samples in a negative pair should not even participate in learning • Threshold on maximal dissimilarity quantified by margin 𝑚 • If 𝑑𝑖𝑠𝑡 𝑥1 𝑗 , 𝑥2 𝑗 > 𝑚 then the contribution to the loss should be 0 • If 𝑑𝑖𝑠𝑡 𝑥1 𝑗 , 𝑥2 𝑗 ≤ 𝑚: 𝐿 𝑛 = 𝑚 − 𝑑𝑖𝑠𝑡 𝑥1 𝑗 , 𝑥2 𝑗 • Negative loss – sum over all negative pairs from the batch: 𝐿 𝑛 = ෍ 𝑗 max 0, 𝑚 − 𝑑𝑖𝑠𝑡 𝑓 𝑥1 𝑗 , 𝑓 𝑥2 𝑗 2 Jan Sedmidubský | PV056 Machine Learning and Data Mining 10 PC loss • Overall loss for all pairs in batch by combining 𝐿 𝑝 and 𝐿 𝑛 • Binary variable 𝑦 ∈ 0,1 : • 𝑦 = 0 → positive pair • 𝑦 = 1 → negative pair 𝐿 = ෍ 𝑖 (1 − 𝑦𝑖) 𝑑𝑖𝑠𝑡 𝑓 𝑥1 𝑖 , 𝑓 𝑥2 𝑖 2 + 𝑦𝑖 max 0, 𝑚 − 𝑑𝑖𝑠𝑡 𝑓 𝑥1 𝑖 , 𝑓 𝑥2 𝑖 2 • 𝑖 – goes over all the pairs from the batch Jan Sedmidubský | PV056 Machine Learning and Data Mining 11 ⋮ positive 𝑦3 = 0 ⋮ negative 𝑦78 = 1 ⋮ Batch example Triplet loss • Creating triplets (Anchor, Positive, Negative) from a batch • (Anchor, Positive) – carry the same class label • (Anchor, Negative) – carry different labels • Different mining strategies with different computational properties: • Negative-hard mining • Negative semi-hard mining Jan Sedmidubský | PV056 Machine Learning and Data Mining 12 Triplet loss – creating triplets • For every pair having the same class label: • One selected as Anchor, the other as Positive: (Anchor, Positive) • For every (Anchor, Positive) pair: • Negative objects are identified – objects with a different class than Anchor/Positive • 𝑛1 (hard negative) – must be pushed further out • 𝑛2 (semi-hard negative) • 𝑛3 (easy negative) Jan Sedmidubský | PV056 Machine Learning and Data Mining 13 Triplet loss – determining negatives • Criteria for dividing a set of negatives: • Hard-negative mining (𝑛~𝑛1): negatives closer to anchor than positive • 𝑑𝑖𝑠𝑡 𝑎, 𝑛 < 𝑑𝑖𝑠𝑡(𝑎, 𝑝) • Semi-hard negative mining (𝑛~𝑛2): negatives fall within margin δ • 𝑑𝑖𝑠𝑡 𝑎, 𝑝 < 𝑑𝑖𝑠𝑡 𝑎, 𝑛 < 𝑑𝑖𝑠𝑡 𝑎, 𝑝 + δ • Easy negatives (𝑛~𝑛3): negatives that lie beyond the margin • 𝑑𝑖𝑠𝑡 𝑎, 𝑝 + δ < 𝑑𝑖𝑠𝑡 𝑎, 𝑛 • Suitability of negatives for training: • Only easy negatives for training – insignificant role in learning, if any at all • Only hard negatives for training – network: • Converges to a local minimum at best, or • Collapses to a state in which all the embeddings are zero • The most suitable for training are semi-hard negatives Jan Sedmidubský | PV056 Machine Learning and Data Mining 14 Triplet loss – determining negatives • Semi-hard negative mining (𝑛2) • Negatives sufficiently close to anchor • Margin δ has to be carefully set: • Appropriate value – network correctly distinguishes between positive/negative samples • Too large/small value – network optimization process may get stuck in a local minimum • To properly set the margin, all embeddings are normalized to be of size unity • Margin typically set to δ = 0.2 Jan Sedmidubský | PV056 Machine Learning and Data Mining 15 Triplet loss – calculation • List of triplets (𝑥𝑖 𝑎 , 𝑥𝑖 𝑝 , 𝑥𝑖 𝑛 ) of a batch of cardinality 𝑁 𝐿 = ෍ 𝑖=1 𝑁 max 0, 𝑑𝑖𝑠𝑡2 (𝑓 𝑥𝑖 𝑎 , 𝑓 𝑥𝑖 𝑝 ) − 𝑑𝑖𝑠𝑡2 (𝑓 𝑥𝑖 𝑎 , 𝑓 𝑥𝑖 𝑛 ) + δ • No contributions from the negatives that are outside the margin (max returns 0) Jan Sedmidubský | PV056 Machine Learning and Data Mining 16 Results on CIFAR-10 • Training on the CIFAR-10 dataset • Contrastive loss learning: Precision@1 = 74% • Triplet loss learning: Precision@1 = 84% Jan Sedmidubský | PV056 Machine Learning and Data Mining 17 Coding issues • Goal – determine pairs/triplets → calculate pair-wise distances • Variables: • Batch of size 𝐵 • Dimensionality of embeddings: 𝑀 • Embeddings-data array 𝑋 of shape 𝐵 × 𝑀 • Labels of embeddings: 𝐵 × 1 • Easy solution – iterative processing (for-loops) to determine distances • The cost of iterative processing is simply too much great • GPU solution – matrix multiplications – not a friend with for-loops • Thousand-fold speedup when you eliminate the loops that you would otherwise need for estimating Jan Sedmidubský | PV056 Machine Learning and Data Mining 18 Coding issues • Easy solution – iterative processing Jan Sedmidubský | PV056 Machine Learning and Data Mining 19 Coding issues • GPU solution – matrix multiplications • Implementation based on tensors • In case of iterative processing: if batch size is 128, this results in 8,192 fetches from the GPU memory (1282 /2) • Solution using 1 GPU fetch → >8 K speedup 1) Determining pairs using the structure with labels (of 𝐵 dimensionality): Jan Sedmidubský | PV056 Machine Learning and Data Mining 20 Coding issues • GPU solution – matrix multiplications 2) Calculating the distance matrix: • Euclidean distance between vectors Ԧ𝑥 and Ԧ𝑦 : Ԧ𝑥 − Ԧ𝑦 2 = Ԧ𝑥 2 − 2 Ԧ𝑥 Ԧ𝑦 𝑇 + Ԧ𝑦 2 • The square of the norm of each of the vectors • The value of the dot product between the two vectors • Calculating a dot product of every pair of embedding vectors in X: 𝑑𝑜𝑡_𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 = 𝑋@𝑋. 𝑇 • Vector norms for the embedding vectors are on diagonal 𝑠𝑞𝑢𝑎𝑟𝑒𝑑_𝑛𝑜𝑟𝑚𝑠_𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔_𝑣𝑒𝑐𝑠 = 𝑡𝑜𝑟𝑐ℎ. 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙(𝑑𝑜𝑡_𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠) • Dot products between pairs of embedding vectors are in the off-diagonal elements • Calculating the Euclidean distance matrix 𝐵 × 𝐵 (𝐵 = 6): 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒_𝑚𝑎𝑡𝑟𝑖𝑥 = 𝑠𝑞𝑢𝑎𝑟𝑒𝑑_𝑛𝑜𝑟𝑚𝑠_𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔_𝑣𝑒𝑐𝑠. 𝑣𝑖𝑒𝑤(1, 6) − 2.0 ∙ 𝑑𝑜𝑡_𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 + 𝑠𝑞𝑢𝑎𝑟𝑒𝑑_𝑛𝑜𝑟𝑚𝑠_𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔_𝑣𝑒𝑐𝑠. 𝑣𝑖𝑒𝑤(6, 1) • Note: operators for tensors are overloaded to add (or subtract) three tensors of different shapes Jan Sedmidubský | PV056 Machine Learning and Data Mining 21 Image embeddings – what we know • Training a neural network based on PC/Triplet loss learning • Transform query and each database image into an embedding • Mapping function 𝑓() extracting embedding 𝑓 𝑥 for object 𝑥 • Given two images 𝑥 and 𝑦, their closeness can be quantified by the Euclidean distance: 𝑑𝑖𝑠𝑡 𝑓 𝑥 , 𝑓 𝑦 = 𝑓 𝑥 − 𝑓 𝑦 2 Jan Sedmidubský | PV056 Machine Learning and Data Mining 22 Neural network (0.2, 0.9, …, 0,7) 𝑥 𝑓 𝑥 Search over image embeddings • Query image ~ query embedding ~ query • k-nearest neighbor (k-NN) query • Finding the k database images that are the most similar to the query image • Similarity between query and database images based on the Euclidean/cosine distance between their embeddings Jan Sedmidubský | PV056 Machine Learning and Data Mining 23 I need some information… Image database Image representation Index Query Query representation Similarity Document DocumentDocument DocumentDocumentRanked images Sort by decreasing similarity Search over image embeddings • Many applications of image search, e.g.: • Photo organization – grouping images into albums automatically by recognizing similar faces, locations, or objects • Fashion and e-commerce – outfit pairing in online stores • Recommending matching items (e.g., shoes, bags, or accessories) by comparing product images based on style, color, and texture • Visual similarity in product search – enabling users to upload an image of a product to find similar items in the store • Cultural heritage and art preservation • Artifact identification – comparing images of newly found artifacts with existing ones to determine origin or classification • Style similarity matching – finding paintings with similar styles for study or curation • Duplicate image detection – removing similar or exact copies of an image in large databases to ensure unique content Jan Sedmidubský | PV056 Machine Learning and Data Mining 24 Search over large image databases • Brute-force approach: • Comparing the query embedding against each database embedding • 𝑂(𝑁) complexity (𝑁 is the size of the dataset) – not scalable • How to search when the database of embeddings is very large? • Solution – Approximate Nearest Neighbor (ANN) algorithms • Many algorithms but leading ones are: • Locality Sensitive Hashing (LSH) – PA212 course • Product Quantization (PQ) Jan Sedmidubský | PV056 Machine Learning and Data Mining 25 Product Quantization (PQ) • PQ – extension of the very old Vector Quantization (VQ) idea • VQ idea: • Compressing a high-dimensional image vector (embedding) into a single codeword, typically an integer value • Pre-processing phase: • Create a mapping from all codewords to the original image vectors within a database • Each codeword points to the list of all the database images with the same codeword • This mapping structure is referred to as the lookup table or inverted index • Search phase: • Transform the query into a single codeword • Use the lookup table to get the candidate image vectors of the same codeword as the query • It is also possible to consider candidate vectors having the codeword “relevant” (not strictly the same) • Compute the distance between the original query vector and all candidate vectors • Return the k-nearest candidate vectors with respect to the query vector Jan Sedmidubský | PV056 Machine Learning and Data Mining 26 (0.2, 0.9, …, 0,7) 4 1 2 3 4 DB vectors VQ • Compression idea: • Partitioning a vector space into Voronoi cells with respect to a set of points • Points are typically called the centroids (pivots) of the cells in which they reside • Voronoi diagram – partitioning the D-dimensional space of embeddings into cells {𝑐1, 𝑐2, … , 𝑐 𝑛} determined by pivots {𝑝1, 𝑝2, … , 𝑝 𝑛}; each cell 𝑐𝑖 gets its ID ~ codeword • All the vectors in cell 𝑐𝑖 are closer to the pivot 𝑝𝑖 than to any other pivot Jan Sedmidubský | PV056 Machine Learning and Data Mining 27 An example of the Voronoi diagram for the case of four pivots {𝑝1, 𝑝2, 𝑝3, 𝑝4} in the 2D plane 𝑐2: 𝑐1: 4 1 32 𝑝2 𝑝4 𝑝3𝑝1 VQ • Pre-processing phase: • Creating a codebook of 𝑛 = 2 𝐵 codewords • The 𝑛 centroids (~cells) are typically the K centroids generated by applying the K-means algorithm to the (sample of) database of vectors • Any vector can be quantized in the underlying vector space to one of the K cluster centers • Vectors that fall in the same cluster will be mapped to the same codeword • Clusters can be differently populated based on data distribution • Each codeword is represented by an integer value → B-bit representation • Managing a lookup table with 2 𝐵 codewords • Transforming each database image vector into a codeword • Each codeword in the lookup table is associated with a list of the database images (or paths to these images) of the same codeword • Dimensionality of data can be dramatically reduced – example scenario: • Each image represented by a 512-D embedding vector of floats → 512 ∙ 4 = 2,048 bytes • 𝐵 = 16 → 512-D vectors quantized to 216 = 65,536 codewords • Each image then represented by the 16-bit code (2 bytes) → 1,024x compression Jan Sedmidubský | PV056 Machine Learning and Data Mining 28 • Search phase: • Before the query is transformed to codeword, the ranking of pivots is created • The distance between the query vector and each pivot vector must be calculated • The query gets the codeword corresponding to the nearest (most-ranked) pivot • Other “query-relevant” codewords can also be considered, e.g., as 2nd or 3rd most-ranked • The database vectors associated with the same codeword as the query codeword (or any of “query-relevant” codewords) become candidates • The distance between the query vector and each candidate is evaluated • Each candidate vector must be loaded, e.g., from secondary storage • The k-most similar candidates (with the smallest distance) are returned as the query result 𝑝1 VQ Jan Sedmidubský | PV056 Machine Learning and Data Mining 29 4 1 32 𝑝2 𝑝4 𝑝3 Database vectors Query vector Illustration of query evaluation: • Query vector • Candidates are database vectors associated with codeword 4 4 VQ • Limitations of VQ: • If the codebook is too large (e.g., 𝐵 = 64 → 264 clusters are needed to be found), it is impossible for K-means to detect such a huge number of clusters • Returning the k-most query relevant database vectors requires to load a large set of candidate vectors and calculate their distance with respect to the query Jan Sedmidubský | PV056 Machine Learning and Data Mining 30 PQ • Product quantization (PQ) idea: • Original database vector split into several segments (sub-vectors) • Each sub-vector follows the vector quantization independently • Quantized vector – concatenation of codewords of individual segments • Pre-processing phase: • Create a sub-quantizer – a codebook for each of the segments – separately (i.e., #codebooks = #segments) • Clustering operation applied to each set of sub-vectors • Create a mapping from all codewords of each codebook to the original image vectors within a database • Search phase: • Original database vectors need not be loaded (e.g., from secondary storage) • Distance between the query and a database vector is efficiently approximated • Based on pre-computed distances between the query segments and centroids in each codebook Jan Sedmidubský | PV056 Machine Learning and Data Mining 31 2 8 9 16 Vector Segments • Pre-processing phase: • Creating 𝑚 segments → 𝑚 codebooks, each of 𝑛 = 2 𝐵 codewords • Each codeword is again represented by an integer value → 𝐵-bit representation • Codewords of all segments are concatenated →(𝑚 ∙ 𝐵)-bit representation • Managing a lookup table with 2 𝐵 codewords for each segment (𝑚 lookup tables) PQ Jan Sedmidubský | PV056 Machine Learning and Data Mining 32 Example scenario: • Image as a 512-D vector of floats → 512 ∙ 4 = 2,048 bytes • 𝑚 = 32 → 32 segments (codebooks) • 𝐵 = 8 → 16-D sub-vectors quantized to 28 = 256 codewords • Each image then as (32 ∙ 8)-bit code (32 bytes) → 64x compression 𝑚 = 4 2 8 9 16 Vector Segments 12 9 1110 𝑝2 𝑝4 𝑝3𝑝14 1 3 2 8 5 7 6 16 13 15 14 𝑛 = 4 (𝐵 = 2) • Search phase: • 1) Precompute the distances between each query sub-vector and each centroid in the corresponding codebook → 𝑚 ∙ 2 𝐵 distances in total • Distances kept within a query distance matrix 𝑄𝐷𝑀 of size 𝑚 × 2 𝐵 • E.g., 32 ∙ 256 = 8,192 distances in the previous example scenario, which is cheap to compute • Illustration of distance precomputation: • 𝑄𝐷𝑀 of size 𝑚 × 2 𝐵 = 4 × 4 • 𝑚 = 4, 𝐵 = 2 PQ Jan Sedmidubský | PV056 Machine Learning and Data Mining 33 .4 .2 .6 .3 .6 .4 .4 .1 .2 .4 .6 .7 .4 .3 .2 .1 𝑑𝑖𝑠𝑡(. , . ) 2 𝐵 𝑚 𝑚 = 4 2 8 9 16 Query vector Segments 12 9 1110 𝑝2 𝑝4 𝑝3𝑝14 1 3 2 8 5 7 6 16 13 15 14 𝑛 = 4 (𝐵 = 2) PQ • Search phase: • 2) Identify the nearest cluster(s) in the same way as in the VQ approach • 3) Approximate the distance of each candidate vector in the nearest cluster(s): • For 𝑖-th segment and associated codeword c𝑖 of the candidate vector, approximate the subdistance between 𝑖-th query segment and 𝑖-th candidate segment by the precomputed subdistance 𝑄𝐷𝑀[𝑖, 𝑐𝑖] • Sum the sub-distances for all the segments: σ𝑖=0 𝑚−1 𝑄𝐷𝑀[𝑖, 𝑐𝑖] Illustration of approximating the distance (for query ): • Candidate : σ𝑖=0 3 𝑄𝐷𝑀 𝑖, 𝑐𝑖 = 0.2 + 0.1 + 0.2 + 0.2 = 0.7 • Candidate : 0.2 + 0.1 + 0.7 + 0.2 = 1.2 • Candidate : 0.4 + 0.1 + 0.2 + 0.1 = 0.8 Jan Sedmidubský | PV056 Machine Learning and Data Mining 34 .4 .2 .6 .3 .6 .4 .4 .1 .2 .4 .6 .7 .4 .3 .2 .1 2 𝐵 𝑚 2 8 9 15 2 8 9 16 2 8 12 15 1 8 9 16 2 8 9 15 Candidate PQ • Summary: • For distance approximations, the candidate vectors are not accessed • Accessing candidate vectors can be bottleneck in VQ, especially when vectors are stored within secondary storage • ANN search carried out efficiently in high dimensional vector spaces even when a database has billions of vectors • Approximated distances need not be perfect Jan Sedmidubský | PV056 Machine Learning and Data Mining 35 Similarity search using FAISS • FAISS – Facebook AI Similarity Search • Library developed by Facebook AI Research for efficient similarity search and clustering of dense vectors • Useful for large-scale similarity search problems, which are common in various machine learning and information retrieval tasks • Designed to work on either the GPU or CPU and provides significant performance improvements compared to other nearest neighbor search algorithms • One of the best implementation of the Product Quantization approach to similarity search • Implemented in C++ with Python bindings Jan Sedmidubský | PV056 Machine Learning and Data Mining 36 FAISS • Several techniques to achieve efficient similarity search: • Quantization – compresses the embeddings which significantly reduces memory usage and accelerates distance computations • Supports Product Quantization (PQ) • Indexing – FAISS provides multiple index types for different use cases and trade-offs between search speed and search quality • Flat index – brute-force index that computes exact distances between query vectors and indexed vectors • IVF (Inverted File) index – partitioned index that divides the vector space into Voronoi cells • HNSW (Hierarchical Navigable Small World) – graph-based index that builds a hierarchical graph structure, enabling efficient nearest neighbor search with logarithmic complexity Jan Sedmidubský | PV056 Machine Learning and Data Mining 37 FAISS • Example of IVF (Inverted File) index: • nprobes parameter specifies the number of nearest cluster(s) to be visited • Clusters ranked by the distance between the cluster centroid and query vector Jan Sedmidubský | PV056 Machine Learning and Data Mining 38 Indexselectionguidelines FAISS • Useful references • Tutorials: • https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity- search/ • https://github.com/facebookresearch/faiss/wiki/ • Research papers: • Johnson et al.: Billion-scale similarity search with GPUs, 2017: https://arxiv.org/abs/1702.08734 • Douze et al.: The FAISS library, 2024: https://arxiv.org/abs/2401.08281 Jan Sedmidubský | PV056 Machine Learning and Data Mining 39 Sources • Avi Kak and Charles Bouman: Metric Learning with Deep Neural Networks. Purdue University, 2024 • https://www.pinecone.io/learn/series/faiss/faiss-tutorial/ Jan Sedmidubský | PV056 Machine Learning and Data Mining 40