Mikuláš Bankovič Researcher Geographic information retrieval Geographic information retrieval POI - Point Of Interest ● searching POI ● suggesting POI ● routing between 2 POIs ● street, municipality, country ● mountains, rivers, canyons, beaches ● stores, merchandise, restaurants, etc. ● public transport, special events, vaccination centres, etc. POI - Point Of Interest General Problems ● Ambiguous POIs ● Category + locality ● Personalization ● Diversity in SERP(Search engine result page) POI characteristics ● Name - string ● Address - string ● Coordinates - (float, float) ● Geometry - area, diagonal ● Popularity - float ● Importance - float ● POI type - category ● Aliases - List[string] Previous suggester ranking ● hand-crafted equation and a lot of magic constants ● equation components: ○ popularity ○ importance ○ simple text score (based on matches with name, address or category) ○ distance ○ zoom ○ category relevance ○ category distance coefficient Nowadays - Learning to Rank ● Pointwise - predict real number for each POI independently ● Pairwise - classify two POIs: better/worse ● Listwise - optimize list as a whole CatBoost CatBoost ● https://catboost.ai/ ● hand-crafted signals (features) ● not so easy to beat equation on metrics Datasets ● 15 000 annotations ● millions of logs in history (label 1 if user clicked on POI, 0 otherwise) Query context POI properties POI and query relationship • language • zoom • time signals • area • diagonal • importance • popularity • POI type • distance • simple text score • prefix text score • previous ranking value Suggester signals Evaluation Metric TOP1 Random Queries Frequent Queries LongTail Queries Original equation 77% 98% 82% CatBoost model 81% 97% 86% ● Differently filtered logs: longtail, frequent, categories, ● AB tests - small improvement in some parts, globally not visible Personalization Personalization (Baidu Maps) ● word2vec signals for catboost ● neural net signals for catboost ● end-to-end neural net ● all-in-one catboost W2V ● train and learn embedding for POI (only from ID) ● user has a matrix U, with some embeddings from POI history ● for each ranking POI candidate multiple its embedding with the user matrix U, and get personalized embedding The most common POI embedding The second most common POI embedding The third most common POI embedding The freshest POI embedding The second freshest POI embedding Training ● skipgram ● user_id -> poi1_id, poi2_id, poi3_id ● fasttext, word2vec Neural network signals Create embedding (Bi-LSTM, sentence CNN, transformer): ● id ● name ● address ● category Skip-gram training with negative sampling. θ End-To-End ● Embeddings for name, address, location, category type ● Embeddings for personalized query ● Connected through triplet loss function - similar idea of pushing clicked poi embedding closer to the personalized query prefix Mikuláš Bankovič Research E-mail: mikulas.bankovic@firma.seznam.cz Tel.: +420735507244 Kontakt Mapy ● Learning-to-rank ● Personalized embeddings for POIs - WIP ● End-to-end neural network - Future