Mikuláš Bankovič
Researcher
Geographic information retrieval
Geographic information retrieval
POI - Point Of Interest
● searching POI
● suggesting POI
● routing between 2 POIs
● street, municipality, country
● mountains, rivers, canyons, beaches
● stores, merchandise, restaurants, etc.
● public transport, special events, vaccination centres, etc.
POI - Point Of Interest
General Problems
● Ambiguous POIs
● Category + locality
● Personalization
● Diversity in SERP(Search engine result page)
POI characteristics
● Name - string
● Address - string
● Coordinates - (float, float)
● Geometry - area, diagonal
● Popularity - float
● Importance - float
● POI type - category
● Aliases - List[string]
Previous suggester ranking
● hand-crafted equation and a lot of magic constants
● equation components:
○ popularity
○ importance
○ simple text score (based on matches with name, address or category)
○ distance
○ zoom
○ category relevance
○ category distance coefficient
Nowadays - Learning to Rank
● Pointwise - predict real number for each POI independently
● Pairwise - classify two POIs: better/worse
● Listwise - optimize list as a whole
CatBoost
CatBoost
● https://catboost.ai/
● hand-crafted signals (features)
● not so easy to beat equation on metrics
Datasets
● 15 000 annotations
● millions of logs in history (label 1 if user clicked on POI, 0
otherwise)
Query context POI properties POI and query
relationship
• language
• zoom
• time signals
• area
• diagonal
• importance
• popularity
• POI type
• distance
• simple text score
• prefix text score
• previous ranking
value
Suggester signals
Evaluation
Metric TOP1 Random Queries Frequent Queries LongTail Queries
Original equation 77% 98% 82%
CatBoost model 81% 97% 86%
● Differently filtered logs: longtail, frequent, categories,
● AB tests - small improvement in some parts, globally not
visible
Personalization
Personalization (Baidu Maps)
● word2vec signals for catboost
● neural net signals for catboost
● end-to-end neural net
● all-in-one catboost
W2V
● train and learn embedding for POI (only from ID)
● user has a matrix U, with some embeddings from POI
history
● for each ranking POI candidate multiple its embedding
with the user matrix U, and get personalized embedding
The most common POI embedding
The second most common POI embedding
The third most common POI embedding
The freshest POI embedding
The second freshest POI embedding
Training
● skipgram
● user_id -> poi1_id, poi2_id, poi3_id
● fasttext, word2vec
Neural network signals
Create embedding (Bi-LSTM, sentence CNN, transformer):
● id
● name
● address
● category
Skip-gram training with negative sampling.
θ
End-To-End
● Embeddings for name, address, location, category type
● Embeddings for personalized query
● Connected through triplet loss function - similar idea of
pushing clicked poi embedding closer to the personalized
query prefix
Mikuláš Bankovič
Research
E-mail: mikulas.bankovic@firma.seznam.cz
Tel.: +420735507244
Kontakt
Mapy
● Learning-to-rank ●
Personalized embeddings for POIs - WIP
● End-to-end neural network - Future