Hiking is the Best Hobby for Research RatingInference for Custom Trips from Enriched GPS Traces Lasaris Seminar, November 23, 2023 Mouzhi Ge DeggendorfInstituteof Technology, Germany mouzhi.ge@th-deg.de Agenda • Motivation • Definition and Background • Problem statement and scope • Similarity-based trip rating inference framework • ML-based trip rating inference framework • Experimental settings and results • Key take-aways The real motivation (hobby-driven research) Research motivation • GPS-enableddevicesallow us to pinpoint our location and generate a large amount of data that traces our movements along trips. • Custom trips are designed to cater to travelers’ specific desires and user preferences for personalized tourism experience. • Since the custom trip is usually new in the system, no rating can be shown to the user. As a result, the rating inference of custom trips has emerged as an important feature in tourism applications and location-based services. • This paper aims to determine which representationfeeds best to the machine learning algorithms and achieves higher accuracy for rating inference. • Apart from trip recommendations,rating inference in this paper can be considered a second opinionfor custom trips defined by users. Custom trip • Closeness to POIs • Closeness to places where users take pictures Enriched GPS traces along with the trip • Trip location • Trip elevation Multi-criteria ratings for trips • Multi-criteria ratings consider different factors simultaneously. For example, one hiking route would have various attributes for ratings, such as Condition, Difficulty, Technique, Quality of Experience, and Landscape. Condition Difficulty Technique Quality of Experience Landscape Problem statement and scope • The user designs a custom trip • This trip contains enriched GPS traces • There are different rating criteria for this trip • We want to infer/predict • The rating of each criterion for this custom trip Our similarity-based solution Proposed in 2019 Theodoros Chondrogiannis, Mouzhi Ge: Inferring ratings for custom trips from rich GPS traces, LocalRec at 27th ACM SIGSPATIAL, Chicago, Illinois, USA, November 5, 2019. Hiking routes Hiking routes with overlaps Recap and intuition • Users design their own routes • Applications ‣ hiking trails ‣ running/training routes • Problem: what is the rating of such a route? • Idea: Consider the ratings of overlapping routes to infer a rating for the new route Trip rating inference similarity-based framework Map Matching • Map all rated trips to segments of the underlying spatial network (preprocessing) • Map the unrated trip to a segment of the underlying spatial network (query processing) ⟨(x1, y1), … , (xn, yn)⟩ GPS Trace ⟨e1, … , em⟩ List of edges Overlapping Trip Retrieval • Overlap: how much of the query trip is overlapping with some already rated trip • Inverted index ‣ Edge e → List of trips that contain e ‣ Retrieval cost is linear to the size of the query trip Ol(ti, tj) = i j∀e∈p(t)∪p(t) ∑ ℓ (e) ℓ(p(tj)) Rating Inference (Step 1) ‣ Te is the set of trips that cross e • The rating of and edge e depends on ‣ the rating of each trip that cross e ‣ the overlap of each trip that cross e with tq ∀ti∈Te iRt(e) = ∑ r ⋅ • Edge rating inference Ol(ti, tq) ∑∀tj∈Te Ol(tj,tq) ∣ Te = {ti ∣ ti ∈ D∧e ∈ p(ti)} Rating Inference (Step 2) • The rating of the trip is given by the weighted sum of the ratings of its edges ‣ Eq is the set of edges that have been rated from the previous step • Note: our approach considers only segments that overlap with at least one existing trip. ∑ ∀e∈p(tq) rq = Rt(e) ⋅ ℓ(e) ℓ(Eq) Outdooractive dataset Evaluation Setup • Hiking Trails from Outdooractive • Five attributes rated betweem [1,6] • (Condition, Difficulty, Landscape, Quality, Technique) network nodes edges trips (all) trips (hiking) Swabia 491213 630094 544 353 Austria 2484861 3033885 516 260 NE Italy 1467754 1884450 696 419 Bavaria 3045179 3928652 1346 754 The average overlap of each unrated trip with already rated trips was 48.6% for Swabia, 14.1% for Austria, 22.4% for NE Italy, and 15.5 % for Bavaria. Experimental results (MAE) Experimental results (Accuracy) Our machine leaning based solution Proposed in 2023 Theodoros Chondrogiannis, Mouzhi Ge: Rating Inference for Custom Trips from Enriched GPS Traces using Random Forests, LocalRec at 31st ACM SIGSPATIAL, Hamburg, Germany, November 13, 2023. Trip rating inference ML-based framework We want to use machine learning to do the rating inference, but the focus of this work is not ML model selection, it is feature engineering and encoding selections, given the enriched GPS traces are complex. Location Encoder • We first impose a 𝑛 × 𝑛 grid over the space defined by the minimum bounding rectangle of all traces. • One-hot encoding • 𝑍-order curve to first ID the grids • For each set of IDs the trip crosses, a vector that contains basic statistics, i.e., min, max, mean, and median values. • Histogram of 𝑛 buckets 0 1 2 3 0 1 2 3 0 3 Altitude Encoder < • total ascent, • total descent • minimum altitude • maximum altitude • standard deviation of the elevation profile > POI Distance Encoder • A combination of two vectors • Vector 1: Distances to all POIs Plus • Vector 2: Distances to a predefined set of POIs (𝑘 nearest POIs) Geo-tagged Images Encoder • A bit vector and the size equals to the number of images in 𝐼 • we set each bit associated with an image to 1 if the minimum distance between the trace and the image location is below a predefined threshold, e.g., 20 meters. Datasets • Trip data obtained from Outdooractive: www.outdooractive.com • Elevation data for trip from Copernicus: www.copernicus.eu • 181,185 POI data from www.kaggle.com/datasets/ehallmar/points- of-interest-poi-database • 50,000 geotagged images from www.kaggle.com/datasets/habedi/large-dataset-of-geotagged- images Encoding methods overview ML Model and Evaluation Metric • Random forest • Our previous experiments demonstrated that Random Forest performs best in several similar rating inference scenarios. We used Random Forest classifier in this work. • MAE, widely used for evaluating rating predictions, especially in recommender system research. Experimental Results Take aways • "One size fits all" encodings may lower the quality of multi-criteria rating inferences. • Different encodings might be dynamically used to infer different rating criteria. • The trip-oriented ratings are focused on the intrinsic features of the trip. Thus, the encoding of trip profiles can offer higher-quality rating inferences. • User-oriented ratings focus on how users feel about the trip and user satisfaction. Summary and Future Research • Scope of this research: encoding selection, not model selection. • The model may consider more contextual factors. For example, the context of a trip may include group dynamics, previous experiences, and cultural factors. • Users would often like to know how the inference is made. In turn, users can be more confident in their trip decisions. Therefore, developing transparent and explainable models may increase user trust and satisfaction. • Including user feedback to enhance user engagement is also critical. User feedback can be used to improve the model training and provide continuous improvement for implementing trip recommendations Thank you and questions Contact details • Prof.Dr. habil. Mouzhi Ge Head of Data Science and Intelligent Systems Research Group European CampusRottal-Inn Deggendorf Institute of Technology Max-Breiherr-Straße32 84347Pfarrkirchen, Germany • Email: mouzhi.ge@th-deg.de