Problem Statement and Metrics

Let’s dive into the problem statement and metrics required for the Airbnb rental search ranking application.

We'll cover the following

Airbnb rental search ranking
1. Problem statement
2. Metrics design and requirements

Airbnb rental search ranking#

1. Problem statement#

Airbnb users search for available homes at a particular location. The system should sort stays from multiple homes in the search result so that the most frequently booked homes appear on top.

The naive approach would be to craft a custom score ranking function. For example, a score based on text similarity given a query. This wouldn’t work well because similarity doesn’t guarantee a booking.
The better approach would be to sort results based on the likelihood of booking. We can build a supervised ML model to predict booking likelihood. This is a binary classification model, i.e., classify booking and not-booking.

2. Metrics design and requirements#

Metrics#

Offline metrics#

Discounted Cumulative Gain $DCG_{p} = \sum_{i=1}^p {rel_{i} \over log_{2}(i+1)}$
- where $rel_{i}$ stands for relevance of result at position $i$ .

Normalized discounted Cumulative Gain:

nDCG_{p} = {DCG_{p} \over IDCG_{p}}

IDCG is ideal discounted cumulative gain:

IDCG_{p} = \sum_{i=1}^{|REL_{p}|} {2^{rel_{i}} - 1 \over log_{2}(i+1)}

Online metrics#

Conversion rate and revenue lift: This measures the number of bookings per number of search results in a user session.

$conversion\_rate = \frac{number\_of\_bookings}{ number\_of\_search\_results}$

Requirements#

Training#

Imbalanced data and clear-cut session: An average user might do extensive research before deciding on a booking. As a result, the number of non-booking labels has a higher magnitude than booking labels.
Train/validation data split: Split data by time to mimic production traffic, for example, we can select one specific date to split training and validation data. We then select a few weeks of data before that date as training data and a few days of data after that date as validation data.

Inference#

Serving: Low latency (50ms - 100ms) for search ranking
Under-predicting for new listings: Brand new listings might not have enough data for the model to estimate likelihood. As a result, the model might end up under-predicting for new listings.

Summary#

Type	Desired goals
Metrics	Achieve high normalized discounted Cumulative Gain metric
Training	Ability to handle imbalance data
	Split training data and validation data by time
Inference	Latency from 50ms to 100ms
	Ability to avoid under-predicting for new listings

Ads Recommendation System Design

Booking Model

Mark as Completed

Report an Issue

Machine Learning Primer

Video Recommendation

Feed Ranking

Ad Click Prediction

Rental Search Ranking

Estimate Food Delivery Time

Machine Learning Knowledge

Machine Learning Model Diagnosis

Conclusion