Problem Statement and Metrics

Let’s dive into the problem statement and metrics required for the Airbnb rental search ranking application.

Airbnb rental search ranking#

1. Problem statement#

Airbnb users search for available homes at a particular location. The system should sort stays from multiple homes in the search result so that the most frequently booked homes appear on top.

Search ranking system for Airbnb
  • The naive approach would be to craft a custom score ranking function. For example, a score based on text similarity given a query. This wouldn’t work well because similarity doesn’t guarantee a booking.

  • The better approach would be to sort results based on the likelihood of booking. We can build a supervised ML model to predict booking likelihood. This is a binary classification model, i.e., classify booking and not-booking.

2. Metrics design and requirements#

Metrics#

Offline metrics#

  • Discounted Cumulative Gain DCGp=i=1prelilog2(i+1)DCG_{p} = \sum_{i=1}^p {rel_{i} \over log_{2}(i+1)}

    • where relirel_{i} stands for relevance of result at position ii.
  • Normalized discounted Cumulative Gain:

nDCGp=DCGpIDCGpnDCG_{p} = {DCG_{p} \over IDCG_{p}}

  • IDCG is ideal discounted cumulative gain:

IDCGp=i=1RELp2reli1log2(i+1)IDCG_{p} = \sum_{i=1}^{|REL_{p}|} {2^{rel_{i}} - 1 \over log_{2}(i+1)}

Online metrics#

  • Conversion rate and revenue lift: This measures the number of bookings per number of search results in a user session.

    conversion_rate=number_of_bookingsnumber_of_search_resultsconversion\_rate = \frac{number\_of\_bookings}{ number\_of\_search\_results}

Requirements#

Training#

  • Imbalanced data and clear-cut session: An average user might do extensive research before deciding on a booking. As a result, the number of non-booking labels has a higher magnitude than booking labels.

  • Train/validation data split: Split data by time to mimic production traffic, for example, we can select one specific date to split training and validation data. We then select a few weeks of data before that date as training data and a few days of data after that date as validation data.

Inference#

  • Serving: Low latency (50ms - 100ms) for search ranking

  • Under-predicting for new listings: Brand new listings might not have enough data for the model to estimate likelihood. As a result, the model might end up under-predicting for new listings.

Summary#

Type Desired goals
Metrics Achieve high normalized discounted Cumulative Gain metric
Training Ability to handle imbalance data
Split training data and validation data by time
Inference Latency from 50ms to 100ms
Ability to avoid under-predicting for new listings
Ads Recommendation System Design
Booking Model
Mark as Completed
Report an Issue