Hotel Booking Cancellation Prediction

Oct 2024 - Dec 2024
React
Python
TypeScript
Panda
Tailwind CSS

 

I developed a machine learning classifier to predict hotel booking cancellations using a real-world dataset from two prominent hotels in Portugal: a resort hotel and a city hotel. The dataset contains over 100,000 booking records from July 2015 to August 2017, encompassing more than 30 features extracted directly from the hotels' databases.

Project Motivation

In the hospitality industry, predicting cancellations is crucial due to the perishable nature of hotel room inventory. Unsold rooms represent lost revenue that cannot be recovered. High cancellation rates can significantly impact a hotel's revenue and operational planning. By accurately predicting cancellations, hotels can implement proactive strategies to mitigate financial losses, such as overbooking policies or targeted marketing efforts.

Data Challenges

  • Data Quality Issues: The dataset, being sourced directly from operational SQL databases, included inconsistencies, missing values, and other quality issues typical of real-world data.

  • Feature Complexity: With over 30 features, including both numerical and categorical variables, proper feature selection and engineering were vital to enhance model performance.

  • Imbalanced Classes: The dataset had an imbalance between bookings that were canceled and those that weren't, requiring careful handling during model training.

Methodology

Data Preprocessing

  • Data Cleaning: Handled missing values, corrected inconsistencies, and removed duplicates to ensure data integrity.

  • Feature Engineering: Created new features and transformed existing ones to better capture patterns related to cancellations.

  • Encoding Categorical Variables: Applied one-hot encoding to categorical features like countrydistribution_channel, and deposit_type.

  • Normalization: Scaled numerical features to normalize the data for algorithms sensitive to feature scales.

Exploratory Data Analysis (EDA)

  • Pattern Discovery: Analyzed booking patterns, seasonality effects, and customer behaviors.

  • Correlation Analysis: Identified correlations between features and the target variable to inform feature selection.

  • Visualization: Used graphs and charts to visualize distributions and relationships within the data.

Model Development

  • Algorithm Selection: Evaluated several algorithms including Decision Trees, Random Forests, and Gradient Boosting Machines.

  • Handling Imbalanced Data: Used techniques like SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance.

  • Hyperparameter Tuning: Optimized model parameters using grid search and cross-validation for better generalization.

Model Evaluation

  • Performance Metrics: Assessed models using accuracy, precision, recall, F1-score, and ROC-AUC score.

  • Validation: Performed k-fold cross-validation to ensure model robustness.

  • Model Selection: Chose the model that provided the best balance between performance and interpretability.

Interpretation with SHAP Values

  • Feature Importance: Utilized SHAP (SHapley Additive exPlanations) values to understand the impact of each feature on the predictions.

  • Aggregation: Aggregated SHAP values for one-hot encoded features back to their original categorical features for clearer interpretation.

  • Visualization: Generated plots to illustrate the most influential features affecting booking cancellations.

Results

  • Accuracy Achieved: The final model achieved an accuracy of 77% on the test dataset.

  • Key Influencing Factors:

    • Lead Time: Longer lead times were associated with higher cancellation rates.

    • Deposit Type: Bookings with no deposit were more likely to be canceled.

    • Previous Cancellations: Customers with prior cancellations had a higher likelihood of canceling again.

    • Distribution Channel: Online travel agencies had different cancellation patterns compared to direct bookings.

Backend and Frontend

  • API Development: Created a RESTful API using Flask to serve the model predictions.

  • Web Interface: Built a user-friendly frontend using React where users can input booking details and receive predictions along with explanations.

 

Impact

  • Operational Efficiency: Hotels can forecast cancellations and adjust their inventory management accordingly.

  • Revenue Optimization: By predicting cancellations, hotels can implement overbooking strategies to minimize revenue loss.

  • Customer Satisfaction: Understanding cancellation patterns helps in designing better policies and improving customer experience.

Conclusion

This project showcases the practical application of machine learning in addressing real-world business challenges. By accurately predicting hotel booking cancellations, the model aids hotels in making informed decisions, optimizing revenue, and enhancing customer satisfaction.