Post on 10-Apr-2022
Ninth Montreal Industrial Problem Solving Workshop:Spilling problem
Ismael Assani, Poclaire Kenmogne, Jiliang Li, Gabriel Lemeyre, ThiThanh Hue Nguyen, Frédérique Robin, Pierre-Loïk Rothé
August 26, 2019
Under the supervision ofFrançois Bellavance (HEC) & Olivier G. Leblanc (Air Canada)
Ninth, IPSW AIR CANADA August 26, 2019 1 / 18
Outline
1 Context
2 Dataset building
3 ApproachesMachine LearningSurvival modelsKalman filtering approach
Ninth, IPSW AIR CANADA August 26, 2019 2 / 18
Context
The objective is to predict the spilling of a flight:↪→ Spilling flight definition: open to interpretation.
One proposition: the spill flight event is defined by the event:
“occupation rate 3 days before departure ≥ 0.95”.
The occupation rate is defined by
the number of bookingsactual airplane capacity .
Ninth, IPSW AIR CANADA August 26, 2019 3 / 18
ContextOne proposition: the spill flight event is defined by the event:
“occupation rate 3 days before departure ≥ 0.95”.The occupation rate is defined by
the number of bookingsactual airplane capacity .
Ninth, IPSW AIR CANADA August 26, 2019 4 / 18
Dataset Building and Features - 110 Origins and Destinations: AAA, BBB, CCC, ..., JJJ, KKK, LLL
Figure: Flights from 20 routes studied between 10 Origins and Destinations
Simplification: aggregating data by a unique flight index (TOD).↪→ longitudinal data (time series) per each flight over two years.
Ninth, IPSW AIR CANADA August 26, 2019 5 / 18
Dataset Building and Features - 2
Figure: Distribution of flights functions of Departure airport (Left), Destinationairport (Middle) and Departure hour (Right)
Ninth, IPSW AIR CANADA August 26, 2019 6 / 18
Machine Learning - Random forest - 1
Ninth, IPSW AIR CANADA August 26, 2019 7 / 18
Machine Learning - Random forest - 2
Data were stratified by cabinclass70-30 split between training andtesting data5-fold cross-validation usingcaret packageAverage of 93% of accuracyachieved for spill-detection
Ninth, IPSW AIR CANADA August 26, 2019 8 / 18
Machine Learning - Lasso, SVM, Gradient Boosting,logistic regression
Data were stratified by flightWe use Lasso to select features.Average of 80% of accuracyachieved for spill-detection forSVM, LG, Gradient Boosting
Ninth, IPSW AIR CANADA August 26, 2019 9 / 18
Machine Learning - ROC , AUC
Ninth, IPSW AIR CANADA August 26, 2019 10 / 18
Survival model approach - 1
Approach: Train a survival model to obtain a survival functionassociated with each unique Origin-Destination pair.The selected model is the Cox.This model allows us to predict the probability of survival accordingto certain flight characteristics.The characteristics retained are: the moment of the day, the day ofthe week and the week of the year.
Ninth, IPSW AIR CANADA August 26, 2019 11 / 18
Survival model approach - 20.
70.
80.
91.
01.
11.
21.
3
Moment in the day
Incr
easi
ng p
rob.
to s
pill
MORNING AFTERNOON
flight1flight2flight3flight4flight5
0.4
0.6
0.8
1.0
1.2
1.4
Day of the week
Incr
easi
ng p
rob.
to s
pill
1 2 3 4 5 6 7
flight1flight2flight3flight4flight5
05
1015
20week of the year
Incr
easi
ng p
rob.
to s
pill
1 5 9 14 19 24 29 34 39 44 49
flight1flight2flight3flight4flight5
5 flights charateristic
Ninth, IPSW AIR CANADA August 26, 2019 12 / 18
Survival model approach - 3
Application to one flight to predict the probability of spill 3 days beforedeparture knowing that we are 30 days from departure gives : predictionscore = 67.01%; MSE = 53.17%.
Low prediction capacity: But normal since the model does not takeinto account any other information.Can be use as feature engineering to improve another model.Possible improvement : add more relevant variables that may explainspill (eg: price range 30 days before departure).
Ninth, IPSW AIR CANADA August 26, 2019 13 / 18
Kalman Filtering - 1
Approach: Compute a forecast of plane occupation and conclude if it spillsor not
Historical data and measurements Occupation rate forecast Spilling Forecast
Principle:
Infer dynamic for the current booking:Use historical data to fit a polynomial regression
Modify dynamic to fit current measurements (data-driven approach):Use Kalman filtering to enrich the dynamic with current observations
Ninth, IPSW AIR CANADA August 26, 2019 14 / 18
Kalman Filtering - 2
0 25 50 75 100 125 150 175 200
0.0
0.2
0.4
0.6
0.8
1.0Kalman Filter Prediction Historical model (2017)Measurements (2018)
Figure: One flight occupation prediction using Kalman Filters and historical model(Polynomial degree: 5)
Ninth, IPSW AIR CANADA August 26, 2019 15 / 18
Kalman Filtering - 3
0 25 50 75 100 125 150 175 200
0.0
0.2
0.4
0.6
0.8
1.0
Kalman Filter Prediction Historical model (2017)Measurements (2018)
95% Occupation
Figure: One flight occupation prediction using Kalman Filters and historical model(Polynomial degree: 5)
Ninth, IPSW AIR CANADA August 26, 2019 16 / 18
Kalman Filtering - 4
Actual PredictedSpill occurrence rate 36% 40%
Figure: Results for a dataset of 11,307 flights
Prediction score: 73%False negative: 12%
Perspectives:Improving the historical dynamic modelMachine learning initial guess for new flight (without historical data)The Kalman filtering approach allows day to day update of theoccupation forecasting with minimal computational load
Ninth, IPSW AIR CANADA August 26, 2019 17 / 18
Acknowledgments
Thank for your attention !
Special thanks to Olivier, François, Caroline andFabian . . .
and Odile for the organizationNinth, IPSW AIR CANADA August 26, 2019 18 / 18