I couldn't spend lots of time on the competition (only made 30 submissions :(). In the meantime, the competition metric is kinda noisy and we also expected a shake-up/down (not a planet-scale, but for some cases). So, my strategy is focused on protecting a shake-down as possible i can (instead of bulding new features).
My strategy is
building various datasets, folds, seeds, models. I'll explain them one by one.
My base dataset is based on the raddar's dataset (huge thanks to @raddar). Also, most of the pre-processing logic can be found in the
The differences are...
using more lagging features (to 3 months)
not just using a single dataset, but multiple datasets (I just added features incrementally) for the variousity.
- A dataset
- B dataset = A dataset + (features)
- C dataset = B dataset + (another features)
I didn't check the exact effectiveness of using the datasets on multiple models, however, it seems that positive effects when ensembling in my experiments.
I built 6 models (3 gbtm, 3 nn) to secure the variousity and roboustness. Also, a few models (LightGBM, CatBoost) are trained on multiple seeds (1, 42, 1337) with the same training recipe. Lastly, some models are trained with 10, 20 folds.
- LightGBM (w/ dart, w/o dart)
- 5-layers NN
- stacked bi-GRU
Here's the best CV by the model (sorry for the LB, PB scores, I rarely submitted a single model)
|Xgboost||0.795940||only using the given(?) cat features as
The CV score of the single neural network model isn't good. Nevertheless, when ensembling, It works good with the tree-based models.
Inspired by the discussion log-odds, I found weighted ensemble with log-odds probability is better than a normal weighted ensemble (I tuned the weights with
Optuna library based on the OOF). But, one difference is not
log10. In my experiments, It's better to optimize the weights with
log10. However, It brings little boost (4th digit difference).
I ensembled about 50 models, and there's no post-processing logic.
The final score is
|Model||CV||Public LB||Private LB|
Last day of the competition, I selected about 1600th Public LB solution (my best CV solution). Luckily,
Trust CV score wins again :) (Actually, my best CV is also my best LB, and when the cv score increases, lb score increases, so there's little difference between best CV & LB for my cases)
After the competition, I checked the correlation among the scores (CV vs Private LB, CV vs Public LB). then, I found the CV score is more correlated with Private LB than Public LB in my case.
- blending various models (gbtm + nn), even if there're huge CV gaps
- e.g. nn 0.790, lgbm 0.798
- (maybe) various datasets, models, seeds bring a robust prediction I guess
- pseudo labeling (w/ hard label)
hard labelwith a more strict threshold could be worked i guess.
- deeper NN models
- 5-layers nn is enough
- num of folds doesn't matter (5 folds are enough)
- there's no significant difference between 5 folds vs 20 folds
- rank weighted ensemble
I hope this you could help :) Thank you!