Lottery Prediction - Part 2/3 (Training)

Training and Prediction modeling

When starting this project, I was a little concerned, because predicting the future from the sequence of independent draws does not sound plausible. However, reminding of the numerous companies that sell lottery predictions in Korea, I thought it is doable to put aside the independency between the draws.
In other words, I decided to assume that the prior draws influence the next draw,
which made this project much simpler.


Here are the things I set up for training,


Dataset


   - Obtained data from web-scrapping by using beautifulsoup4 package.
   - Designed X and y to be the sequence of 5 last draws and the next draw, respectively.
   - Splitted the data to have the recent 15 draws and the rest for Test and Train, respectively. (Tried training with validation set, but the validation loss always exploded.)
   - Data is one-hot encoded during train and test


Training


   - Implemented a LSTM structure with some modification.
   - Varied hyperparameters to optimize (e.g. hidden layer, the number of layers, etc)
   - Trained in 160 combinations of parameter sets, resulting in 160 models.
   - Selected the best model that has the highest winning probability.


Best model selection


   - My first deployment was designed to provide one prediction, But I found that relying on only one array was not fun and barely informative, since it just becomes a 'Hit or Not' problem, right? I wanted to see how well the model does its job with a quantized metric, and the probability will make users mentally count on and anticipate (for several days at least).
   - So I decided to leave the randomized dropout enabled during prediction to give some randomness on prediction results. This is called Monte Carlo Dropout.
   - Then I did inference 100 times with the same input. Then I could see each inference provides different results due to the dropout, and I matched the results to the label data in the test dataset.
   - By using the 100 results on each lottery draw history, I could evaluate the winning probability of a model. I counted the number of matched digits on each prediction, and checked what model has the highest expectation of winning.


Operation


   - I automated the whole process using shell script with python codes
   - The workflow is visualized in the previous post
   1. After each draw of the lottery, the winning numbers are crawled and uploaded to DB.
   2. Dataset is built as described above
   3. Then LSTM is trained with various hyperparameter sets.
   4. Winning probability is calculated per model.
   5. The best model with the highest winning probability is deployed. The train dataset is saved to respond the requests.
   8. When there is a prediction request from the button, the deployed model takes input from the last several draws in the data then runs prediction. Then the result is sent back and displayed in the webapplication.