close
The Wayback Machine - https://web.archive.org/web/20201008100833/https://github.com/sharmaroshan/Don-t-Overfit
Skip to content

It is from Kaggle Competitions where the training dataset is very small and the testing dataset is very large and we have to avoid or reduce overfiting by looking for best possible ways to overcome the most popular problem faced in field of predictive analytics.

master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Don-t-Overfit

It is from Kaggle Competitions where the training dataset is very small and the testing dataset is very large and we have to avoid or reduce overfiting by looking for best possible ways to overcome the most popular problem faced in field of predictive analytics.

Long ago, in the distant, fragrant mists of time, there was a competition... It was not just any competition.

It was a competition that challenged mere mortals to model a 20,000x200 matrix of continuous variables using only 250 training samples... without overfitting.

Data scientists ― including Kaggle's very own Will Cukierski ― competed by the hundreds. Legends were made. (Will took 5th place, and eventually ended up working at Kaggle!) People overfit like crazy. It was a Kaggle-y, data science-y madhouse.

So... we're doing it again.

Don't Overfit II: The Overfittening

This is the next logical step in the evolution of weird competitions. Once again we have 20,000 rows of continuous variables, and a mere handful of training samples. Once again, we challenge you not to overfit. Do your best, model without overfitting, and add, perhaps, to your own legend.

In addition to bragging rights, the winner also gets swag. Enjoy!

Acknowledgments

I hereby salute the hard work that went into the original competition, created by Phil Brierly. Thank you!

About

It is from Kaggle Competitions where the training dataset is very small and the testing dataset is very large and we have to avoid or reduce overfiting by looking for best possible ways to overcome the most popular problem faced in field of predictive analytics.

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.