29 lines
1.5 KiB
Markdown
29 lines
1.5 KiB
Markdown
|
---
|
||
|
title: Its Generalization That Counts
|
||
|
---
|
||
|
## Its Generalization That Counts
|
||
|
|
||
|
The power of machine learning comes from not having to hard code or explicitly
|
||
|
define the parameters that describe your training data and unseen data. This is
|
||
|
the essential goal of machine learning: to generalize a learner's findings.
|
||
|
|
||
|
To test a learner's generalizability, you'll want to have a separate test data
|
||
|
set that is not used in any way in training the learner. This can be created by
|
||
|
either splitting your entire training data set into a training and test set, or
|
||
|
to just collect more data. If the learner were to use data found in the test
|
||
|
data set, this would create a sort of bias in your learner to do better than in
|
||
|
reality.
|
||
|
|
||
|
One method to get a sense on how your learner will do on a test data set is to
|
||
|
perform what is called **cross-validation**. This randomly splits up your
|
||
|
training data into a given number of subsets (for example, ten subsets) and
|
||
|
leaves one subset out while the learner trains on the rest. And then once the
|
||
|
learner has been trained, the left out data set is used for testing. This
|
||
|
training, leaving one subset out, and testing is repeated as you rotate through
|
||
|
the subsets.
|
||
|
|
||
|
#### More Information:
|
||
|
|
||
|
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>
|
||
|
- <a href='https://stats.stackexchange.com/a/153058/132399' target='_blank' rel='nofollow'>"How do you use test data set after Cross-validation"</a>
|