34 lines
1.8 KiB
Markdown
34 lines
1.8 KiB
Markdown
---
|
|
title: Data Alone Is not Enough
|
|
---
|
|
## Data Alone Is not Enough
|
|
|
|
Without accounting for changing machine learning algorithms or other aspects of
|
|
training the model, data alone is not enough to help your learner do better.
|
|
|
|
> Every learner must embody some knowledge or assumptions beyond the data it's
|
|
> given in order to generalize beyond it (Domingos, 2012).
|
|
|
|
What this statement is essentially saying is that if you blindly choose a
|
|
learner just because you've heard it does well, collecting more data won't
|
|
necessarily help you in your machine learning goals.
|
|
|
|
For example, say you have data which depends on time (e.g. time series data)
|
|
and you want to use a binary classifier (e.g. logistic regression). Collecting
|
|
more time series data might not be the best to help your learner. This is
|
|
because a binary classifier isn't designed for time series.
|
|
|
|
This is not to say that once you've chosen the best machine learning algorithm
|
|
based on your problem that adding more data does you no good. In this case, it
|
|
will help you.
|
|
|
|
> Machine learning is not magic; it can't get something from nothing. What it
|
|
> does is get more from less...Learners combine knowledge with data to grow
|
|
> programs (Domingos, 2012).
|
|
|
|
#### More Information:
|
|
|
|
- <a href='https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf' target='_blank' rel='nofollow'>A Few Useful Things to Know about Machine Learning</a>
|
|
- <a href='http://www.kdnuggets.com/2015/06/machine-learning-more-data-better-algorithms.html' target='_blank' rel='nofollow'>In Machine Learning, What is Better: More Data or better Algorithms?</a>
|
|
- <a href='https://www.quora.com/In-machine-learning-is-more-data-always-better-than-better-algorithms/answer/Xavier-Amatriain?srid=Tds3' target='_blank' rel='nofollow'>In machine learning, is more data always better than better algorithms?</a>
|