added description and example (#22548)

I added a little bit more description and a example of how to use the library.
2018-11-24 11:21:37 +01:00 · 2018-11-24 11:21:37 +01:00 · 5a4260a2e1
parent e68084401d
commit 5a4260a2e1
1 changed files with 43 additions and 1 deletions
--- a/guide/english/data-science-tools/scikit/index.md
+++ b/guide/english/data-science-tools/scikit/index.md
@ -5,11 +5,53 @@ title: scikit-learn

 Scikit-learn is a popular open-source machine learning library for Python, built off of previous packages like numpy and scipy. There is code available to handle everything from importing data and data cleaning to model preparation and testing. 

+The library contains algorithms for both supervised and unsupervised learning, which can be easily integrated into your own code. Furthermore, many other helpful tools are offered which support model selection (e.g. Cross Validation or Grid Search) or data preparation (e.g. normalization, creation of categorical features).
+
+It is completely open source: [scikitlearn on github](https://github.com/scikit-learn/scikit-learn)
+
+
 ## Installation
+
 To install scikit-learn in a conda environment: `conda install scikit-learn` <br>
 To install scikit-learn with pip: `pip install scikit-learn`

+
+## Utilization - Example
+
+Suppose a K-mean clustering is to be performed. The data has already been divided into a training and a test data set. Training data is in the variable *train* and test data in the variable *test*.
+
+First the respective algorithm must be imported:
+```python
+from sklearn.cluster import KMeans
+```
+Then an instance of the estimator (here the algorithm) must be created:
+```python
+km = KMeans()
+```
+Within the brackets it would of course be possible to specify the parameters of the algorithm more precisely and, for example, to adjust the number of clusters to be considered or to specify the maximum number of iterations. If these parameters are not specified, the default will be used.
+
+After that the Estimator must be trained:
+```python
+km.fit(train)
+```
+The course data is used for the course. In the documentation of scikitlearn (e.g. [KMeans](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)) the methods of all algorithms can be read easily. For example, it would also be possible to directly query the forecast for the training data with:
+
+```python
+km.fit_predict(train).
+```
+
+The trained algorithm can now be used to calculate the test data labels.
+
+
+```python
+km.predict(test).
+```
+
+In case of supervised learning the quality of the results can be determined. scikit-learn offers methods like [r2_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html) or [accuracy_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html).
+
+
 ## References

 Scikit-learn main page: http://scikit-learn.org/stable/
-Tutorials: http://scikit-learn.org/stable/tutorial/index.html
+Tutorials: http://scikit-learn.org/stable/tutorial/index.html
+