chore: migrate instructions to learn (#45568)

* chore: migrate instructions to learn

* chore: apply sem's review suggestions

Co-authored-by: Sem Bauke <46919888+Sembauke@users.noreply.github.com>

* chore: apply tom's review suggestion

Co-authored-by: Tom <20648924+moT01@users.noreply.github.com>

Co-authored-by: Sem Bauke <46919888+Sembauke@users.noreply.github.com>
Co-authored-by: Tom <20648924+moT01@users.noreply.github.com>
pull/45677/head
Naomi Carrigan 2022-04-12 07:20:56 -07:00 committed by GitHub
parent 905018ad35
commit f88b272f79
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 204 additions and 21 deletions

View File

@ -8,16 +8,49 @@ dashedName: book-recommendation-engine-using-knn
# --description--
In this challenge, you will create a book recommendation algorithm using K-Nearest Neighbors.
You will use the Book-Crossings dataset. This dataset contains 1.1 million ratings (scale of 1-10) of 270,000 books by 90,000 users.
You can access [the full project instructions and starter code on Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-book-recommendation-engine/blob/master/fcc_book_recommendation_knn.ipynb).
You will be [working on this project with Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-book-recommendation-engine/blob/master/fcc_book_recommendation_knn.ipynb).
After going to that link, create a copy of the notebook either in your own account or locally. Once you complete the project and it passes the test (included at that link), submit your project link below. If you are submitting a Google Colaboratory link, make sure to turn on link sharing for "anyone with the link."
We are still developing the interactive instructional content for the machine learning curriculum. For now, you can go through the video challenges in this certification. You may also have to seek out additional learning resources, similar to what you would do when working on a real-world project.
# --instructions--
In this challenge, you will create a book recommendation algorithm using **K-Nearest Neighbors**.
You will use the [Book-Crossings dataset](http://www2.informatik.uni-freiburg.de/~cziegler/BX/). This dataset contains 1.1 million ratings (scale of 1-10) of 270,000 books by 90,000 users.
After importing and cleaning the data, use `NearestNeighbors` from `sklearn.neighbors` to develop a model that shows books that are similar to a given book. The Nearest Neighbors algorithm measures the distance to determine the “closeness” of instances.
Create a function named `get_recommends` that takes a book title (from the dataset) as an argument and returns a list of 5 similar books with their distances from the book argument.
This code:
```py
get_recommends("The Queen of the Damned (Vampire Chronicles (Paperback))")
```
should return:
```py
[
'The Queen of the Damned (Vampire Chronicles (Paperback))',
[
['Catch 22', 0.793983519077301],
['The Witching Hour (Lives of the Mayfair Witches)', 0.7448656558990479],
['Interview with the Vampire', 0.7345068454742432],
['The Tale of the Body Thief (Vampire Chronicles (Paperback))', 0.5376338362693787],
['The Vampire Lestat (Vampire Chronicles, Book II)', 0.5178412199020386]
]
]
```
Notice that the data returned from `get_recommends()` is a list. The first element in the list is the book title passed into the function. The second element in the list is a list of five more lists. Each of the five lists contains a recommended book and the distance from the recommended book to the book passed into the function.
If you graph the dataset (optional), you will notice that most books are not rated frequently. To ensure statistical significance, remove from the dataset users with less than 200 ratings and books with less than 100 ratings.
The first three cells import libraries you may need and the data to use. The final cell is for testing. Write all your code in between those cells.
# --hints--
It should pass all Python tests.

View File

@ -8,14 +8,98 @@ dashedName: cat-and-dog-image-classifier
# --description--
For this challenge, you will use TensorFlow 2.0 and Keras to create a convolutional neural network that correctly classifies images of cats and dogs with at least 63% accuracy.
You can access [the full project instructions and starter code on Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-cat-and-dog-image-classifier/blob/master/fcc_cat_dog.ipynb).
You will be [working on this project with Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-cat-and-dog-image-classifier/blob/master/fcc_cat_dog.ipynb).
After going to that link, create a copy of the notebook either in your own account or locally. Once you complete the project and it passes the test (included at that link), submit your project link below. If you are submitting a Google Colaboratory link, make sure to turn on link sharing for "anyone with the link."
We are still developing the interactive instructional content for the machine learning curriculum. For now, you can go through the video challenges in this certification. You may also have to seek out additional learning resources, similar to what you would do when working on a real-world project.
# --instructions--
For this challenge, you will complete the code to classify images of dogs and cats. You will use Tensorflow 2.0 and Keras to create a convolutional neural network that correctly classifies images of cats and dogs at least 63% of the time. (Extra credit if you get it to 70% accuracy!)
Some of the code is given to you but some code you must fill in to complete this challenge. Read the instruction in each text cell so you will know what you have to do in each code cell.
The first code cell imports the required libraries. The second code cell downloads the data and sets key variables. The third cell is the first place you will write your own code.
The structure of the dataset files that are downloaded looks like this (You will notice that the test directory has no subdirectories and the images are not labeled):
```py
cats_and_dogs
|__ train:
|______ cats: [cat.0.jpg, cat.1.jpg ...]
|______ dogs: [dog.0.jpg, dog.1.jpg ...]
|__ validation:
|______ cats: [cat.2000.jpg, cat.2001.jpg ...]
|______ dogs: [dog.2000.jpg, dog.2001.jpg ...]
|__ test: [1.jpg, 2.jpg ...]
```
You can tweak epochs and batch size if you like, but it is not required.
The following instructions correspond to specific cell numbers, indicated with a comment at the top of the cell (such as `# 3`).
## Cell 3
Now it is your turn! Set each of the variables in this cell correctly. (They should no longer equal `None`.)
Create image generators for each of the three image data sets (train, validation, test). Use `ImageDataGenerator` to read / decode the images and convert them into floating point tensors. Use the `rescale` argument (and no other arguments for now) to rescale the tensors from values between 0 and 255 to values between 0 and 1.
For the `*_data_gen` variables, use the `flow_from_directory` method. Pass in the batch size, directory, target size (`(IMG_HEIGHT, IMG_WIDTH)`), class mode, and anything else required. `test_data_gen` will be the trickiest one. For `test_data_gen`, make sure to pass in `shuffle=False` to the `flow_from_directory` method. This will make sure the final predictions stay is in the order that our test expects. For `test_data_gen` it will also be helpful to observe the directory structure.
After you run the code, the output should look like this:
```py
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Found 50 images belonging to 1 class.
```
## Cell 4
The `plotImages` function will be used a few times to plot images. It takes an array of images and a probabilities list, although the probabilities list is optional. This code is given to you. If you created the `train_data_gen` variable correctly, then running this cell will plot five random training images.
## Cell 5
Recreate the `train_image_generator` using `ImageDataGenerator`.
Since there are a small number of training examples there is a risk of overfitting. One way to fix this problem is by creating more training data from existing training examples by using random transformations.
Add 4-6 random transformations as arguments to `ImageDataGenerator`. Make sure to rescale the same as before.
## Cell 6
You don't have to do anything for this cell. `train_data_gen` is created just like before but with the new `train_image_generator`. Then, a single image is plotted five different times using different variations.
## Cell 7
In this cell, create a model for the neural network that outputs class probabilities. It should use the Keras Sequential model. It will probably involve a stack of Conv2D and MaxPooling2D layers and then a fully connected layer on top that is activated by a ReLU activation function.
Compile the model passing the arguments to set the optimizer and loss. Also pass in `metrics=['accuracy']` to view training and validation accuracy for each training epoch.
## Cell 8
Use the `fit` method on your `model` to train the network. Make sure to pass in arguments for `x`, `steps_per_epoch`, `epochs`, `validation_data`, and `validation_steps`.
## Cell 9
Run this cell to visualize the accuracy and loss of the model.
## Cell 10
Now it is time to use your model to predict whether a brand new image is a cat or a dog.
In this cell, get the probability that each test image (from `test_data_gen`) is a dog or a cat. `probabilities` should be a list of integers.
Call the `plotImages` function and pass in the test images and the probabilities corresponding to each test image.
After your run the cell, you should see all 50 test images with a label showing the percentage of "sure" that the image is a cat or a dog. The accuracy will correspond to the accuracy shown in the graph above (after running the previous cell). More training images could lead to a higher accuracy.
## Cell 11
Run this final cell to see if you passed the challenge or if you need to keep trying.
# --hints--
It should pass all Python tests.

View File

@ -8,16 +8,30 @@ dashedName: linear-regression-health-costs-calculator
# --description--
In this challenge, you will predict healthcare costs using a regression algorithm.
You are given a dataset that contains information about different people including their healthcare costs. Use the data to predict healthcare costs based on new data.
You can access [the full project instructions and starter code on Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-linear-regression-health-costs-calculator/blob/master/fcc_predict_health_costs_with_regression.ipynb).
You will be [working on this project with Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-linear-regression-health-costs-calculator/blob/master/fcc_predict_health_costs_with_regression.ipynb).
After going to that link, create a copy of the notebook either in your own account or locally. Once you complete the project and it passes the test (included at that link), submit your project link below. If you are submitting a Google Colaboratory link, make sure to turn on link sharing for "anyone with the link."
We are still developing the interactive instructional content for the machine learning curriculum. For now, you can go through the video challenges in this certification. You may also have to seek out additional learning resources, similar to what you would do when working on a real-world project.
# --instructions--
In this challenge, you will predict healthcare costs using a regression algorithm.
You are given a dataset that contains information about different people including their healthcare costs. Use the data to predict healthcare costs based on new data.
The first two cells of this notebook import libraries and the data.
Make sure to convert categorical data to numbers. Use 80% of the data as the `train_dataset` and 20% of the data as the `test_dataset`.
`pop` off the "expenses" column from these datasets to create new datasets called `train_labels` and `test_labels`. Use these labels when training your model.
Create a model and train it with the `train_dataset`. Run the final cell in this notebook to check your model. The final cell will use the unseen `test_dataset` to check how well the model generalizes.
To pass the challenge, `model.evaluate` must return a Mean Absolute Error of under 3500. This means it predicts health care costs correctly within $3500.
The final cell will also predict expenses using the `test_dataset` and graph the results.
# --hints--
It should pass all Python tests.

View File

@ -8,14 +8,22 @@ dashedName: neural-network-sms-text-classifier
# --description--
In this challenge, you need to create a machine learning model that will classify SMS messages as either "ham" or "spam". A "ham" message is a normal message sent by a friend. A "spam" message is an advertisement or a message sent by a company.
You can access [the full project instructions and starter code on Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-neural-network-sms-text-classifier/blob/master/fcc_sms_text_classification.ipynb).
You will be [working on this project with Google Colaboratory](https://colab.research.google.com/github/freeCodeCamp/boilerplate-neural-network-sms-text-classifier/blob/master/fcc_sms_text_classification.ipynb).
After going to that link, create a copy of the notebook either in your own account or locally. Once you complete the project and it passes the test (included at that link), submit your project link below. If you are submitting a Google Colaboratory link, make sure to turn on link sharing for "anyone with the link."
We are still developing the interactive instructional content for the machine learning curriculum. For now, you can go through the video challenges in this certification. You may also have to seek out additional learning resources, similar to what you would do when working on a real-world project.
# --instructions--
In this challenge, you need to create a machine learning model that will classify SMS messages as either "ham" or "spam". A "ham" message is a normal message sent by a friend. A "spam" message is an advertisement or a message sent by a company.
You should create a function called `predict_message` that takes a message string as an argument and returns a list. The first element in the list should be a number between zero and one that indicates the likeliness of "ham" (0) or "spam" (1). The second element in the list should be the word "ham" or "spam", depending on which is most likely.
For this challenge, you will use the [SMS Spam Collection](http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/) dataset. The dataset has already been grouped into train data and test data.
The first two cells import the libraries and data. The final cell tests your model and function. Add your code in between these cells.
# --hints--
It should pass all Python tests.

View File

@ -8,14 +8,58 @@ dashedName: rock-paper-scissors
# --description--
For this challenge, you will create a program to play Rock, Paper, Scissors. A program that picks at random will usually win 50% of the time. To pass this challenge your program must play matches against four different bots, winning at least 60% of the games in each match.
You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-rock-paper-scissors).
After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below.
You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-rock-paper-scissors).
We are still developing the interactive instructional part of the machine learning curriculum. For now, you will have to use other resources to learn how to pass this challenge.
# --instructions--
Create a function named `calculate()` in `mean_var_std.py` that uses Numpy to output the mean, variance, standard deviation, max, min, and sum of the rows, columns, and elements in a 3 x 3 matrix.
The input of the function should be a list containing 9 digits. The function should convert the list into a 3 x 3 Numpy array, and then return a dictionary containing the mean, variance, standard deviation, max, min, and sum along both axes and for the flattened matrix.
The returned dictionary should follow this format:
```py
{
'mean': [axis1, axis2, flattened],
'variance': [axis1, axis2, flattened],
'standard deviation': [axis1, axis2, flattened],
'max': [axis1, axis2, flattened],
'min': [axis1, axis2, flattened],
'sum': [axis1, axis2, flattened]
}
```
If a list containing less than 9 elements is passed into the function, it should raise a `ValueError` exception with the message: "List must contain nine numbers." The values in the returned dictionary should be lists and not Numpy arrays.
For example, `calculate([0,1,2,3,4,5,6,7,8])` should return:
```py
{
'mean': [[3.0, 4.0, 5.0], [1.0, 4.0, 7.0], 4.0],
'variance': [[6.0, 6.0, 6.0], [0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 6.666666666666667],
'standard deviation': [[2.449489742783178, 2.449489742783178, 2.449489742783178], [0.816496580927726, 0.816496580927726, 0.816496580927726], 2.581988897471611],
'max': [[6, 7, 8], [2, 5, 8], 8],
'min': [[0, 1, 2], [0, 3, 6], 0],
'sum': [[9, 12, 15], [3, 12, 21], 36]
}
```
The unit tests for this project are in `test_module.py`.
## Development
For development, you can use `main.py` to test your `calculate()` function. Click the "run" button and `main.py` will run.
## Testing
We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button.
## Submitting
Copy your project's URL and submit it below.
# --hints--
It should pass all Python tests.