diff --git a/guide/english/machine-learning/deep-learning/gradient-descent/index.md b/guide/english/machine-learning/deep-learning/gradient-descent/index.md
index 862822e5631..c2dff760ad8 100644
--- a/guide/english/machine-learning/deep-learning/gradient-descent/index.md
+++ b/guide/english/machine-learning/deep-learning/gradient-descent/index.md
@@ -20,10 +20,10 @@ This is where feature scaling, also called normalization, comes in handy, to mak
 
 Machine learning problems usually requires computations over a sample size in the millions, and that could be very computationally intensive. 
 
-In stochastic gradient descent you update the the parameter for the cost gradient of each example rather that the sum of the cost gradient of all the examples. You could arrive at a set of good parameters faster after only a few passes through the training examples, thus the learning is faster as well. 
+In stochastic gradient descent you update the parameter for the cost gradient of each example rather that the sum of the cost gradient of all the examples. You could arrive at a set of good parameters faster after only a few passes through the training examples, thus the learning is faster as well. 
 
 ### Further Reading
 
 * [A guide to Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)
 * [Gradient Descent For Machine Learning](https://machinelearningmastery.com/gradient-descent-for-machine-learning/) 
-* [Difference between Batch Gradient Descent and Stochastic Gradient Descent](https://towardsdatascience.com/difference-between-batch-gradient-descent-and-stochastic-gradient-descent-1187f1291aa1)
\ No newline at end of file
+* [Difference between Batch Gradient Descent and Stochastic Gradient Descent](https://towardsdatascience.com/difference-between-batch-gradient-descent-and-stochastic-gradient-descent-1187f1291aa1)