freeCodeCamp/guide/english/machine-learning/backpropagation/index.md

---
title: Backpropagation
---
## Backpropagation


Backprogapation is a subtopic of [neural networks](../neural-networks/index.md).

**Purpose:** It is an algorithm/process with the aim of minimizing the cost function (in other words, the error) of parameters in a neural network. 

**Method:** This is done by calculating the gradients of each node in the network. These gradients measure the "error" each node contributes to the output layer, so in training a neural network, these gradients are minimized.

Backpropogation can be thought of as using the chain rule to compute gradients with respect to different parameters in a neural network in order to perform iterative updates to those parameters. 

Note: Backpropagation, and machine learning in general, requires significant familiarity with linear algebra and matrix manipulation. Coursework or reading on this topic is highly recommended before trying to understand the contents of this article. 

### Computation

The process of backpropagation can be explained in three steps.

Given the following

- m training examples (x,y) on a neural network of L layers
- g = the sigmoid function
- Theta(i) = the transition matrix from the ith to the i+1th layer
- a(l) = g(z(l)); an array of the values of the nodes in layer l based on one training example
- z(l) = Theta(l-1)a(l-1)
- Delta a set of L matricies representing transitions between the ith and i+1th layer
- d(l) = the array of the gradients for layer l for one training example
- D a set of L matricies with the final gradients for each node
- lambda the regularization term for the network

In this case, for matrix M, M' will denote the transpose of matrix M

1. Assign all entries of the Delta(i), for i from 1 to L, zero.
2. For each training example t from 1 to m, perform the following:

  - perform forward propagation on each example to compute a(l) and z(l) for each layer
  - compute d(L) = a(L) - y(t)
  - compute d(l) = (Theta(l)' • d(l+1)) • g(z(l)) for l from L-1 to 1
  - increment Delta(l) by delta(l+1) • a(l)'

3. Plug the Delta matricies into our partial derivative matricies
   D(l) = 1\m(Delta(l) + lambda • Theta(l)); if l≠0
   D(l) = 1\m • Delta(l); if l=0

This article should only be understood in the greater contexts of neural networks and machine learning. Please read the arrached references for a better understanding of the topic as a whole.

### More Information

**High-Level:**
* Siraj Raval - [Backpropagation in 5 Minutes](https://www.youtube.com/watch?v=q555kfIFUCM)
* [Backprop on Wikipedia](https://en.wikipedia.org/wiki/Backpropagation)

**In-depth:**
* Lecture 4 CS231n [Introduction to Neural Networks](https://youtu.be/d14TUNcbn1k?t=354)
* [In depth, wiki style article](https://brilliant.org/wiki/backpropagation/)
* [Article on computation graphs](http://colah.github.io/posts/2015-08-Backprop/)
* [A Step by Step Backpropagation Example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
* [Andrew Ng's ML Course](https://www.coursera.org/learn/machine-learning/)

If you'd like to learn how to implement a full-blown single (hidden) layer neural network in Python, whilst learning more about the math behind the algorithms used, you can register for [Andrew Ng's Deep Learning Specialization] (https://www.coursera.org/specializations/deep-learning)
feat(guide): Import guide in to the client app 2018-10-04 13:47:55 +00:00			`---`
			`title: Backpropagation`
			`---`
			`## Backpropagation`


small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`Backprogapation is a subtopic of [neural networks](../neural-networks/index.md).`

			`Purpose: It is an algorithm/process with the aim of minimizing the cost function (in other words, the error) of parameters in a neural network.`

			`Method: This is done by calculating the gradients of each node in the network. These gradients measure the "error" each node contributes to the output layer, so in training a neural network, these gradients are minimized.`

Add backpropogation and chain rule relation. (#18667) 2018-10-13 21:02:53 +00:00			`Backpropogation can be thought of as using the chain rule to compute gradients with respect to different parameters in a neural network in order to perform iterative updates to those parameters.`

small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`Note: Backpropagation, and machine learning in general, requires significant familiarity with linear algebra and matrix manipulation. Coursework or reading on this topic is highly recommended before trying to understand the contents of this article.`
feat(guide): Import guide in to the client app 2018-10-04 13:47:55 +00:00
			`### Computation`

			`The process of backpropagation can be explained in three steps.`

			`Given the following`

			`- m training examples (x,y) on a neural network of L layers`
			`- g = the sigmoid function`
			`- Theta(i) = the transition matrix from the ith to the i+1th layer`
			`- a(l) = g(z(l)); an array of the values of the nodes in layer l based on one training example`
			`- z(l) = Theta(l-1)a(l-1)`
			`- Delta a set of L matricies representing transitions between the ith and i+1th layer`
			`- d(l) = the array of the gradients for layer l for one training example`
			`- D a set of L matricies with the final gradients for each node`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`- lambda the regularization term for the network`
feat(guide): Import guide in to the client app 2018-10-04 13:47:55 +00:00
			`In this case, for matrix M, M' will denote the transpose of matrix M`

			`1. Assign all entries of the Delta(i), for i from 1 to L, zero.`
			`2. For each training example t from 1 to m, perform the following:`

			`- perform forward propagation on each example to compute a(l) and z(l) for each layer`
			`- compute d(L) = a(L) - y(t)`
			`- compute d(l) = (Theta(l)' • d(l+1)) • g(z(l)) for l from L-1 to 1`
			`- increment Delta(l) by delta(l+1) • a(l)'`

			`3. Plug the Delta matricies into our partial derivative matricies`
			`D(l) = 1\m(Delta(l) + lambda • Theta(l)); if l≠0`
			`D(l) = 1\m • Delta(l); if l=0`

Changed grammar and formatting (#18730) 2018-10-15 19:46:55 +00:00			`This article should only be understood in the greater contexts of neural networks and machine learning. Please read the arrached references for a better understanding of the topic as a whole.`
feat(guide): Import guide in to the client app 2018-10-04 13:47:55 +00:00
Changed grammar and formatting (#18730) 2018-10-15 19:46:55 +00:00			`### More Information`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00
Changed grammar and formatting (#18730) 2018-10-15 19:46:55 +00:00			`High-Level:`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`* Siraj Raval - [Backpropagation in 5 Minutes](https://www.youtube.com/watch?v=q555kfIFUCM)`
fix(guide): modified links to form list (#18373) 2018-10-12 07:35:32 +00:00			`* [Backprop on Wikipedia](https://en.wikipedia.org/wiki/Backpropagation)`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00
Changed grammar and formatting (#18730) 2018-10-15 19:46:55 +00:00			`In-depth:`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`* Lecture 4 CS231n [Introduction to Neural Networks](https://youtu.be/d14TUNcbn1k?t=354)`
			`* [In depth, wiki style article](https://brilliant.org/wiki/backpropagation/)`
Add link to article by Chris Olah. (#18682) 2018-10-13 20:45:58 +00:00			`* [Article on computation graphs](http://colah.github.io/posts/2015-08-Backprop/)`
fix(guide): modified links to form list (#18373) 2018-10-12 07:35:32 +00:00			`* [A Step by Step Backpropagation Example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)`
small grammatical corrections, some formatting (#18345) 2018-10-13 01:56:54 +00:00			`* [Andrew Ng's ML Course](https://www.coursera.org/learn/machine-learning/)`

Changed grammar and formatting (#18730) 2018-10-15 19:46:55 +00:00			`If you'd like to learn how to implement a full-blown single (hidden) layer neural network in Python, whilst learning more about the math behind the algorithms used, you can register for [Andrew Ng's Deep Learning Specialization] (https://www.coursera.org/specializations/deep-learning)`