0%

吴恩达神经网络

What you need to remember:

np.exp(x) works for any np.array x and applies the exponential function
to every coordinate
• the sigmoid function and its gradient

image2vector is commonly used in deep learning
np.reshape is widely used. In the future, you’ll see that keeping your
matrix/vector dimensions straight will go toward eliminating a lot of
bugs.
• numpy has efficient built-in functions
broadcasting is extremely useful

What to remember:

Vectorization is very important in deep learning. It provides computa-
tional efficiency and clarity.
• You have reviewed the L1 and L2 loss.
• You are familiar with many numpy functions such as np.sum, np.dot,
np.multiply, np.maximum, etc…

What you need to remember:

Common steps for pre-processing a new dataset are:
• Figure out the dimensions and shapes of the problem (m_train, m_test, num_px,
…)
• Reshape the datasets such that each example is now a vector of size (num_px *
num_px * 3, 1)
Standardize the data

What to remember: You’ve implemented several functions that:

Initialize (w,b)
Optimize the loss iteratively to learn parameters (w,b):
– computing the cost and its gradient
updating the parameters using gradient descent
• Use the learned (w,b) to predict the labels for a given set of examples

What to remember from this assignment:

  1. Preprocessing the dataset is important.
  2. You implemented each function separately: initialize(), propagate(), optimize().
    Then you built a model().
  3. Tuning the learning rate (which is an example of a ”hyperparameter”) can make a
    big difference to the algorithm

What you should remember:

• The weights W[l] should be initialized randomly to break symmetry.
• It is however okay to initialize the biases b[l] to zeros. Symmetry is still
broken so long as W[l] is initialized randomly.

• Initializing weights to very large random values does not work well.

• Hopefully intializing with small random values does better.

What you should remember from this assignment:

• Different initializations lead to different results
• Random initialization is used to break symmetry and make sure different
hidden units can learn different things
• Don’t intialize to values that are too large
• He initialization works well for networks with ReLU activations.

Gradient Descent => Stochastic Gradient Descent (SGD) =>mini-batch Gradient Descent

SGD is equivalent to mini-batch gradient descent where each mini-batch has just 1 example.

What you should remember:

• The difference between gradient descent, mini-batch gradient descent and stochastic
gradient descent is the number of examples you use to perform one update step.
• You have to tune a learning rate hyperparameter α.
• With a well-turned mini-batch size, usually it outperforms either gradient descent
or stochastic gradient descent (particularly when the training set is large).

What you should remember:

• Shuffling and Partitioning are the two steps required to build mini-batches
• Powers of two are often chosen to be the mini-batch size, e.g., 16, 32, 64, 128.

------------- Thank you for reading -------------

Title - Artist
0:00