Your machine learning team is building and planning to operationalize a machine learning model that uses deep learning to recognize and classify images of potential security threats to important government officials using stills taken from live security video.
However, when your team runs their mini-batch training of the neural network, the training accuracy oscillates over your training epochs. What is the most probable cause of your training accuracy problem?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: D.
Option A is incorrect.
In deep learning training, choosing the correct number of epochs is important.
The validation error is used to determine how many epochs to run through.
When the learning rate stops decreasing, you should stop running training epochs.
The point when your validation error stops decreasing has no correlation to the oscillation of the accuracy of your training epochs.
Option B is incorrect.
A small mini-batch is used to prevent your training process from stopping at local minima.
Having a small mini-batch size won't cause oscillation in your training epoch accuracy.
Option C is incorrect.
A large mini-batch size is used to allow for highly computational demanding matrix multiplication in your training calculations.
Having a large mini-batch size won't cause oscillation in your training epoch accuracy.
Option D is CORRECT.
A very high learning rate tends to cause oscillation in your training accuracy.
A high learning rate causes your weight updates to be too large, and you will overestimate your goal and oscillate around the true goal.
Reference:
Please see the Amazon SageMaker developer guide titled DeepAR Forecasting Algorithm.
Please refer to the Machine Learning Mastery article titled How to Configure the Learning Rate When Training Deep Learning Neural Networks.
Please review the article titled Hyperparameters in Machine /Deep Learning.
Please refer to the Towards Data Science article titled Hyper-parameter Tuning Techniques in Deep Learning.
The most probable cause of the training accuracy problem is option B - the mini-batch size is too small.
Mini-batch training involves splitting the training data into small batches of samples, which are used to update the weights of the neural network in each epoch. The mini-batch size is a hyperparameter that specifies the number of samples used in each batch. If the mini-batch size is too small, the training accuracy can oscillate over epochs, which is referred to as "noise" in the training process.
When the mini-batch size is too small, the weights of the neural network are updated frequently, resulting in a high variance in the weight updates. This can cause the training accuracy to fluctuate over epochs, as the network is learning from a noisy signal. Additionally, when the mini-batch size is too small, the neural network may not be able to generalize well to the unseen data, resulting in overfitting.
Therefore, increasing the mini-batch size can help in reducing the oscillation of the training accuracy over epochs and lead to a better generalization of the model. However, increasing the mini-batch size too much can result in slower convergence and can also lead to a memory bottleneck in the GPU.
The other options are less likely to cause the training accuracy problem. Option A - the validation error has stopped decreasing - could cause the model to overfit, but it is not the most probable cause of the oscillating training accuracy. Option C - the mini-batch size is too large - can slow down the training process but is less likely to cause the oscillation of the training accuracy. Option D - the learning rate is very high - can cause the model to diverge, but it is not the most probable cause of the oscillating training accuracy.