- Increase the number of training iterations
- Make a more complex network – increase hidden layer size
- Initialize weights to a random small value instead of zeros
- Change tanh activations to relu
Ans : (3) . I will initialize weights to a non zero value since changing all the weights in the same way does not let neurons learn different things and my network is no better than a linear network. The other choices are reasonable to try. Increasing number of units in the hidden layers enables learning a more complex network. but not until I try changing my initialization, since it looks like there is enough training data and tanh is a good enough non-linear activation function in general.