I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?


  1. Increase the number of training iterations
  2. Make a more complex network – increase hidden layer size
  3. Initialize weights to a random small value instead of zeros
  4. Change tanh activations to relu



Ans : (3) . I will initialize weights to a non zero value since changing all the weights in the same way does not let neurons learn different things and my network is no better than a linear network. The other choices are reasonable to try.  Increasing number of units in the hidden layers enables learning a more complex network. but not until I try changing my initialization, since it looks like there is enough training data and tanh is a good enough non-linear activation function in general.

Leave a Reply

Your email address will not be published. Required fields are marked *