I have used a 4 layered fully connected network to learn a complex classifier boundary. I have used tanh activations throughout except the last layer where I used sigmoid activation for binary classification. I train for 10K iterations with 100K examples (my data points are 3 dimensional and I initialized my weights to 0 to begin with). I see that my network is unable to fit the training data and is leading to a high training error. What is the first thing I try ?

Posted on February 14, 2019February 14, 2019 by MLNerds

Increase the number of training iterations
Make a more complex network – increase hidden layer size
Initialize weights to a random small value instead of zeros
Change tanh activations to relu

Ans : (3) . I will initialize weights to a non zero value since changing all the weights in the same way does not let neurons learn different things and my network is no better than a linear network. The other choices are reasonable to try. Increasing number of units in the hidden layers enables learning a more complex network. but not until I try changing my initialization, since it looks like there is enough training data and tanh is a good enough non-linear activation function in general.

Leave a Reply Cancel reply