MSE Vs Entropy (2)
This is the mean square error representation for a binary classification neural network, in which, our model outputs only one value; as a result, the mean square error graph has only one value which is globally optimal. While the mean square error representation for a categorical neural network , in which, our model outputs more than one value; the mean square error graph has more than one value which is locally sufficient solutions. As a result, if we use the mean square error, the problem will have no solution, and for that reason, we use the cross-entropy loss instead. Finally, here is the cross-entropy graph: