Overparameterized deep networks have the capacity to memorize training data with zero \emph{training error}. Even after memorization, the \emph{training loss} continues to approach zero, making the...

Do We Need Zero Training Loss After Achieving Zero Training Error?