Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence