Stochastic gradient descent optimizer.
SGD performs a parameter update for training examples \(x\) and label
\(y\):
\[\theta = \theta - \eta \cdot \nabla_\theta J(\theta; x; y)\]
- Parameters:
lr (Union[float, Scheduler, Variable]) – learning rate.