🚀 Describe the improvement or the new tutorial
It was challenging for me to initially grasp why requires_grad was done after weights, but in the same line as bias under https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nn
At first glance, the code looks inconsistent:
weights initialization is split into two lines.
bias initialization is done in one line.
The Logic Gap
The tutorial currently explains that we do it, but not exactly why the distinction exists between these two specific variables.
- The Bias is created using a factory function (
torch.zeros) with no subsequent mathematical operations. It is born as a "Leaf Node" (a source parameter).
- The Weights involve a mathematical operation (
/ math.sqrt(...)). If we set requires_grad=True inside torch.randn(), PyTorch records the division as a computational step. The resulting weights variable becomes a non-leaf node (a calculated outcome), which the optimizer cannot update.
Proposed Improvement
I propose modifying the comment block to explicitly mention that requires_grad must be deferred until after the initialization math is complete to preserve the tensor as a trainable parameter (Leaf Node).
Existing tutorials on this topic
Additional context

🚀 Describe the improvement or the new tutorial
It was challenging for me to initially grasp why
requires_gradwas done after weights, but in the same line as bias under https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nnAt first glance, the code looks inconsistent:
weightsinitialization is split into two lines.biasinitialization is done in one line.The Logic Gap
The tutorial currently explains that we do it, but not exactly why the distinction exists between these two specific variables.
torch.zeros) with no subsequent mathematical operations. It is born as a "Leaf Node" (a source parameter)./ math.sqrt(...)). If we setrequires_grad=Trueinsidetorch.randn(), PyTorch records the division as a computational step. The resultingweightsvariable becomes a non-leaf node (a calculated outcome), which the optimizer cannot update.Proposed Improvement
I propose modifying the comment block to explicitly mention that
requires_gradmust be deferred until after the initialization math is complete to preserve the tensor as a trainable parameter (Leaf Node).Existing tutorials on this topic
Additional context