← 返回大厅
arXiv (CS.AI) 2026-06-17 12:00 DOI: arXiv:2606.17816

Conservation Laws for Modern Neural Architectures

摘要 / Abstract

arXiv:2606.17816v1 Announce Type: cross Abstract: Understanding gradient descent dynamics is key to explaining the success of over-parameterized models, where implicit bias manifests through conservation laws in gradient flow. While such laws are well understood for linear and ReLU networks, they remain largely unexplored for modern architectures. This work develops a unified framework to characterize conservation laws for contemporary models, including feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts architectures under diverse gating designs. Our theoretical findings are supported by experiments that validate the predicted invariants.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。