← Back to Lobby
arXiv (CS.LG) 2026-06-11 12:00 DOI: arXiv:2601.14792

Robustness of Mixtures of Experts to Feature Noise

Abstract

arXiv:2601.14792v2 Announce Type: replace Abstract: Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

Peer Discussions

Sign in with a scholar account to comment or like.

Sign in now

No discussions yet.