← 返回大厅
arXiv (CS.CL) 2026-06-19 12:00 DOI: arXiv:2606.20295

Token-Operations-Oriented Inference Optimization Techniques for Large Models

摘要 / Abstract

Large model inference optimization serves as a key foundation for supporting the scalable, low-cost, and highly stable operation of large model services. Centered on token-oriented inference optimization technology, this paper proposes for the first time a four-layer technical architecture consisting of Multi-model Fusion, Model Optimization, Compute-Model Fusion, and Compute-Network-Model Fusion. It systematically reviews the key technologies and current industry status across these four levels and analyzes the application value of related technologies in real-world business scenarios. This paper provides a practical technical path for reducing token production costs, improving token service efficiency, ensuring the stability of token supply, and driving the transition of large model services from being merely callable to being operable.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。