Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-22

Deep-Tissue Hemodynamic Sensing: Comparing Impedance and Photoplethysmography for Wearable Blood Pressure Estimation

The pursuit of continuous, cuffless blood pressure (BP) monitoring is constrained by the superficial sensing depth of photoplethysmography (PPG). Impedance plethysmography (IPG) offers deeper tissue penetration, but its comparative value over PPG remains unquantified at scale. In this comparative study of 261 participants (130 hypertensive, 131 non-hypertensive), we utilized a custom dual-modality wearable prototype to capture simultaneous IPG and PPG signals. Over 150,000 cardiac cycles were analyzed using an unsupervised archetype discovery pipeline to quantify beat-to-beat morphological heterogeneity. IPG resolved up to three distinct morphological modes per participant, whereas co-located PPG converged into highly conserved, uniform profiles. IPG captured specific signatures of pathological arterial remodeling and physiological habitus; ventral forearm IPG pulse amplitude exhibited a significant main effect for BP status (p = 0.024), a relationship absent in the co-located PPG signal. Furthermore, increasing body mass index (BMI) significantly attenuated the prevalence of steep-upstroke archetypes in IPG (p = 0.035), quantifying a likely damping effect of adipose tissue. Deep-tissue bioimpedance captures rich, heterogeneous hemodynamic signatures including arterial-dominant morphologies that are invisible to optical sensors. Transitioning from optical pulse wave analysis to bioimpedance-based models may offer a promising pathway for accurate wearable cardiovascular monitoring.

02.
arXiv (quant-ph) 2026-06-16

Communication Complexity of Distributed Unitary Synthesis

arXiv:2511.04250v2 Announce Type: replace Abstract: We study space-bounded communication complexity for unitary implementation in distributed quantum processors, where we restrict the number of qubits per processor to ensure practical relevance and technical non-triviality. We model distributed quantum processors using distributed quantum circuits with nonlocal two-qubit gates, defining the distributed communication complexity of a unitary as the minimum number of such nonlocal gates required for its realization, up to permutations of data qubit positions. Our contributions are twofold. First, for general $n$-qubit unitaries, we improve upon the trivial $O(4^n)$ communication bound. Considering $k$ pairwise-connected processors (each with $n/k$ data qubits and $m$ ancillas), we prove the communication complexity satisfies $O\left(\max\{4^{(1-1/k)n - m}, n\}\right)$ – for example, $O(2^n)$ when $m=0$ and $k=2$ – and establish the tightness of this upper bound. We further extend the analysis to approximation models and general network topologies. Second, for special unitaries, we show that both the Quantum Fourier Transform (QFT) and Clifford circuits admit linear upper bounds on communication complexity in the exact model, outperforming the trivial quadratic bounds applicable to these cases. In the approximation model, QFT's communication complexity reduces drastically from linear to logarithmic, while Clifford circuits retain a linear lower bound. These results offer fundamental insights for optimizing communication in distributed quantum unitary implementation, advancing the feasibility of large-scale DQC systems.

03.
arXiv (CS.CL) 2026-06-11

Benchmarking Large Language Models for Safety Data Extraction

Accurate extraction of structured information from Safety Data Sheets (SDS) remains challenging in industrial safety due to heterogeneous document formats and the limitations of traditional rule-based methods. This study benchmarks state-of-the-art Large Language Models (LLMs) for automated SDS data extraction, comparing text-based and multimodal processing pipelines. We systematically evaluate four models: Gemini 1.5 Pro, GPT-4o, Claude 3.7 Sonnet, and Llama 3.1-70B, across three prompting strategies: zero-shot, few-shot, and chain-of-thought. The evaluation framework assessed accuracy, latency, and cost across more than 50,000 extracted data fields. Results show that text-based extraction consistently outperforms multimodal processing across all metrics. Gemini 1.5 Pro combined with a Chain-of-Thought prompt achieved the highest accuracy (84%), outperforming GPT-4o (81%) and Claude 3.7 Sonnet (79%). However, no model surpassed the 90% accuracy threshold commonly required for reliable real-world deployment. These findings indicate that general-purpose LLMs are not yet robust enough for unsupervised industrial use, though performance suggests strong potential with task-specific fine-tuning. Future research should focus on domain-adapted training, model calibration, and the integration of Human-in-the-Loop verification to ensure safety-critical reliability.

04.
arXiv (CS.AI) 2026-06-16

Safe Exploration via Policy Priors

arXiv:2601.19612v3 Announce Type: replace-cross Abstract: Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.

05.
arXiv (CS.AI) 2026-06-16

LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction

arXiv:2606.15314v1 Announce Type: cross Abstract: Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.

06.
arXiv (CS.AI) 2026-06-11

Certifiable Safe RLHF: Semantic Grounding and Fixed Penalty Constraint Optimization for Safer LLM Alignment

arXiv:2510.03520v2 Announce Type: replace-cross Abstract: Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of model outputs and mitigating their potential for harm is a complex and persistent challenge. Contemporary approaches frequently formalize this problem within the framework of Constrained Markov Decision Processes (CMDPs) and employ established CMDP optimization techniques. However, these methods exhibit two notable limitations. First, their reliance on reward and cost functions renders performance highly sensitive to the underlying scoring mechanism, which must capture semantic meaning rather than being triggered by superficial keywords. Second, CMDP-based training entails tuning dual-variable, a process that is both computationally expensive and does not provide any provable safety guarantee for a fixed dual variable that can be exploitable through adversarial jailbreaks. To overcome these limitations, we introduce Certifiable Safe-RLHF (CS-RLHF) that introduces a cost model trained on a large-scale corpus to assign semantically grounded safety scores. In contrast to the lagrangian-based approach, CS-RLHF adopts a rectified penalty-based formulation. This design draws on the theory of exact penalty functions in constrained optimization, wherein constraint satisfaction is enforced directly through a suitably chosen penalty term. With an appropriately scaled penalty, feasibility of the safety constraints can be guaranteed at the optimizer, eliminating the need for dual-variable updates. Empirical evaluation demonstrates that CS-RLHF outperforms state-of-the-art LLM model responses rendering at-least 5 times efficient against nominal and jail-breaking prompts

07.
arXiv (CS.AI) 2026-06-18

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

arXiv:2605.29649v2 Announce Type: replace Abstract: Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planning researchers. Recent work has shown that large language models (LLMs) can design heuristics for individual planning domains, but no LLM-generated heuristic has so far worked on arbitrary planning tasks. In this paper, we use evolutionary search to produce the first LLM-generated domain-independent heuristics that exceed the hand-engineered state of the art. We let an LLM mutate parent heuristics written in C++, store candidates in a MAP-Elites archive keyed on informedness and speed and calculate fitness scores by blending coverage with solving time. To place the evolved programs in context, we additionally benchmark a broad set of hand-engineered heuristics on their informedness-speed tradeoff, which to our knowledge has not been done before. On unseen testing domains, our best evolved heuristic solves more tasks than even the strongest baseline, with our full heuristic suite spanning the Pareto frontier of said tradeoff. We also find that seeding evolution from the trivial blind heuristic outperforms seeding from the strong FF heuristic, even when the resulting program is itself an FF variant, and that LLM reasoning effort affects how often candidates compile much more than the quality of those that do. Because the evolved programs are plain C++, they slot into existing planners as drop-in replacements and inherit the soundness and completeness guarantees of the underlying search.

08.
arXiv (quant-ph) 2026-06-19

General circuit mapping algorithm for neutral atom quantum computers

arXiv:2606.20503v1 Announce Type: new Abstract: Neutral atom quantum computers (NAQC) are emerging as a promising, scalable quantum computing platform because of their long qubit coherence, flexible qubit arrangement, and multiqubit gate capabilities. However, circuit execution often requires physically moving qubits, making compilation a critical optimization challenge. We propose a circuit independent mathematical framework built on graph-theoretic combinatorial optimization that determines the minimal number of required qubit transfers. This model captures spatial constraints specific to NAQC platforms with zone-limited gate operations and multi-qubit gates. From this framework, we encode the qubit mapping problem as a nonlinear integer program and solve it using a genetic algorithm, enabling trade-offs between minimizing the total traveled distance and the number of parallel transfer operations. Compared to the state-of-the-art scalable compiler for zoned architectures, our approach consistently finds fewer transfers. Depending on the optimization focus, our method produces shorter traveled distances or fewer parallel transfer operations. This work provides both theoretical guaranties and a practical tool for efficient, architecture-aware quantum circuit compilation. As a result, practitioners can generate hardware-aware mappings that reduce movement-induced errors and better exploit atom transfer parallelism, directly improving execution efficiency on NAQC devices.

09.
medRxiv (Medicine) 2026-06-12

Mathematical analysis of the overall survival after chemoradiotherapy of limited-stage small cell lung cancer and the effect of dose/fractionation

The purpose of this work is to analyze the 2-year overall survival (OS2y) of limited-stage small cell lung cancer (LS-SCLC) treated with chemoradiotherapy (CRT), aiming at characterizing the response of LS-SCLC, and in particular the /{beta} value and proliferation parameters. Through a systematic analysis of the literature, we collated a dataset containing 57 entries (3363 patients) of response of LS-SCLC treated with CRT. Radiotherapy schedules ranged from hyper- to hypofractionation. Four radiobiological models to describe the OS2y were investigated, with progressive levels of complexity including the effect of radiotherapy, chemotherapy, treatment year and toxicity. The Akaike Information Criterion (AIC) was used to compare models, and the profile likelihood methodology to compute confidence intervals. Model 4, which includes the effect of radiotherapy, chemotherapy, treatment year and dose-dependent toxicity, provided the best fits of the experimental data (lowest AIC value). While being the best model, model 4 still fails to provide a good prediction of the OS2y, in particular failing to predict the survival of the schedules achieving the lower/higher survivals. The radiobiological analysis of the dose-response of LS-SCLC to CRT does not allow to narrowly constrain the value of response parameters. We attribute this limitation to the large heterogeneity of this disease. Nonetheless, our analysis shows a large /{beta} value (>9 Gy, 95% CI), which implies a low fractionation effect in the radiotherapy of LS-SCLC. and an accelerated proliferation of tumor cells, {lambda}' > 1.6 Gy/day (95% CI), after a kick-off time of ~4-5 weeks, which supports the use of accelerated protocols to avoid the effect of tumor proliferation on the clinical outcome.

10.
arXiv (CS.LG) 2026-06-11

Apertus LLM Family Expansion via Distillation and Quantization

arXiv:2605.29128v2 Announce Type: replace Abstract: The wide adoption of LLMs has led to their use in great variety of applications and scenarios, such as chatbot assistants and data annotation, creating the need for the models to satisfy certain budget and hardware constraints. This has led to the trend of LLMs being released in batches consisting of similar models of various sizes for the family of models to adhere to as wide of a range of constraints as possible. In this paper, we validate distillation and quantization as a cost-effective way to expand model families to new sizes and hardware formats. Based on the open-recipe Apertus 8B LLM, we produce Apertus-v1.1 - a distilled family of models with up to 4B parameters trained on 1.7T permissive license tokens. We demonstrate cost-efficiency and strong accuracy performance of our approach for covering large ranges of hardware and systems requirements.

11.
arXiv (CS.AI) 2026-06-19

Modularity-Free Conflict-Averse Training for Generalized PINNs

arXiv:2606.20156v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) have become a powerful framework for solving PDEs by embedding physical laws into differentiable objectives. Despite their advances, training PINNs remains fragile: recent conflict-averse optimization schemes alleviate gradient interference between residual and boundary losses, but we show that their effectiveness deteriorates as model capacity increases. In this paper, we identify a capacity-induced failure mode, where overparameterized networks undergo functional modularity, self-partitioning into task-exclusive modules that suppress cross-objective interaction and hinder convergence toward Pareto-stationary points. To address this issue, we propose a novel framework, Modular-Sparsity Synchronization (ModSync), which integrates structural optimization into conflict-averse training by penalizing task-exclusive connections while preserving interaction-promoting pathways. Extensive experiments across diverse PDE benchmarks demonstrate that ModSync consistently prevents capacity-driven failures, sustains robust cross-objective coupling, and achieves state-of-the-art accuracy. Codes are available at \url{https://github.com/heejokong/ModSync}.

12.
arXiv (CS.CV) 2026-06-18

Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition

Visual Place Recognition (VPR) is a key component for localisation in Global Navigation Satellite System (GNSS)-denied environments, but its performance critically depends on selecting an image matching threshold (operating point) that balances precision and recall. Thresholds are typically hand-tuned offline for a specific environment and fixed during deployment, leading to degraded performance under environmental change. We propose a method that automatically selects the operating point of a VPR system to maximise recall at 100% precision. The method uses a small calibration traversal with known correspondences and transfers thresholds to deployment via quantile normalisation of similarity score distributions. This quantile transfer ensures that thresholds remain stable across calibration sizes and query subsets. Experiments with seven state-of-the-art VPR techniques across five benchmark datasets demonstrate that our proposed approach consistently outperforms existing baselines, enabling the underlying VPR technique to operate at 100% precision in approximately twice as many deployment scenarios (median improvement), while retrieving up to 29% more correct matches at that precision. The method eliminates manual tuning by adapting to new environments and generalising across operating conditions. Our code is available at https://github.com/DhyeyR-007/Quantile-Transfer-for-Reliable-VPR.

13.
arXiv (CS.CV) 2026-06-11

On the Study of Biometric Spoofing Detection using Deep Learning

Biometric systems are increasingly deployed in security applications; however, they remain vulnerable to spoofing attacks, in which attackers exploit counterfeit biometric data to gain unauthorized access. This research evaluates the effectiveness of state-of-the-art machine learning models, MobileNetV2, DenseNet-121, Inception-v3, and Spoof Trace Disentanglement (STD) in detecting spoofing attacks within facial recognition systems. Using the CelebA-Spoof dataset, the study evaluates model effectiveness using metrics such as accuracy, precision, recall, and F1 Score. Cross-dataset validation is carried out on the MSU-MFSD dataset to assess generalizability. The results show MobileNetV2 as the most efficient model, achieving 92% accuracy while balancing computational effectiveness, making it appropriate for real-life applications. Inception-v3 shows moderate robustness, while DenseNet-121 and STD struggle with generalization. The findings highlight the need for advances in domain adaptation and hybrid architectures to enhance biometric security systems.

14.
arXiv (CS.CV) 2026-06-12

PROBE: Probabilistic Occupancy BEV Encoding with Analytical Translation Robustness for 3D Place Recognition

We present PROBE (PRobabilistic Occupancy BEV Encoding), a learning-free LiDAR place recognition descriptor that models each BEV cell's occupancy as a Bernoulli random variable. Rather than relying on discrete point-cloud perturbations, PROBE analytically marginalizes over continuous Cartesian translations via the polar Jacobian, yielding a distance-adaptive angular uncertainty $\sigma_\theta = \sigma_t / r$ in $\mathcal{O}(R{\cdot}S)$ time. The primary parameter $\sigma_t$ represents the expected translational uncertainty in meters, a sensor-independent physical quantity that enhances cross-sensor generalization while reducing the need for extensive per-dataset tuning. Pairwise similarity combines a Bernoulli-KL Jaccard with exponential uncertainty gating and FFT-based height cosine similarity for rotation alignment. Evaluated on four datasets spanning four diverse LiDAR types, PROBE achieves the highest accuracy among handcrafted descriptors in multi-session evaluation and competitive single-session performance relative to both handcrafted and supervised baselines. The source code and supplementary materials are available at https://sites.google.com/view/probe-pr.

15.
arXiv (CS.LG) 2026-06-18

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

arXiv:2605.25929v2 Announce Type: replace-cross Abstract: The effectiveness of multi-agent LLM deliberation depends not only on the agents' individual predictions, but also on how they communicate and collaborate. We study this mechanism through the lens of Friedkin-Johnsen (FJ) opinion dynamics, a tractable model for analyzing stubbornness, influence, and opinion change in multi-agent systems that captures empirically observed deliberation patterns. We show that the FJ parameters are input-dependent, turning multi-agent deliberation into a mixture of experts. This perspective implies that multi-agent systems can outperform single agents and static ensembles when routing reflects agent competence. Since competence is latent in practice, we analyze how influence is established through observable proxies: agents' self-assessed confidence, their perceived confidence, and initial alignment with other agents' views.

16.
arXiv (CS.CV) 2026-06-16

Power Battery Detection

Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense endpoints of cathode and anode plates from industrial X-ray images for quality inspection. Manual inspection is inefficient and error-prone, while traditional vision algorithms struggle with densely packed plates, low contrast, scale variation, and imaging artifacts. To address this issue and drive more attention into this meaningful task, we present PBD5K, the first large-scale benchmark for this task, consisting of 5,000 X-ray images from nine battery types with fine-grained annotations and eight types of real-world visual interference. To support scalable and consistent labeling, we develop an intelligent annotation pipeline that combines image filtering, model-assisted pre-labeling, cross-verification, and layered quality evaluation. We formulate PBD as a point-level segmentation problem and propose MDCNeXt, a model designed to extract and integrate multi-dimensional structure clues including point, line, and count information from the plate itself. To improve discrimination between plates and suppress visual interference, MDCNeXt incorporates two state space modules. The first is a prompt-filtered module that learns contrastive relationships guided by task-specific prompts. The second is a density-aware reordering module that refines segmentation in regions with high plate density. In addition, we propose a distance-adaptive mask generation strategy to provide robust supervision under varying spatial distributions of anode and cathode positions. The source code and datasets will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/X-ray-PBD}{PBD5K}.

17.
arXiv (CS.AI) 2026-06-16

Odds Law: The Decomposition Algebra On How Intelligence Organizes Itself to Solve Difficult Problems Reliably

作者:

arXiv:2606.15712v1 Announce Type: cross Abstract: We ask a structural question: given unreliable elementary problem-solvers, what organizations of them solve hard problems reliably, and what are the limits? We develop a $decomposition~algebra$: elementary solvers are morphisms in a stochastic category, and four combinators (sequential composition, parallel ensembling, verification gating, and recursive reduction) generate the space of compound solvers. We equip this algebra with two homomorphisms, a $reliability$ valuation into the ordered monoid $([0,1],\le)$ and a $cost$ valuation into a commutative semiring, and we derive the composition laws that govern how reliability flows through structure. Our central results are (i) a $verification~odds~law$ (the result that names this report), showing that a verification gate multiplies the odds of correctness by the verifier's likelihood ratio $\Lambda$, so that $k$ conditionally independent gates yield geometric amplification; (ii) a $reliability~amplification~theorem$, giving target reliability $1-\delta$ at $O(\log 1/\delta)$ verification depth whenever $\Lambda>1$; and (iii) a $threshold~dichotomy$: above the critical parameters reliability can be driven arbitrarily close to one at logarithmic cost, while at or below them no amplification is possible. We then show that $self-organization$ is the least fixed point of a monotone improvement operator on the complete lattice of strategies, and that this fixed point equalizes marginal log-odds gain per unit cost. Finally, we prove matching limits: an information ceiling bounds per-gate amplification by a divergence quantity; shared error causes create a strictly positive voting floor, so diversity is $necessary$ for unbounded amplification. Reliability, in short, is neither free nor magical: it is bought with independent information, arranged by composition, and bounded by the verifier.

18.
medRxiv (Medicine) 2026-06-16

Diurnal variation in brain-derived tau and five other blood-based biomarkers for dementia and their association with cognitive performance

Blood-based biomarkers of dementia are a promising scalable tool for early diagnosis, tracking disease progression, and evaluating therapeutic efficacy. Utility of these biomarkers will not only be dependent on the reliability of their association with pathology but also contingent on their ability to track cognitive status. Previously, we demonstrated diurnal variation in several biomarkers (amyloid beta (A{beta}) 42 and 40, 42/40 ratio, glial fibrillary acidic protein (GFAP), neurofilament light (NfL), and phosphorylated-Tau 217 (p-Tau217)) which has implications for their reliability. Here, we extend these observations to a larger cohort, include brain-derived tau (BD-Tau), which is assumed to be produced exclusively in the brain, and report endocrine measures of circadian rhythmicity. We not only assessed whether these biomarkers vary with time of day, but also whether they associate with daytime function and whether these associations vary with cognitive domain and number of repeated assessments. Data collected in 20 PLWA (72.4{+/-}5.9 years, mean{+/-}SD) and 19 controls (68.9{+/-}9.8 years) were analysed. Participants completed 14 days of home monitoring and one laboratory assessment of sleep and daytime function: mood, daytime sleepiness, reaction time, immediate and delayed memory recall, everyday memory errors. During the 27-hour residential laboratory session, 3-hourly blood samples were collected and analysed for the six blood-based biomarkers of dementia as well as melatonin and cortisol. Rhythmicity of melatonin and cortisol did not differ between groups. P-Tau217 and GFAP (p

19.
arXiv (CS.AI) 2026-06-15

FPGA-Based Neural Network Accelerators for Space Applications: A Survey

arXiv:2504.16173v3 Announce Type: replace-cross Abstract: Space missions are becoming increasingly ambitious, necessitating high-performance onboard spacecraft computing systems. In response, field-programmable gate arrays (FPGAs) have garnered significant interest due to their flexibility, cost-effectiveness, and radiation tolerance potential. Concurrently, neural networks (NNs) are being recognized for their capability to execute space mission tasks such as autonomous operations, sensor data analysis, and data compression. This survey serves as a valuable resource for researchers aiming to implement FPGA-based NN accelerators in space applications. By analyzing existing literature, identifying trends and gaps, and proposing future research directions, this work highlights the potential of these accelerators to enhance onboard computing systems.

20.
arXiv (CS.LG) 2026-06-16

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

arXiv:2606.16384v1 Announce Type: new Abstract: Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute attention block by block which incurs significant communication overhead. While feasible in high-speed clusters, these methods are impractical for decentralized training over low-bandwidth connections. We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving a remarkable compression rate of over 95\% with negligible overhead and no loss in convergence. Our key insight is to exploit the intrinsic low-rank structure of activation outputs by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. We demonstrate scaling billion-parameter decentralized models to context lengths exceeding 100K tokens on networks as slow as 300Mbps, matching the wall-clock convergence speed of centralized models on 100Gbps interconnects.

21.
arXiv (CS.CL) 2026-06-18

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to domain-specific customization requirements and high latency and inference costs in agentic workflows. We propose a unified framework for customization and efficient deployment of multi-agent systems in real-world settings. The first stage, Agentic Model Customization, combines continual pretraining, supervised fine-tuning, and preference optimization to adapt a compact model to specialized domains while retaining strong agentic capabilities. The second stage, Inference Optimization, integrates speculative decoding and FP8 quantization with targeted calibration to enable cost-efficient serving with minimal quality loss. Across enterprise workloads, our framework enables rapid domain adaptation and achieves a 4.48x speedup in throughput while maintaining performance and improving robustness on long-tail scenarios.

22.
arXiv (CS.CV) 2026-06-17

SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA$_{RoMa}$ matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.

23.
arXiv (CS.LG) 2026-06-11

Analytic Bijections for Smooth and Interpretable Normalizing Flows

arXiv:2601.10774v2 Announce Type: replace Abstract: A key challenge in normalizing flows is finding expressive invertible scalar bijections. Existing approaches face trade-offs: affine transformations are smooth and analytically invertible but lack expressivity; monotonic splines offer local control but are only piecewise smooth and act on bounded domains; residual flows achieve smoothness but need numerical inversion. We introduce three families of analytic bijections that are globally smooth ($C^\infty$), defined on all of $\mathbb{R}$, and analytically invertible in closed form, combining the favorable properties of prior approaches. Beyond serving as drop-in replacements in coupling flows, where they match or exceed spline performance, we develop radial flows: a novel architecture using direct parametrization that transforms the radial coordinate while preserving angular direction. Radial flows exhibit exceptional training stability, produce geometrically interpretable transformations, and on targets with radial structure can achieve comparable quality to coupling flows with $1000\times$ fewer parameters. We provide comprehensive evaluation on 1D and 2D benchmarks, and demonstrate applicability to higher-dimensional physics problems through experiments on $\phi^4$ lattice field theory, where our bijections outperform affine baselines and enable problem-specific designs that address mode collapse.

24.
arXiv (quant-ph) 2026-06-15

Universal Speed Limit in a Far-from-Equilibrium Bose Gas: Symmetry and Dynamical Decoherence

arXiv:2605.11895v2 Announce Type: replace-cross Abstract: Predicting universal transport coefficients in far-from-equilibrium quantum systems remains a fundamental challenge. A paradigmatic example is the non-thermal fixed point (NTFP) of isolated Bose gases, where coherence spreads as $\ell^2(t) = C\hbar t/m$ with a universal constant $C$. While the scaling exponent $z=2$ is well established, the amplitude $C$ has remained elusive because the underlying particle cascade $n(k)\sim k^{-4}$ leads to a divergent kinetic energy, threatening the very existence of a constant speed limit. Here we resolve this paradox and present the first analytical, parameter-free prediction of a universal amplitude $C$. A deep interplay between symmetry and dissipation is uncovered. The emergent weak U(1) symmetry at the NTFP enforces a conserved total current, forcing the low-energy phase dynamics to obey a diffusive Langevin equation with noise entering as the divergence of a stochastic current. This structure, combined with dynamical decoherence of high-momentum modes, yields a universal power-law momentum distribution $\tilde{f}(v)\sim(1+v^2)^{-3}$ (with $v=k\ell$) that naturally regularizes the ultraviolet divergence. From this, a parameter-free geometric baseline $C=3$ is obtained, independent of microscopic details. The experimental value $C=3.4(3)$ [Martirosyan et al., Nature 647, 608 (2025)] is then shown to be quantitatively consistent with universal logarithmic corrections arising from a marginally irrelevant coupling at the fixed point. A new paradigm is thus established for predicting transport coefficients in strongly correlated non-equilibrium systems: symmetry constraints determine the low-energy effective theory, dynamical decoherence provides a natural ultraviolet completion, and scaling analysis delivers testable predictions moving beyond scaling exponents to quantitative amplitude prediction.

25.
arXiv (CS.AI) 2026-06-16

SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation

arXiv:2606.16454v1 Announce Type: cross Abstract: Low-Rank Adaptation (LoRA) enables efficient adaptation of large pre-trained models to downstream tasks by parameterizing weight updates with low-rank matrices. In this paper, we investigate the limitations of the LoRA parameterization from a geometric perspective. Specifically, we show that when a full fine-tuning gradient is backpropagated to the low-rank matrices, it undergoes anisotropic scaling driven by their singular values. We argue that this phenomenon is undesirable because it distorts the full fine-tuning gradient by skewing it toward dominant singular directions while suppressing others. Our analyses demonstrate that anisotropic gradient scaling reduces the effective rank of the low-rank matrices' gradients and results in suboptimal alignment between the full fine-tuning gradient and its low-rank approximation in LoRA, thereby exacerbating the gap to full fine-tuning. To address these limitations, we propose a new low-rank parameterization, SDS-LoRA, which structurally decouples singular values from the backward pass. Our method ensures that the full fine-tuning gradient backpropagates only through the orthonormal bases of the low-rank matrices' subspaces, independent of their scales. Convergence analysis demonstrates that while LoRA's convergence rate degrades with the condition number of the low-rank matrices, SDS-LoRA remains independent of it. Experimental results across natural language and vision benchmarks show that SDS-LoRA improves loss convergence and reduces the gap to full fine-tuning, significantly enhancing adaptation performance.