Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Advanced Machine Learning and Deep Learning Techniques for Enhanced Cattle Identification and Detection: A Comprehensive Review

arXiv:2606.15655v1 Announce Type: new Abstract: The need for effective cattle identification technology is now more acutely felt than ever in maintaining biosecurity, food safety, and supply chain efficacy in livestock management. This paper presents a systematic review of recent research in cattle identification using machine learning and deep learning techniques. The present systematic review measures the effectiveness of traditional and modern cattle identification techniques using studies from major academic databases, where articles were subjected to full-text review. Among these techniques, classical Machine Learning Techniques such as K-Nearest Neighbors and Support Vector Machines have demonstrated good results in cattle identification; however, Deep Learning Techniques, such as Convolutional Neural Networks, Residual Networks, and You Only Look Once, are better in cognition, detection, and identification tasks. Feature extraction relies on common techniques like Local Binary Pattern (LBP), Speeded-Up Robust Features (SURF), and Scale-Invariant Feature Transform (SIFT), while key features commonly used in these studies include muzzle prints and coat patterns. The review highlights key hurdles involving cattle identification, such as the limited number of publicly accessible datasets, issues with data quality susceptible to environmental changes and animal mobility, and high demand for real-time processing ability. The paper aims to inform researchers, policymakers, and stakeholders about implementing scalable, humane, and effective cattle identification systems to achieve sustainable livestock management.

02.
arXiv (CS.CV) 2026-06-16

Sustainable Face Recognition on Low-Power Devices with VQ-VAE Embeddings

Face recognition has become a cornerstone of modern AI applications, yet conventional approaches often rely on computationally intensive models deployed in cloud environments, leading to increased network traffic, high energy consumption, and a heavy carbon footprint. This work introduces a sustainable, edge-deployable face recognition framework based on Vector-Quantized Variational Autoencoders (VQ-VAE), which generates compact and semantically rich latent representations of facial images. By leveraging the compression capacity and reconstruction quality of VQ-VAE embeddings on the edge and combining them with the power of pre-trained face embeddings in a knowledge distillation setup, our system achieves comparable accuracy to state-of-the-art face embedding models while significantly reducing memory and computation requirements on the edge, making it suitable for low-power edge devices. The integration of VQ-VAE compression minimizes network overhead while keeping the matching accuracy high by retaining only the most informative facial features in the latent space. As a result, the reconstructed images preserve the key identity characteristics, improving the robustness and overall performance of the face embeddings.

03.
arXiv (quant-ph) 2026-06-17

Quantum statistical functions

作者:

arXiv:2602.05821v2 Announce Type: replace Abstract: Statistical functions such as the moment-generating, characteristic, cumulant-generating, and second characteristic functions are standard tools in classical statistics and probability theory. They provide a systematic means to analyze the statistical properties of a system and find applications in diverse fields. While these functions are ubiquitous in classical theory, a quantum counterpart has remained underdeveloped because of the noncommutativity of operators. The absence of such a framework has obscured the connections between statistical quantities and the nonclassical features of quantum mechanics. Here, we construct a framework for quantum statistical functions that addresses these limitations and unifies the languages of quantum statistics. We show that the functions reproduce standard statistical quantities such as expectation values, variance, and covariance upon differentiation. By extending the framework to include pre- and post-selection, we define conditional functions that generate conditional statistical quantities, including the weak value and the weak variance. We further show that multivariable functions, defined with specific operator orderings, correspond to the Kirkwood–Dirac, Margenau–Hill, and Wigner distributions. By generalizing Bochner's theorem within the theory of compactly supported distributions, we obtain a criterion that separates classical statistics from quantum statistics, linking the failure of positive definiteness of the multivariable function to the emergence of quasiprobability. As an application, we import the classical method of moments and generalized method of moments into quantum estimation, introducing quantum estimators that exploit the proposed functions. Our framework reproduces quantum statistical quantities and incorporates the nonclassical features of quasiprobability, providing a basis for further study of quantum statistics.

04.
arXiv (CS.LG) 2026-06-17

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

arXiv:2602.11557v2 Announce Type: replace Abstract: A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

05.
arXiv (quant-ph) 2026-06-19

Propagating Collective Spin-valley Modes in Twisted WSe2

arXiv:2507.18770v2 Announce Type: replace-cross Abstract: The emergence of neutral collective modes is a hallmark of correlated quantum phases but is often challenging to probe experimentally. In two-dimensional flatband systems, charge responses have been intensively investigated yet neutral excitations remain largely unexplored. In particular, intervalley coherent state (IVC) features a neutral Goldstone mode due to spontaneously broken valley U(1) symmetry. While IVC state has been proposed as a unifying theme across graphene and semiconductor based systems, its defining feature, the neutral Goldstone mode, remains elusive in experiment. Here we investigate space and time resolved transport of neutral modes in twisted WSe2 moire superlattices through a novel ultrafast imaging technique. We uncover two new propagating collective modes with very different velocities, which emerge near the van Hove singularity (VHS) in both intermediate (3.5 to 4 degree) and large (around 5 degree) angle twisted WSe2. The fast-propagating mode has a large speed of about 3 km/s and is consistent with a Goldstone mode for an IVC state, while the slow-moving mode is likely a gapped amplitude mode. They can be understood as the spin-valley analogues of collective modes of a superfluid, whose propagation is imaged for the first time in a condensed matter system. Our study demonstrates a powerful new approach for probing charge-neutral modes in quantum materials and offers key insights into the interplay between charge and spin-valley physics in moire superlattices.

07.
arXiv (CS.CL) 2026-06-17

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study this phenomenon from the perspective of GRPO-style reinforcement learning (RL) post-training, framing it as a training-time credit-assignment problem rather than merely a decoding-time stopping problem. In rollouts sampled at the onset of GRPO training, we observe that successful trajectories can exhibit a slightly higher degree of overthinking than unsuccessful trajectories for the same prompts. This early imbalance provides a starting point for an undesirable feedback loop: because GRPO assigns sequence-level credit, it cannot distinguish the solution-reaching prefix from the unnecessary continuation that lengthens a successful trajectory. Both receive positive update signal, allowing the initial imbalance to grow into more severe overthinking during training. To address this issue, we introduce Dynamic Rollout Editing (DRE), a training-time intervention for successful trajectories that continue thinking after answer emergence. DRE preserves the accepted verified prefix, edits the remaining thinking, and prefers the edited trajectory within the same RL group, weakening the preference signal for unnecessary thinking without penalizing the reasoning needed to reach the answer. Experiments across diverse tasks show the effectiveness of DRE.

08.
arXiv (CS.AI) 2026-06-16

AI Contagion in Social Networks

arXiv:2606.15206v1 Announce Type: cross Abstract: We study how artificial intelligence (AI) interacts with social communication networks to shape the stability of collective knowledge. Agents exchange information through a network while receiving AI-generated content, and AI systems retrain on the aggregate social information they influence. This interaction generates two feedback forces: an AI contagion channel, through which distortions diffuse across the network, and an AI social distortion multiplier, through which retraining amplifies past errors. Despite the high dimensionality of the environment, we show that the long-run behavior of the system admits a two-dimensional representation whose spectral radius determines whether AI-mediated information systems are dynamically stable or unstable. We characterize a sharp regulatory frontier identifying the minimum filtering required for stability and show how network topology shapes systemic informational risk.

09.
arXiv (quant-ph) 2026-06-11

Compressed minimum-purity time evolution for late-time quantum dynamics

arXiv:2606.11392v1 Announce Type: cross Abstract: Unitary time evolution of initially simple quantum many-body states rapidly generates entanglement and complex correlations, which limits direct numerical simulations. The late-time dynamics of physical observables, however, typically exhibits an effective simplicity in the form of hydrodynamics or kinetic theory. This leads to the question whether microscopic equations of motion can remain accurate and tractable up to long time scales by discarding irrelevant information in a controlled manner. Here, we introduce compressed minimum-purity time evolution (CoMPuTE) as an approach to keep track of a consistent set of reduced local density matrices, closing the hierarchical equations of motion using a minimum-purity principle. In benchmark applications we demonstrate (i) accurate description of energy diffusion in the one-dimensional mixed-field Ising model, (ii) the applicability to genuinely out-of-equilibrium Floquet dynamics starting from a pure state, and (iii) the limitations of the local reduced density matrix approximation when describing transport in the XXZ chain at $\Delta=1$ that is governed by increasingly non-local integrals of motion. The CoMPuTE method enhances computational efficiency in comparison to the closely related local-information time evolution algorithm, opening a possible route towards an extension to systems in higher spatial dimensions.

10.
arXiv (CS.CV) 2026-06-17

TaFD: Threat-Aware Frequency Decoupling for Adversarial Robustness against Heterogeneous Attacks

Multi-threat robustness remains a fundamental challenge in deep learning. Although joint adversarial training (JAT) is widely adopted, it suffers from negative transfer under heterogeneous threats, particularly between $\ell_p$-bounded and semantic attacks. Through first-order gradient analysis, we formalize this as gradient incompatibility and theoretically establish the necessity of decoupled optimization. We further reveal that these conflicting threats exhibit separable spectral characteristics in the frequency domain. Motivated by this observation, we propose Threat-aware Frequency Decoupling (TaFD), a two-stage defense framework that reformulates JAT as a frequency-domain divide-and-conquer paradigm. TaFD first discovers latent threat domains via unsupervised clustering of attack spectral prototypes and trains a lightweight classifier for inference-time threat domain identification. Conditioned on the prediction, TaFD employs a Frequency-Conditional Convolution that learns threat-domain-specific spectral masks and routes each sample to the corresponding expert, enforcing structural parameter separation and alleviating optimization conflicts. We validate TaFD on three representative image-classification benchmarks (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and on two representative architectures (the convolutional ResNet and the hybrid-transformer MobileViT). Extensive results demonstrate that TaFD achieves more balanced robustness against heterogeneous attacks than existing JAT and frequency-domain baselines, improving average robust accuracy by approximately 11\% over the strongest baseline while maintaining leading clean accuracy.

11.
arXiv (CS.CL) 2026-06-18

Attention as Frustrated Synchronization

A network of oscillators that synchronizes perfectly computes nothing further, so an attention architecture built from synchronization must locate its computation in structured departures from agreement. We introduce the Frustrated Synchronization Network (FSN), whose token states are phases on a torus and whose entire value pathway is one learned complex coupling kernel over harmonics and a one-step delay. Each component of the kernel is a frustration in the sense of the synchronization literature. The complex phases are static Kuramoto-Sakaguchi frustration angles, the signed harmonics are repulsive Daido components, and the delay term, which couples each token to the successors of the tokens it attends to, is algebraically identical to Kuramoto-Sakaguchi coupling whose frustration angle is the data's own transition, so next-token prediction is implemented as synchronization frustrated by the data. At matched one-million-parameter and training budgets on character-level text and code, the FSN's validation loss is below a tuned RoPE-SwiGLU transformer's at every epoch measured, and the comparison survives training the baseline to convergence: every thirty-epoch enwik8 seed finishes below the transformer's converged fifty-epoch loss of 1.611, and the FSN's completed fifty-epoch runs converge to 1.5953 +/- 0.0014. A variant with every feed-forward block replaced by mean-field coupling to learned collective modes, leaving no multilayer perceptron in the stack, tracks the transformer. On natural text the unfrustrated base layer falls behind the converged transformer at every copy depth, worst on long-range copy events; the kernel reverses the deficit at every depth of four and beyond. Headline comparisons are at the one-million-parameter scale; a scale ladder is complete through four million parameters with the advantage persisting, and remaining arms are marked as in progress.

12.
arXiv (quant-ph) 2026-06-17

Tunneling Dynamics and Time Delay in Electron Transport through Time-Dependent Barriers with Finite-Bandwidth Reservoirs

arXiv:2507.20649v2 Announce Type: replace-cross Abstract: We study a model system consisting of a tunneling barrier driven by an external harmonic field and coupled to two leads with finite bandwidth. Avoiding Floquet expansions, we derive simple expressions for the time-dependent tunneling current in the adiabatic regime. Our approach relates the barrier modulation to a measurable time delay in the steady-state periodic current. It provides a physically consistent definition of the tunneling time inside the barrier by subtracting the time delay associated with the leads from the total time delay. We find that the tunneling time always vanishes for wide/high barriers. Remarkably, the time delay persists even when the barrier becomes static, i.e., in the limit where the modulation frequency vanishes. This indicates that the time delay obtained through the introduction of an external periodic perturbation actually reflects an intrinsic property of the tunneling dynamics, rather than an effect of the external drive or of a particular system. We apply our results to the analysis of tunneling times in optical experiments and find good agreement with the experimental data.

13.
arXiv (CS.CL) 2026-06-15

Indirect Computing Model with Indirect Formal Method

作者:

This paper,from the perspective of a collaborative intelligent computing system formed by combining human-computer interface and collaborative computing programs, discusses the principles of optimized cloud computing technology supported by the combination of an indirect computing model and an indirect formal method. On the basis of systematically reviewing the influence of previous theoretical achievements Turing's computability theory,Kleene's formal theory of small strings,von Neumann's digital computer architecture and Turing's hypothesis on AI judgment on the mainstream general-purpose digital computer paradigm,the author focuses on introducing an indirect computing model and an indirect formal theory compatible with both large and small strings. Using Chinese information data as an example,the design concept of a collaborative intelligent computing system prototype is presented. The significance is that this achievement facilitates optimization of cloud computing from data centers to knowledge centers.

14.
arXiv (CS.AI) 2026-06-12

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

arXiv:2606.13602v1 Announce Type: new Abstract: We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations across CUT\&Tag/CUT\&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. Across 5,088 valid trajectories from 16 model-harness pairs, no system passed a majority of attempts: GPT-5.5 / Pi led at 45.0\% (143/318 attempts; 95\% confidence interval (CI), 36.3–53.7), followed by GPT-5.5 / OpenAI Codex at 39.9\% (127/318 attempts; 95\% CI, 31.6–48.3). Claude Opus 4.8 Max / Pi and GPT-5.4 / Pi each passed 39.0\% (124/318 attempts; 95\% CI, 30.2–47.8 and 31.0–47.0, respectively). Performance varies across assay types, and many failed runs still contain parts of the correct answer. Agents often found the right files and computed useful intermediate results, but failed when the task required deeper, assay-specific scientific judgment.

15.
arXiv (CS.CV) 2026-06-17

Remote sensing data imputation using deep learning for multispectral imagery

Remote sensing techniques have been increasingly utilised in aquatic applications in recent years. A common challenge in using optical satellite data is the presence of missing observations due to cloud cover. These data gaps can lead to missed detection of critical events, such as algal blooms, in lakes of high interest to water authorities. As a result, enhancing the completeness of optical satellite datasets is crucial for improving the monitoring and prediction of algal blooms. In this study, we compared a traditional data imputation method (i.e., linear interpolation) with deep learning models for reconstructing missing spectral bands across four lakes with historical records of algal blooms. The deep learning models adopted include CNN-based architectures (i.e., CNN, Inception Resnet, and Autoencoder) and CNN-LSTM-based architectures (i.e., CNN-LSTM, Resnet-LSTM, and Autoencoder-LSTM). Our results demonstrated that deep learning models substantially outperformed the baseline linear interpolation method in imputing spectral band values within artificially masked regions. Among these models, CNN delivered the best performance across most lakes. Furthermore, we evaluated the performance of algal bloom indices (i.e., Green/Red and NDCI) derived from the imputed imagery by comparing them with the observed data. Our results demonstrate that deep learning models are effective for imputing missing data in PlanetScope SuperDove imagery, enabling more reliable applications in water monitoring.

16.
arXiv (CS.CV) 2026-06-11

Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

Multi-modal large language models (MLLMs) depend on in-context learning (ICL) for rapid task adaptation, but their scalability is severely limited by finite context windows and the growing cost of key-value (KV) caches in long multi-modal sequences. Existing memory compression approaches typically rely on rigid token removal or sample-dependent importance estimation, which introduces bias, disrupts semantic structure, particularly for visual representations, and yields static memories that cannot adapt to new queries. We introduce TASM (Task-Aware Structured Memory), a training-free framework that addresses these limitations through task-aware, structure-preserving, and dynamically accessible memory construction. TASM employs task-vector guided compression to replace sample-specific signals with a task-level direction that captures shared relevance across demonstrations. To preserve the underlying manifold, it applies semantics-aware token merging via bipartite graph matching, aggregating tokens without destructive pruning. Finally, TASM structures memory into a hierarchy comprising a compact Core Memory and a Latent Bank, facilitating query-adaptive dynamic retrieval. Evaluations confirm TASM maintains high performance under heavy compression, effectively balancing efficiency with adaptability.

17.
arXiv (CS.AI) 2026-06-15

STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

arXiv:2604.09737v2 Announce Type: replace-cross Abstract: Structured prediction with large language models requires outputs that are label-accurate, ontology-constrained, structurally valid, and evidence-grounded under label imbalance and heterogeneous group difficulty. We present a unified framework for ontology-constrained generation. First, we introduce a modular prompt-engineering architecture combining XML-style structure, expert disambiguation rules, chain-of-thought reasoning, metadata-aware decision logic, schema contracts, and a self-validation gate. It targets recurrent in-context failures, including format drift, label ambiguity, evidence hallucination, and metadata-conditioned confusion. Second, we propose STaR-DRO, combining Tsallis mirror ascent, sparse entmax-style primal mapback, EMA-smoothed group-loss tracking, rescaled ascent signals, and bounded excess-only multipliers. Unlike conventional DRO, which relies on dense Shannon-entropy exponentiated-gradient updates, can introduce high-variance stochastic reweighting, assigns positive adversarial mass to groups that are not persistently hard, and incurs costs through simplex competition, STaR-DRO upweights only persistently hard groups without suppressing easier ones. We evaluate the framework on EPPC Miner, a clinically grounded high-stakes structured-prediction task requiring hierarchical label prediction and evidence-span extraction from patient-provider secure messages. Across 1B-70B Llama models, prompt engineering improves zero-shot extraction, yielding an average label F1 gain of +14.46 and a Span F1 gain of +17.40. Building on supervised fine-tuning, STaR-DRO further improves accuracy and robustness, increasing average label F1 by +1.08 and +2.20 while reducing mean groupwise validation cross-entropy by 21.3% and 14.8% relative to SFT and standard DRO, respectively. These results advance reliable automated communication mining for patient-centered clinical care analysis.

18.
arXiv (CS.AI) 2026-06-17

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

arXiv:2603.22372v2 Announce Type: replace-cross Abstract: Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods. Code is available at: https://github.com/seunghan96/cfa.

19.
arXiv (quant-ph) 2026-06-19

Quantum deformations of $\mathcal{U}(\mathfrak{sl}(2, \mathbb{R}))$. Part I: Fidelity and experimental benchmarking

arXiv:2606.19462v1 Announce Type: new Abstract: This work explores the effects of both the standard quantum $q$-deformation and the non-standard $h$-deformation of the Hopf algebra $\mathcal{U}(\mathfrak{sl}(2, \mathbb{R}))$ on multi-qubit systems. By constructing the states of a Hilbert space of $N$ qubits through the Clebsch-Gordan coefficients associated with the deformed algebras, we show that these states naturally coincide with the eigenstates of the Hamiltonian of the $q$- and $h$-deformed Kittel-Shore models. We compare the resulting deformed states with those typically targeted in quantum information experiments, providing a bridge between algebraic constructions and experimentally relevant quantum resources. Fidelities with respect to the undeformed states are computed to establish how the quantum correlations are affected, both for few-qubit systems (including Dicke and non-Dicke states), and in the macroscopic limit ($N \to \infty$) through closed-form formulas derived for arbitrary Dicke states. The results reveal different behaviors between the two deformations. The $q$-deformation smoothly modifies the states and maintains a residual overlap with the original configurations, while the $h$-deformation rapidly makes the states orthogonal to their undeformed counterparts. Both models demand a standard $N^{-1}$ rescaling to preserve fidelity stability in the macroscopic limit.

20.
arXiv (quant-ph) 2026-06-12

Kerr-induced nonreciprocal transparency and group delay in a hybrid cavity magnomechanical system

arXiv:2606.13412v1 Announce Type: new Abstract: We propose a scheme for realizing nonreciprocal transparency, Fano resonances, and slow/fast light in a hybrid cavity magnomechanical system containing two YIG spheres and a mechanical resonator. The nonreciprocal behavior originates from the magnon Kerr nonlinearity, which induces direction-dependent frequency shifts and modifies the interference pathways among cavity photons, magnons, and phonons. We show that the hybrid system supports multiple transparency windows arising from magnon- and magnomechanical-induced interference processes. The Kerr interaction strongly reshapes these transparency features, producing asymmetric Fano line shapes and enabling controllable nonreciprocal transmission. Furthermore, the associated dispersion exhibits pronounced directional asymmetry, leading to giant differences in the group delay for opposite propagation directions and allowing reversible switching between slow- and fast-light regimes. We investigate the roles of hybrid coupling strengths and dissipation channels and identify parameter regimes where the nonreciprocal response is maximized. These findings establish Kerr-engineered magnomechanical systems as promising platforms for integrated nonreciprocal microwave photonics and quantum information technologies.

21.
arXiv (CS.CL) 2026-06-16

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing

Modeling long-range dependencies remains a central challenge in natural language processing. Transformer architectures achieve strong performance via self-attention but scale quadratically ($O(N^2)$) with sequence length, while State Space Models (SSMs) scale linearly ($O(N)$) but suffer from a selective recall bottleneck, struggling to retrieve precise information from compressed states. This creates a fundamental tradeoff between efficiency and perplexity. To tackle these challenges, we propose the Parallel Hybrid Architecture (PHA), which runs Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) as independent parallel branches fused by a learnable mixing mechanism. Instead of forcing SSMs to approximate attention or serializing the two paradigms, PHA allows each branch to specialize: GSS captures global context, while attention performs selective retrieval, with FFN providing complementary processing. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming Hedgehog (16.70) and H3-125M (23.70). Scaling to 180M parameters yields 16.42 PPL, which gives comparable results with the pure attention baseline while delivering 24\% higher throughput and up to 40\% lower memory usage at long contexts. On OpenWebText, our 125M model achieves 19.72 PPL, outperforming standard Transformers (20.60) and GSS hybrid baselines (19.80). These results demonstrate that separating sequence modeling paradigms into parallel specialists enables Transformer-level perplexity with substantially improved efficiency for long-context language modeling.

22.
arXiv (CS.LG) 2026-06-12

Limits of spectral learning under noise

arXiv:2606.13067v1 Announce Type: new Abstract: Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

23.
medRxiv (Medicine) 2026-06-17

Method comparisons for differentiation of Schizophrenia and Bipolar based on rs-fMRI Intrinsic and Functional Networks

Psychosis as a symptom manifests in schizophenia and bipolar disorder, two highly heterogeneous psychiatric illnesses with overlapping clinical manifestations. Resting-state functional Magnetic Resonance Imaging (rsfMRI), represents a promising tool for identifying objective biomarkers of functional brain alterations to aid differential diagnosis. In this work, we comparatively evaluate multiple rs-fMRI representations for differentiating schizophrenia and bipolar disorder using intrinsic connectivity network (ICN) temporal profiles and several functional network connectivity (FNC) approaches, including static, dynamic, and high-order connectivity analyses. The study was conducted on a cohort of 371 subjects with psychosis, while evaluation was performed using a separate held-out cohort of 315 subjects. We investigated convolutional neural network architectures applied to ICN temporal profiles, spectrograms, and scalograms, alongside classical machine learning models trained on connectivity-derived features. Across the evaluated approaches, ICN temporal profiles provided the most consistent discriminative performance, with a 1D convolutional neural network achieving the strongest overall results under the benchmark protocol. Among connectivity-based methods, static functional connectivity generally outperformed dynamic and high-order representations, suggesting that increased representational complexity did not necessarily translate into improved generalization. Although the obtained classification performance remained modest, the results highlight the challenges of robust psychosis differentiation using rs-fMRI while emphasizing the relative stability of low-order connectivity representations and temporal ICN features. These findings contribute to ongoing efforts toward reproducible and interpretable neuroimaging biomarkers for psychiatric disorders.

24.
arXiv (CS.CL) 2026-06-16

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Pretraining large language models (LLMs) typically requires centralized clusters with thousands of high-memory GPUs (e.g., H100/A100). Recent decentralized training methods reduce communication overhead by employing federated optimization; however, they still need to train the entire model on each node, remaining constrained by GPU memory limitations. In this work, we propose SParse Expert Synchronization (SPES), a memory-efficient decentralized framework for pretraining mixture-of-experts (MoE) LLMs. SPES trains only a subset of experts per node, substantially lowering the memory footprint. Each node updates its local experts and periodically synchronizes with other nodes, eliminating full-parameter transmission while ensuring efficient knowledge sharing. To mitigate limited per-expert data utilization under sparse expert updates, we introduce an expert-merging warm-up strategy, where experts exchange knowledge early in training, to rapidly establish foundational capabilities. With SPES, we train a 2B-parameter MoE LLM using 16 standalone 48GB GPUs over internet connections, which achieves competitive performance with centrally trained LLMs under similar computational budgets. We further demonstrate scalability by training a 7B model from scratch and a 9B model upcycled from a dense checkpoint, both of which match prior centralized baselines. Our code is available at https://github.com/zjr2000/SPES.

25.
arXiv (CS.LG) 2026-06-12

WHAR Arena: Benchmarking the State of the Art in Efficient Wearable Human Activity Recognition

arXiv:2606.13194v1 Announce Type: new Abstract: Deep learning has become the dominant paradigm in Wearable Human Activity Recognition (WHAR), yet progress is obscured by a comparability crisis. Results are often reported using inconsistent datasets, custom data processing, and varying evaluation protocols, making state-of-the-art claims fragile. We address this with a large-scale, open-source benchmark that integrates 30 diverse datasets under standardized processing, unified model interfaces, and a shared cross-subject evaluation protocol. Evaluating 17 representative architectures across 4760 training runs, we jointly measure predictive performance alongside on-device latency, peak memory, and model size on an Android reference device. Our results reveal that the WHAR state of the art is distributed rather than dominated by a single architecture. While CNN-HAR achieves the highest mean macro-F1, top-performing models cluster tightly, indicating contemporary architectures have converged near a predictive performance ceiling. When accounting for deployment efficiency, compact neural models, such as TinierHAR, and classical Random Forests define the practically relevant Pareto frontier, whereas larger recurrent and hybrid models incur high hardware costs without corresponding performance gains. Consequently, while predictive performance has plateaued, substantial potential for future progress remains in optimizing deployment efficiency and improving adaptation to domain shifts. We release our full framework to support transparent reuse and extension.