Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-12

Quantum optical photoelectron interferometry

arXiv:2606.13447v1 Announce Type: new Abstract: We present a general theoretical framework for multiphoton processes driven by quantum light fields, establishing a direct link between photon statistics and photoelectron observables. Our results show that the autocorrelation and cross-correlation functions, which quantify the underlying photon statistics, are directly mapped onto the resulting photoelectron spectra. Although our framework is broadly applicable, we demonstrate specifically in the example of reconstruction of attosecond beating by interference of two-photon transitions (RABBIT) the influence of the light statistical properties. In this approach, the amplitude, contrast and phase of the oscillations of the sideband signal as a function of pump-probe delay reveal the quantum nature of light. We analyze these observables across several quantum configurations, including correlated infrared and harmonic modes, as well as the uncorrelated case with non-classical harmonic statistics, thereby establishing a general framework for quantum-light RABBIT spectroscopy. We compare the analytical theory with numerical simulations for the case of classical harmonics and an infrared field in a squeezed coherent state, obtaining excellent agreement. Our results reveal how the interplay between classical and quantum correlations dictates the coherence of the photoemission process, providing a new window into the quantum-optical foundations of attosecond science.

02.
arXiv (CS.CV) 2026-06-16

CPS4: Class Prompt driven Semi-Supervised Spine Segmentation with Class-specific Consistency Constraint

Vision Language Model (VLM) has great potential to enhance the quality of pseudo labels in semi-supervised spine segmentation by leveraging textual class prompts to generate segmentation map, but no one has studied it yet. Although promising, it lacks explicit constraints to ensure consistency between spine class prompts and spine unit region, resulting in unsatisfactory performance in multi-class segmentation map generation. In this paper, we propose CPS4, the first text-guided semi-supervised spine segmentation network using class prompts to enhance the quality of spine pseudo labels. Specifically, CPS4 is implemented through two training stages. (i) Class-specific consistency constrained VLM pretraining stage: we propose token- and pixel-level attention loss to optimize the consistency between class prompts and spine units, forcing the textual class prompt to be closely coupled with the target spine unit in the semantic space. (ii) Class Prompt driven semi-supervised spine segmentation stage: using the pretrained vision-text encoder, we derive each class-specific binary segmentation map for the unlabeled spine image and integrate them into an unified multi-class segmentation map, improving the quality of the spine pseudo label generated by the semi-supervised spine segmentation network. Experimental results show that our CPS4 achieves superior spine segmentation performance with Dice of 80.44%, only using 5% labeled data on the public spine segmentation dataset, surpassing popular semi-supervised learning and VLM methods. Our code will be available.

03.
arXiv (CS.CV) 2026-06-18

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

04.
arXiv (CS.AI) 2026-06-16

SMEPilot: Characterizing and Optimizing LLM Inference with Scalable Matrix Extensions

arXiv:2606.16332v1 Announce Type: cross Abstract: Modern CPUs increasingly integrate matrix extensions, such as Arm Scalable Matrix Extension (SME), that provide high-throughput matrix execution within the CPU. For LLM inference, however, these units are not a universal replacement for conventional CPU cores: prefill, decode, attention, and KV-cache operations expose different arithmetic intensities, vector behavior, and layout requirements, while SME units and CPU cores still compete for shared memory bandwidth. This paper studies this mismatch through a roofline-based characterization of SME-enabled CPUs and uses the resulting model to guide operator-level execution choices. We present SMEPilot, an LLM inference engine that selects CPU-only, SME-only, or cooperative SME+CPU execution for each operator shape. SMEPilot partitions matrix work across SME and CPU cores at tile granularity, overlaps SME-suitable matrix stages with CPU-suitable vector stages in attention, and maintains layout state so packed tensor representations are reused rather than repeatedly rebuilt on critical paths. Across Llama-3.2-3B, Qwen3-4B, and Qwen3-30BA3B on phone, PC, and server platforms, SMEPilot improves end-to-end inference performance by up to 3.94$\times$.

06.
arXiv (CS.LG) 2026-06-11

Time-multiplexed layer reuse for physical neural networks

arXiv:2511.00044v3 Announce Type: replace Abstract: Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.

07.
arXiv (CS.LG) 2026-06-16

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

arXiv:2606.15271v1 Announce Type: cross Abstract: This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods – Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation – and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

08.
arXiv (CS.LG) 2026-06-11

Learning from almost nothing: How neural networks survive heavy input corruption

arXiv:2606.11319v1 Announce Type: new Abstract: Learning from imperfect data is a central theme in machine learning, connecting practical questions of robustness to fundamental questions of learnability. Here we examine attribute noise: learning from corrupted inputs while keeping the labels intact, a setting that has received considerably less analytical attention than its label-noise counterpart. We consider two types of corruption models: additive noise and replacement noise. Through experiments with multi-layer perceptrons (MLPs) on corrupted classification datasets, we find that neural networks remain robust, maintaining well-above-chance accuracy even when inputs are >90% corrupted – far beyond human recognition. To understand this robustness, we analyze infinite-width networks in the heavy-corruption regime using a mean-field-inspired approach and derive a leading-order decision rule for the classification outcome: the network implements a prototype rule, the nearest-class-mean, assigning each test point to the class whose training-set average it most closely resembles. This leading-order decision rule is universal across a broad range of MLP architectures, holding for any depth, as well as a wide class of activation functions and noise distributions. The same centroid mechanism closely matches finite-width network behavior in our experiments and provides an interpretable and analytically tractable account of why learning can succeed even when individual training examples carry almost no signal.

09.
arXiv (math.PR) 2026-06-15

Boltzmann-Like Occupation of Nonequilibrium Steady States on Dense Networks

Authors:

arXiv:2606.14542v1 Announce Type: cross Abstract: A central problem in statistical physics is to extend the Boltzmann distribution to nonequilibrium steady states (NESS). We prove that NESS on large dense networks have Boltzmann-like occupation despite extensive entropy production. We further show that the active-matter heuristic of "low rattling" is asymptotically exact. Intuitively, these NESS spend a greater fraction of their time in states they leave more slowly. This explanation extends to the broader class of "equiaccessible" steady states, which play a role in our analysis akin to that of equilibrium in linear response.

10.
arXiv (CS.AI) 2026-06-11

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

arXiv:2606.09426v2 Announce Type: replace Abstract: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often evaluate these interfaces as separable capabilities, leaving long-horizon cross-interface orchestration under-tested. Thus, we introduce WeaveBench, a long-horizon hybrid-interface benchmark with 114 tasks across 8 real-world work domains, grounded in real user requests and publicly verifiable artifacts. Each task requires agents to combine GUI observations/actions with CLI/code operations within a single trajectory. We evaluate these tasks on a real Ubuntu desktop inside deployed CLI-agent runtimes, augmented with a minimal desktop-control plugin. We also propose a companion trajectory-aware judge that inspects deliverables, files, screenshots, logs, and action traces, while detecting shortcut behaviors such as fabricated visual evidence or hard-coded metrics. Across frontier model-runtime pairings, the best PassRate reaches only 41.2%, showing the benchmark remains far from saturated. The trajectory-aware judge further reveals that outcome-only grading substantially overestimates agent performance. Overall, WeaveBench exposes a critical gap in CUA evaluation and provides an effective testbed to measure whether agents can orchestrate GUI, CLI, and code operations across long-horizon real-world tasks.

11.
arXiv (CS.CV) 2026-06-17

Do We Really Need Diffusion? A Fast U-Net for Paired Medical Image Translation

Magnetic resonance imaging-signal fat fraction (MRI-SFF) quantifies tissue fat and serves as an established biomarker for metabolic and musculoskeletal disorders. The acquisition requires, however, specialized MRI sequences, which are not available routinely. We investigate whether SFF can be estimated from widely available T2-weighted (T2w) MRI via image-to-image translation (I2I). We further compare a lightweight 4-level U-Net to a state-of-the-art Denoising Diffusion Probabilistic Model (DDPM) using a dataset of 230 048 paired 2D images (183 517 train, 23 621 val, 22 910 test) from the German National Cohort (NAKO). Both models clearly outperform the identity baseline (Pearson correlation r = 0.769, mean absolute error MAE = 0.070 +/- 0.054), which confirms that the models learn a non-trivial cross-modal mapping. Interestingly, the lightweight U-Net outperforms the DDPM in both correlation (r = 0.975 vs. 0.962) and error (MAE = 0.014 +/- 0.015 vs. 0.019 +/- 0.019), while reducing inference time by a factor of 208 (25.2 ms vs. 5 227.2 ms per image using 50 Denoising Diffusion Implicit Model (DDIM) steps). The strong clinical performance at substantially reduced computational cost enables real-time clinical use.

12.
PLOS Medicine 2026-05-15

Spatial transcriptomic-metabolic features of tumor foci and tumor capsule in microvascular invasion with hepatocellular carcinoma: A spatial multi-omics study

Authors:

by Zhi-Hui Luo, Na Wang, Jingwei Zhao, Fei Long, Si Wu, Wei Zhong, Wei-Ming Chen, Bicheng Wang, Kun Wang, Yufeng Yuan, Jingjiao Zhou, Chunhui Yuan, Fubing Wang Background Microvascular invasion (MVI) is closely related to the recurrence and metastasis of hepatocellular carcinoma (HCC), but the underlying cellular mechanism remains largely elusive. This study aims to elucidate the regional cellular discrepancy between MVI-positive (MVI+) and MVI-negative (MVI−) HCC by integrating Spatial transcriptomics (ST) and spatial metabolomics (SM). Methods and findings ST and SM were performed on six tissue samples from four patients (including 2 MVI+, 2 MVI−, and 2 paratumor tissues), with the integration of 79 public single-cell RNA sequencing datasets of HCC. Patient identity was used as a covariate in the linear equation for regional differentially expressed gene analysis with the ST data. Clinical validation was conducted through multiplex immunofluorescence staining in 79 patients, together with external validation in the cancer genome atlas (TCGA)-liver hepatocellular carcinoma (LIHC) cohort (n = 299) and an independent microarray dataset (n = 62). For cell-type-specific metabolic profiling, spatial transcriptomic-metabolic registration was performed. The functional roles of key metabolites were further validated in vitro using inflammatory cancer-associated fibroblasts (iCAFs) derived from hepatic stellate cells (HSCs) and primary CAFs through co-culture models and various functional assays assessing cell proliferation, migration, and invasion. In the tumor lesion, a malignant STMN1+HMGN2+GPC3+ cell subtype enriched in MVI+ HCC was identified, which exhibited enhanced proliferative activity and was associated with poor prognosis. This finding was further confirmed in a local cohort of 79 patients, where multiplex immunofluorescence staining for the three genes (STMN1, HMGN2, and GPC3) showed significantly higher expression in the MVI+ group than in the MVI− group (p = 0.046). Integrated SM analysis further revealed that this cell population underwent metabolic reprogramming characterized by suppressed glycerolipid metabolism. In the tumor capsule, iCAFs-related genes were downregulated in MVI+ cases, and iCAFs were located distally from the tumor boundary. Spatial metabolite mapping showed a strong correlation between taurine and iCAFs, and functional assays demonstrated that taurine promotes HCC proliferation and migration by suppressing iCAF activity. One limitation of this study is the small sample size of spatial omics data, which hinders a more complete molecular functional analysis of the STMN1+HMGN2+GPC3+ cell subtype and iCAFs in MVI+ HCC. Larger-scale ST cohorts are required to further validate and expand the findings of this study. Conclusions This integrative spatial atlas proposes a hypothesis that there exists a highly proliferative and metabolically reprogrammed malignant cell subtype in the tumor lesion of MVI+ HCC, and that taurine in the tumor capsule modulates iCAF activity to influence tumor progression. The exploratory results provide mechanistic insights into MVI-related HCC progression and offer potential avenues for targeted therapeutic intervention of MVI+ HCC.

13.
arXiv (CS.LG) 2026-06-16

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

arXiv:2605.03573v3 Announce Type: replace-cross Abstract: Quantum machine learning increasingly relies on pure-state representations, motivating generative models that sample directly in quantum representation space rather than perturbing classical inputs and re-encoding. We introduce Stochastic Schrödinger Diffusion Models (SSDMs), a score-based generative framework that defines diffusion, scores, and reverse-time sampling intrinsically on the complex projective manifold $\mathbb{CP}^{d-1}$ under the Fubini–Study metric. SSDMs combine a Riemannian Ornstein–Uhlenbeck forward diffusion with a stochastic Schrödinger realization, and learn reverse-time dynamics driven by the Riemannian score. Our central technical contribution is a local-time learning objective that exploits the local Euclidean OU limit of intrinsic manifold diffusions in Fubini-Study normal coordinates to obtain an analytic teacher score, bypassing the intractable transition densities that limit existing Riemannian score-based models. Across synthetic, physics-inspired (TFIM, XXZ), and quantum feature-state benchmarks up to $14$ qubits, SSDMs match target pure-state ensembles by orders of magnitude on MMD and observable statistics over both ambient Euclidean and matched Riemannian score-based baselines, and improve representation-level diagnostics for downstream quantum kernel methods.

14.
PLOS Computational Biology 2026-06-16

Evolution and the ultimatum game: An agent-based model with interbirth intervals and population structure

by Jeffrey C. Schank, Matt L. Miller The ultimatum game (UG) is widely used to study mutually beneficial exchanges, fairness, and prosocial behavior across different societies. However, human behavior in UG experiments does not align with the game-theoretical prediction that proposers should offer the least positive amount and responders should accept such offers. Instead, proposers make generous offers that are greater than the minimum responders are willing to accept, resulting in generous offers with wide offer-acceptance gaps. Numerous evolutionary models of the UG have been created and studied to explain human behavior, particularly generous offers made in UG experiments. These models have recently faced criticism for lacking biological realism and not adequately explaining the data. Here, we present an agent-based model inspired by our hunter-gatherer ancestors and with a biologically more realistic selection process. We assume that (1) agents exist in group-structured and group-clustered populations, where reproduction (2) depends on resource accumulation, but (3) is limited by interbirth intervals. We ran simulations to assess whether this biologically more realistic model evolves patterns of behavior consistent with patterns in the data from meta-analyses of human behavior in the UG. For the proposed model, we show that generous offers robustly evolve, as well as the difficult-to-explain offer-acceptance gaps, only in group-structured populations with interbirth intervals. We demonstrate that these results are robust and may help explain variation in data across societies. We discuss how interbirth intervals interact with group structure to modulate offer and rejection costs, favoring the evolution of generous offers, offer-acceptance gaps, and other patterns in the data on human behavior in the UG. We also discuss why weak selection and/or high mutation rate models cannot explain all the patterns in UG experimental data. We discuss biological realism and conclude that group structure and interbirth intervals may be essential for explaining prosocial behavior across societies.

15.
arXiv (CS.AI) 2026-06-12

Exploring How Agent Voice Accents Shape Human-AI Collaboration in K-12 Group Learning

arXiv:2606.12805v1 Announce Type: cross Abstract: Collaboration is widely recognized as a cornerstone of 21st-century education, yet teachers still encounter persistent challenges in fostering productive peer interaction. LLM conversational peer agents introduce new possibilities for mediating in-person group work, raising questions about how persona design, particularly their voice characteristics, shapes learners' perceptions, trust, and interactional dynamics. While prior work has examined agent accent effects in one-to-one settings, little is known about how these effects manifest in groups. We conducted a between-subjects mixed-methods study with 33 teachers examining how a GenAI voice agent with different accents (British, Indian, and African American) influenced collaboration and agent perception. Across surveys, group interaction analyses, and artifacts, we find that accent shaped participants' mental models and the roles the agent assumed in group interaction. The British-accented agent was largely treated as a tool and engaged in detached, utility-based ways, whereas Indian- and African American-accented agents were more readily anthropomorphized and integrated as peers. These role expectations influenced trust, engagement, and reliance over time. This work advances understanding of how GenAI's sociolinguistic design features shape group dynamics in CSCL, with implications for designing culturally inclusive AI partners in group learning.

16.
arXiv (quant-ph) 2026-06-16

Linear algebra at exponential scale via tensor network dimension reduction

arXiv:2606.15350v1 Announce Type: cross Abstract: Many problems in modern scientific computing are challenging because of a curse of dimension, where their mathematical formulation involves objects whose dimension is exponential in the nominal "size" of the problem. Tensor networks can provide a compact representation for exponentially large vectors and matrices that arise in applications, but these representations do not always lead to reliable algorithms. This paper develops and analyzes techniques for randomized dimension reduction of tensor network data. These techniques support a suite of efficient algorithms for provably solving exponential-scale linear algebra problems, including trace estimation and eigenvalue approximation. The paper includes several stylized illustrations from quantum many-body physics with ambient dimension up to $2^{200}$.

17.
bioRxiv (Bioinfo) 2026-06-11

TifBERT: a self-supervised foundation model for normalization-robust bulk RNA-seq representation learning

Bulk RNA sequencing remains central to translational genomics, yet foundation-model development has largely focused on single-cell data. Existing transformer approaches for bulk RNA-seq often rely on expression discretization, numerical reconstruction, external gene embeddings, or restricted gene sets, limiting robustness across normalization schemes and cohorts. Here, we introduce TifBERT, a self-supervised framework for full-transcriptome bulk RNA-seq representation learning. TifBERT converts each unordered expression profile into a sample-specific gene sequence using term frequency-inverse document frequency (TF-IDF) ordering, prioritizing genes that are both highly expressed within a sample and selectively expressed across the cohort. It is then pretrained using masked gene modeling, predicting gene identities from transcriptomic context rather than reconstructing expression values. Pretrained on harmonized TCGA Pan-Cancer data spanning five RNA-seq normalization schemes, TifBERT learns contextual representations across approximately 10,000 genes without expression binning, landmark-gene restriction, or external biological embeddings. Across 33 TCGA cancer types, TifBERT achieved 90.83% accuracy, 0.996 macro AUC-ROC, and 0.903 MCC. It also captured pathway-level biology, achieving mean sample-wise and pathway-wise Pearson correlations of 0.754 and 0.762 across 1,387 PARADIGM pathway activities. Independent evaluation on GTEx healthy tissues showed preservation of tissue-level transcriptomic structure without retraining. In comparison with existing models, TifBERT achieves competitive subtype discrimination with substantially greater stability and produces markedly richer embedding geometry (effective rank 95.6 versus 6.3), without requiring expression discretization or in-distribution pretraining exposure. Together, TifBERT provides a scalable, normalization-independent foundation model for reusable bulk transcriptomic representation learning

18.
arXiv (CS.AI) 2026-06-18

Towards Understanding What State Space Models Learn About Code

arXiv:2602.06774v2 Announce Type: replace Abstract: State Space Models (SSMs) have emerged as an efficient alternative to the Transformer architecture. Prior work shows that, when trained under comparable conditions, SSMs can match or surpass Transformers on code understanding tasks. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models learn along with the direct comparison between SSM and Transformer models in this domain. Our analysis shows that SSMs capture syntactic and semantic structure more effectively than Transformers during pretraining but forgets certain relations during fine-tuning on some tasks. To investigate this behavior, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model by upto +6 MRR on NLCodeSearch. This demonstrates that our analysis not only explains model behavior but also leads directly to better designs.

19.
arXiv (CS.AI) 2026-06-18

Signals of Provenance: Practices & Challenges of Navigating Indicators in AI-Generated Media for Sighted and Blind Individuals

arXiv:2505.16057v2 Announce Type: replace-cross Abstract: AI-Generated (AIG) content has become increasingly widespread by recent advances in generative models and the easy-to-use tools that have significantly lowered the technical barriers for producing highly realistic audio, images, and videos through simple natural language prompts. In response, platforms are adopting provable provenance with platforms recommending AIG to be self-disclosed and signaled to users. However, these indicators may be often missed, especially when they rely solely on visual cues and make them ineffective to users with different sensory abilities. To address the gap, we conducted semi-structured interviews (N=28) with 15 sighted and 13 BLV participants to examine their interaction with AIG content through self-disclosed AI indicators. Our findings reveal diverse mental models and practices, highlighting different strengths and weaknesses of content-based (e.g., title, description) and menu-aided (e.g., AI labels) indicators. While sighted participants leveraged visual and audio cues, BLV participants primarily relied on audio and existing assistive tools, limiting their ability to identify AIG. Across both groups, they frequently overlooked menu-aided indicators deployed by platforms and rather interacted with content-based indicators such as title and comments. We uncovered usability challenges stemming from inconsistent indicator placement, unclear metadata, and cognitive overload. These issues were especially critical for BLV individuals due to the insufficient accessibility of interface elements. We provide practical recommendations and design implications for future AIG indicators across several dimensions.

20.
arXiv (CS.LG) 2026-06-19

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

arXiv:2505.18201v2 Announce Type: replace-cross Abstract: Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

21.
arXiv (CS.AI) 2026-06-12

Real-Time Execution with Autoregressive Policies

arXiv:2606.13355v1 Announce Type: cross Abstract: Real-time execution, enabled by asynchronous inference that ensures both smooth action trajectories and fast reactivity, is critical for realistic deployments of large-scale Vision-Language-Action models. However, recent work on real-time execution primarily focuses on variants of diffusion policies, even though it is more critical for autoregressive policies given their slower rollout speed in synchronous inference. In contrast, we demonstrate that autoregressive policies can achieve real-time execution by adjusting the tokenization horizon and applying constrained decoding, thereby guaranteeing strict latency bounds that enable multi-trajectory decoding to maximize performance. Across simulated and real-world environments, we find that the autoregressive policy consistently outperforms its equivalent-level flow-matching policy counterpart while achieving significantly improved task completion speeds from synchronous inference. Coupled with the inherent advantages of autoregressive policies, such as faster convergence and better generalizability in instruction-following, these results confirm that autoregressive policies can remain a competitive policy type supporting real-time execution.

22.
arXiv (math.PR) 2026-06-18

Formation of clusters and coarsening in weakly interacting diffusions

arXiv:2510.17629v3 Announce Type: replace-cross Abstract: This paper studies the clustering behavior of weakly interacting diffusions under the influence of sufficiently localized attractive interaction potentials on the one-dimensional torus. We describe how this clustering behavior is closely related to the presence of discontinuous phase transitions in the mean-field PDE. For local attractive interactions, we employ a new variant of the strict Riesz rearrangement inequality to prove that all global minimizers of the free energy are either uniform or single-cluster states, in the sense that they are symmetrically decreasing. We analyze different timescales for the particle system and the mean-field (McKean-Vlasov) PDE, arguing that while the particle system can exhibit coarsening by both coalescence and diffusive mass exchange between clusters, the clusters in the mean-field PDE are unable to move and coarsening occurs via the mass exchange of clusters. By introducing a new model for this mass exchange, we argue that the PDE exhibits dynamical metastability. We conclude by presenting careful numerical experiments that demonstrate the validity of our model.

23.
arXiv (CS.CV) 2026-06-18

InTrain: Intrinsic Trainability for Zero-Cost Neural Architecture Search

Training-free neural architecture search promises efficient discovery of high-performance networks without costly training. However, existing zero-cost proxies rely on fragmented heuristics that fail to capture the fundamental question: what makes an architecture trainable? This paper introduces Intrinsic Trainability (InTrain), a unified theoretical proxy that formalizes trainability as an architectural invariant emerging from two synergistic components: geometric capacity and optimization resilience. We operationalize intrinsic trainability through analysis of neural information processing. Geometric capacity is quantified via the participation ratio of activation covariance eigenspectrum, capturing the effective dimensionality of representation manifolds. Optimization resilience is measured through cumulative gradient health, assessing the robustness of backpropagation across network depth. InTrain synthesizes these dimensions through a scale-invariant multiplicative coupling, which we hypothesize is essential for capturing their synergistic, non-additive relationship. Extensive experiments on standard NAS benchmarks and search spaces demonstrate that InTrain achieves ranking correlations on par with state-of-the-art ensemble-based proxies and outperforms other single-metric methods.

24.
arXiv (CS.AI) 2026-06-12

Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting

arXiv:2606.13571v1 Announce Type: cross Abstract: Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolved from impute-then-forecast pipelines to continuous-time models such as Neural ODEs and continuous-time graph networks. While these approaches improve the modeling of historical irregularity, they still rely on an implicit oracle assumption at inference time: the timestamps of future valid observations are presumed to be known in advance. This assumption limits practical relevance, since in many real systems the more fundamental question is not only what the future value will be, but also whether a valid observation will occur at all. In this paper, we propose Timeflies, a unified framework that reformulates forecasting as a joint problem of future observability inference and value estimation. To explicitly model the interaction between observation dynamics and state evolution, Timeflies adopts an observation stream and a value stream, coupled through three dedicated modules for reliability-aware embedding, observation-guided dependency modeling, and joint prediction. We further construct Shadow, a benchmark that combines natural missingness from public datasets with real-world industrial data, and introduce the Observation-Value Joint Entropy (OVJE) metric to comprehensively evaluate this coupled predictability. Extensive experiments show that Timeflies consistently outperforms existing methods, highlighting the importance of explicitly modeling future observability in time series forecasting with missing values. Code and dataset are available in https://github.com/ant-intl/Timeflies.

25.
bioRxiv (Bioinfo) 2026-06-14

Generative design of antigen-specific T-cell receptor sequences with a conditional diffusion model

T cell receptor (TCR)-based immunotherapy holds immense potential for treating cancers and infectious diseases, where highly antigen-specific TCR recognition is crucial for adaptive immunity against tumors and pathogens. Engineering or de novo generation of the complementarity-determining region 3 (CDR3) loops of TCRs using artificial intelligence offers a powerful alternative to designing reactive TCRs rather than laborious experimental screening. However, current in silico approaches are constrained by weak conditional guidance, limited flexibility, and a lack of rigorous functional validation. To address these limitations, we introduce TCRDiff, a generative diffusion framework for designing antigen-specific TCRs conditioned on peptide-MHC (pMHC) targets and germline-encoded variable genes. By leveraging pre-trained knowledge from massive T-cell repertoires and TCR-pMHC recognition data, TCRDiff generates CDR3{beta} sequences with state-of-the-art fidelity to native binding TCRs through a denoising diffusion process. Furthermore, incorporating the interface geometry features generated TCR-pMHC complexes with superior structural plausibility. As a proof of concept, we deployed TCRDiff in a systematic pipeline to design candidate TCRs for immunotherapy. In vitro activation assays validated that TCRDiff-generated TCRs specifically recognize the MAGE-A3 epitope with minimized off-target cross-reactivity. Together, TCRDiff establishes a powerful, validated computational paradigm to accelerate the development of TCR-based immunotherapies.