Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-18

Cardiac rhythm development: A wearable device index of risk for physical and mental illness in adolescence

Objective. The autonomic nervous system, which regulates cardiac rhythm, undergoes pronounced maturation across adolescence. How cardiac rhythm develops over this period, however, and whether individual differences in its development forecast mental and physical illness, remain open questions. We used three waves of Fitbit data from the Adolescent Brain Cognitive Development (ABCD) Study to characterize the developmental trajectory of the cardiac rhythm and to test whether variation in that trajectory predicts onset of psychopathology and cardiometabolic disease. Methods. 8,301 adolescents contributed 242,811 valid Fitbit wear days across Waves 2 (Mage=12), 4 (Mage=14), and 6 (Mage=16). Cosinor mixed-effects models yielded three rhythm parameters per session: mesor (24-hour mean), amplitude (diurnal swing), and acrophase (peak timing). We first characterized age- and sex-specific trajectories, cross-wave stability, and factors shaping the rhythm. We then used parallel-process latent growth models to test whether within-person changes in rhythm tracked symptom trajectories, and hierarchical logistic models to test whether rhythm parameters predicted the first clinical onset of psychopathology and of obesity and hypertension. Results. The cardiac rhythm changed substantially across adolescence: mesor decreased, amplitude flattened, and acrophase shifted later. Within-person change in the rhythm tracked change in blood pressure, BMI, and trajectories of depression and ADHD symptoms. Higher mesor predicted incident onset of all five outcomes controlling for demographics, baseline symptoms, and behavior (ORs 1.36-1.54); amplitude, acrophase, and rhythm instability conferred additional risk. Conclusions. The 24-hour cardiac rhythm is a passively measurable substrate of adolescent autonomic development that indexes transdiagnostic risk for psychiatric and cardiometabolic illness.

02.
arXiv (math.PR) 2026-06-18

Second-Order Approximation of Limit Order Books in a Single-Scale Regime

arXiv:2308.00805v3 Announce Type: replace-cross Abstract: We establish a first- and second-order approximation for an infinite dimensional limit order book model in a single (critical) scaling regime where market and limit orders arrive at a common time scale. With our choice of scaling we obtain non-degenerate first- and second-order approximations for the price and volume dynamics. While the first-order approximation is given by a coupled ODE-PDE system, the second-order approximation is described in terms of an infinite-dimensional stochastic evolution equation driven by a cylindrical Brownian motion. The driving noise processes exhibit a non-trivial correlation in terms of the model parameters. We prove that the evolution equation has a unique solution and that the sequence of standardized limit order book models converges weakly to the solution of the evolution equation. The proof uses a non-standard martingale problem. We calibrate a linearized model to market data and explain how our model can be used for deriving confidence intervals of portfolio liquidation values.

03.
arXiv (CS.LG) 2026-06-17

Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization

Authors:

arXiv:2606.17215v1 Announce Type: new Abstract: A certificate that removes outliers sees the data only through its low-degree moments, and an adversary exploits exactly this, hiding corruption where the clean data already looks typical, in the blind spot no bounded-degree test resolves. That blind spot turns out to have an exact size: the Christoffel function of the clean marginal, the very quantity modern data analysis thresholds to detect outliers, here read from the adversary's side as the corruption a bounded-degree certificate cannot remove. We turn this inversion into the organizing principle of the reweighted-hinge approach to robustly learning $\gamma$-margin halfspaces under malicious noise (Shen, 2025; Zeng and Shen, 2025): the governing resource is the Sum-of-Squares degree of the outlier-removal certificate, and the resolution principle states that the maximal corruption mass which can hide at a center $c$ from a degree-$2t$ certificate is exactly the Christoffel function $\lambda_{t+1}(c)$ of the clean marginal. Three consequences follow, all against the certificate method (not information-theoretic). A margin-degree tradeoff: certifying the dense pancake to error $\epsilon$ costs SoS degree $\Omega(\log(1/\epsilon))$ or margin $\Omega(\sqrt{\log(1/\epsilon)}/\sqrt{d})$, explaining why the $\log(1/\epsilon)$ margin Shen (2025) records is forced, with a weighted-Chebyshev reduction making the threshold $2t=\Theta((|c|/s)^2)$ tight modulo one classical weighted-extremal estimate. A degree-$2$ outlier barrier: the resolution principle realized as an explicit instance on which degree $2$ is stuck at $\eta^{1/2}$ while degree $4$ escapes, locating the method's small breakdown rate in the degree, not the analysis. And a degree-$2t$ algorithm tracing the frontier $\eta^{1-1/2t}$ (recovering Shen (2025) at $t=1$), whose gain is an explicit constant, capped by the pancake density and shown unimprovable by the degree-$2$ barrier.

04.
arXiv (CS.CV) 2026-06-25

Multilingual Hematology Visual Question Answering Dataset

Vision Language Models (VLMs) have shown promising capabilities in medical image analysis by jointly understanding visual and textual information for tasks such as Visual Question Answering. However, existing hematology vision-language resources remain predominantly English centric, limiting their applicability in multilingual healthcare environments. This challenge is releveant generally to South Asia and specifically to Pakistan, where Urdu is widely used despite healthcare information and digital medical systems being largely dependent on English. To investigate this gap, we conducted a survey among healthcare professionals, which revealed substantial language mismatches between clinical documentation and patient communication, emphasizing the need for multilingual healthcare technologies. To address this limitation, we introduce WBCMor VQA, a clinically validated bilingual English, Urdu morphology aware VQA benchmark for leukemia and normal white blood cell analysis. The benchmark is constructed using morphology-aware annotations from LeukemiaAttri and WBCAtt datasets and supported by a domain specific Urdu hematology dictionary to ensure linguistic consistency and clinical correctness. The final benchmark contains 110K bilingual question answer pairs serving as VQA annotations for 20K leukemic and normal single-cell images. Furthermore, we establish baseline performance by evaluating multiple open-source VLMs on the proposed benchmark. The proposed resource aims to facilitate the development of accessible and clinically relevant AI systems for multilingual healthcare environments.

05.
arXiv (CS.AI) 2026-06-18

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

arXiv:2606.19079v1 Announce Type: new Abstract: The increasing deployment of parameter-efficient fine-tuning (PEFT) has led to model ecosystems in which a single backbone is paired with many task-specialized adapters. In this setting, inference-time queries often arrive without task labels, requiring the system to automatically select the most appropriate adapter from a growing and heterogeneous adapter pool. Existing routing methods either depend on access to adapter internals, such as weight decompositions or gradient-based statistics, or require additional router training, which limits scalability and portability as new adapters are added. We introduce ARIADNE, a training-free, adapter-agnostic routing framework for dynamic adapter selection at inference time. ARIADNE represents each adapter through a set of centroids computed from embeddings of its training set, capturing the data distribution associated with that adapter. Given an unlabeled input, it selects an adapter by measuring proximity to these centroids in latent space. Because routing is performed entirely in the input embedding space, ARIADNE is compatible with arbitrary PEFT methods and requires no modification to the adapters or training procedures. Primarily evaluated with Llama 3.2 1B Instruct on 23 diverse NLP tasks, ARIADNE recovers 97.44% of the upper bound performance. Scaling to 44 tasks, it achieves 89.7% average selection accuracy, without additional training or access to adapter internals.

06.
arXiv (math.PR) 2026-06-16

Riviera model with egoistical settlers

arXiv:2606.16791v1 Announce Type: cross Abstract: The Riviera model mimics a densifying settlement along the coastline. In the lattice version, houses are built sequentially in empty sites with the constraint that every newly built house has at least one empty neighboring site. The distribution of clusters of adjacent houses does not obey a closed set of evolutionary equations, but the void-cluster-void distribution does. We compute the latter and extract the cluster distribution from it. In the jammed state, when all voids have length one and the evolution ceases, the cluster distribution has a neat form and exhibits a factorial decay with the length of the cluster. To investigate finite systems, we employ a static approach directly treating jammed states. If the coastline is a finite segment, we determine the statistics of the number of empty sites in the jammed state (the average, variance, and higher cumulants). We also study a continuum version in which houses are built along the line so that each newly built house is sufficiently separated from at least one neighboring house.

07.
bioRxiv (Bioinfo) 2026-06-08

DDI_single: Single-Sequence-Based Protein Domain Assembly

Authors:

Domains are the basic units of protein structure and function. Appropriate inter-domain organization is critical to enable cooperative execution of multiple related functions. It is thus a crucial step to determine the full-length structure of multi-domain proteins for the purpose of elucidating their functions and designing new drugs to regulate these functions. Existing structure prediction algorithms are generally better at solving the internal conformation of domains, rather than modeling the relative positions between domains. To address the challenge of accurately determining multi-domain protein conformations, we develop a single-sequence-based domain assembly algorithm called DDI_single. DDI_single directly extracts features from the amino acid sequence using the protein language model ESM-1b, and accurately predicts the interactions between residue pairs of structural domains through a novel gated cross-attention module, thus achieving the correct assembly of structural domains. With the knowledge of domain definition, DDI_single achieves more than 20% higher accuracy in the task of predicting the relative distances of residue pairs between domains than that of the single-sequence-based structure prediction algorithm trRosettaX_single. When assembling domains with known spatial conformations, DDI_single correctly assembles 74.4% of the samples in the test set (TM-score>0.5). When assembling domains with unknown spatial conformations, in cases where the internal spatial conformations of domains are correctly modeled, DDI_single correctly assembles 73.9% of the samples.

08.
arXiv (quant-ph) 2026-06-16

Efficient Implementation of a Single-Qutrit Gate Set via Coherent Control

arXiv:2507.06860v2 Announce Type: replace Abstract: Qutrits offer the potential for enhanced quantum computation by exploiting an enlarged Hilbert space. However, the synthesis of high-fidelity and fast qutrit gates, particularly for single qutrits, remains an ongoing challenge, as it involves overcoming intrinsic constraints in quantum platforms. Here, we develop a novel framework for the efficient implementation of a single-qutrit gate set via coherent control, leveraging SU(3) dynamics while obviating platform-specific constraints such as those arising from the selection rule. As a proof-of-principle demonstration, we realize 35-ns qutrit Hadamard and X gates using a superconducting transmon, achieving an average fidelity of 99.5\%, as verified by randomized benchmarking. We further demonstrate two paradigmatic quantum circuits, which can be naturally extended to scalable qudit algorithms for phase estimation and parity check. In addition, we propose an SU(3)-based decomposition strategy for an arbitrary single-qutrit gate and numerically demonstrate its substantial efficiency improvement over conventional SU(2)-based protocols. By addressing the challenge of efficiently implementing single-qutrit gates, our protocol paves the way for realizing high-performance qutrit processors in diverse quantum platforms.

09.
arXiv (CS.LG) 2026-06-12

Deep Learning-based Algebraic Reynolds Stress Closures for RANS Simulations of Turbulent Flows

arXiv:2605.26358v2 Announce Type: replace-cross Abstract: Turbulence is ubiquitous in engineering and science, yet direct simulation is prohibitively expensive. The Reynolds-averaged Navier-Stokes (RANS) equations provide savings exceeding ten orders of magnitude but introduce unclosed terms (the closure problem). Offline-trained machine-learning (ML) closures suffer distribution shift in predictive simulations, while ML methods that bypass the governing equations struggle to generalise from scarce high-fidelity data. We develop a physics-derived deep learning closure model for RANS, the Deep Algebraic Reynolds Stress Model (DARSM), which can be trained on small datasets and accurately generalise across Reynolds numbers, to unseen geometries, and to different flow regimes. A neural network maps flow invariants to empirical parameters in an implicit algebraic Reynolds stress equation, derived from the Reynolds stress transport equations under the weak-equilibrium assumption, imposing physics-based structure on the ML closure. End-to-end optimisation through the governing PDEs and the coupled implicit closure eliminates distribution shift, but both unrolled and implicit automatic differentiation fail on the stiff coupled solver. We derive adjoint equations that exploit the solver's implicit-explicit structure for efficient optimisation. On canonical square-duct and periodic-hill benchmarks, DARSM reduces average test velocity error over baseline RANS by $2$-$4\times$ across Reynolds number, geometries, and flow regimes, with peak case-level reductions of $12\times$. The model trained on attached, anisotropy-dominated flows (square duct) accurately generalises without retraining to separated flows (periodic hills), a regime change in the underlying physics. DARSM also outperforms five established ML methods: offline training, tensor-basis neural networks, field-inversion machine learning, DeepONets, and physics-informed neural networks.

10.
medRxiv (Medicine) 2026-06-22

Referral pathways, ETAT triage acuity, and inpatient outcomes among children presenting to a national tertiary paediatric emergency unit in Ghana: a prospective cohort study

Emergency referral systems in sub-Saharan Africa are fragmented, and children reaching tertiary facilities through different referral pathways often arrive in advanced clinical states. Prospective data simultaneously characterising referral patterns, triage acuity at presentation, diagnostic case mix, and inpatient mortality at a national tertiary paediatric emergency unit are lacking from West Africa. This prospective cohort study enrolled 675 consecutively presenting children aged one month to 12 years at the Paediatric Emergency Unit of Korle Bu Teaching Hospital, Accra, Ghana, from February to December 2019. The primary outcome was all-cause inpatient mortality. Key variables collected included referral status and facility tier, Emergency Triage Assessment and Treatment (ETAT) triage category, ICD-10 diagnostic classification, Oyedeji socioeconomic classification, and time from symptom onset to PEU registration. Crude odds ratios were computed for all candidate predictors. Multivariable logistic regression was conducted using complete case analysis (n = 613). Of 675 children, 63.0% (n = 425) were referred from another health facility; referred children had higher ETAT emergency triage category rates than self-presenting children (32.7% vs 27.6%, p < 0.001). Overall inpatient mortality was 9.9% (67/675). Mortality varied by referral source: 16.7% among secondary/regional hospital referrals, 11.0% among lower-tier facility referrals (district, municipal, CHAG, polyclinic, private, health centre, and maternity home facilities combined, n = 356), 7.6% among self-presenting children, and 7.4% among tertiary referrals. Overall, 30.8% of children were classified as ETAT emergencies on arrival, with case fatility rate of 21.6%. The three most common diagnostic domains were respiratory conditions (17.2%), blood and haematological disorders (17.0%), and digestive presentations (16.4%). Inpatient mortality was highest in neoplastic disease (33.3%, n = 30) and circulatory presentations (31.0%, n = 29). In the primary multivariable analysis (n = 613, 51 events; events-per-variable ratio 4.2), no referral tier was independently associated with inpatient mortality after adjustment. Referral from secondary/regional hospitals showed a borderline non-significant association (adjusted odds ratio 3.09, 95% CI 0.96 to 9.90, p = 0.058). School going children (60-119 months) had higher odds of inpatient death than infants (adjusted odds ratio 5.56, 95% CI 1.16 to 26.53, p = 0.032), as did adolescents (adjusted odds ratio 10.01, 95% CI 2.15 to 46.69, p = 0.003). ETAT emergency category and lower socioeconomic status were not independently significant in this model. A pre-specified sensitivity analysis using the full analytic cohort (n = 674, events-per-variable ratio 6.7) with collapsed referral categories did not confirm any referral tier association; ETAT emergency category and lower SES were independently associated in the sensitivity model. All multivariable estimates should be regarded as exploratory. This prospective cohort provides simultaneous characterisation of referral patterns, ETAT triage acuity, diagnostic case mix, and inpatient mortality at a national tertiary paediatric emergency unit in West Africa. The referral-mortality gradient and high ETAT emergency category proportion document the severity of illness arriving through different referral pathways at this facility. The association between secondary/regional hospital referral and inpatient mortality is hypothesis-generating and requires replication in an adequately powered multicentre study before any service-level conclusions can be drawn.

11.
arXiv (CS.AI) 2026-06-12

PRISMR: Overcoming Parse Collapse in Multimodal Listwise Ranking via Parameterized Representation Internalization

arXiv:2606.12942v1 Announce Type: new Abstract: Generative listwise ranking with Large Multimodal Models (LMMs) aims to capture global list context in a single forward pass, but its effectiveness degrades in long-context multimodal scenarios. We identify a recurring failure mode, parse collapse, where the autoregressive decoder produces fluent yet incomplete rankings by silently omitting candidates and terminating early. This failure stems from limited context utilization rather than simple formatting mistakes, making prompt engineering and constrained decoding insufficient. We propose PRISMR (Parameterized Representation Internalization for Semantic Multimodal Ranking), a framework that replaces transient in-context list processing with parametric structural conditioning. PRISMR uses a lightweight hypernetwork to encode multimodal candidates in parallel and generate item-specific LoRA weights, which are synthesized into an instance-specific adapter for a LMM. This paradigm enables more robust internalization of list structure while preserving the base model. We further introduce a large-scale multimodal review-ranking benchmark for evaluation. Experiments demonstrate that PRISMR substantially reduces parse collapse, improves listwise ranking performance, and transfers effectively across domains and instruction-tuned backbones.

12.
arXiv (CS.LG) 2026-06-18

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

arXiv:2606.19329v1 Announce Type: cross Abstract: We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

13.
arXiv (quant-ph) 2026-06-15

Spin-orbit coupling by design in quantum state engineering of atomically defined quantum dots

arXiv:2606.14487v1 Announce Type: cross Abstract: Tuning spin-orbit coupling is essential in controlling both spin and charge in confined semiconductor nanostructures, yet it is rarely a truly controllable parameter. Here, we show control over the spin-orbit Hamiltonian in quantum dots and the resulting quantum states by tailoring the confinement potential with atomic-scale precision. Using scanning tunnelling microscopy and spectroscopy, we pattern individual Cs ions into designer quantum dot structures on the surface of indium antimonide, in which electrons from a two-dimensional electron gas are confined with chosen in-plane electric-field gradients. We then quantify the atomic level structure, both spatially resolving the orbital character of the electronic states and their magnetic-field evolution. We demonstrate that the level structure, including the induced zero-field splitting, can be tailored by the designed geometry of the local electric fields. These effects can be described using a Hamiltonian that allows consistent treatment of the confinement-induced spin-orbit coupling beyond the conventional Bychkov-Rashba description. This Hamiltonian is derived from a multiband k.p model and takes the energy dependence of the relevant physical parameters into account. Such precise control of spin-orbit coupling in semiconductor quantum dots is relevant to quantum and spintronic technologies.

14.
arXiv (quant-ph) 2026-06-25

Preparing two-mode magnonic Schrödinger cat states in a cavity-magnon-qubit system

arXiv:2606.25511v1 Announce Type: new Abstract: The cavity-magnon-qubit system has recently been demonstrated as a new platform for preparing macroscopic quantum states in magnonic systems. Here, we propose to prepare a two-mode magnonic cat state, which is also a non-Gaussian entangled state, based on this practical system involving two yttrium-iron-garnet (YIG) spheres and a superconducting qubit coupled to a common microwave cavity. By adiabatically eliminating the cavity and resonantly driving the qubit, an effective magnon-qubit conditional-displacement interaction is achieved. Further working in the magnon-magnon strong-coupling regime and considering two identical magnon frequencies and coupling strengths to the cavity, two hybridized magnon modes are formed, of which the bright mode is prepared in a cat state after a projective measurement on the qubit, while the dark mode remains in its initial vacuum state. Such a state corresponds to a two-mode cat state of two original magnon modes, which share strong non-Gaussian entanglement. We also discuss practical dissipation and dephasing effects on the cat state. The results indicate that strong nonclassicality and non-Gaussian entanglement are present in the two-mode cat state using fully feasible parameters.

15.
arXiv (CS.CV) 2026-06-25

Efficient Real-World Dehazing via Physics-Inspired Global-Local Decoupling

Real-world single image dehazing is highly ill-posed due to spatially and spectrally varying scattering, while practical deployment demands lightweight and low-latency models. Existing approaches either rely on fragile physical inversion under simplified assumptions or adopt heavy blind architectures unsuitable for edge deployment. To overcome these limitations, we propose PGL-Net (Physics-Inspired Global-Local Decoupling Network), a lightweight framework that incorporates physical inductive biases via operator-level emulation, avoiding explicit parameter estimation. It decouples dehazing into global distribution rectification and local structural refinement. A Physics-Inspired Affine Fusion (PAF) module performs globally conditioned alignment across hierarchical skip connections to compensate for haze-induced bias, while a compact Degradation-Aware Modulation (DAM) block adaptively restores spatially and spectrally variant details through dynamic feature modulation. Extensive experiments on multiple real-world benchmarks demonstrate that PGL-Net achieves state-of-the-art restoration quality with significantly reduced complexity. Compared with the recent SOTA SGDN, the Tiny variant (PGL-Net-T) improves PSNR by up to 2.6dB and consistently enhances downstream object detection accuracy, while achieving over a 10x reduction in inference latency. Code is publicly available at: https://github.com/sc-30-bit/PGL-Net.

16.
medRxiv (Medicine) 2026-06-22

Paired plasma and EV-enriched plasma proteomics reveal nonredundant sepsis-associated host-response signatures in critical illness

Background: Plasma proteomics may identify host-response signatures in sepsis, but it is unclear whether extracellular vesicle (EV)-enriched plasma provides distinct or redundant information compared with plasma. We compared paired plasma and EV-enriched plasma proteomes in critically ill patients with sepsis and critically ill non-sepsis controls (CINS). Methods: In this prospective observational study, paired plasma and EV-enriched plasma samples were analyzed from 56 critically ill adults, including 40 patients with sepsis and 16 CINS patients. Protein abundance was quantified using liquid chromatography-tandem mass spectrometry. Analyses compared proteomic depth, protein overlap, global concordance between compartments, and differential protein abundance between CINS and sepsis. Exploratory Gene Ontology enrichment was performed as a supplementary analysis. Results: EV-enriched plasma expanded proteomic detection, identifying 2,476 filtered proteins compared with 506 in plasma. Only 386 proteins were detected in both compartments, while 2,090 were unique to EV-enriched plasma and 120 were unique to plasma. Among shared proteins, plasma and EV-enriched plasma showed modest global concordance across critically ill patients (Spearman coeff = 0.322, p = 9.19 x 10^-11), with similar findings in sepsis alone. Differential abundance analysis identified 11 sepsis-associated proteins in plasma and 22 in EV-enriched plasma. Only SAA1, SAA2, and IGFBP6 were significant in both compartments. Exploratory pathway analysis supported acute-phase and inflammatory enrichment in plasma sepsis-associated proteins, while EV-enriched signals were directionally plausible but did not meet prespecified FDR thresholds. Conclusion: Plasma and EV-enriched plasma proteomics capture related but nonredundant sepsis-associated host-response information in critically ill patients.

17.
arXiv (CS.AI) 2026-06-11

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

arXiv:2603.09715v2 Announce Type: replace Abstract: Visual instruction tuning is crucial for improving vision-language large models (VLLMs). However, many samples can be solved via linguistic patterns or common-sense shortcuts, without genuine cross-modal reasoning, limiting the effectiveness of multimodal learning. Prior data selection methods often rely on costly proxy model training and focus on difficulty or diversity, failing to capture a sample's true contribution to vision-language joint reasoning. In this paper, we propose CVS, a training-free data selection method based on the insight that, for high-quality multimodal samples, introducing the question should substantially alter the model's assessment of answer validity given an image. CVS leverages a frozen VLLM as an evaluator and measures the discrepancy in answer validity with and without conditioning on the question, enabling the identification of samples that require vision-language joint reasoning while filtering semantic-conflict noise. Experiments on Vision-Flan and The Cauldron show that CVS achieves solid performance across datasets. On Vision-Flan, CVS outperforms full-data training by 3.5% and 4.8% using only 10% and 15% of the data, respectively, and remains robust on the highly heterogeneous Cauldron dataset. Moreover, CVS reduces computational cost by 17.3% and 44.4% compared to COINCIDE and XMAS.

18.
arXiv (CS.AI) 2026-06-24

Event-Grounded Question Answering over Long Audio via Structured Retrieval

arXiv:2602.14612v4 Announce Type: replace-cross Abstract: Answering natural-language questions over multi-hour audio requires both event recognition and temporal grounding. Current large audio-language models perform well on short clips, but are limited by context length, query-time cost, and weak temporal localization. We present LA-RAG (Long Audio-Retrieval Augmented Generation), a structured framework that converts continuous audio into timestamped event records using an open-vocabulary Audio Grounding Model (AGM), stores them in a SQL event database, and answers queries through intent-aware retrieval followed by LLM-based generation. LA-RAG supports offline grounding mode, where long recordings are pre-indexed for low-latency QA, and inference-time grounding mode, where query-conditioned grounding is performed for shorter open-ended clips. We create 24-hour Home-IoT and Industrial-IoT audio benchmarks and augment CASTELLA, a real-world audio moment retrieval dataset with QA pairs. In offline grounding mode, LA-RAG achieves 76.88% overall accuracy on Home-IoT and 71.10% on Industrial-IoT, with average query latencies below 0.6 seconds. In inference-time grounding mode, state-of-the-art LALMs achieve competitive event-detection accuracy on CASTELLA-QA but low temporal detection F1. We further show that LALMs augmented with our structured retrieval metadata achieve consistent temporal detection improvements, with F1 gains of 11-17% across baseline models with improved latency. These results show that explicit timestamped grounding and structured retrieval provide a practical complement to generative audio-language models for deployment-oriented long-audio QA.

19.
bioRxiv (Bioinfo) 2026-06-20

RNAStabFormer: Region-Aware Multi-Task Hybrid Learning for RNA Stability Prediction from Pulse-Chase Transcriptomics

Authors:

RNA stability is a central layer of post-transcriptional gene regulation, yet large-scale stability labels derived from pulse-chase transcriptomics depend strongly on quantification region, time-window definition, and replicate quality control. We present RNAStabFormer, a controlled learning framework for predicting human RNA stability proxies from transcript sequence. Its core model, RAMHT, combines region-specific nucleotide Transformer encoders for CDS, and sequence, a CDS codon stream, engineered sequence-grammar features, gated fusion, and four task-specific regression heads. We construct four strict consensus labels from ENCODE BrU-seq/BruChase-seq data by crossing gene-sense and exon-sense quantification with late-chase 6 h/2 h and total-chase 6 h/0 h retention ratios, and evaluate all models on fixed repeated-random and chromosome-holdout splits. Across chromosome holdouts, XGBoost remains the strongest standalone model, with median Pearson correlations of 0.504, 0.544, 0.546, and 0.778 on the four labels. RAMHT is competitive with raw-sequence deep models but does not universally exceed engineered-feature baselines. A strict nested RAMHT–XGBoost blend nevertheless improves gene total-chase prediction by 0.017 mean Pearson and exon late-chase prediction by 0.004 mean Pearson over XGBoost. Region and mechanism analyses show that CDS, local k-mer composition, and codon-sensitive signals dominate predictive information. RNAStabFormer therefore provides both a multi-task neural model and a leakage-controlled evaluation protocol for RNA stability prediction from pulse-chase data.

20.
arXiv (CS.LG) 2026-06-18

Decomposing Prediction Mechanisms for In-Context Recall

arXiv:2507.01414v2 Announce Type: replace Abstract: We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

21.
arXiv (CS.AI) 2026-06-24

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

arXiv:2606.24551v1 Announce Type: new Abstract: Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

22.
arXiv (CS.AI) 2026-06-16

Runtime Analysis of Cartesian Genetic Programming in Evolving Boolean Functions

arXiv:2606.15923v1 Announce Type: cross Abstract: Cartesian Genetic Programming (CGP) is among the practical and popular forms of Genetic Programming as it uses a graph-based representation of programs. This paper presents a first runtime analysis of CGP in evolving Boolean functions using complete training sets. We prove an asymptotic bound $O(n D^5)$ for the expected number of fitness evaluations of CGP to construct a conjunction of $n$ inputs using at most $D \geq n-1$ binary gates, a minimal function set, and even with a strict survival selection. When the non-strict selection is used, the bound is improved to $O(n D^4)$. Our analysis reveals interesting characteristics of CGP induced search, which have been only observed empirically. In particular, enabling the acceptance of equally good solutions, including those with connected gates non-contributing to fitness, can lead to a speedup, and consequently a better asymptotic time bound. In contrast to conjunctions, we also prove a negative result which shows that CGP requires exponential time to evolve an exclusive disjunction. Experiments evolving conjunctions complement our theoretical findings. The use of incomplete training sets is found to further reduce the average number of fitness evaluations while maintaining a good level of generalisation.

23.
arXiv (CS.CV) 2026-06-18

APT: Atomic Physical Transitions for Causal Video-Language Understanding

Physical events are not understood by their names alone, but by the causal state changes that compose them. A clip-level label such as "bounce" can be correct while hiding the process that makes the event physically valid, from support loss and contact onset to rebound and settling. To make this hidden process explicit, we introduce Atomic Physical Transitions (APTs): minimal, temporally localized state changes that bind a visible cue to an active physical mechanism and before/after dynamical regimes. An APT chain represents a video as an ordered causal transition sequence rather than a single aggregate event label: event labels tell what happened; APT chains explain why it happened. To make APTs learnable by VLMs, we construct mixed-source APT data from human annotations and simulator ground truth, covering 14 transition types across contact, gravity, friction, and rotation/stability, with 27,303 timed instances over 1,246 trials. Using this data, we find that current VLMs miss transition-level physics, with zero-shot recall at most 14% and errors dominated by missed transitions. Direct fine-tuning on APT chains improves transition detection but causes event-level forgetting, indicating that the model learns a specialized answer format rather than a reusable physical representation. We therefore propose APT-Tune, a parameter-efficient recipe that teaches VLMs to use causal transitions without forgetting how to answer video questions. It combines image-pad-aware supervision, format-conditional co-training, and mechanism-conditioned domain-to-type decoding to make APT learning format-robust and physically grounded. With only 11 M LoRA parameters on Qwen3-VL-2B, APT-Tune substantially improves APT recall while also improving event-level video transfer. These results show that APTs are not a new answer format, but a human-aligned causal supervision signal for physical video understanding.

24.
arXiv (CS.CL) 2026-06-16

Why Tree-Style Branching Matters for Thought Advantage Estimation in GRPO

Group Relative Policy Optimization (GRPO) trains Chain-of-Thought reasoning with verifiable rewards, but estimating thought-level advantages without value functions often suffers from high variance. Although tree-style branching is used in practice to reduce variance, it lacks a theoretical explanation of why it works and whether it is important or potentially necessary. We study thought-level advantage estimation in GRPO from a variance perspective under a minimal tree-style setting where multiple continuations are sampled for each thought. Using the multivariate delta method, we reveal a sampling-dimension asymmetry. Increasing sampled thoughts ($K$) leaves a strictly positive estimation-variance floor, whereas increasing continuations per thought ($M$) drives the leading-order estimation variance to zero at rate $1/M$. This implies that, within the fixed-temperature GRPO-style estimator without value models studied here, accurate thought-level advantage estimation cannot be achieved by scaling thought sampling alone, making continuation-level branching a principled and potentially necessary mechanism rather than a heuristic. Experiments further provide empirical evidence for its effectiveness and potential necessity, demonstrating improved optimization stability, training efficiency, and final performance not only in math but also across vision domains and under different model architectures and sizes.

25.
arXiv (CS.LG) 2026-06-24

Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control

arXiv:2601.02896v3 Announce Type: replace Abstract: Controlling emergent behavioral personas (e.g., sycophancy, hallucination) in Large Language Models (LLMs) is critical for AI safety, yet remains a persistent challenge. Existing solutions face a dilemma: manual prompt engineering is intuitive but unscalable and imprecise, while automatic optimization methods are effective but operate as "black boxes" with no interpretable connection to model internals. We propose a novel framework that adapts gradient ascent to LLMs, enabling targeted prompt discovery. In specific, we propose two methods, RESGA and SAEGA, that both optimize randomly initialized prompts to achieve better aligned representation with an identified persona direction. We introduce fluent gradient ascent to control the fluency of discovered persona steering prompts. We demonstrate RESGA and SAEGA's effectiveness across Llama 3.1, Qwen 2.5, and Gemma 3 for steering three different personas, sycophancy, hallucination, and myopic reward. Crucially, on sycophancy, our automatically discovered prompts achieve significant improvement (49.90% compared with 79.24%). By grounding prompt discovery in mechanistically meaningful features, our method offers a new paradigm for controllable and interpretable behavior modification. We release our scripts for RESGA and SAEGA in this github repo: https://github.com/HarshSaini10/RESGA_SAEGA.