Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-12

Measurement Plasticity: Sensor-Level Adaptation for Vision-Language Models

We propose Multi-View Physical-prompt (MVP) for Test-Time Adaptation (TTA), a forward-only framework that moves TTA from tokens to photons by treating the camera exposure triangle (i.e., ISO, shutter speed, and aperture) as physical prompts. At inference, MVP acquires selected multiple physical views using a source-affinity score, evaluates digitally augmented variants of each retained view and filters the lowest-entropy predictions, and aggregates predictions with hard voting. This selection-then-vote design is simple, calibration-friendly, and requires no gradients or model modifications. On ImageNet-ES and ImageNet-ES-Diverse, MVP outperforms digital-only TTA on both Auto-Exposure and a combination with conventional sensor control. MVP remains effective under reduced parameter candidates that lower capture latency, demonstrating its practicality.

02.
arXiv (math.PR) 2026-06-16

Steady-State Approximation Error of Heterogeneous Mean-Field Models

Authors:

arXiv:2606.09022v2 Announce Type: replace Abstract: This paper studies heterogeneous mean-field models in which agent parameters are sampled from a population distribution. We establish an $O(1/M)$ bound on the steady-state mean-square error between the occupancy measure of the $M$-agent system and the corresponding annealed mean-field equilibrium. The analysis extends Stein's method for homogeneous mean-field models and reveals a fundamental difference between homogeneous and heterogeneous systems. While stability of the mean-field dynamics is sufficient in the homogeneous setting, heterogeneous systems further require uniform robustness of the occupancy dynamics with respect to perturbations of the initial condition. The results are illustrated through a heterogeneous SIS epidemic model.

03.
arXiv (CS.CV) 2026-06-24

Hybrid Event Frame Sensors: Modeling, Calibration, and Simulation

Hybrid event-frame sensors integrate an Event Vision Sensor (EVS) and an Active Pixel Sensor (APS) within a single chip, combining the high dynamic range and low latency of the EVS with the rich spatial intensity information from the APS. While this tight integration offers compact and temporally precise imaging, the complex circuit architecture introduces nontrivial noise patterns that remain poorly understood and unmodeled. In this work, we present the first unified statistics-based imaging noise model that jointly describes the noise behavior of APS and EVS pixels. Our formulation explicitly incorporates photon shot noise, dark current noise, fixed-pattern noise, and quantization noise, and links EVS noise to illumination level and dark current. Based on this formulation, we further develop a calibration pipeline to estimate noise parameters from real data and provide a detailed analysis of both APS and EVS noise behaviors. Finally, we propose H-ESIM, a statistically grounded simulator that generates RAW frames and events under realistic jointly calibrated noise statistics. Experiments on two hybrid sensors validate our model across multiple imaging tasks, including video frame interpolation and deblurring, demonstrating strong transfer from simulation to real data.

04.
arXiv (CS.LG) 2026-06-19

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

arXiv:2606.20437v1 Announce Type: cross Abstract: Charged-particle tracking – reconstructing trajectories from sparse detector measurements – is a fundamental high-energy-physics inference problem and a canonical example of learning under extreme combinatorial ambiguity. At the High-Luminosity Large Hadron Collider (HL-LHC), tracking must remain accurate and efficient despite unprecedented collision densities. Graph neural networks perform strongly, but incur substantial costs from graph construction and processing, while transformer-based approaches rely on auxiliary stages that prevent end-to-end optimization. To address this, we present HEPTv2, an end-to-end point-transformer architecture that reconstructs tracks from detector hits in one trainable pipeline. HEPTv2 combines a locality-aware point encoder with a track decoder that predicts complete trajectories without graph-building, clustering, or filtering. The encoder uses locality-sensitive hashing in detector coordinate space to preserve tracking-relevant geometry while enabling efficient local attention. The decoder resolves ambiguities through sectorized decoding and direct hit-to-track prediction under joint encoder-decoder supervision, allowing the full pipeline to be optimized end-to-end. On TrackML, HEPTv2 achieves 98.6% double-majority tracking efficiency at a 0.8% fake rate, while requiring only $\sim$15~ms inference time and 0.4~GB peak memory per event on a NVIDIA A100 GPU. Latency and memory scale approximately linearly for events with up to $5\times10^5$ hits. HEPTv2 establishes a new state of the art in the accuracy-latency trade-off, improving efficiency by 4.5% over the strongest prior transformer and by 1.1–2.2% over optimized graph-based pipelines, while reducing latency by factors of 7 and 38–52, respectively. These results show end-to-end transformers can deliver the accuracy and efficiency required for real-time particle reconstruction at the HL-LHC.

05.
bioRxiv (Bioinfo) 2026-06-11

A Deep Hypergraph Learning Model for Predicting Antimicrobial Combination Effects Across Bacterial Targets

Antimicrobial resistance (AMR) creates an urgent need for efficient strategies to identify effective antibacterial combinations. Combination therapy, including antimicrobial peptides (AMPs) paired with conventional antibiotics, is a promising approach, but exhaustive experimental screening across drug pairs and bacterial targets is impractical. This study introduces a hybrid GCN-based hypergraph neural network (HGNN) for predicting antimicrobial-agent combination outcomes against bacterial targets. Each antimicrobial-agent-antimicrobial-agent-bacterium triplet is represented as a ternary hyperedge, enabling the model to learn context-dependent interaction patterns. The framework integrates SMILES-derived molecular graph embeddings for antimicrobial agents, including conventional antibiotics and AMPs, with taxonomy-derived bacterial representations. The prediction task was formulated as a three-class classification problem: synergy, antagonism, and non-interaction. The non-interaction class included experimentally verified indifferent records and synthetic presumed non-interaction triplets generated by negative sampling. Model development used drug-pair-grouped splitting, five-fold grouped cross-validation within the training/validation partition, and final evaluation on a held-out test set. On the held-out three-class test set, the selected GCN-based HGNN achieved an accuracy of 0.83, weighted F1-score of 0.84, macro F1-score of 0.80, and ROC-AUC of 0.95. Per-class evaluation showed accuracies of 0.80 for synergy, 0.92 for antagonism, and 0.85 for non-interaction. Pair-type analysis showed strong performance across AMP-AMP, AMP-conventional antibiotic, and conventional antibiotic-conventional antibiotic combinations. These findings suggest that hypergraph-based representation learning can support computational prioritization of antimicrobial combinations for experimental follow-up. Further studies will be needed to improve model interpretability and to perform prospective validation of predicted synergistic combinations.

06.
arXiv (CS.LG) 2026-06-16

A Conservation Law for Equilibrium Propagation and Coupled Learning

arXiv:2606.15444v1 Announce Type: cross Abstract: In this paper we show that the physical learning methods known as coupled learning (CL) and equilibrium propagation (EP) conserve a mass-like quantity in the trainable parameters in the continuous-time, small-nudging limit. We prove that this conservation holds in a broad range of physically relevant settings. We then show that the conservation law constrains the training dynamics in a way that makes convergence reliable in important settings for linear circuits. We conclude by discussing some practical implications of this conservation law.

07.
arXiv (CS.CV) 2026-06-24

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Foundation models trained on biased datasets often rely on spurious correlations between target labels and non-causal attributes, resulting in poor generalization on minority groups. Bias mitigation remains challenging due to two fundamental issues. First, when group labels are unavailable, existing group-unsupervised methods typically infer spurious attributes implicitly from model behavior, making it difficult to identify spurious factors that are semantically aligned with real-world biases. Second, even with pseudo spurious supervision, most existing debiasing methods follow a single-branch design that operates within a single shared feature space, where target and spurious attributes are intrinsically entangled. To address the first challenge, we introduce Confidence-guided Bias Concept Mining (CBCM), which leverages diffusion-disentangled, semantically grounded concept representations to identify reliable spurious attributes without attribute annotations. To address the second challenge, we propose Dual-branch Cross-projection Debiasing (DCD), a prompt-tuning framework that separates target and spurious representations into two branches and explicitly removes spurious information through cross null-space projection while preserving target-relevant semantics. Extensive experiments on four benchmark datasets show that our method achieves state-of-the-art worst group accuracy among group-unsupervised approaches, while tuning at most 0.22% of the model parameters. The source code is available in the supplementary materials.

08.
arXiv (CS.LG) 2026-06-11

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

arXiv:2512.08211v2 Announce Type: replace Abstract: Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platform for such applications because they are continuously carried by users, connected to wearable sensors, and deeply integrated with daily mobile applications. However, practical LLM fine-tuning on commodity phones remains difficult. Existing fine-tuning frameworks are largely Python-based and server-oriented, making them hard to deploy inside mobile applications. We present MobileFineTuner, a mobile-native open-source framework for end-to-end LLM fine-tuning on commodity mobile phones. MobileFineTuner is implemented in C++ and provides a reusable training stack. To make fine-tuning feasible under mobile resource constraints, MobileFineTuner integrates a resource-aware training runtime with memory-efficient attention, activation checkpointing, gradient accumulation, parameter sharding, and energy-aware scheduling. We evaluate MobileFineTuner on real mobile phones using GPT-2, Gemma 3, and Qwen2.5 models across multiple fine-tuning tasks. The results show that MobileFineTuner reproduces standard Full-FT and LoRA fine-tuning behavior, substantially reduces memory pressure and improves executability on memory-constrained phones. We further demonstrate MobileFineTuner through a private campus health-agent application, where a local LLM is fine-tuned on user-specific wearable-sensing records to provide more personalized responses while keeping raw records on the phone. These results establish MobileFineTuner as a practical toolkit for studying and building on-device LLM fine-tuning applications in embedded AI and sensing systems.

09.
arXiv (CS.AI) 2026-06-18

Learning-Based Decision Making for Combustion Phasing Control in Multi-Fuel CI Engines with Latent Fuel Reactivity Estimation

arXiv:2606.18393v1 Announce Type: cross Abstract: Multi-fuel compression-ignition engines offer fuel flexibility but introduce uncertain, time-varying fuel reactivity, represented by cetane number (CN), which complicates cycle-to-cycle combustion-phasing control. This work formulates CA50 regulation under latent CN variation as a partially observable sequential decision problem and systematically evaluates controllers with increasing temporal and representational capacity, including LinUCB, history-augmented contextual bandits, observation-only DDPG, recurrent DDPG, and a proposed GRU-guided RL framework. A Gaussian-process surrogate trained on experimental multi-fuel engine data provides a controlled and reproducible evaluation environment. Results show that myopic and fixed-history bandit methods degrade under CN variation, observation-only RL suffers from latent-state aliasing, and generic recurrence is insufficient when CN evolves rapidly. The proposed framework learns a compact GRU-based representation of fuel reactivity from combustion history and conditions both actor and critic on this estimated signal rather than oracle CN. By training the policy on the same imperfect fuel-reactivity information available at deployment, the controller avoids train-deploy inconsistency in conventional online estimate-then-control pipelines. Across unseen CN trajectories, the policy achieves stable CA50 regulation with mean absolute tracking error below 0.25{\deg} CA at the training setpoint, while producing smooth, physically consistent SOI and glow-plug-power actuation. These results show that combustion control under latent, continuously evolving fuel dynamics requires more than standalone estimation or generic recurrence. By aligning fuel-reactivity inference with control policy learning, the proposed framework enables reactivity-aware decision-making using the same estimated state available during deployment.

10.
arXiv (CS.CV) 2026-06-16

Fusing Transferred Priors and Physics-based Decomposition for Underwater Image Enhancement

The underwater images are captured within diverse water-medium conditions, leading to complex degradation, including color bias, low contrast, and blur effect. Recently, learning-based methods have demonstrated their potential for underwater image enhancement (UIE). However, most of the previous work focus on the training strategy or network design to make the enhanced result aligned well with the labels in datasets, ignoring that the labels are selected from the enhanced results of previous UIE methods and these pseudo-labels are noisy. Consequently, the performance of their models is not satisfactory to a certain extent. However, collecting the true labels of the underwater images is challenging. In this work, we propose a transfer learning-based UIE that does not require underwater images to have paired noisy or true labels for learning. Instead, the UIE task is first divided into global color correction, haze removal, and background noise suppression following the underwater physics. Then multiple types of prior from other vision tasks are leveraged as cross-domain supervision in each step. In this way, a novel UIE is available via transfer learning, and the physics-aligned UIE decomposition provides theoretical soundness. Qualitative and quantitative experiments demonstrate that our proposal based on physics and priors fusion achieves SOTA performance in the UIE task and effectively boosts downstream vision tasks, significantly outperforming benchmark methods. Project repo: https://github.com/Haru2022/P2-UIE.

11.
arXiv (CS.LG) 2026-06-19

When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting

arXiv:2606.19363v1 Announce Type: new Abstract: The deployment of Time-Series Foundation Models (TSFMs) in physical sciences is hindered by a critical trade-off: while these models encode rich, universal temporal dynamics, they suffer from severe distributional misalignment when applied zero-shot to specific scientific domains, and their computational cost prohibits deployment in edge-computing sensor networks. We address a fundamental challenge: How can we extract latent structural knowledge from misaligned foundation models (FM) to train lightweight, specialized forecasters? We propose Gated Uncertainty-Aware Routing for Distillation (Guard), a novel framework that reframes multiteacher distillation as an instance-wise decision process with two adaptive mechanisms: (1) a Contextual Router that dynamically selects the most relevant teacher based on local input statistics, exploiting complementarity across diverse foundation models; and (2) an Uncertainty-Gated Temperature mechanism that acts as a "circuit-breaker," automatically attenuating distillation strength when teacher confidence diverges from domain reality. We evaluate our proposed lightweight framework on four climate-critical domains: meteorology, ecosystem carbon flux, soil moisture, and energy grids. Our method significantly reduces RMSE relative to a fixed-weight multi-teacher distillation baseline, successfully distilling knowledge from pretrained FMs (teachers) even when they exhibit suboptimal zero-shot accuracy due to distribution shift between the original and target data domains. We demonstrate that these domain-misaligned teachers can still serve as critical correctives, outperforming the globally superior FMs on 28.5% of the hardest instances. Ultimately, this enables high-precision scientific forecasting suitable for resource-constrained edge deployment. Code is available at https://github.com/RupasreeDey/GUARD-KDD2026.

12.
arXiv (math.PR) 2026-06-18

Very large cliques in a scale-free random graph

arXiv:2606.18722v1 Announce Type: new Abstract: In this short article we consider a preferential attachment random graph model with edge steps, studied by Alves, Ribeiro and Sanchis. Starting with an initial graph $\mathbb{G}_1$ formed by a vertex with a self-loop attached to it, the model evolves as follows. At every subsequent (discrete) time step, either with probability $p$ we add a vertex to the graph and connect it to exactly one of the older vertices selected with probability proportional to its degree, or with probability $1-p$ we add one edge between two existing vertices, both selected (independently) with probability proportional to their degrees. Let $\omega(\mathbb{G})$ be the clique number of a graph $\mathbb{G}$, i.e.\ the number of vertices in a largest complete subgraph of $\mathbb{G}_{}$. Alves, Ribeiro and Sanchis showed that, for any given $\varepsilon>0$, we have $\omega(\mathbb{G}_{2t})\geq t^{\frac{1-p}{2-p}(1-\varepsilon)}$ with high probability (i.e.\ with probability tending to $1$ as $t\rightarrow \infty$). Here we strengthen this bound by showing that, for any function $f:\mathbb{N}\mapsto \mathbb{N}$ that satisfies $f(t)\rightarrow \infty$ as $t\rightarrow \infty$, with high probability \[\omega(\mathbb{G}_{2t}) = \Omega\left(t^{\frac{1-p}{2-p}}\Big(\log^{\frac{1}{2-p}}(t)f(t)\Big)^{-1}\right).\]

13.
arXiv (CS.CV) 2026-06-19

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a cost-efficient hybrid framework for temporally grounded reasoning in long videos. TimeProVe first employs lightweight modules to generate action-grounded answer–evidence hypotheses and subsequently invokes an expensive VLM only for targeted verification. The core of our framework lies in the Action-based Candidate Evidence (ACE) module, which converts temporally localized actions into query-conditioned candidate answers and supporting evidence windows through lightweight LLM reasoning. We further introduce OpenTSUBench (OTB), an open-ended benchmark designed to evaluate temporally grounded reasoning in real-world Activities of Daily Living (ADL) scenarios. Experiments show that TimeProVe outperforms the strongest baseline on OTB by 7.3%, while reducing VLM calls by 75% and inference cost by 93%. Furthermore, without explicit temporal grounding training, TimeProVe achieves competitive performance on Charades-STA, and reaches state-of-the-art results when enhanced with grounding VLMs.

14.
arXiv (CS.CV) 2026-06-16

MamBOA: State-Space Architecture for Video Recognition

Fine-grained action recognition demands temporal reasoning that general-purpose architectures address through different cost-accuracy tradeoffs: 3D dense operators couple computation to the input volume, while difference-based methods approximate motion through rigid, hand-crafted subtraction of uncontextualized features - each reflecting a deliberate design choice with corresponding limitations in expressiveness or flexibility. We present MamBOA, a backbone-agnostic temporal framework built upon a novel interleaved scan structure that recasts the selective state-space recurrence (S6) as a native motion synthesizer. By interleaving consecutive feature representations extracted from a pretrained backbone into a single alternating sequence, the proposed scan structurally drives the recurrence to encode both temporal observations of each position within a shared hidden state, separated by only a single decay step - rendering the inter-frame transition an intrinsic component of the state dynamics rather than an externally computed quantity. A cascade of dedicated alignment and decoding operations then distills this joint encoding into an explicit motion representation, which a dual-path pooling mechanism adaptively aggregates by balancing attention-driven selection with uniform temporal coverage. The framework interfaces seamlessly with CNN, Transformer, and Mamba backbone families, adding only ~2.1 GFLOPs per feature pair. On Diving48, MamBOA achieves 85.02% Top-1 accuracy with an image-pretrained backbone and 86.24% with a video-pretrained backbone processing the entire video in a single forward pass - demonstrating that structurally induced state-space dynamics constitute a principled and general foundation for motion modeling.

15.
arXiv (CS.CV) 2026-06-17

ERQA-Plus: A Diagnostic Benchmark for Reasoning in Embodied AI

Generalist embodied agents require more than object recognition: they must reason about spatial relations, actions, procedures, human intentions, environmental constraints, and commonsense consequences from situated visual observations. Yet existing visual and embodied question answering benchmarks often provide limited control over the reasoning dependencies being tested, making it difficult to distinguish grounded embodied reasoning from shortcut-driven visual or linguistic pattern matching. We present ERQA-Plus, a diagnostic benchmark for reasoning in embodied AI. ERQA-Plus contains 1,766 question-answer instances grounded in 711 robot-centric images and organized according to a structured taxonomy spanning perceptual, action-centric, social-interaction, navigation-environmental, and contextual commonsense reasoning. The dataset is constructed using a multi-stage generation and validation pipeline that combines taxonomy-guided question generation, automatic quality judging, iterative revision, and human assessment to improve visual grounding, answer validity, and reasoning quality. We benchmark representative general-purpose vision-language models and embodied models, including LLaVA-NeXT-8B, Prismatic-7B, MiniCPM-V-4.5-8B, Qwen3-VL, RoboRefer-8B, and RoboBrain2.5-8B. Although the strongest model, Qwen3-VL-32B, achieves 83.4% overall accuracy and 61.4 SBERT score, category-level results reveal persistent weaknesses in spatial reasoning, procedural reasoning, event prediction, and intention inference. ERQA-Plus therefore provides a fine-grained evaluation framework for measuring not only whether embodied agents answer correctly, but also which forms of embodied reasoning they can and cannot perform reliably. The dataset is available https://huggingface.co/datasets/huggingdas/erqa-plus and the project page at https://github.com/LUNAProject22/erqa-plus.

16.
arXiv (CS.LG) 2026-06-19

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

arXiv:2606.19699v1 Announce Type: cross Abstract: Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

17.
arXiv (CS.CL) 2026-06-15

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

As language models take integrated roles across many domains, the response of LLMs to user pushback becomes a critical alignment property. Yet many existing evaluations treat compliance as unidirectional, measuring whether models resist pressure but not whether they resist it selectively. We introduce Compliance Asymmetry (A = BCR/HCR), a bidirectional diagnostic that compares beneficial output change under helpful nudges with harmful change under misleading nudges. Across 9 models and 972,000 nudge-condition responses, we find that this selectivity differs in factual and moral judgments: models follow helpful nudges more than harmful ones on factual questions (A = 1.58), but follow both directions at nearly identical rates on moral questions (A = 1.04). This phenomenon persists across model families, capability levels, and nudging types. Interestingly, we also find that chain-of-thought prompting amplifies helpful and harmful compliance together, while identity-based prompting suppresses both by nearly identical margins. These results identify direction-blind moral compliance as a distinct failure mode in current LLMs and suggest that alignment should target directionally calibrated updating rather than lower compliance alone.

18.
arXiv (quant-ph) 2026-06-24

On the localization transition from MAA to AA models

arXiv:2606.24720v1 Announce Type: cross Abstract: Despite their potential similarity between the mosaic Aubry-André (MAA) and AA models, the MAA model allows mobility edges (MEs), whereas the AA model does not. Here we develop a new double quasiperiodic MAA (DMAA) model consisting of one primitive MAA with nonzero even-site potentials and the other modified one with both nonzero odd-site potentials and a tunable amplitude factor, to reveal how localization transitions evolve from MAA to AA models. Interplays and competitions among the extended, critical and localized states arising from superpositions of double quasi-periodic MAA potentials enable new twice and multiple localization-delocalization transitions besides the original single localization transition. Our numerical calculations on inverse participation ratio, normalized participation ratio, fractal dimension and real-space wavefunction distribution confirm such localization features. The continuum model simulations on the experimental polariton modes also yield consistent results and hence validate their experimental feasibility. The constructed DMAA model provides a new framework for studying the localization transition processes between two analogous quasiperiodic models and broadens the understanding of Anderson localization.

19.
arXiv (CS.CL) 2026-06-17

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

20.
arXiv (CS.AI) 2026-06-17

Dissecting model behavior through agent trajectories

arXiv:2606.17454v1 Announce Type: new Abstract: AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we $reproduce or improve on the pass@1$ performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an $analysis of 138k trajectories generated by SSA$, we look beyond the $\texttt{pass@1}$ numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

21.
bioRxiv (Bioinfo) 2026-06-19

Sanjeevani: A manually curated anti-cancerous phytochemical database integrated with downstream analysis tools.

Background: Cancer continues to pose a massive global health burden. While plant-derived phytochemicals offer promising therapeutic leads, existing natural product databases often lack cancer specificity, dataset downloadability, and integrated screening tools. Methods: We developed Sanjeevani, an integrative web platform cataloguing 4,823 curated anticancer phytochemicals. Using a balanced dataset of 9,646 molecules, we trained Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbours classifiers using a hybrid feature representation of RDKit descriptors and 2048-bit ECFP4 fingerprints. The platform also integrates AutoDock Vina for web-based molecular docking for binding affinity, poses prediction and ADMET-AI for pharmacokinetics estimation. Results: The SVM model demonstrated the strongest predictive capability, achieving a top test accuracy of 0.966 and a ROC-AUC of 0.992. Benchmarking across five docking tools confirmed that AutoDock Vina successfully balanced computational automation with literature-consistent binding affinity replication. The final architecture provides rapid interactive 2D/3D visualizations integrated with downstream analysis tools. Conclusion: Sanjeevani provides an open-access, one-stop pipeline that bridges the gap between raw natural product data and actionable computational screening, accelerating natural product-based oncology drug discovery.

22.
arXiv (CS.CL) 2026-06-25

Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns

Neural scaling laws for transformer language models predict smooth improvements in pretraining loss with increasing parameters, but downstream capabilities such as in-context learning are known to emerge abruptly past a certain model scale. In this paper, we show that emergent capabilities arise stochastically throughout training, with larger models acquiring them earlier on average. We demonstrate that the emergence of capabilities such as pattern completion and indirect object identification corresponds to the abrupt learning of task-relevant attention patterns. To isolate this phenomenon, we train transformer models on synthetic linear map and cellular automata datasets, and we show that the difficulty of learning attention patterns depends on context length and pattern sparsity. Moreover, scaling the number of attention heads improves learning efficiency on our synthetic tasks, while increasing the head dimension yields diminishing returns past a minimum capacity. We additionally investigate architectures with alternative attention mechanisms, showing that MLP-Mixer outperforms a transformer on linear map tasks with complex attention patterns. Our findings provide a mechanistic insight into emergence, showing that downstream capabilities arise abruptly due to the intrinsic difficulty of learning sparse attention patterns in transformer models.

23.
arXiv (CS.CL) 2026-06-15

Persuasion Index: A Theory-Guided Framework for Persuasion Analysis

Identifying persuasive rhetorical cues is critical across domains, from detecting information manipulation and improving AI safety to advancing public health communication. We propose Persuasion Index (PI), a taxonomy of 15 dimensions grounded in persuasion theories from psychology and communication, and one transparent implementation using 55 sub-features built from lexicons and rule-based detectors. The taxonomy is modular: individual detectors can be replaced while preserving the theoretical structure. By evaluating PI on four public datasets varying in domain, style, and outcome measures, we show that PI provides a shared feature space for interpreting rhetorical patterns associated with persuasion-related outcomes. Linear models show that PI features carry meaningful predictive signal while remaining computationally lightweight. Dimension-level analyses reveal recurring associations between PI dimensions and persuasion outcomes across datasets, while also highlighting topic- and stance-specific variation. We release PI as an open-source package and web interface for principled and auditable analysis of human and AI-mediated communication.

24.
arXiv (CS.AI) 2026-06-12

scLLM-DSC: LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering for Single-Cell RNA Sequencing

arXiv:2606.13007v1 Announce Type: cross Abstract: Clustering is fundamental to scRNA-seq analysis, serving as a cornerstone for identifying cell populations and resolving tissue heterogeneity. However, existing methods focus on mining numerical statistical patterns, suffering from semantic agnosticism by neglecting the intrinsic biological functions encoded by genes. While Large Language Models (LLMs) offer promising semantic capabilities, their direct adaptation to cell clustering is hindered by the structural mismatch between generative pre-training objectives and discriminative downstream tasks. To bridge this gap, we propose scLLM-DSC, a novel LLM-Knowledge Enhanced Cross-Modal Deep Structural Clustering framework. Diverging from data-driven paradigms, scLLM-DSC establishes a semantically-grounded representation by synergizing two views: a Knowledge-Driven Semantic View derived from NCBI gene priors and contextualized Cell2Sentence embeddings, and a Structure-Aware Topological View extracted via a graph-guided encoder. Crucially, we introduce a cross-modal contrastive alignment mechanism to enforce consistency between biological semantics and transcriptomic features within a unified latent space. Extensive benchmarks demonstrate that scLLM-DSC significantly outperforms eleven state-of-the-art baselines in clustering accuracy.

25.
bioRxiv (Bioinfo) 2026-06-14

Structural Analysis of Prostate Cancer N-Glycans Using Graph-Based Structural Metrics

The N-linked glycans are structurally complex carbohydrate modifications that regulate protein folding, immune recognition, and cellular signaling, and their expression is extensively remodeled during cancer progression, making them promising biomarkers. In this study, prostate cancer-associated N-glycans from a range of relevant peer-reviewed studies were curated and digitized to develop a versatile computational framework that quantitatively encodes their spatial complexity across diverse biological systems. We invented two indices – the Distance & Connectivity Index (DCI) and the Position & Composition Index (PCI) – to capture the spatial information in N-glycans as layered architectures, enabling calculation of residue-level path lengths, branching structure, and compositional diversity. DCI summarizes glycan structure as both a scalar and matrix representation, while PCI does the same but also captures monosaccharide diversity, linkage heterogeneity, and cross-layer branching features. These metrics were computed with GlycoAssessor, an open-source platform that extracts information for the DCI and PCI from glycans drawn via Symbol Nomenclature for Glycans (SNFG) notation. Principal Component Analysis (PCA) was applied to evaluate whether glycans from prostate cancer tissues cluster distinctly in a disease-relevant manner. Results show that the spatial information in N-glycans: (1) increased in a multi-dimensional, non-linear manner, (2) objectively segregated structural themes, (3) could function as a potential prostate cancer biomarker that is distinct from mass-to-charge ratio and relative abundance, and (4) could objectively quantify novel subtype classifications of glycans associated with disease states and progression.