Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

DataMagic: Transforming Tabular Data into Data Insight Video

arXiv:2606.20388v1 Announce Type: cross Abstract: Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

02.
medRxiv (Medicine) 2026-06-18

Expert in Ultrasound Skills: Feasibility of an IMU-video platform to describe technical profiles during focused cardiac ultrasound. Pilot study

Background: Focused cardiac ultrasound (FoCUS) is operator dependent and requires coordinated probe manipulation, image interpretation and iterative visual feedback. Existing assessment approaches often emphasize final image quality or expert rating. We developed Expert in Ultrasound Skills (EXUS) , a platform that synchronizes transducer-mounted inertial measurement unit (IMU) data with ultrasound video, and evaluated its technical feasibility during FoCUS acquisition. Methods: This observational pilot study included 6 operators performing two repetitions of a four-view FoCUS protocol, yielding 12 analytical sessions and 48 planned acquisitions. Feasibility was defined by acquisition completion, video availability, start/stop events, fused IMU-video windows, temporal coverage, complete human label entries and IMU integrity. A 100-image Likert rating task was used to summarize pairwise inter-rater agreement for still-frame image quality assessment. Results: All 48 planned acquisitions were completed with video, start/stop events, fused windows and complete human label entries. Temporal coverage was at least 90% in 47/48 acquisitions. IMU integrity endpoints exceeded the 80% threshold: 43/48 acquisitions had no extreme IMU-derived artifact, 43/48 had no active-segment IMU restart and 44/48 had no complete motion flatline. Mean pairwise exact agreement for the Likert task was 38.9%, with mean quadratic-weighted Cohen's kappa of 0.564. Post hoc profiles varied across duration, visual quality, mechanical load and motor efficiency. Conclusions: EXUS was technically feasible for synchronized IMU-video capture during FoCUS. The pilot supports multimodal acquisition data as a way to describe technical profiles and generate formative feedback hypotheses, but the post hoc indices are not validated competency measures. Keywords: focused cardiac ultrasound; point-of-care ultrasound; inertial measurement unit; medical education; deliberate practice

03.
arXiv (CS.AI) 2026-06-16

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

作者:

arXiv:2606.17005v1 Announce Type: new Abstract: Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preference stress test; and GAIA and tau-bench contribute limited agentic pilots. Together, these archives instantiate a Bayesian inference problem: under a fixed reporting convention, one constructed terminal-only example over $1{,}000$ systems is compatible with two pre-terminal histories, yielding times of $23.03$ or $75.13$ to reach within $0.05$ of the ceiling under the same terminal-tail model. In synthetic posterior comparisons, action-facing diagnostics differ across observation regimes. The candidate selection-aware frontier model fails synthetic recovery, objective-archive prediction, preference transfer, and uncertainty calibration; correspondingly, fixed audit gates reject its stronger claims. An archive-and-adjudication protocol reconstructs public evaluation histories, isolates a verified timing boundary, and falsifies unsupported frontier claims.

04.
arXiv (quant-ph) 2026-06-11

Permutation-Invariant N-body gates via Tavis-Cummings Hamiltonian

arXiv:2506.03453v3 Announce Type: replace Abstract: Global control provides a promising route to implementing multi-qubit gates without individual qubit addressing. This is especially appealing for permutation-invariant (PI) gates, whose symmetry is often broken when they are compiled into individually addressed one- and two-qubit gates. Important examples include SWAP, $\sqrt{iSWAP}$, and the n-qubit controlled-Z gate, which is equivalent, up to two single-qubit Hadamard gates, to the multi-qubit Toffoli gate. Motivated by this global-control perspective, we show that all PI unitaries on an arbitrary number of qubits can be realized using the Tavis-Cummings (TC) interaction, the multi-qubit version of the Jaynes-Cummings interaction, together with global uniform z and x fields. Here, the $n$ qubits are identically coupled to a single bosonic mode (oscillator), which is initialized in and returned to its vacuum state. A corollary is that all PI states, including GHZ and Dicke states, can be prepared using the same global control. For the case n=2 qubits, which is particularly important in quantum computing, we also find explicit pulse sequences for implementing all PI qubit unitaries that conserve angular momentum in the z direction, using only the TC interaction and global z fields. This includes controlled-Z, SWAP, and $\sqrt{iSWAP}$.

05.
arXiv (CS.AI) 2026-06-16

Imperfect Visual Verification for Code Edition : A Case Study on TikZ

arXiv:2606.15693v1 Announce Type: cross Abstract: LLMs have significantly advanced code generation, enabling the synthesis of functional programs. While recent systems achieve strong performance on many coding benchmarks, tasks involving programs such as TikZ that generate visual artifacts remain challenging, in particular on visual code customization. Unlike generation from scratch, customization requires localized, semantics-preserving edits: the model must locate relevant code, modify it according to the instruction, and preserve the remaining structure and rendering. Approaches based on post-hoc iterative refinement/correction where a verifier provides feedback to guide corrections, have shown promise. However, in the case of programs with a visual outcome such as in TikZ, where correctness is harder or likely impossible to formalize and evaluate automatically, deterministic verifiers do not exist. Hence, developers can only rely on imperfect verifiers. In this paper, we conduct an empirical study to answer:to what extent can iterative refinement remain effective when the verifier itself is unreliable?} We use TikZ as a focused case study that isolates the core difficulties of the problem (weak code structure, fine-grained visual semantics, and difficult feature localization) in a controlled and challenging setting. We define visual code customization as an iterative editing problem with an imperfect oracle, and introduce a framework for analyzing such iterative refinements. We conduct a large-scale study and evaluate multiple LLM-based and tool-augmented visual verifiers within iterative refinement pipelines, and perform extensive manual annotation of refinement trajectories to assess verifier behavior and feedback quality. Our findings show that even imperfect verifiers can determine with moderate accuracy whether visual instructions are applied to code, achieving F1-scores up to 0.815. Feedback improves iterative refinement, especially for weaker models, adding 11–20 perfect customizations for Qwen3-vl-30b-a3b-Instruct, while stronger models like Gemini-3 gain fewer improvements (+5) but benefit more from accurate verification that prevents premature acceptance. Feedback is effective only when it precisely identifies image issues, provides actionable guidance, addresses all relevant problems, and remains grounded in the original instruction.

06.
arXiv (CS.CV) 2026-06-17

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounced in pathological scar classification, where differentiating keloids from hypertrophic scars requires subtle expert knowledge and labeled images are extremely limited. We propose a novel paradigm that repositions large language models (LLMs) as knowledge-driven feature engineers rather than end-to-end classifiers. We call this framework ScaFE (Scar Feature Engineering). Our key insight is that LLMs encode rich medical knowledge that can be externalized as executable feature extraction code, enabling the transformation of high-dimensional images into low-dimensional, clinically interpretable representations. Specifically, we prompt an LLM with established scar assessment criteria to generate deterministic Python code that extracts features aligned with clinical scoring systems such as the Vancouver Scar Scale. Our approach offers three key advantages: (1) data efficiency, achieving robust performance with limited training samples by decoupling knowledge acquisition from statistical learning; (2) privacy preservation, as raw images are processed locally without exposure to external LLMs; and (3) interpretability, through explicit features grounded in clinical reasoning. Extensive experiments on scar classification demonstrate that our method consistently outperforms end-to-end deep learning baselines or using LLMs as black-box classifiers under limited data conditions, establishing a promising direction for integrating LLMs into data-efficient and clinically transparent medical AI systems.

07.
arXiv (CS.AI) 2026-06-18

Explaining Attention with Program Synthesis

arXiv:2606.19317v1 Announce Type: cross Abstract: A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matrices on a collection of randomly selected training examples. Next, we prompt a pre-trained language model with a summary of these matrices, and instruct it to generate a set of Python programs that can reproduce the associated attention patterns given only text from the input sentence. Finally, we re-rank programs according to how well our final set of programs predict behavior on held-out inputs. We demonstrate that a set of fewer than 1,000 such generated programs can reproduce the attention patterns of heads in GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity above 75% on TinyStories. Moreover, the best-fit programs can replace neural attention heads without substantially affecting model behavior: replacing 25% of attention heads with programmatic surrogates across the three models incurs only a 16% average perplexity increase, while maintaining performance on a variety of downstream question answering benchmarks. This work contributes a scalable pipeline for reverse-engineering attention heads in transformer models using human-readable, executable code, advancing a path toward symbolic transparency in neural models.

08.
arXiv (CS.CV) 2026-06-19

Linear Recurrent Unit with Semantic Modulation for Image Super-Resolution

Linear recurrent unit (LRU), designed with a principled formulation for stable linear recurrence, has demonstrated promising accuracy and robustness on long-range dependency tasks. However, its static parameterization and single-scan method limits its applicability to 2D vision tasks. In this study, we propose a LRU-based restoration network with a semantic modulating unit (SMU) to achieve a harmonious balance between performance and efficiency in single-image super-resolution. The SMU plays three key roles: LRU modulation, spatial categorization, and feature enhancement through learned prototype. Extensive experiments demonstrate that our method quantitatively and qualitatively surpasses recent state-of-the-art methods. Notably, our approach achieves superior performance with computational complexity on par with existing methods. The source code and models are available at https://github.com/MingyuChoi-run/LSM

09.
medRxiv (Medicine) 2026-06-15

ECHOCARDIOGRAPHY ABNORMALITIES IN PREECLAMPSIA WITH SEVERE FEATURES.

Purpose To determine the frequency of echocardiographic abnormalities in women with preeclampsia with severe features. To describe the spectrum and types of echocardiographic abnormalities associated with preeclampsia with severe features. Method This is a Prospective observational study conducted in Vani Vilas hospital attached to Bangalore Medical College and Research Institute, Bangalore from January 2023 to December 2025. 560 pregnant women diagnosed with severe preeclampsia(SPE) were included in the study. Chronic hypertension without superimposed preeclampsia, underlying cardiac diseases and previous history of peripartum cardiomyopathy were excluded from the study. Transthoracic echocardiography-TTE (2D ECHO) was done to evaluate cardiac structure and function. Echocardiographic abnormalities identified during the study were documented and analysed using descriptive statistical methods. Results Abnormalities in ECHO was noted in 23.03%. A unique finding was the documentation of elevated pulmonary artery systolic pressures (PASP) suggestive of Pulmonary Hypertension (PH) (PASP >35 mm HG) among 20.25% of the participants. It was also the commonest abnormality on ECHO. Mild PH was the commonest (15.71%), moderate PH was seen in 3.92% and severe PH in 0.71% of cases. Next most frequent abnormality was moderate to severe valvular regurgitation (10%), followed by left ventricular hypertrophy (5.53%). Diastolic dysfunction (DD) was seen in 3.92%, systolic dysfunction(SD) in 3.57%, chamber dilatation in 3.57% and LV global hypokinesia in 3.03% cases of SPE Conclusion Preeclampsia with severe features (SPE) is associated with 23.03% abnormalities on echocardiography. SPE is associated with systolic dysfunction, diastolic dysfunction, chamber dilatation, valvular regurgitation, left ventricular hypertrophy and pulmonary hypertension.

10.
medRxiv (Medicine) 2026-06-10

Trajectories of brain structure and function in young adult carriers of genetic frontotemporal dementia variants

Background and Objectives: Converging evidence hints at neurodevelopmental effects in genetic frontotemporal degeneration (FTD). In cross-sectional studies, for some genes, young adult FTD variant carriers show differences in brain volumes and cognition compared to familial non-carriers. However, longitudinal trajectories may more sensitively capture FTD-related neurodevelopmental vs. neurodegenerative changes than cross-sectional approaches. This study examined longitudinal trajectories of brain volumes, executive function, and plasma biomarkers in young adult carriers compared to familial non-carriers, as measures of neurodevelopmental and neurodegenerative outcomes of FTD-causing variants. Methods: This longitudinal cohort study comprised participants, aged 18-30 years, from the FTD Prevention Initiative across Europe, Canada, and the USA. Genetic groups included C9orf72 (47%), MAPT (30%), and GRN (23%). Linear mixed-effects models were computed to assess longitudinal outcomes across age between groups, controlling for sex, scanner (for brain volumes), and education (for executive function); random effects accounted for between-subject variability nested within family membership. Results: Variant carriers (n=147) and familial non-carriers (n=113) did not differ in age (mean{+/-}SD, 25.9{+/-}3.2 years), sex (53% female), or number of visits (2.1{+/-}1.7). Young adult C9orf72 repeat expansion carriers exhibited smaller thalamic volumes than non-carriers at the reference age of 26 years (b=-982.8mm3, SE=317.0, p=0.0046, f2=0.32), with relatively stable trajectories across ages 18-30 (i.e., no change over time). Trajectories of rostral anterior cingulate volumes differed in C9orf72 carriers and non-carriers across age, where carriers showed relatively stable trajectories and non-carriers showed age-appropriate declines (b=64.4mm3, SE=29.9, p=0.035, f2=0.07). For MAPT and GRN, there were little to no differences in total brain, cortical, or subcortical volumes between groups and over time. No longitudinal differences were observed between carriers and non-carriers in executive function, or plasma NfL or GFAP for any genetic group. Discussion: C9orf72 repeat expansions were linked to smaller average thalamic volumes and stable trajectories between ages 18 to 30, supporting potential neurodevelopmental origins. The modest evidence supporting an absence of difference in neurodegenerative biomarkers and executive function suggests minimal early neurodegeneration and functional preservation in young adulthood.

11.
PLOS Medicine 2026-06-04

Comparative impacts and cost-effectiveness of tuberculosis systematic screening strategies in prisons in Brazil, Colombia, and Peru: A mathematical modeling study

作者:

by Yiran E. Liu, José Victor Bortolotto Bampi, Ronan F. Arthur, Argita D. Salindri, Caroline Busatto, Pedro Avedillo Jiménez, Daniele Maria Pelissari, Fernanda Dockhorn Costa Johansen, Robert Arana-Narvaez, Alvaro Fernando Moreno Roca, Wilfredo Santos Solís Tupes, Esther Mori Jiu, Christian Alfredo Moreno Roca, Erika Albertina Abregú Contreras, Valentina Antonieta Alarcón Guizado, Julián Trujillo Trujillo, Belkys Marcelino, Mónica Alonso Gonzalez, Mayra Cecilia Córdova Ayllon, Ted Cohen, Moises A. Huaman, Jeremy D. Goldhaber-Fiebert, Julio Croda, Jason R. Andrews Background Incarceration is a leading driver of tuberculosis in Latin America. Systematic screening in prisons may reduce tuberculosis burden, but optimal strategies and cost-effectiveness remain uncertain. We examined the population-wide health impacts and cost-effectiveness of systematic screening in prisons in Brazil, Colombia, and Peru, comparing different timepoints, frequencies, and screening algorithms. Methods and findings Using dynamic transmission models calibrated to Brazil, Colombia, and Peru, we simulated annual or biannual (twice-yearly) prison-wide screening, alone or combined with entry and exit screening from 2026 to 2035. We evaluated four algorithms: (1) symptom screening, (2) chest X-ray with computer-aided detection (CXR-CAD), (3) symptoms and CXR-CAD (follow-up testing if either is positive), and (4) GeneXpert Ultra (Xpert) with pooled sputum. Individuals screening positive then received individual Xpert. We projected impacts on within-prison and population-level tuberculosis incidence in 2035, along with discounted costs (2023 US dollars) and disability-adjusted life years (DALYs). Model projections showed that combined entry, exit, and biannual screening with CXR-CAD was highly impactful and cost-effective across countries, reducing tuberculosis incidence by 61%–87% in prisons and 18%–28% population-wide. Compared to only biannual CXR-CAD (the next best strategy), the incremental cost per DALY averted of adding entry and exit screening was $2,984 (Brazil), $2,925 (Colombia), and $645 (Peru). Adding symptom screening to CXR-CAD marginally increased benefit and was only cost-effective in Peru’s higher-incidence prisons. Biannual screening alone remained cost-effective at prison incidence levels well below national averages, as well as at far lower willingness-to-pay thresholds. In settings without CXR-CAD, pooled Xpert was an impactful, cost-effective alternative. Key limitations include the model’s simplified representation of tuberculosis disease states and lack of stratification by age, gender/sex, HIV, or drug resistance. Conclusions These modeling results support immediate national-level adoption of prison-wide tuberculosis screening twice-yearly and at entry and exit, using CXR-CAD or pooled Xpert.

12.
arXiv (CS.CL) 2026-06-15

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects. Code provided at: https://github.com/kamyahari/IC-Encoding-Models.git

13.
arXiv (CS.CV) 2026-06-16

ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

Aerial Embodied Question Answering (EQA) requires Unmanned Aerial Vehicles (UAVs) to actively perceive the environment and answer natural language questions. Existing outdoor EQA systems usually stop once the target enters the UAV's field of view, leaving the fine-grained viewpoint adjustment needed for evidence-seeking questions largely unresolved. To address this issue, we introduce FG-EQA, a fine-grained active perception EQA benchmark with more than 40K simulated trajectories and 1K real-world trajectories. Drawing inspiration from the ``waggle dance'' of scout bees, which iteratively adjust their flight paths to verify target information, we propose ScoutVLA, an evidence-driven Vision-Language-Action model for outdoor EQA. To emulate this active exploration behavior, ScoutVLA features a decoupled dual-expert architecture: a vision-language expert infers the semantic intent to identify missing evidence, while an independent action expert employs high-DoF flow matching to generate continuous viewpoint-refinement trajectories. To balance the competing demands of continuous control and semantic reasoning, we devise a decoupled training strategy with a knowledge insulation mechanism that prevents the action gradients from erasing the model's multimodal reasoning ability. Extensive simulated experiments and a qualitative real-world field study both verify the superiority of ScoutVLA over the state-of-the-art baselines, demonstrating a 10.48$\boldsymbol{\times}$ higher average strict success rate and a 7.72$\boldsymbol{\times}$ higher average QA correctness.

14.
arXiv (CS.LG) 2026-06-11

CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting

arXiv:2511.09789v2 Announce Type: replace Abstract: Recent advances in deep forecasting models have achieved remarkable performance, yet most approaches still struggle to provide both accurate predictions and interpretable insights into temporal dynamics. This paper proposes CaReTS, a novel multi-task learning framework that combines classification and regression tasks for multi-step time series forecasting problems. The framework adopts a dual-stream architecture, where a classification branch learns the stepwise trend into the future, while a regression branch estimates the corresponding deviations from the latest observation of the target variable. The dual-stream design provides more interpretable predictions by disentangling macro-level trends from micro-level deviations in the target variable. To enable effective learning in output prediction, deviation estimation, and trend classification, we design a multi-task loss with uncertainty-aware weighting to adaptively balance the contribution of each task. Furthermore, four variants (CaReTS1–4) are instantiated under this framework to incorporate mainstream temporal modelling encoders, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Transformers. Experiments on real-world datasets demonstrate that CaReTS outperforms state-of-the-art (SOTA) algorithms in forecasting accuracy, while achieving higher trend classification performance.

15.
arXiv (CS.AI) 2026-06-18

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

作者:

arXiv:2606.19172v1 Announce Type: new Abstract: Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

16.
arXiv (CS.AI) 2026-06-19

Computational Identifiability

arXiv:2606.19361v1 Announce Type: cross Abstract: Identification conditions describe the computability of a target query or parameter of interest as a function of the type and amount of information available. In causal identification, this information is often expressed in the form of a causal graph, and data are observed or collected for some subset of variables in the graph. Target queries may be for a single effect alone or for a class of effects in a given model. The derivation of an identification algorithm then defines mathematically the process by which the desired causal effect(s) can be uniquely determined, theoretically, in expectation. Identifiability in expectation, or 'theoretical identifiability,' generally assumes asymptotic properties, infinite data, or other mathematically idealized conditions. In this paper, we explore a fundamental distinction between this theoretical, idealized notion of identifiability and a proposed alternative that is computation-bound. The framework we propose - 'computational identifiability' - is to instead define a finite computational search procedure for an empirical estimator. If this process finds an estimator empirically, within a desired error tolerance, then identifiability is satisfied, conditional on the specified assumptions of the search (i.e., a prior distribution over the parameters) and conditional on the search procedure itself. Through several experiments, we demonstrate how this framework allows us to answer fine-grained, practical identification questions, such as identification with small finite samples, with ambiguous graphical criteria, with mixed observational-interventional data, and across counterfactual data and estimands. Code is available at https://github.com/lbynum/metadentify.

17.
arXiv (CS.LG) 2026-06-19

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

arXiv:2601.02322v2 Announce Type: replace-cross Abstract: A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

18.
arXiv (CS.AI) 2026-06-11

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

arXiv:2606.12240v1 Announce Type: cross Abstract: Multivariate time-series data often exhibit complex temporal dependencies, irregular sampling, and heterogeneous dynamics across multiple time scales, making accurate sequence modeling particularly challenging. Traditional recurrent neural networks (RNNs), such as Long Short-Term Memory (LSTM) networks, operate in discrete time and may struggle to effectively capture continuous and irregular temporal behaviors. Liquid Neural Networks (LNNs) address some of these limitations through continuous-time dynamics, but standard LNN architectures typically rely on a single dynamical system, limiting their ability to model heterogeneous temporal patterns. To address these challenges, we propose a Multi-Rate Mixture-of-Experts (MR-MoE) framework built on top of Liquid Neural Networks. In the proposed architecture, multiple LNN-based experts operate at distinct time scales, enabling the model to explicitly separate fast-changing dynamics from slow-evolving temporal trends. A gating network further enables adaptive expert specialization based on input conditions. In addition, we incorporate both feature-level and temporal attention mechanisms to improve robustness, interpretability, and long-range dependency modeling. Feature-level attention suppresses noisy or irrelevant variables, while temporal attention selectively focuses on informative historical states. We evaluate the proposed framework on a complex multivariate time-series prediction task and compare it against strong baselines, including LSTM, monolithic LNN, and standard MoE models. Experimental results demonstrate that the proposed MR-MoE framework consistently achieves improved AUROC and AUPRC performance while maintaining favorable computational efficiency. These results highlight the effectiveness of combining continuous-time dynamics, multi-scale expert decomposition, and adaptive attention mechanisms for time-series modeling.

19.
bioRxiv (Bioinfo) 2026-06-19

HTS-Oracle v2: Prospective AI-Guided Discovery and Experimental Validation of Small Molecule Modulators Across Multiple Targets

High-throughput screening (HTS) remains the cornerstone of early-phase small molecule discovery yet consistently underperforms against immunotherapy targets, yielding validated hit rates below 0.1%. Here we introduce HTS-Oracle v2, which features rigorous cross-validation that ensures honest performance estimates. HTS-Oracle v2 was trained and validated across four clinically significant immune checkpoint targets (CD28, ICOS, LAG-3, and TIGIT) achieving ROC-AUC values of 0.968, 0.969, 0.875, 0.928 respectively under rigorous cross-validation. For prospective experimental validation, HTS-Oracle v2 was applied to an 8,960-compound Enamine Protein Mimetic Library, selecting only 25 compounds per target for experimental testing using temperature-related intensity change (TRIC) technology, a 99.7% reduction in screening burden. HTS-Oracle v2 identified 4, 5, 4, and 6 validated binders from 25 prospectively selected compounds per target, corresponding to validated hit rates of 16%, 20%, 16%, and 24%, respectively. Notably, 67-80% of all experimentally confirmed hits across the full 8,960-compound library were captured within just 25 model-selected compounds per target. For CD28, this represents a 28-fold improvement over HTS-Oracle v1 (239x versus 8.4x), establishing HTS-Oracle v2 as an efficient platform for AI-guided prospective hit discovery across immunotherapy targets.

20.
arXiv (CS.CV) 2026-06-17

Human-in-the-Loop Atlas-Based 3D Asset Segmentation for Interactive Content Workflows

Segmenting 3D assets into meaningful regions remains challenging, especially when segmentation criteria are application-dependent and require user control. We present a human-in-the-loop pipeline for generating a segmented 2D parameterized atlas from a 3D model for interactive media, game, and XR content workflows. Our method first selects a compact set of rendered views using a greedy set cover strategy over sampled surface points, and then supports interactive segmentation of these views with SAM~2 and Label Studio. The resulting masks are back-projected onto the model's UV parameterization to produce a unified segmented atlas that supports downstream production tasks such as segment-wise material assignment, style transfer, and semantic labeling. We assess the pipeline through a demonstration-based technical evaluation on eight cultural heritage objects. The results show that the approach can generate usable segmented atlases across diverse geometries while revealing recurring sources of manual correction, particularly fine structures, cavities, and weak appearance boundaries.

21.
arXiv (CS.AI) 2026-06-11

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv:2606.11724v1 Announce Type: new Abstract: Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.

22.
arXiv (CS.LG) 2026-06-18

Wasserstein Policy Learning for Distributional Outcomes

arXiv:2606.19117v1 Announce Type: cross Abstract: Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N-dim}(\Pi)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N-dim}(\Pi)$.

23.
arXiv (CS.LG) 2026-06-15

Curvature-Informed Potential Energy Surface for Protein-Ligand Binding Affinity Prediction

arXiv:2606.14217v1 Announce Type: new Abstract: Accurate prediction of protein-ligand binding affinity is essential for structure-based drug discovery. Recent geometric deep learning methods have achieved promising performance by representing protein-ligand complexes as three-dimensional graphs. However, most existing approaches mainly rely on static interaction geometry from a single bound conformation, while neglecting molecular flexibility and binding-induced conformational changes. To address this limitation, we propose a curvature-informed potential energy surface (CPES) graph neural network for protein-ligand binding affinity prediction, which incorporates physics-informed curvature representations to model conformational flexibility. CPES first derives curvature spectral descriptors from the Hessian of the potential energy surface evaluated at equilibrium configurations, whose eigenvalues define the local principal curvatures of the potential energy surface. It then uses spectral cross-attention to compare the unbound ligand and protein with the bound complex, thereby capturing binding-induced changes in conformational dynamics. In parallel, hierarchical protein-ligand interaction representations are learned from static structural features through geometry-aware message passing, soft clustering, and bidirectional cross-attention. Finally, CPES fuses the curvature-informed dynamic representations with static interaction representations for affinity regression. Extensive evaluations on multiple benchmark datasets demonstrate that CPES achieves improved predictive performance and offers physical interpretability.

24.
arXiv (CS.CV) 2026-06-17

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: https://fr3d-wm.github.io.

25.
arXiv (CS.LG) 2026-06-15

ORCA: A Platform for Open-Source Dexterity Research

arXiv:2606.14561v1 Announce Type: cross Abstract: Robotics manipulation research increasingly focuses on two-finger parallel grippers for their effectiveness, affordability, and ease of teleoperation. Grippers are nonetheless limited by their form factor, often requiring bimanual setups even for simple reorientation tasks. Anthropomorphic hands are a more natural platform for dexterous robot learning – closer to the human hand, and capable of learning from human video – yet they remain hard to use in learning research: even where open and accessible hand hardware exists, the software for control, simulation, teleoperation, and retargeting is scattered in one-off code bases, and largely disconnected from the robot-learning ecosystem. In this work, we introduce the \orca~learning stack, an open-source research stack for dexterity as a first-class robot learning domain. Our \orca~stack unifies low-level control, simulation, teleoperation from a range of consumer platforms, and hand retargeting, behind a single interface, and integrates natively with popular robot-learning frameworks such as \lerobot, so dexterous hand researchers can leverage the same data, training, and evaluation pipelines used for non-dexterous robot learning. We demonstrate a complete end-to-end workflow, collecting expert demonstrations of an in-hand reorientation task by teleoperation with a consumer-grade VR headset, training an autonomous policy with \lerobot, and evaluating the learned policy in a fully reproducible and observable setup. We open-source the entire stack as a shared, reproducible foundation for dexterous-manipulation research.