Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-16

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

We introduce Qwen-RobotWorld, a language-conditioned video world model for embodied intelligence. With natural language as a unified action interface, it predicts physically grounded future visual trajectories from current observations across robotic manipulation, autonomous driving, indoor navigation, and human-to-robot transfer. This unified formulation provides three promising application directions: synthetic data generation for policy training augmentation, scalable virtual environments for policy evaluation, and language-guided planning signals for downstream robot control. This is achieved through a three-part design: a) Double-Stream MMDiT with MLLM Action Encoding, where a 60-layer double-stream diffusion transformer couples frozen Qwen2.5-VL semantics with video-VAE latents through layer-wise joint attention; b) Embodied World Knowledge (EWK), an 8.6M video-text corpus (200M+ frames) with action-language mapping over 20+ embodiments and 500+ action categories; and c) General+Expert Progressive Curriculum, a two-stage training strategy that first learns general visual priors and then injects embodied specialization under a shared language interface. Extensive results show strong competitiveness: ranks 1st overall on EWMBench and DreamGen Bench, outperforms all open-source models on WorldModelBench and PBench. Additional zero-shot analyses on RoboTwin-IF benchmark further support robust generalization and multi-view consistency.

02.
arXiv (CS.LG) 2026-06-24

Machine-Learning Emulation of Satellite Greenhouse Gas Retrievals: Stability over Time

arXiv:2606.09313v2 Announce Type: replace Abstract: Retrieval algorithms are used to estimate atmospheric concentrations of greenhouse gases (GHGs), such as carbon dioxide (CO2) and methane (CH4), by solving inverse problems from high-spectral-resolution satellite radiance measurements. However, these algorithms are computationally expensive, which makes real-time estimation at scale difficult. Machine-learning models have therefore been proposed as fast emulators of retrieval algorithms. Most existing studies, however, evaluate them only on test data from the same period as the training data. We study the stability over time of such emulators using data from the Greenhouse Gases Observing SATellite (GOSAT). We show that prediction accuracy generally deteriorates when the test period moves away from the training period. We also show that including time as an input feature substantially improves XCH4 prediction for Lasso and neural-network models. Among the methods considered, a simple Lasso model performs as well as or better than more complex methods such as neural networks, and yields more stable predictions over time. We further validate the results using the Total Carbon Column Observing Network (TCCON), a ground-based observation network. On the TCCON-matched dataset, the time-augmented Lasso achieves errors against TCCON that are comparable to the disagreement between GOSAT and TCCON for both XCO2 and XCH4.

03.
arXiv (CS.AI) 2026-06-24

Skills for the future software profession: beyond agentic AI!

arXiv:2606.21894v2 Announce Type: replace-cross Abstract: As coding agents are rapidly changing software engineering, a natural question is: what are the core skills needed by future software engineers? To identify where software engineering is headed and thus what skills will be needed, we summarize the results of two round-tables with researchers and industrial practitioners, held in 2026 in New York and Singapore. One key finding is that verification and validation is increasing in importance as agents handle implementation, as highlighted by anecdotes from the events. From our observations, we identify the skills developers need in the agentic era of development, with implications for training and educating future software engineers in coming years.

04.
arXiv (CS.LG) 2026-06-12

Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling

arXiv:2606.13347v1 Announce Type: new Abstract: Diffusion models have emerged as state-of-the-art generative models for high-fidelity image synthesis, particularly in their classifier-free guided and classifier-guided forms. However, standard classifier guidance concentrates probability mass around high-density class mean, leading to poor coverage of rare samples in the tails of the class-conditional distributions. Recent work on diffusion-based tail sampling mitigates this by training an additional low-density-seeking classifier with a synthetic-vs-real discriminator, at the cost of additional networks and training. In parallel, a number of samplers and distillation techniques accelerate or refine diffusion sampling, but do not explicitly address long-tail coverage. We propose a purely sampling-time, density-aware extension of classifier-guided conditional diffusion model that targets low-density regions without any additional training. We have applied guidance at noisy images not on predicted noise like most diffusion models. Starting from a pretrained conditional diffusion model and classifier on ImageNet, we modify the guided reverse dynamics by steering trajectories toward low-confidence regions via the modified classifier gradient, and at each time step, we also guide the sampling process toward the predicted real image. 1st guidance helps explore low-probability samples, and 2nd guidance helps to generate samples to be close to the real data manifold. The proposed sampler consistently improves ADM model recall at 64x64 resolution while maintaining a comparable FID, and with a 256x256 ADM model, we showed the results visually with different combinations of both guidance. We also showed that standard ADM classifier guidance, combined with predicted real image guidance, helps generate high perceptual quality samples with a 256x256 ADM model on ImageNet.

05.
arXiv (quant-ph) 2026-06-24

Syndrome aware mitigation of logical errors

arXiv:2512.23810v2 Announce Type: replace Abstract: Broad applications of quantum computers will require error correction (EC). However, hardware roadmaps indicate that physical qubit numbers will remain limited in the foreseeable future, leading to residual logical errors that constrain the size and accuracy of achievable computations. Recent work suggested logical error mitigation (LEM), which applies known error mitigation (EM) methods to logical errors, eliminating their effect at the cost of a runtime overhead. We introduce syndrome-aware logical error mitigation (SALEM), which mitigates logical errors conditioned on the error syndromes measured during error correction. The runtime overhead of SALEM is exponentially lower than that of LEM schemes which do not make use of syndrome data, enabling substantially larger circuit volumes that can be executed accurately. Compared to the routinely used combination of error correction and syndrome rejection (post-selection), SALEM increases the size of reliably executable computations by orders of magnitude. In the practical setting where space and time overheads are fixed and error reduction methods are compared by their resulting estimation errors, we observe a surprising phenomenon: SALEM, which tightly combines EC with EM, can outperform physical EM even above the standard fault-tolerance (pseudo) threshold. Thus, SALEM can make use of EC in regimes of physical error rates where EC is commonly deemed useless.

06.
arXiv (CS.AI) 2026-06-25

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

arXiv:2606.25116v1 Announce Type: cross Abstract: Respiratory acoustic foundation models (FMs) are benchmarked exclusively on smartphone recordings, yet clinical deployment increasingly targets body-coupled (BC) wearables whose sensors attenuate high-frequency content through tissue and bone, leaving FM reliability uncharacterised. We introduce BCoughBench, evaluating five FMs (OPERA-CT/CE/GT, HeAR, M2D+Resp) on nine classification tasks (AUROC, sensitivity at 95% specificity, Expected Calibration Error) and three age regression tasks (MAE vs. a mean-predictor baseline) across five EBEN-simulated BC sensor conditions on five labeled cough datasets. Mean AUROC declines from 0.785 (smartphone) to 0.689-0.723, degrading most under temple vibration pickup ($\Delta$ = -0.096) and least under the soft in-ear ($\Delta$ = -0.062). No FM meets the clinical sensitivity threshold (Se@Sp95 $\geq$ 0.20) on most disease tasks under any BC sensor. Sex classification on the CIDRZ cohort collapses (AUROC 0.954 to 0.596-0.628, $\Delta$ = -0.341) while COVID detection is nearly unaffected ($\Delta$ = -0.004). Age regression is robust, improving under the forehead accelerometer on CoughVID (MAE 9.61 to 8.97 yr); HeAR leads on regression and demographic tasks, M2D+Resp on disease and characteristic tasks. BCoughBench provides a reproducible framework for FM evaluation under wearable conditions.

07.
arXiv (CS.CL) 2026-06-25

Scaling Laws for Agent Harnesses via Effective Feedback Compute

Agent harnesses shape language-model performance by controlling tool use, feedback, verification, memory, and repair. Yet raw test-time expenditure, such as tokens, tool calls, wall time, or cost, cannot distinguish useful feedback from redundant or unstable interaction. We introduce Effective Feedback Compute (EFC), a trace-level scaling coordinate for informative, valid, non-redundant, and retained feedback. We further define Estimated-EFC, NRS-EFC, harness efficiency $\eta$, and task-demand normalization for realistic traces and heterogeneous tasks. Across synthetic, real, held-out, and prospective evaluations, EFC-based coordinates outperform raw-compute baselines and SAS. Oracle-EFC/$D_{\mathrm{task}}$ reaches $R^2=0.99$ in controlled scaling, and NRS-EFC/$D_{\mathrm{task}}$ reaches $R^2=0.93$ on real traces where raw compute has near-zero or negative fit. Finally, \ours uses EFC as a companion control layer for existing harnesses, improving mean pass rate from $61.2\%$ to $68.2\%$ while reducing mean raw cost from $213.8$ to $85.1$ under matched settings. These results suggest that harness scaling depends on durable, task-sufficient feedback rather than raw computation alone.

08.
arXiv (CS.CL) 2026-06-15

"I Didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

As large language models (LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions in human-AI collaboration becomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce a goal-level attribution framework, CoTrace, that decomposes explicit goals into verifiable requirements and traces both direct contributions and indirect influences across dialogue turns. Applying CoTrace to 638 real-world collaboration logs, we find that while models account for only 11-26% of goal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Through controlled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In a user study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.

09.
arXiv (CS.AI) 2026-06-11

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

arXiv:2606.09426v2 Announce Type: replace Abstract: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often evaluate these interfaces as separable capabilities, leaving long-horizon cross-interface orchestration under-tested. Thus, we introduce WeaveBench, a long-horizon hybrid-interface benchmark with 114 tasks across 8 real-world work domains, grounded in real user requests and publicly verifiable artifacts. Each task requires agents to combine GUI observations/actions with CLI/code operations within a single trajectory. We evaluate these tasks on a real Ubuntu desktop inside deployed CLI-agent runtimes, augmented with a minimal desktop-control plugin. We also propose a companion trajectory-aware judge that inspects deliverables, files, screenshots, logs, and action traces, while detecting shortcut behaviors such as fabricated visual evidence or hard-coded metrics. Across frontier model-runtime pairings, the best PassRate reaches only 41.2%, showing the benchmark remains far from saturated. The trajectory-aware judge further reveals that outcome-only grading substantially overestimates agent performance. Overall, WeaveBench exposes a critical gap in CUA evaluation and provides an effective testbed to measure whether agents can orchestrate GUI, CLI, and code operations across long-horizon real-world tasks.

10.
arXiv (CS.AI) 2026-06-11

Automating Geometry-Intensive Compliance Checking in BIM: Graph-Based Semantic Reasoning Framework

arXiv:2606.12065v1 Announce Type: new Abstract: Automating compliance check for geometry-intensive regulations remains a significant technical bottleneck in Building Information Modeling (BIM), primarily due to the semantic disparity between high-level regulatory logic and structured IFC data. Existing methods, often reliant on static rule templates, struggle to traverse multi-hop reasoning chains or resolve latent spatial dependencies across multiple building entities. To address these challenges, a Spatial-Geometric Reasoning System for Building Information Modeling (SGR-BIM) is proposed as an integrative graph-driven reasoning framework. SGR-BIM dynamically constructs a cross-modal knowledge graph that aligns user intent, regulatory semantics, and BIM geometry, enabling interpretable reasoning without rigid hard-coding. Validated on 679 expert-verified queries from fire safety codes, the framework achieves 84.3% accuracy, representing an 8.6% improvement over enhanced-tool single-agent baselines. This research provides a graph-based semantic reasoning paradigm, enhancing the transparency and flexibility of automated geometric compliance check workflows in the Architecture, Engineering, and Construction (AEC) industry.

11.
arXiv (CS.CL) 2026-06-12

IVIE: A Neuro-symbolic Approach to Incremental and Validated Generation of Interactive Fiction Worlds

Computational creativity in Interactive Fiction faces a fundamental tension: Large Language Models (LLM) may produce creative narratives but struggle with world coherence, while symbolic systems ensure consistency but lack creative flexibility. We present IVIE (Incremental & Validated Interactive Experiences), a neuro-symbolic approach to generating complete and playable interactive fiction worlds from scratch. Building upon PAYADOR's neuro-symbolic framework, IVIE implements a four-stage incremental generation pipeline that delegates creative decisions–setting and character creation, puzzle design–to LLMs while grounding the world state through symbolic validation. The system generates worlds with interconnected locations, functional items, non-player characters, and coherent puzzles, all structured around a central goal-oriented architecture. Human evaluation shows the approach generates immersive, thematically coherent worlds with high player engagement. Results seem to indicate that the neuro-symbolic approach successfully balances flexibility with narrative coherence: symbolic validation grounds LLM generation without eliminating generative freedom. However, challenges remain: LLM inconsistencies occasionally bypass puzzle constraints, and objective validation gaps allow some structurally impossible goals. We identify key design considerations for future neurosymbolic interactive storytelling systems, particularly regarding LLM capabilities and their limitations.

12.
arXiv (CS.CV) 2026-06-16

Faithful Action-unit Causal Reasoning for Counterfactually Faithful Emotion Explanations

Multimodal models can name the action units (AUs) behind a facial emotion, but their AU->emotion rationales are typically plausible rather than faithful: nothing forces the AUs a model invokes to be the AUs that actually drive its prediction. We cast AU->emotion reasoning as a counterfactual-consistency problem between the rationale, the label, and a structural AU->emotion causal graph G, and propose FACR, which grounds the reasoner in an independently induced, polarity-aware G and trains a counterfactual-faithfulness objective: a do-intervention on an AU that G marks causal for a class must move the prediction, while one it marks irrelevant must leave it unchanged. Faithfulness is thereby both trainable and measurable through a matching interventional metric, which we evaluate against a known causal structure, the PSPI pain-AU composition, as no existing affective-reasoning benchmark allows. We are explicit that this metric tests fidelity to the supplied structure rather than its rediscovery: it asks whether the trained reasoner invokes the AUs the structure marks causal, on held-out subjects and a second dataset. Under subject-independent evaluation on UNBC-PAIN, the objective raises the agreement between the invoked AUs and the PSPI composition from a no-objective baseline of 0.08 to 0.57, at a small detection cost; an unfaithfulness control attributes the gain to the objective. On a cross-dataset emotion transfer, the objective likewise raises fidelity to G on a seven-class task (0.50 to 0.84). Finally, we attach a language verbalizer and extend the audit to the generated text: biasing each action unit's emission by its latent activation makes the rationale faithful by construction, so that ablating an AU removes it from the explanation, a property that transfers to a second language-model backbone, whereas a freely generated rationale is unfaithful.

13.
arXiv (math.PR) 2026-06-25

On the jump of the cover time in random geometric graphs

arXiv:2501.02433v4 Announce Type: replace Abstract: In this paper we study the cover time of the simple random walk on the giant component of supercritical $d$-dimensional random geometric graphs on $\mathrm{Poi}(n)$ vertices. We show that the cover time undergoes a jump at the connectivity threshold radius $r_c$: with $r_g$ denoting the threshold for having a giant component, we show that if the radius $r$ satisfies $(1+\varepsilon)r_g \le r \le (1-\varepsilon)r_c$ for $\varepsilon > 0$ arbitrarily small, the cover time of the giant component is asymptotically almost surely $\Theta(n \log^2 n$). On the other hand, we show that for $r \ge (1+\varepsilon)r_c$, the cover time of the graph is asymptotically almost surely $\Theta(n \log n)$ (which was known for $d=2$ only for a radius larger by a constant factor). Our proofs also shed some light onto the behavior around $r_c$.

14.
arXiv (CS.CL) 2026-06-24

Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules

Knowledge graphs (KGs) can be enhanced through rule mining; however, the resulting logical rules are often difficult for humans to interpret due to their inherent complexity and the idiosyncratic labeling conventions of individual KGs. This work presents Rule2Text, a comprehensive framework that leverages large language models (LLMs) to generate natural language explanations for mined logical rules, thereby improving KG accessibility and usability. We conduct extensive experiments using multiple datasets, including Freebase variants (FB-CVT-REV, FB+CVT-REV, and FB15k-237) as well as the ogbl-biokg dataset, with rules mined using AMIE 3.5.1. We systematically evaluate several LLMs across a comprehensive range of prompting strategies, including zero-shot, few-shot, variable type incorporation, and Chain-of-Thought reasoning. To systematically assess models' performance, we conduct a human evaluation of generated explanations on correctness and clarity. To address evaluation scalability, we develop and validate an LLM-as-a-judge framework that demonstrates strong agreement with human evaluators. Leveraging the best-performing model (Gemini 2.0 Flash), LLM judge, and human-in-the-loop feedback, we construct high-quality ground truth datasets, which we use to fine-tune the open-source Zephyr model. Our results demonstrate significant improvements in explanation quality after fine-tuning, with particularly strong gains in the domain-specific dataset. Additionally, we integrate a type inference module to support KGs lacking explicit type information. All code and data are publicly available at https://github.com/idirlab/KGRule2NL.

15.
arXiv (CS.AI) 2026-06-16

Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training

arXiv:2606.08898v2 Announce Type: replace-cross Abstract: In the task of few-shot class-incremental audio classification, the number of classes is assumed to always increase without considering the possibility of decrease. However, the number of classes generally increases or decreases in practice. In this paper, we investigate a problem of Few-shot Class-variable Incremental Audio Classification (FCIAC), in which the number of classes increases or decreases. We propose a FCIAC method using prototype adaptation and pseudo class-variable training. The model in our method consists of an encoder and a classifier. The classifier is initialized by a class-variable prototype adaptation network, whose structure dynamically changes with the change of classes. In addition, we design a pseudo class-variable training strategy to enhance the model's adaptability to changing classes. Experiments on three public datasets show that our method exceeds previous methods in average accuracy. The code is at: https://github.com/cgq2971-afk/FCIAC.

16.
arXiv (CS.LG) 2026-06-15

The Program Is Still There: A Conservation Law for Program Discovery

arXiv:2606.13799v1 Announce Type: cross Abstract: Finding the shortest program that generates a sequence is uncomputable, and for six decades that fact has been mistaken for a wall around finding any generating program. It is not a wall but a price, and this paper measures it. For every algorithm that learns about a candidate program only through its score, a class spanning Levin search, evolutionary methods, simulated annealing, and the cross-entropy method, we define the coupling width of a search problem and prove an unconditional worst-case lower bound, exponential in that width with base one less than the domain size. From it follows a conservation law: structural knowledge injected into a search trades one for one against the search it removes, and their sum can never fall below the length of the program sought. Levin's 1973 upper bound and the lower bound proved here are the two ends of one conserved quantity, closing on each other as the instruction set grows. The only escape is to read a candidate's structure rather than its score, and its price, which we prove for generic targets, is incompleteness. A deterministic engine built on this theory recovers a generating program, certified by compressing its data and predicting an unseen continuation, for 2,383 of 3,914 sequences across four independent populations, including 244 of the 256 elementary cellular automata, with measured discovery cost rising along program length more than an order of magnitude inside the score-oracle worst case.

17.
bioRxiv (Bioinfo) 2026-06-17

Beyond phylogeny: Genome-wide DNA sequence patterns suggest DNA physical properties associated with thermal adaptation in extremophile microbes

Authors:

Temperature is a fundamental constraint on biological systems, yet how it is reflected in genome sequence organization remains unclear. Here, we show that genome-wide distributions of short DNA sequences contain a robust signal of thermal adaptation that is largely independent of phylogeny. Using Structural Topic Modelling (STM), a machine-learning approach for identifying groups of co-occurring sequence motifs, we analyze canonical 6-mer and 9-mer frequency profiles of bacterial and archaeal genome proxies (randomly sampled genomic regions) and identify motif families systematically associated with thermophiles and psychrophiles. In bacterial thermophiles, the identified motif families are dominated by highly specific, overrepresented and co-occurring C- and G-stacked hexamers, and a distinct family of CG-periodic hexamers recurring across multiple temperature comparisons. In contrast, bacterial psychrophile-associated motifs are dominated by low-complexity A-, T-, and AT-run hexamers. Thermophilic archaea generally exhibit a distinct CTAG-centred hexamer family, suggesting that different domains may adapt to similar environmental constraints through different sequence-level solutions. However, this domain-level contrast is not absolute: in a targeted analysis of two thermophilic bacterium–archaeon pairs, we find unusually similar frequencies of all the STM-identified thermophile-associated hexamer families, suggesting that shared high-temperature environments can, in specific cases, partially override phylogenetic divergence. Notably, the identified motif families constitute only a small and highly selective subset of the vast space of possible G+C-rich or A+T-rich sequences. This indicates that thermal adaptation is associated with specific sequence architectures rather than broad shifts in nucleotide composition. Accordingly, the observed signal cannot be explained by overall base composition alone, but instead arises from structured combinations and positional arrangements of nucleotides within short sequence contexts. Related motif families are recovered at both k=6 and k=9, indicating that the signal reflects systematic shifts in genome-wide sequence organization rather than isolated sequence motifs. These patterns are consistent with known sequence-dependent DNA physical properties documented in biochemical and biophysical studies, including differences in base-stacking interactions and conformational flexibility. Together, our results suggest that genome-wide sequence organization reflects sequence-dependent DNA physical properties associated with thermal adaptation, revealing a previously underappreciated physical layer of genomic information beyond phylogenetic history.

18.
arXiv (CS.CV) 2026-06-18

AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model

Large Vision and Language Models (LVLMs) have advanced rapidly, yet European Portuguese (pt-PT) remains systematically underserved by existing open-source multimodal models, which either conflate it with Brazilian Portuguese or severely under-represent it in their training data mixes. We introduce AMALIA-VL, the first open-source instruction-tuned LVLM built natively for pt-PT, pairing a high-resolution vision encoder with dynamic image tiling and a fully open pt-PT-optimized language model via a learned connector. We contribute with a purposefully designed three-stage training process - vision-language alignment, general visual instruction tuning, and preference optimization - together with a pt-PT-centric multimodal data mix combining curated and translated public datasets with novel datasets that address the near-total absence of European Portuguese multimodal resources. Our evaluation shows that AMALIA-VL establishes a strong baseline for open-source pt-PT LVLMs.We will release model weights, training data, and construction pipelines along with machine-translated pt-PT evaluation benchmarks to help democratize pt-PT LVLM development.

19.
arXiv (CS.LG) 2026-06-11

TimeRouter: Efficient and Adaptive Routing of Time-Series Foundation Models

arXiv:2606.11625v1 Announce Type: new Abstract: Time-series foundation models (TSFMs) are increasingly explored as predictive experts within emerging agentic time-series systems. However, TSFMs exhibit heterogeneous inductive biases, and no single model consistently dominates across forecasting regimes, making expert selection a critical challenge. Existing systems often delegate this decision to LLM-based controllers, incurring substantial inference overhead. We present TimeRouter, an efficient routing framework that leverages empirical complementarity across a pool of pretrained TSFMs through lightweight discriminative routing, selective gating, and ensemble fallback. Concretely, TimeRouter combines a learned routing head, a selective gate, and an ensemble fallback, enabling adaptive expert selection without invoking an LLM at inference time. TimeRouter achieves state-of-the-art performance on the GIFT-EVAL leaderboard, with an LB MASE of 0.6765. Beyond benchmark performance, our ablation studies provide empirical insights into TSFM routing design, highlighting the importance of pool composition and selective gating. Taken together, these results position TimeRouter as a modular and lightweight routing layer for future agentic time-series systems built upon foundation-model pools. Our code is available at https://github.com/UConn-DSIS/TimeRouter.

20.
arXiv (CS.LG) 2026-06-16

Incentives and Evidence in Learned Service Orchestration

arXiv:2606.16555v1 Announce Type: cross Abstract: Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, DAG scheduling, and autoscaling, using pre-registered predictions about comparative degradation under production-relevant perturbations and paired inference with family-wise error correction. Across the tests, most predicted performance reversals do not occur. Diagnostic analyses show that these outcomes often reflect comparator collapse, artefact limitations, or evaluation choices rather than evidence that learned controllers tolerate the perturbations. One apparent advantage under observation lag is roughly fortyfold compared to a Kubernetes HPA-equivalent controller. Another widely cited result cannot be reconstructed from its released artefact, and the strongest reproducible margin is far smaller than the published results. Conclusions also reverse under changes in perturbation magnitude and evaluation mode. Based on these results and broader patterns in the literature, we identify an institutional problem. Publication and review incentives favour benchmark gains against convenient comparators, even when those gains provide little evidence of deployment performance. We argue that the problem is not solely technical. Rather, it is institutional, so learned orchestration needs production-grade comparators, registered perturbation models, separate operational metrics, and publication criteria that reward reproducible operational evidence. Without these changes, the literature can grow without establishing whether learning improves orchestration.

21.
arXiv (CS.CV) 2026-06-16

Variational Deep Unfolding with Mamba-Based Nonlocal Modeling for Underwater Image Enhancement

Underwater imaging plays a crucial role in ocean engineering, although captured data often suffer from poor visibility and color distortion. To address these challenges, we propose a model-based deep unfolding network for underwater image enhancement that integrates variational modeling into a learnable architecture. The framework is guided by a variational formulation based on a dehazing decomposition, incorporating a multiplicative residual component to absorb remaining artifacts and a nonlocal gradient-type constraint to preserve structural details and enhance edge sharpness. We provide a theoretical analysis establishing the existence of solution for the associated minimization problem. The proposed unfolding method incorporates Mamba layers to efficiently capture self-similarities in the scene. In addition, we introduce a proximal trajectory loss that enforces consistency between the unfolding stages and the iterations of an ideal restoration regularizer. Experimental results demonstrate that the proposed unfolding approach achieves improved visual quality and competitive quantitative performance compared with recent state-of-the-art methods. The source code will be available at https://github.com/MIA-UIB/Variational-Unfolding-Mamba-Underwater-Enhancement .

22.
medRxiv (Medicine) 2026-06-22

Artificial Intelligence-Enabled Cardiac Function Estimation from Phone Videos of Echocardiograms

Importance: Mobile phone-recorded echocardiogram videos are commonly used in point of care, telemedicine, and resource-limited workflows, but artificial intelligence models for left ventricular ejection fraction (LVEF) estimation have primarily been evaluated on native Digital Imaging and Communications in Medicine (DICOM) videos. Objective: To evaluate whether previously described artificial intelligence models for LVEF estimation retain performance when applied to mobile phone-recorded echocardiographic videos. Design: Multicenter model validation study comparing model-estimated LVEF with clinician reported LVEF. Setting: Three medical centers: Kaiser Permanente Northern California, Beth Israel Deaconess Medical Center through MIMIC-IV-ECHO, and Cedars-Sinai Medical Center. Participants: Source studies with clinician reported LVEF and apical 4-chamber or apical 2-chamber views, yielding 6209 phone-recorded videos from 2648 studies and 2611 patients. Exposures: Mobile phone recording of native echocardiographic videos and fine-tuning of pretrained models using mobile phone-recorded videos from the Kaiser Permanente Northern California training cohort. Main Outcomes and Measures: Mean absolute error in ejection fraction percentage points, R^2 for continuous estimation, and area under the receiver operating characteristic curve for identifying ejection fraction greater than 50%. Results: The study included 6209 mobile phone recorded echocardiographic videos from 2648 studies and 2611 patients; the weighted mean age was 68.4 years, and 1031 patients were male (39.5%). Without phone-video fine-tuning, the primary model achieved a mean absolute error of 7.00 percentage points, coefficient of determination of 0.49, and area under the receiver operating characteristic curve of 0.91 on phone-recorded videos; corresponding native DICOM performance was 6.08 percentage points, 0.60, and 0.93, respectively. On the 2396-video fine-tuning evaluation cohort, fine-tuning improved primary model performance to a mean absolute error of 6.96 percentage points, coefficient of determination of 0.61, and area under the receiver operating characteristic curve of 0.93. Fine-tuning the public EchoNet-Dynamic model improved performance from 9.36 percentage points, 0.37, and 0.84 to 7.86 percentage points, 0.50, and 0.89, respectively. Progressive central zoom preprocessing degraded model performance. Conclusions and Relevance: These findings suggest that artificial intelligence assisted left ventricular ejection fraction estimation from mobile phone-recorded echocardiograms may be feasible when native image export is unavailable, although prospective evaluation is needed before clinical deployment.

23.
medRxiv (Medicine) 2026-06-24

Cardiologists perspectives on sociocultural and structural factors shaping cardiovascular genetic testing

Introduction: Genetic testing is increasingly central to the diagnosis and management of cardiovascular genetic conditions. However, use and follow-through vary across patient populations. Examining clinician perspectives on sociocultural and structural factors influencing testing is important for understanding these differences and informing public health genomics research and implementation efforts. Methods: We conducted semi-structured interviews with 15 cardiologists from health systems across the United States who have integrated cardiogenetics in their practice. Interviews explored experiences diagnosing cardiovascular genetic conditions among patients from underrepresented backgrounds, as well as approaches to incorporating social and contextual information into care. Data were coded thematically and analyzed using a framework analysis guided by the Health Equity Implementation Framework and Social Determinants of Health domains. Results: Clinicians described multi-level factors shaping genetic testing practices, including patient-provider interactions, clinical workflows, health system infrastructure, and broader policy contexts. Key themes included challenges communicating complex genetic information across language and literacy differences; patient trust shaped by prior healthcare experiences; fragmented insurance coverage separating genetic testing from genetic counseling; and challenges interpreting variants of uncertain significance, particularly for populations underrepresented in genomic reference databases. Clinicians also described adaptive strategies, such as interdisciplinary collaboration, telehealth, and patient assistance programs, that supported testing in some settings but were often inconsistent or resource-dependent. Conclusion: Among cardiologists using genetic testing, system-level and sociocultural factors shape the feasibility and downstream use of cardiovascular genetic testing. Findings highlight considerations for public health-informed genomic infrastructure that accounts for social context, supports communication, and reduces reliance on individual clinician workarounds, with implications for clinical decision support and related public health genomics initiatives.

24.
arXiv (quant-ph) 2026-06-24

From Spectral Singularities to Multipartite Entanglement Scaling at Higher-Order Exceptional Points

arXiv:2606.24205v1 Announce Type: new Abstract: Exceptional points (EPs) are non-Hermitian spectral singularities exhibiting fractional-power responses, yet their implications for multipartite entanglement of interacting quantum many-body systems remain largely unexplored. Here we develop a general framework that links higher-order non-Hermitian degeneracies to the scaling behavior of genuine multipartite entanglement in interacting identical-qubit systems. Permutation symmetry of the identical qubits decomposes the exponentially large Hilbert space into independent irreducible-representation sectors, thereby constraining the maximal EP order of $N$ qubits to $N+1$ rather than $2^N$. Near an $n$th-order EP, genuine multipartite entanglement inherits the spectral response and generically exhibits a fractional-power scaling under weak perturbations. Explicit examples show that conventional two-body interactions support third- and fourth-order EPs with the corresponding entanglement responses, whereas higher-order EPs with genuine multipartite-entangled coalesced states require additional independent interaction channels, such as three-body interactions. Our results establish a fundamental connection among non-Hermitian degeneracies, multipartite entanglement, and symmetry, extending higher-order EP physics from spectral singularities to genuine many-body quantum correlations.

25.
medRxiv (Medicine) 2026-06-11

PCRAgent: A Multi-Agent Framework for Transforming Noisy clinical conversations into Structured Pre-Consultation Medical Records and Reusable Clinical Data Resources

In primary care and outpatient settings, clinically important patient information is often embedded in fragmented, ambiguous, repetitive, and noisy communication between physicians and patients. This limits physicians ability to obtain a clear preconsultation overview of symptoms, history of present illness, and visit intent, while also preventing real world clinical dialogues from being reused in hospital information systems and medical artificial intelligence applications. To address this challenge, we developed PCRAgent, a centrally coordinated multi agent framework for preconsultation clinical information organization. Guided by physician inquiry logic, PCRAgent identifies, extracts, corrects, and standardizes patient-reported information from noisy consultations. Its coordinated modules including error detection, semantic editing, output control, contextual memory, and intent recognition enable robust parallel handling of spelling errors, repetitions, grammatical inconsistencies, medical ambiguities, and non-medical interference. A traceable edit list records intermediate corrections and context, allowing iterative refinement without redundant modifications. PCRAgent generates two complementary outputs. One is a PreConsultation Clinical Report for rapid physician review. The other is a Structured Clinical Conversation Dataset for hospital data construction and downstream AI applications. In evaluations using 220000 strongly perturbed consultations, PCRAgent maintained high robustness, achieving a clinical information accuracy of 4.99 out of 5 and key element completeness of 5 out of 5, outperforming GPT4o. Expert review of Chinese and English dialogues confirmed high clinical accuracy of 4.85 out of 5 and high safety of 4.79 out of 5. Multicenter validation in real-world outpatient workflows further demonstrated practical utility. These findings indicate that PCRAgent can efficiently transform noisy and unstructured consultations into physician ready reports and AI ready structured data, improving outpatient efficiency, reducing cognitive burden, ensuring information completeness, supporting precise decision-making, and enabling high-quality reuse of clinical data.