Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-11

Time-Frequency Grid States for Reconstruction and Correction of Channel-Induced Distortion in Entangled Photons

arXiv:2606.12216v1 Announce Type: new Abstract: Characterization of time-frequency (TF) quantum states requires reliable reconstruction of their TF distributions. However, imperfect transmission or measurement channels can distort reconstructed joint spectral intensities (JSIs), especially when the underlying perturbation mechanism is unknown. Here, we experimentally demonstrate a reconstruction and correction framework that uses a TF grid state as an intrinsic frequency-domain reference. By analyzing the displacement of the grid points, a Gaussian process regression model is employed to reconstruct a correction mapping for the nonlinear coordinate deformation without assuming a prior physical model of the distortion. The learned mapping reduces the residual coordinate deviation of the TF grid state by approximately a factor of 11 and, when applied to an independent frequency-entangled test state, improves the Gaussian-shape fidelity from 76.2\% to 90.0\%. These results establish TF grid states as practical metrological resources for diagnosing and correcting distortions in TF quantum systems, providing a pathway toward distortion-resilient quantum communication and high-dimensional quantum information processing.

02.
arXiv (CS.CL) 2026-06-15

Fractured Chain-of-Thought Reasoning

Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches the full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.

03.
arXiv (CS.CV) 2026-06-19

Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

The rapid progress of multimodal large language models (MLLMs) has led to increasing interest in agent-based systems. While most prior work in medical imaging concentrates on automating routine clinical workflows, we study an underexplored yet clinically significant setting: distinguishing visually hard-to-separate diseases in a zero-shot setting. We benchmark representative agents on two imaging-only proxy diagnostic tasks, (1) melanoma vs. atypical nevus and (2) pulmonary edema vs. pneumonia, where visual features are highly confounded despite substantial differences in clinical management. We introduce a multi-agent framework based on contrastive adjudication. Experimental results show improved diagnostic performance (an 11-percentage-point gain in accuracy on dermoscopy data) and reduced unsupported claims on qualitative samples, although overall performance remains insufficient for clinical deployment. We acknowledge the inherent uncertainty in human annotations and the absence of clinical context, which further limit the translation to real-world settings. Within this controlled setting, this pilot study provides preliminary insights into zero-shot agent performance in visually confounded scenarios.

04.
arXiv (quant-ph) 2026-06-15

Conditional squeezing induced by a two-level system: arbitrary-time Magnus coefficients in the quantum Rabi model

arXiv:2508.03506v5 Announce Type: replace Abstract: We present a systematic Magnus expansion treatment of the quantum Rabi model beyond the Rotating Wave Approximation. We show that at the second order of Magnus series, the second-order evolution operator contains a term that induces conditional squeezing of the field mode depending on the state of the atom, in addition to the energy shifts. We analyze the scaling behavior of the conditional squeezing coefficient for $^{87}\mathrm{Rb}$ $5^2S_{1/2}\rightarrow5^2P_{1/2}$ transition line and show that the slow envelope of the squeezing coefficient is maximized at half-detuning cycles, and that it scales with $\frac{4g^2}{\omega_0|\Delta|}$. We also show that the quadrature squeezing angle suggests a possible route towards quantum non-demolition readouts, while further investigation is required for a full first-order suppression. We then connect our work to the well-studied AC-Stark shift and Bloch-Siegert shift using the effective Hamiltonian theory. Finally, we show how the energy shifts and the conditional squeezing arise, as a whole $\mathrm{SU}(1,1)$ algebra, and how they can be disentangled as individual unitary evolutions.

05.
arXiv (CS.LG) 2026-06-16

MultiMolecule: a modular ecosystem for biomolecular sequence-model workflows

作者:

arXiv:2606.16540v1 Announce Type: cross Abstract: Biomolecular sequence models are increasingly reused outside the studies in which they were introduced, but public checkpoints rarely preserve the execution context needed to inspect source-defined behavior, adapt models to new assays, compare models under shared task definitions or deploy biological predictions. MultiMolecule is an open-source Python ecosystem that turns heterogeneous RNA, DNA and protein sequence-model releases into complete, source-checked model-family implementations with shared loading, workflow and prediction interfaces. The Resource state reported here includes 53 complete model-family implementations with 112 standardized model checkpoints, together with 16 curated dataset resources released through 39 public dataset repositories and 10 user-facing prediction pipelines. Standardized components are linked to source provenance, conversion or preparation code, source-reference checks, Extended Data summaries and public documentation, allowing users to inspect what was standardized, what behavior was checked and how each component enters training, evaluation, inference or deployment. By shifting reuse from repository-specific checkpoints to executable implementations connected to standardized checkpoints, curated datasets, Runner workflows and biological prediction pipelines, MultiMolecule provides common infrastructure for preserving source-defined model behavior, adapting models to new assays, enabling controlled evaluation and deploying biomolecular predictions.

06.
arXiv (CS.AI) 2026-06-16

User as Code: Executable Memory for Personalized Agents

作者:

arXiv:2606.16707v1 Announce Type: new Abstract: A personalized AI agent needs a user memory: a persistent model of who the user is, built across many conversations and consulted on each new one. Today this memory is almost always stored as unstructured text, a knowledge graph, or a flat store of facts, and consulted by retrieval – fetching the entries most similar to the current request. Such "bag-of-facts" memory recalls individual facts well, but because storing a fact and acting on it are separate steps, it struggles to resolve contradictions, aggregate over many records, or enforce rules. We argue that user memory should instead be executable. We introduce User as Code (UaC), a paradigm in which an agent's model of a user is a living software project: typed Python objects hold the user's state and ordinary Python functions encode the rules that govern it, so representing and reasoning about the user happen in one medium an interpreter can run. The enabling mechanism is a two-phase pipeline: an append-only log that never discards a fact, periodically checkpointed into typed code. This changes what memory can do. On standard long-term conversation benchmarks, UaC matches both a full-context upper bound and the strongest prior memory systems on recall (78.8% on LOCOMO). Its advantage emerges where representation matters most. On aggregate questions over a user's history – "how many international trips did I take last year?" – retrieval-based memory collapses (6-43%) while UaC stays near-perfect (99%), because the answer is a one-line computation over typed state rather than a search over text. And because its rules execute deterministically whenever the state changes, UaC can surface unsolicited, safety-critical alerts – such as a newly prescribed drug that conflicts with an allergy recorded months earlier – a capability query-driven memory cannot provide.

07.
arXiv (CS.LG) 2026-06-11

LakeFM: Toward a Foundation Model for Aquatic Ecosystems Using Irregular Multivariate Multi-depth Time Series Data

arXiv:2606.11268v1 Announce Type: new Abstract: Understanding and forecasting lake dynamics is critical for monitoring water quality and ecosystem health across lakes and reservoirs. While machine learning methods have been recently applied to ecological time-series data, existing works assume regular sampling in time and depth, and struggle to generalize across lakes with heterogeneous variables, depths, and observation patterns. To address these limitations, we introduce \textsc{LakeFM}, a foundation model for aquatic systems, pre-trained on large-scale ecological datasets comprising both simulated and observed lakes. Through extensive empirical evaluation, we show that \textsc{LakeFM} learns meaningful representations spanning broader lake-level characteristics, and achieves competitive or often superior-forecasting performance compared to existing time-series foundation and non-foundation models, while producing physically plausible predictions consistent with real-world lake dynamics.

08.
arXiv (CS.LG) 2026-06-17

Conditional Local Importance by Quantile Expectations

arXiv:2411.08821v4 Announce Type: replace-cross Abstract: Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, provide useful measures of feature contribution in the prediction space, while leaving opportunities for improved characterization of local structure in the model loss space. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that highlights locally dependent relationships, provides improved stability over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and assigns zero importance in regions where the response is invariant to changes in variables.

09.
arXiv (CS.CL) 2026-06-24

Towards Version-aware Operations and Transaction Memories for Multi-layer MeMo

作者:

MeMo proposes language models with explicit multi-layer correlation matrix memories (CMMs), where memorization, retrieval, and forgetting are architectural operations. This paper asks how such memories can reduce the need for retraining when knowledge changes. For changes expressible as MeMo memory associations, the model's accessible knowledge can be updated by editing explicit memories rather than retraining the whole model. We propose a version-aware operation layer in which high-level operations such as replace, obsolete, keep-history, rollback, and trace are compiled into MeMo-native primitive calls over sequences and tokens. The key observation is that a version-aware operation is rarely a single MeMo association. It is an ordered transaction of primitive edits, for example forgetting one sequence-token chain, memorizing another, preserving a historical chain, and recording an inverse program. The framework introduces two auxiliary CMMs: a Version CMM (V-CMM) for mapping version transitions to transaction handles, and a Transaction CMM (T-CMM) for storing reusable change contents and inverse programs. It supports both direct sequence-level edits and structured diff-level inputs, and outlines an evaluation route for update success, rollback, traceability, locality, and transaction reuse.

10.
bioRxiv (Bioinfo) 2026-06-11

AGZArank: Investigating epitope-conditioned antibody binder ranking with structure-derived synthetic supervision

Computational antibody design methods can generate large libraries of candidate binders for a target epitope, but prioritizing which candidates to test experimentally remains a major bottleneck. Existing scoring approaches, including physics-based affinity estimators, structure-prediction-derived confidence measures, and inverse-folding likelihood models, provide useful proxy signals but are not explicitly optimized for early enrichment of binders among many structurally similar candidates. Here we investigate epitope-conditioned antibody binder ranking as a dedicated learning problem and introduce AGZArank, a geometric deep learning framework trained with structure-derived synthetic supervision based on normalized pseudo-energy targets. On a benchmark of 45 experimentally validated antibody-antigen interfaces, AGZArank recovered the true binder within the top ten candidates in 44.4% of cases and showed stronger generalization on post-2021 structures than ProteinMPNN, ESM-IF, and PRODIGY. Ablation experiments indicate that ranking performance depends primarily on training scale and alignment between the optimization objective and retrieval-based evaluation, rather than architectural complexity alone. These results support candidate prioritization as a distinct and tractable problem in computational antibody design.

11.
arXiv (CS.CV) 2026-06-17

Critique of World Model: A Generative Latent Prediction Architecture for World Modeling

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

12.
arXiv (CS.CV) 2026-06-11

Multi-View In-Cabin Monitoring System for Public Transport Vehicles

We introduce a multi-view in-cabin monitoring dataset for public transportation with synchronized RGB and depth images from four inward-facing cameras and a rotating LiDAR covering the vehicle interior of a digitalized and partly automated German city bus. The dataset contains 9.136 synchronized samples with annotations and is accompanied by a calibration and pseudo-labeling pipeline that generates 3D human pose estimates and oriented 3D bounding boxes for occupants. We further provide a nuScenes-format conversion and benchmark representative multi-view 3D detection models (e.g., Lift-Splat-Shoot and BEVFusion), supporting comparative evaluation and small-scale training of multi-view in-cabin perception models. The dataset and tools are available at https://github.com/EvgenyGorelik/multiview_incabin_dataset.

13.
arXiv (CS.LG) 2026-06-24

Stabilizing Black-Box Prompt Optimization with Textual Regularization and Signal Aggregation

arXiv:2507.09839v2 Announce Type: replace Abstract: An increasing number of NLP applications interact with large language models (LLMs) through black-box APIs, making prompt engineering critical for controlling model behavior. Recent Automatic Prompt Optimization (APO) methods iteratively refine prompts using model-generated critiques (often called textual gradients), but they predominantly optimize from failures and underutilize information contained in correct predictions, leading to instability and semantic drift. We propose TRAS (Textual Regularization with Aggregated Signals), a feedback-centric framework that is plug-and-play with existing APO search backbones. It retains the standard textual gradient signal from prior work for error correction and introduces a complementary textual regularizer derived from successful predictions to preserve beneficial prompt components. Because both signals are stochastic and can be noisy, we further introduce Monte Carlo Signal Aggregation (MCSA), which samples multiple gradients or regularizers and aggregates them into a single actionable directive, emphasizing consistent, actionable advice while filtering out outliers. Motivated by rapid model churn, we also formalize Automatic Prompt Migration (APM), the practical problem of adapting an expert prompt across model versions or API providers without losing critical instructions. Across standard APO and APM scenarios, our approach consistently outperforms strong baselines, yielding higher accuracy, faster convergence, and lower query cost, while substantially reducing the degradation observed under naive prompt migration.

14.
arXiv (CS.LG) 2026-06-24

Decentralized SGD with Controlled Disagreement Finds Flatter Minima

arXiv:2602.02899v2 Announce Type: replace Abstract: Decentralized training is often regarded as inferior to centralized training because the consensus errors between workers are thought to undermine convergence and generalization. This work challenges this view by introducing decentralized SGD with Adaptive Consensus (DSGD-AC), which uses a time-dependent scaling mechanism to maintain consensus errors throughout the training. We show that adaptive consensus changes the stationary variance of disagreement modes by balancing two effects: it preserves consensus-error magnitude through weaker graph damping while still allowing curvature-dependent damping to shape the disagreement directions. This balance can produce a stronger Hessian-weighted loss-envelope penalty around the deployed model, even when normalized Hessian alignment is weaker than in standard DSGD. Empirical results on image classification show that DSGD-AC reaches flatter solutions and higher test accuracy than standard DSGD and even centralized SGD. Together, these results support consensus errors as a useful implicit regularizer and open a new perspective on the design of decentralized learning algorithms.

15.
arXiv (CS.CV) 2026-06-17

Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception. Our code is available at https://github.com/GalacticHogrider/Co-PLNet.

16.
medRxiv (Medicine) 2026-06-15

Modelling the public-health impact of indoor air quality interventions on respiratory virus transmission

Respiratory virus transmission occurs in indoor settings where ventilation, occupancy, and dwell time determine exposure levels. Improving indoor air quality (IAQ) therefore could help reduce disease burden associated with respiratory viruses, yet its population-level impact remains poorly quantified. Here, we develop an individual-based transmission modelling framework that links within-location airborne dynamics to individual infection risk and population-level spread, whilst explicitly incorporating heterogeneity in ventilation and baseline indoor air quality across locations. We use this modelling approach to evaluate IAQ-improving interventions (air-quality interventions or AQIs), using hypothetical endemic and pandemic pathogen archetypes with properties similar to SARS-CoV-2 and influenza, and evaluate how effects on key epidemiological metrics (such as annualized incidence and epidemic final size) depend on AQI coverage, efficacy and allocation strategy. At 20% AQI intervention coverage and 80% efficacy, annualized incidence was reduced by approximately 7.2% for an endemic 'SARS-CoV-2-like' respiratory virus, and 17.0% for an endemic 'influenza-like' virus; at 60% coverage (80% efficacy) the reductions were 26.3% and 56.4%, respectively. Targeting AQI installation to the highest-risk locations outperformed random allocation: for SARS-CoV-2-like transmission, 20% coverage at 80% efficacy cut absolute incidence by 10.8% when targeted versus 7.2% when random; for influenza-like transmission, this comparison was 28.9% versus 17.0%. In epidemic scenarios, random installation at 40% coverage and 60% efficacy reduced final size by 23.7% (influenza-like) versus 6.3% (SARS-CoV-2-like). These results support treating clean indoor air as core public-health infrastructure and prioritising risk-based deployment of IAQ-improving interventions to maximise population-level benefit within budgetary and operational constraints.

17.
bioRxiv (Bioinfo) 2026-06-17

In silico characterization of lysis and host-recognition modules in Staphylococcus aureus bacteriophage genomes

Background/aim: Antimicrobial resistance in methicillin-resistant Staphylococcus aureus (MRSA) requires precision non-antibiotic therapeutics, yet phage lytic efficacy is poorly predicted by phenotypic assays, as shown by paradoxical biofilm responses. This study characterized the genomic architecture of lytic S. aureus bacteriophages, focusing on the conservation of the lysis module and the variability of host-recognition modules, to provide a rational basis for phage candidate selection. Materials and methods: Twenty-two complete S. aureus phage genomes were retrieved from NCBI GenBank. Genomic features were extracted with custom Biopython scripts. Lysis (endolysin, holin) and host-recognition (tail fiber/receptor-binding protein) modules were annotated and validated by InterPro domain analysis, with disrupted endolysins resolved by tBLASTn. Phylogeny was reconstructed from large terminase subunit (TerL) sequences using maximum likelihood. Results: Genome size spanned three classes, from 17.5 to 148.6 kb. The LysK-type endolysin (CHAP, Amidase, SH3b) was highly conserved, whereas tail fiber/RBP genes were detected in only 14 of 22 phages. Domain analysis reclassified two proteins annotated as endolysins as virion-associated peptidoglycan hydrolases, and identified two independent mechanisms, HNH endonuclease insertion and intron splitting, that interrupt lysis-module genes and confound automated annotation. Maximum likelihood analysis recovered a strongly supported, highly conserved core clade with EW and SA13 as divergent lineages. Conclusion: Lysis modules are conserved whereas host-recognition modules are variable, indicating that host recognition rather than the lytic enzyme is the principal determinant of host range and the more rational target for phage selection and engineering.

18.
arXiv (CS.LG) 2026-06-12

GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems

arXiv:2602.08913v2 Announce Type: replace Abstract: High-dimensional, underdetermined and highly correlated systems are common in data science practice, especially when analyzing physical measurements. In such settings, feature selection poses a fundamental challenge because multiple distinct sparse subsets may explain the response equally well. Their identification is crucial not only for predictive modeling but also for generating domain-specific insights into the underlying mechanisms. Yet, conventional methods typically isolate a single solution, obscuring the full spectrum of plausible explanations. This work introduces GEMSS (Gaussian Ensemble for Multiple Sparse Solutions), a variational algorithm designed to simultaneously discover multiple, diverse sparse feature combinations. The method employs a structured spike-and-slab prior for sparsity, a mixture of Gaussians to approximate the intractable multimodal posterior, and a Jaccard-based penalty to further control solution diversity. A single objective function is optimized via stochastic gradient descent. The method is tested on 128 comprehensive experiments by a novel benchmarking framework designed to generate artificial problems with multiple sparse solutions of equal predictive properties. This allows us to measure the retrieval of ground truth features rather than only evaluating predictive performance – characteristics more fitting to our practical needs. A comparative analysis shows that GEMSS consistently outperforms five prominent feature selection methods adapted through the ALFESE framework. Finally, we demonstrate practical usability through 3 challenging real-world datasets from metabolomics and physical chemistry: GEMSS successfully isolates multiple distinct yet quality solutions. GEMSS is available as a PyPI package 'gemss'. The corresponding repository github.com/kat-er-ina/gemss/ includes the full codebase and a free, no-code application GEMSS Explorer.

19.
arXiv (math.PR) 2026-06-16

Sharp One-Dimensional Sub-Gaussian Comparison in Convex Order

作者:

arXiv:2604.26819v2 Announce Type: replace Abstract: We prove that any random variable $X$ whose moment generating function is point-wise upper bounded by that of $ G \sim \mathcal{N}(0,1) $ must be dominated by $ G/\mathbb{E}[|G|] $ in convex order, meaning $ \mathbb{E}[f(X)] \le \mathbb{E}[f(G/\mathbb{E}[|G|])] $ for all convex $f$. This is sharp as witnessed by $ X \sim \mathrm{Unif}(\{-1,1\}) $ and $ f(x) = |x| $.

20.
arXiv (CS.CL) 2026-06-17

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the current policy. To automate this process, we propose the LLM-as-Environment-Engineer framework in which the current policy model analyzes failure trajectories together with contextual information and proposes modifications to the next-stage training environment configuration. We also introduce MAPF-FrozenLake, a controllable testbed whose generator exposes multi-dimensional environment configurations, making it suitable for studying and benchmarking environment redesign. On this testbed, we condition the environment engineer on structured summaries of policy behavior, failure cases, and environment statistics, from which it produces the configuration for the next training stage. With Qwen3-4B as the backbone, our framework achieves the strongest aggregate performance on our benchmarks, outperforming larger proprietary LLMs (e.g., GPT, Gemini) and fixed-environment training baselines. We further analyze which forms of context are most effective, finding that successful environment updates rely on failure evidence and preserve configurations that already work. Interestingly, the current RL checkpoint serves as a better environment engineer than the original base model, suggesting that policy learning improves the model's ability to diagnose its remaining weaknesses.

22.
arXiv (CS.LG) 2026-06-17

Overcoming the Incentive Collapse Paradox

arXiv:2603.27049v2 Announce Type: replace-cross Abstract: AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task accuracy.To overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

23.
arXiv (CS.AI) 2026-06-19

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

arXiv:2606.19747v1 Announce Type: new Abstract: Quran Automatic Speech Recognition (ASR) aims to convert Quranic recitation into text, enabling applications such as aided memorisation tools and Quranic search engines. However, existing ASR models often exhibit high Word Error Rates (WER) on user-recited verses and lack full coverage of the Quranic corpus. This paper presents a systematic empirical study of domain-specific fine-tuning of pretrained Transformer-based models for Quranic ASR, using advanced speech feature extraction methods: Wav2Vec2.0, HuBERT, and XLS-R. These models apply self-supervised learning by masking portions of input audio and using Transformer architectures to learn context-aware speech features. The pretrained models are fine-tuned on a filtered Quranic dataset exceeding 870 hours of professional and user recitations. Through comprehensive ablation studies across feature extractors, output label formats, training strategies, and clip durations, we identify the key factors that affect transcription accuracy in this domain. Our best-performing configuration achieves a WER of 0.08 on the EveryAyah subset and 0.11 on the combined EveryAyah+Tarteel setting, representing roughly a five-percentage-point gain over the Citrinet baseline (WER = 0.163) while reducing combined-model training time from 140 hours to 40 hours. Arabic text without diacritics yields the best fine-tuning results, and Wav2Vec2-XLSR-53 provides the strongest overall representation. Future work includes improving dataset quality and developing phoneme-aware models to extract deeper speech feature representations for Tajweed-sensitive applications.

24.
arXiv (CS.LG) 2026-06-19

Multi-Task Bayesian In-Context Learning

arXiv:2606.20538v1 Announce Type: new Abstract: Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.

25.
arXiv (CS.AI) 2026-06-16

AI Supply Chain Galaxy: 3D Visual Analytics for License Compliance

arXiv:2606.16292v1 Announce Type: cross Abstract: The rapid proliferation of machine learning model reuse has transformed the AI ecosystem into a highly interconnected supply chain. Traditional compliance tools and static reports struggle to navigate these massive, multi-hop dependency networks. To address this, we present AI Supply Chain Galaxy (AISCG), an interactive 3D visual analytics system for model provenance and compliance auditing. AISCG maps models into a 3D spatial layout, integrating explicit structural dependencies with a rule-based compliance engine. It supports multi-scale exploration, from global community detection to localized, path-aware lineage tracing. We demonstrate its efficacy through an ecosystem-scale empirical analysis of 908,449 models from Hugging Face. Our findings reveal a concerning landscape: 55.46% of models exhibit compliance risks or metadata conflicts/omissions. We also identified distinct risk patterns, including a 56.67% license omission rate in adapter derivations and an 8.05% "license drift" rate in fine-tuning. Through a case study on the complex Llama model family, we show how AISCG empowers analysts to intuitively trace inherited restrictive terms and identify root causes across deep topological networks, significantly reducing the cognitive load of compliance auditing.