Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-11

Quantum Occam Learning: Sample-Supported Expressibility for Circuit-Based Quantum Learning

arXiv:2606.12211v1 Announce Type: cross Abstract: A central principle in quantum machine learning is that an ansatz should be expressive enough to represent the quantum data of interest. Yet, the expressibility is statistically meaningful only insofar as it can be learned from finitely many copies of an unknown quantum state. In this work, we develop an information-theoretic Occam theory for quantum data generated by finite-size quantum circuits. For the class $S_{n,G}$ of $n$-qubit pure states preparable with at most $G$ two-qubit gates, a metric-entropy argument gives the realizable sample law $\widetilde{\Theta}(G/\epsilon^2)$ in the circuit-limited regime. For an arbitrary source $\hat{\rho}$, we introduce the best $G$-gate approximation error $d_G(\hat{\rho})$ and the approximate circuit complexity $C_\eta(\hat{\rho})$. We prove an agnostic quantum Occam theorem: with $M$ copies, one can learn up to the best $G$-gate approximation error plus a statistical penalty $\widetilde{O}(\sqrt{G/M})$. We then remove the need to know $G$ in advance through an adaptive model-selection theorem whose oracle inequality selects the circuit complexity justified by the data. Matching lower bounds yield a sample-supported expressibility law: at trace-distance accuracy $\epsilon$, $M$ samples can support only $G_supported \simeq M\epsilon^2$ gates, up to logarithmic factors and tomography saturation at $2^n$. Thus, the circuit complexity becomes an adaptive statistical resource rather than a static promise. Our framework turns bounded circuit complexity into a model-selection principle for quantum machine learning.

02.
arXiv (CS.AI) 2026-06-17

Quantum Cinema: An Interactive Cinematic Exploration of Quantum Computing Hardware via Generative World Models

arXiv:2606.17102v1 Announce Type: cross Abstract: Quantum computing promises transformative advances across science and industry, yet the physical hardware that enables these computations remains invisible to the public: quantum processors operate inside sealed dilution refrigerators at temperatures near absolute zero, making direct observation impossible. This "imagination gap" between quantum computing's growing societal impact and the public's ability to visualize it represents a significant barrier to quantum literacy and workforce development. We present Quantum Cinema, an open-source, browser-based interactive application that closes this gap by transforming invisible quantum hardware into explorable, cinematic experiences using generative world models. Quantum Cinema guides users through a four-act narrative – from the foundational Nobel Prize-winning science of quantum entanglement, through curated video introductions to three major quantum computing architectures (trapped-ion, neutral-atom, and superconducting systems), into immersive three-dimensional generative worlds that make invisible quantum phenomena observable, and finally to interactive radar-chart comparisons grounded in real quantum device specifications. All three-dimensional environments are generated using WorldLabs' generative world model platform and are scientifically grounded in curated metrics from Amazon Web Services (AWS) Braket quantum hardware. Quantum Cinema requires no installation, no specialized hardware, and no quantum computing background. It is designed to serve two distinct communities: scholars and developers seeking to replicate or extend the platform, and educators, researchers, and science communicators seeking an intuitive tool for explaining quantum hardware to diverse audiences. This paper describes the system architecture, the generative world model pipeline, use cases for both communities, and directions for future work.

03.
arXiv (CS.LG) 2026-06-12

Geometry of Lightning Self-Attention: Identifiability and Dimension

arXiv:2408.17221v3 Announce Type: replace Abstract: We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

04.
arXiv (math.PR) 2026-06-16

Asymptotic behavior of some strongly critical decomposable 3-type Galton–Watson processes with immigration

arXiv:2406.09852v2 Announce Type: replace Abstract: We study the asymptotic behavior of a critical decomposable 3-type Galton-Watson process with immigration when its offspring mean matrix is triangular with diagonal entries 1. It is proved that, under second or fourth order moment assumptions on the offspring and immigration distributions, a sequence of appropriately scaled random step processes formed from such a Galton-Watson process converges weakly. The limit process can be described using independent squared Bessel processes $({\mathcal X}_{t,1})_{t\geq0}$, $({\mathcal X}_{t,2})_{t\geq0}$, and $({\mathcal X}_{t,3})_{t\geq0}$, the linear combinations of the integral processes of $({\mathcal X}_{t,1})_{t\geq0}$ and $({\mathcal X}_{t,2})_{t\geq0}$, and possibly the 2-fold iterated integral process of $({\mathcal X}_{t,1})_{t\geq0}$. The presence of the 2-fold iterated integral process in the limit distribution is a new phenomenon in the description of asymptotic behavior of critical multi-type Galton-Watson processes with immigration. Our results complete and extend some results of Foster and Ney (1978) for some strongly critical decomposable 3-type Galton-Watson processes with immigration.

05.
arXiv (CS.CL) 2026-06-17

FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback

Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We release our code and curated datasets at: https://github.com/BBeeChu/FeedEval.git.

06.
arXiv (CS.LG) 2026-06-11

Robustness of Mixtures of Experts to Feature Noise

arXiv:2601.14792v2 Announce Type: replace Abstract: Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

07.
arXiv (CS.LG) 2026-06-12

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Authors:

arXiv:2606.07334v2 Announce Type: replace-cross Abstract: This report treats chord-symbol sequences as an interpretable, controllable time series for genre-local harmonic modeling. The frozen Music Transformer base - released as a pop-jazz fine-tune endpoint but verified in this revision weight-identical to the pop-only Phase-0 baseline, so all gains are measured over a pure-pop prior (see Changes in v2) - is extended to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation compares LoRA, IA3, BitFit, prefix tuning, and full fine-tuning over 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction (macro gains +2.89 to +3.61 percentage points); LoRA and IA3 score highest, but pairwise Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: at a common corpus size IA3 stays on top while LoRA drops to last, so the small method gaps are partly data-driven rather than representational. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting the adaptation effect is largely lightweight conditioning over a reusable harmonic base rather than genre-specific adapter memory. Further diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation that v2 reinterprets as a same-weights control, chord-only genre classification, output-distribution statistics, real-song evaluation, duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-local harmonic prediction, but chord symbols alone do not carry complete genre identity. Perceived genre authenticity and musical quality are left to controlled listener evaluation.

08.
arXiv (CS.AI) 2026-06-19

Grounded Inference: Principles for Deterministically Encapsulated Generative Models

Authors:

arXiv:2606.19753v1 Announce Type: new Abstract: The incorporation of generative models into traditional computational systems presents both enormous opportunity and tremendous peril. Although many early adopters have realized these perils at great expense, the field still requires foundational frameworks to de-risk incorporation of AI into traditional systems. This manuscript establishes this foundation through the definition of four specific primitives of AI blended architecture, designed to enable deterministic encapsulation of probabilistic models. It further establishes two overarching anti-patterns broadly represented across industry to serve as warnings for engineers in this field. This framework was designed to enable successful integration of AI into traditional systems while providing a foundation upon which generative model providers could build the next generation of generative model interfaces.

09.
arXiv (CS.CL) 2026-06-19

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor out-of-domain (OOD) generalization due to its rigid alignment with static teacher trajectories. While reinforcement learning (RL) offers an alternative, the capacity limitations of SLMs pose a severe dilemma: sparse outcome rewards provide insufficient guidance, whereas strict trajectory matching imposes overly restrictive constraints. To bridge this capacity-driven gap, we propose MENTOR, which introduces a flexible yet process-aware reward structure. Instead of enforcing rigid replication, MENTOR uses the teacher's reference to guide tool-use behavior, balancing behavioral alignment with downstream performance. Extensive experiments on controlled executable-tool benchmarks demonstrate that MENTOR improves OOD tool-use performance compared to SFT and strict RL baselines. Our findings suggest that within verifiable tool-use environments, flexible tool-use alignment offers a more effective approach than strict trajectory replication for developing adaptable small models.

10.
arXiv (CS.LG) 2026-06-18

Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning

arXiv:2606.18691v1 Announce Type: new Abstract: Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific calibration due to physicochemical diversity as well as mismatches between practical computational settings and those used in constructing the pre-training data. To address this, we propose a sparsity-promoting fine-tuning method that selectively updates model parameters by exploiting the structural properties of E(3)-equivariant materials foundation models. On energy and force prediction tasks across molecular and crystalline benchmarks, our method matches or surpasses full fine-tuning and equivariant low-rank adaptation while updating only $\sim$3~\% of parameters, and in some cases as little as $\sim$0.5~\%. Beyond energy and force calibration, we further demonstrate task generalizability by applying our method to magnetic moment prediction and magnetism-aware total energy modeling. Finally, analysis of sparsity patterns reveals physically interpretable signatures, such as enhanced $d$-orbital contributions in transition metal systems. Overall, our results establish sparsity-promoting fine-tuning as a flexible and interpretable method for domain specialization of equivariant materials foundation models.

11.
arXiv (CS.AI) 2026-06-16

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification

arXiv:2606.16987v1 Announce Type: new Abstract: Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correct classification depends on hierarchical tariff structures, legal notes, and jurisdiction-specific rules. This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments. The framework integrates multi-agent information retrieval, semantic retrieval over official tariff documents, evidence-grounded reasoning, consensus-based validation, element-wise voting across hierarchical code components, confidence estimation, and human-in-the-loop escalation. We evaluate the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts. Experimental results show that exact 10-digit classification remains difficult even for advanced LLMs, with performance decreasing from coarse chapter-level prediction to fine-grained tariff and statistical suffix assignment. These findings demonstrate the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous single-step prediction. The proposed framework supports more interpretable, accountable, and compliance-oriented HTS classification for maritime logistics and smart-port operations. Our code is available at https://github.com/Analytics-Everywhere-Lab/hts.

12.
arXiv (quant-ph) 2026-06-17

Independent Chiral Control in Theory-Space Models:A Rank-Preserving Framework and Its Application to Neutrino Mass Generation

Authors:

arXiv:2409.09033v3 Announce Type: replace-cross Abstract: We develop a general framework of rank-preserving, element-wise matrix transformations for engineering fermion mass hierarchies in theory-space constructions. We prove that preservation of massless modes requires the transformation function to be separable, $g_f(i,j)=g^{(L)}_f(i)g^{(R)}_f(j)$, which in turn enables independent control of left- and right-chiral zero-mode profiles directly at the level of the theory-space mass matrix. This formalism unifies and extends the clockwork mechanism, permits controlled deformation of Kaluza–Klein spectra, and enhances hierarchy generation in GIM-like fine-cancellation scenarios. As a concrete application, we show that in theory-space models for neutrino masses, suitable transformations allow sub-eV light neutrinos to arise from TeV-scale new physics with only $\mathcal{O}(40)$ additional fermionic sites, while remaining consistent with charged-lepton flavor-violation bounds. In contrast, the corresponding untransformed models asymptote at the MeV scale and cannot access the phenomenologically required regime without extreme field multiplicities or hierarchical parameters.

13.
arXiv (quant-ph) 2026-06-19

Measuring Rényi entropy with an Echo Protocol

arXiv:2504.05237v3 Announce Type: replace Abstract: We present efficient and practical protocols to measure the second Rényi entropy, whose exponential is known as the purity. Our approach is based on expressing the purity in terms of transition probabilities generated by an echo-type forward-backward evolution sequence, making it applicable to quantum many-body systems. Notably, our approach does not rely on random-noise averaging, a feature that can be extended to protocols to measure out-of-time-order correlation functions, as we demonstrate. By way of example, we show that our protocols can be practically implemented in superconducting qubit-based platforms, as well as in cavity-QED trapped ultra-cold gases.

14.
arXiv (CS.AI) 2026-06-17

Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search

arXiv:2512.25065v2 Announce Type: replace-cross Abstract: Systems resource management tasks rely primarily on hand-designed heuristics. However, growing hardware heterogeneity and workload diversity require heuristics specialized to particular deployment instances, making manual design expensive and difficult to scale. In this paper, we explore how to synthesize systems heuristics using LLMs. The main challenge is ensuring that generated heuristics execute safely, integrate correctly with the surrounding system, and still achieve strong performance. We propose Vulcan, a framework that identifies LLM-friendly interfaces that isolate core decision logic from the rest of the implementation. With Vulcan, LLM-generated code is restricted to simple stateless decision functions, while trusted runtime abstractions provide rich derived statistics for meaningful policy exploration without system-integration bugs. To ensure execution safety, LLMs synthesize heuristics in a restricted language, Anvil, that guarantees important properties by construction. We evaluate Vulcan across three well-studied domains and demonstrate up to 4.9x higher savings for spot-VM scheduling, up to 2x lower miss ratios for cache eviction, and up to 10% higher application performance for tiered-memory systems, while ensuring execution safety throughout.

15.
arXiv (CS.CV) 2026-06-12

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

Pixel-bin image sensors are becoming the default choice for smartphone cameras due to their resolution vs light-gathering trade-off. However, their larger inter-color separation compared to the Bayer color filter array (CFA) makes them challenging to demosaic. Furthermore, existing deep learning-based demosaicing methods are CFA-specific, requiring multiple individual models that take up precious onboard resources and demand larger development and maintenance efforts. In this work, we propose a modular unified architecture for demosaicing various pixel-bin sensors that provides higher image quality while being extensible and lightweight. Additionally, to enable plug-and-play operation, we introduce a learning-free CFA-identification module to detect the CFA type of raw data accurately.

16.
bioRxiv (Bioinfo) 2026-06-16

Super Learner Ensemble Modeling of CPTAC Proteomic Data for Survival Prediction in Head and Neck Squamous Cell Carcinoma

Survival analysis in head and neck squamous cell carcinoma (HNSCC) is traditionally performed using Cox proportional hazards models, alongside some exploration into black-box machine learning methods. The Super Learner (SL) algorithm addresses this model selection dilemma by combining diverse candidate algorithms into a weighted ensemble to perform comparably to the best candidate method. This study evaluates the performance of SL in HNSCC. Proteomic features as well as clinical covariates from 96 CPTAC HNSCC samples were modeled with three candidate algorithms (Cox LASSO, Cox Ridge, and Random Survival Forest) as well as the ensemble SL method. Models were optimized via Uno's time-dependent Concordance Index (C-index) and tested at 1- and 3-year time horizons using 2000 bootstrap resamples. The Cox Ridge regression model achieved the highest predictive accuracy among the four total methods. However, the SL demonstrated stable performance over both time horizons (1-year C-index: 0.985; 3-year C-index: 0.960). Variable importance analysis of the Cox Ridge model successfully identified malignant proteins (ATR, MAML1, MIEN1) alongside novel potential prognostic indicators (ZNF800, KERA). This analysis emphasizes the statistical necessity for larger cohorts for ensemble learning, while providing a benchmark of proteomic indicators in HNSCC.

17.
arXiv (CS.LG) 2026-06-12

Accelerating Speculative Diffusions via Block Verification

arXiv:2606.13426v1 Announce Type: new Abstract: Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions – which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

18.
Nature (Science) 2026-06-10

‘Hidden hero’ peptides guard crops against sudden cold

Authors: Unknown Author

A protein signal remains silent under normal conditions but is activated under cold stress to protect developing pollen. This ‘on-demand’ resilience mechanism could enable the development of ‘climate smart’ crops that maintain high yields in good years and food security under climate stress. A peptide signal ensures that, in cold conditions, developing pollen receives nutrients at the right time.

19.
Nature (Science) 2026-06-10

Measurement of reactor neutrino oscillation with the first JUNO data

Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO)4 is a 20-ktonne liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision5,6. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $${\sin }^{2}{\theta }_{12}=0.3092\,\pm \,0.0087$$ and $$\Delta {m}_{21}^{2}=(7.50\,\pm \,0.12)\times 1{0}^{-5}\,{\mathrm{eV}}^{2}$$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the design of the detector and indicate the readiness of JUNO for resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights the potential of JUNO to push the frontiers of precision neutrino physics and paves the way for its broad scientific programme. The first data of the Jiangmen Underground Neutrino Observatory deliver high-precision neutrino oscillation parameters, improving measurements and demonstrating readiness to determine neutrino mass ordering.

20.
PLOS Medicine 2026-05-22

Differences in tuberculosis prevalence by sex in low- and middle-income countries over 1993–2025: A systematic review and meta-analysis

by Nicole A. Swartwood, Nanki Singh, Seyed Alireza Mortazavi, Melike Hazal Can, Hening Cui, Do Kyung Ryuk, Peter MacPherson, Katherine C. Horton, Nicolas A. Menzies Background Global and national initiatives to combat tuberculosis (TB) have expanded over recent years. Despite this, the TB burden remains high in some population groups, with men recognized as having elevated TB risks. Summary measures of sex differences in TB prevalence were last estimated in 2016. Since then, many additional prevalence surveys have been conducted, including in the highest TB burden countries. We conducted a systematic review of sex-stratified TB prevalence survey data published over 1993–2025, to provide updated estimates of male-to-female (M:F) TB prevalence ratios and determine whether sex-related disparities in TB burden have closed over time. Methods and findings We identified surveys reporting community-representative, sex-stratified estimates of pulmonary TB prevalence in low- and middle-income countries (LMICs), including surveys from an earlier review (covering January 1993–March 2016) and a new systematic review (covering 1st December 2015–13th October 2025). This review was prospectively registered with PROSPERO (CRD42024503853) and included searches of PubMed, Embase, Global Health, the Cochrane Library, Africa Index Medicus, LILACS, and SciELO. We extracted data on bacteriologically confirmed and smear-positive TB prevalence among adults (aged ≥ 15 years), stratified by sex. Risk of bias was evaluated using eight criteria specific to prevalence surveys. We fit multi-level Bayesian regression models with study- and country-level random effects to estimate the M:F ratio of TB prevalence (male prevalence divided by female prevalence), overall and for key subgroups. In meta-regression analyses, we estimated how prevalence ratios varied over time and according to known TB risk factors and TB case definitions.We identified 10,124 publications and extracted data from 100 eligible studies representing 102 unique prevalence surveys and 4,658,310 participants (45.6% male) in 33 LMICs. TB prevalence was higher in men than women in 90/102 of the included surveys, with a pooled M:F prevalence ratio of 2.02 (95% credible interval (CrI): 1.71, 2.34) for bacteriologically confirmed TB and 2.38 (95% CrI: 1.91, 2.90) for smear-positive TB. Time trend analyses showed a 2.0% (95% CrI: −0.2, 4.5%) average annual change in the M:F ratio of bacteriologically confirmed TB over the study period. The M:F prevalence ratio was estimated to be higher for countries with greater excess HIV prevalence among men, and countries with greater gender equity (as measured by the United Nation’s Gender Development Index). The estimated M:F prevalence ratio was also higher for surveys that did not restrict testing to individuals reporting TB symptoms. Study limitations include heterogeneity in survey methods and definitions, as well as limited data from the Americas, Eastern Mediterranean, and Europe WHO world regions and post-COVID-19 period. Conclusions Men in LMICs consistently experience TB at a higher prevalence than women. Time trend estimates are uncertain, but consistent with widening sex differences in TB prevalence over the last three decades, despite efforts to address the risk factors underlying this excess TB burden.

21.
arXiv (CS.CL) 2026-06-18

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution dataset of 35,368 validated symbolic math problems spanning integration, limits, differential equations, series, and hypergeometrics. Unlike prior benchmarks, ASyMOB systematically perturbs each seed problem using symbolic, numeric, and equivalence-preserving transformations, enabling a fine-grained assessment of generalization. Our evaluation reveals three key findings: (1) most models' performance collapses under minor perturbations, while top systems exhibit an apparent regime shift in robustness; (2) integrated code tools stabilize performance, particularly for weaker models; and (3) we identify examples where Computer Algebra Systems (CAS) fail while LLMs succeed, as well as problems solved only via a hybrid LLM-CAS approach, highlighting a promising integration frontier. ASyMOB serves as a principled diagnostic tool for measuring and accelerating progress toward building verifiable, trustworthy AI for scientific discovery.

22.
arXiv (CS.AI) 2026-06-16

Agentic Framework for Deep Learning workload migration via In-Context Learning

arXiv:2606.15994v1 Announce Type: new Abstract: Translating deep learning models from PyTorch's flexible, object-oriented design to JAX's functional, stateless setup is usually a manual and error-prone task. Automated migration is challenging because Large Language Models (LLMs) struggle with strict and dynamic API alignment and are prone to mistakes for exacting operations. We propose a fully autonomous system that combines In-Context Learning (ICL) with oracle-driven self-debugging. First, we curated an ICL context that serves as a strict reference for idiomatic JAX styling and test case generation. Second, instead of depending on the LLM to deduce mathematical outputs, we run the source PyTorch modules to get their actual dynamic tensor states. This creates an unchangeable execution oracle. We then use an autonomous agentic loop to synthesize tests based on the oracle data. The test cases are executed repeatedly, and the traceback is sent back to the LLM for self-correction. Ablations show that combining ICL references with oracle grounding and self-debugging greatly outperforms pure instructional and basic agentic baselines. This improvement does not add an excessive computational overhead. Our lightweight pipeline achieves 91% numerical equivalence (compared to baseline: 9%, instruction + self-debugging: 27%) on neural modules, providing a highly reliable, scalable blueprint for cross-framework migration. This has been validated across several state-of-the-art models including SAM (segment anything), T5, Code Whisper amongst others showing high numerical equivalency. Code: https://github.com/AI-Hypercomputer/accelerator-agents/tree/main/MaxCode

24.
arXiv (quant-ph) 2026-06-16

Noise-induced shallow circuits and absence of barren plateaus

arXiv:2403.13927v3 Announce Type: replace Abstract: Motivated by realistic hardware considerations of the pre-fault-tolerant era, we comprehensively study the impact of uncorrected noise on quantum circuits. We first show that in the task of estimating observable expectation values any noise truncates most quantum circuits to effectively logarithmic depth. We then prove that quantum circuits under any non-unital noise do not exhibit barren plateaus for cost functions composed of local observables. However, by using the effective shallowness, we also design an efficient classical algorithm to estimate observable expectation values within any constant additive accuracy, with high probability over the choice of the circuit, in any circuit architecture. Taken together, our results establish that, unless we carefully engineer quantum circuits to take advantage of the noise, noisy quantum circuits are unlikely to offer an advantage over shallow ones for algorithms that output observable expectation value estimates, such as many variational quantum machine learning proposals.

25.
arXiv (CS.CL) 2026-06-12

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.