Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

arXiv:2606.15258v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly capable of mathematical problem solving and can even assist with research-level proofs, yet we still lack a scalable and reproducible way to measure step-level reasoning in long proofs across diverse sources. This evaluation gap limits trustworthy AI assistance in proof-certified scientific progress. Existing evaluations often emphasize final answers or rely on costly expert grading, while end-to-end proof generation remains open-ended and hard to verify automatically. We introduce Mask-Proof, a pipeline that turns real proofs into automatically checkable masked-step tasks. It masks key formula steps, provides the necessary surrounding context, and evaluates model reconstructions with an LLM-based equivalence judge using repeated votes for stability. The resulting Mask-ProofBench contains 292 curated problems across diverse research areas. Experiments with 17 models show that reasoning-enhanced models outperform standard models by 12% to 27%. Our evaluator achieves 96.8% agreement with expert annotators, enabling faithful, reproducible, and comparable measurement of step-level mathematical reasoning. Benchmark, annotations, and code are available at https://github.com/weating/Mask-Proof.

02.
arXiv (quant-ph) 2026-06-16

On-chip semi-device-independent quantum random number generator exploiting contextuality

arXiv:2601.08392v2 Announce Type: replace Abstract: We present a semi-device-independent quantum random number generator (QRNG) based on the violation of a contextuality inequality, implemented by the integration of two silicon photonic chips. Our system combines a heralded single-photon source with a reconfigurable interferometric mesh to implement qutrit state preparation, transformations, and measurements suitable for testing a KCBS contextuality inequality. This architecture enables the generation of random numbers from the intrinsic randomness of single-photon interference in a complex optical network, while simultaneously allowing a quantitative certification of their security without requiring entanglement. We observe a contextuality violation exceeding the classical bound by more than 10{\sigma}, unambiguously confirming non-classical behavior. From this violation, we certify a conditional min-entropy per experimental round of Hmin = 0.077 +- 0.002, derived via a tailored semidefinite-programming-based security analysis. Each measurement outcome therefore contains at least 0.077 +- 0.002 bits of extractable genuine randomness, corresponding to an asymptotic generation rate of 21.7 +- 0.5 bits/s. These results establish a viable route towards general-purpose, untrusted quantum random number generators compatible with practical integrated photonic quantum networks.

03.
bioRxiv (Bioinfo) 2026-06-16

RetroMol: Parsing a shared encoding from natural products and their biosynthetic gene clusters

Natural products such as polyketides and nonribosomal peptides (NRPs) are important sources of bioactive compounds, including many antibiotics. Many of them are assembled by modular enzyme complexes and further modified and diversified by tailoring reactions encoded by biosynthetic gene clusters (BGCs). Although natural products and their coding BGCs describe different data modalities of the same biochemical process, a unified language to jointly describe their biochemistry is lacking. Here we introduce a sequence-based representation of the core biosynthesis of modular natural products, which we call primary sequences, that bridges chemical structures and BGCs. We also present RetroMol, an algorithm that parses either natural product structures or their encoding BGCs into their primary sequences of natural product building blocks. RetroMol allows for similarity scoring between natural products and BGCs, enabling the retrieval of compounds, BGCs, and a combination of the two, based on their biosynthetic similarity. This can, for instance, be used to retrieve biosynthetically similar but structurally dissimilar compounds, or link natural products to candidate coding BGCs in large experimental datasets. We demonstrate the latter by rediscovering the nocardichelin B BGC as a proof of principle. We also exemplify the utility of biosynthetic similarity by showing various pairs of biosynthetically similar compounds with low structural similarity. Together, these results establish primary sequences as a shared biosynthetic encoding for natural product comparison and BGC prioritization.

04.
arXiv (CS.AI) 2026-06-11

Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark

arXiv:2602.19502v2 Announce Type: replace Abstract: Agentic AI systems are increasingly capable of autonomous data science workflows, yet clinical prediction tasks demand domain expertise that purely automated approaches struggle to provide. We investigate how human guidance of agentic AI can improve multimodal clinical prediction, presenting our approach to all three AgentDS Healthcare benchmark challenges: 30-day hospital readmission prediction (Macro-F1 = 0.8986), emergency department cost forecasting (MAE = $465.13), and discharge readiness assessment (Macro-F1 = 0.7939). Across these tasks, human analysts directed the agentic workflow at key decision points, multimodal feature engineering from clinical notes, scanned PDF billing receipts, and time-series vital signs; task-appropriate model selection; and clinically informed validation strategies. Our approach ranked 5th overall in the healthcare domain, with a 3rd-place finish on the discharge readiness task. Ablation studies reveal that human-guided decisions compounded to a cumulative gain of +0.065 F1 over automated baselines, with multimodal feature extraction contributing the largest single improvement (+0.041 F1). We distill three generalizable lessons: (1) domain-informed feature engineering at each pipeline stage yields compounding gains that outperform extensive automated search; (2) multimodal data integration requires task-specific human judgment that no single extraction strategy generalizes across clinical text, PDFs, and time-series; and (3) deliberate ensemble diversity with clinically motivated model configurations outperforms random hyperparameter search. These findings offer practical guidance for teams deploying agentic AI in healthcare settings where interpretability, reproducibility, and clinical validity are essential.

05.
arXiv (CS.AI) 2026-06-18

EffiNav: Fusing Depth and Vision-Language for Efficient Object Goal Navigation

arXiv:2606.18634v1 Announce Type: cross Abstract: To locate a target object while exploring the unknown environment is a fundamental capability for autonomous agents, with applications ranging from search-and-rescue to field robots. A simplified version of such task is Object Goal Navigation (ObjNav). In ObjNav, successful arrival at the target object provides a basic measure of performance; however, the efficiency of the navigation trajectory is equally important, as it indicates how intelligently the agent explores and how much time remains for subsequent tasks. In unknown environments, the key to efficient navigation lies in deciding where to explore next. While many prior works aim to address this core challenge and achieved promising performance in certain settings, recent training-based models and non-training frameworks still suffer from generalization and efficiency issues respectively, which in the worst cases can lead to excessive exploration of already-visited areas or redundant back-and-forth motion. We evaluate EffiNav on two widely used simulation benchmarks Habitat Matterport 3D (HM3D) and Open-Vocabulary Object goal Navigation (OVON), and further validate its effectiveness on physical robots in real-world settings. We conduct failure analysis on massive simulation episodes. With minimal modification, we also extend EffiNav to a memory-augmented ObjNav task on the GOAT-BENCH dataset, demonstrating its adaptability beyond standard ObjNav settings. Across two standard metrics–Success Rate (SR) and Success weighted by Path Length (SPL), EffiNav matches or outperforms recent baselines, reflecting its efficiency, robustness, and practical applicability. Recognizing the different emphases of the two datasets, the performances reveals this framework is more balanced and generalizable for efficient ObjNav.

06.
arXiv (CS.CL) 2026-06-16

Can Agents Read the Room? Benchmarking Visual Social Intelligence in Multimodal Simulation

Social interaction depends on both language and visible social signals, such as facial expressions, posture, gaze, and emotional shifts. Yet existing social-agent benchmarks are largely text-based and rarely test whether multimodal agents can use visual cues to guide interaction. We introduce \textsc{\benchmarkname{}}, a benchmark evaluating visual social intelligence in multimodal social simulation. It contains 240 scenarios, 585 role instances, and 2,340 role-task instances, combining aligned textual-visual evidence, structured role profiles, and four role-level tasks: expression task, characteristic task, interaction regulation task, and interaction outcome task. Evaluating seven recent MLLMs under verbalized-vision and direct-vision reveals a clear gap between local role enactment and interaction management: role-specific expression and conflict handling are near saturation, whereas interaction regulation and visually grounded outcome achievement remain substantially more difficult. The code is released at https://github.com/JunsWan/AgentViSS, and the dataset is available at https://huggingface.co/datasets/JunsWan/AgentViSS.

07.
arXiv (CS.LG) 2026-06-16

DP-Hype: Federated Differentially Private Hyperparameter Search

arXiv:2510.04902v3 Announce Type: replace Abstract: Tuning hyperparameters in federated machine learning can substantially impact model performance. When hyperparameters are tuned on sensitive data, privacy becomes an important challenge and to this end, differential privacy has emerged as the de facto standard for provable privacy. A standard setting in federated learning is that clients agree on a shared setup, i.e., find a compromise from a set of hyperparameters, like a model's learning rate. Yet, prior work on privacy-preserving hyperparameter tuning is tailored to specific learning tasks, does not account for the privacy leakage of aggregated results, or offers a sub-optimal privacy-utility trade-off. In this work, we present our algorithm DP-Hype, which performs a federated and privacy-preserving hyperparameter search by conducting a federated voting based on local hyperparameter evaluations of clients. In this way, DP-Hype selects hyperparameters that lead to a compromise supported by a majority of clients, while maintaining scalability and independence from specific learning tasks. We prove that DP-Hype preserves the strong notion of differential privacy called client-level differential privacy and, importantly, show that its privacy guarantees do not depend on the number of hyperparameters. We also provide bounds on its utility guarantees, that is, the probability of finding good hyperparameters, and implement DP-Hype as a submodule in the popular Flower framework for federated machine learning. In addition, we evaluate performance on multiple benchmark data sets in iid as well as multiple non-iid settings and demonstrate high utility of DP-Hype even under small privacy budgets.

08.
medRxiv (Medicine) 2026-06-22

Sex-specific multimorbidity clusters and all-cause mortality in relatively healthy older adults: findings from the ASPREE cohort

Background: Multimorbidity is common in older adults, but sex differences in chronic condition clustering remain unclear. This study explored multimorbidity clusters and their associations with all-cause mortality among community-dwelling adults aged 70 years and over. Methods: This was a secondary analysis of data from 16,095 Australian ASPREE participants aged at least 70 years without prior dementia or cardiovascular disease. Fifteen baseline chronic conditions were grouped using latent class analysis (LCA). Observed-to-expected (O/E) ratios characterised conditions over-represented within clusters, and Cox proportional hazards models assessed associations with all-cause mortality. Results: Among 16,095 participants (mean age 74 years), 88.3% had multimorbidity at baseline; 4,217 deaths occurred over a median follow-up of 10.85 years. Five clusters were identified overall: hypertension and dyslipidemia (52.1%), gout and metabolic (14.4%), depressive symptoms, osteoporosis and frailty (10.0%), anaemia and kidney disease (10.2%), and hypotension, thyroid disorder and past cancer (13.3%). Sex-stratified analyses revealed three clusters in males and four in females. The frailty, depressive symptoms and osteoporosis cluster was associated with higher mortality in both sexes (aHR 1.56 [95% CI 1.40-1.73] in males; 1.68 [1.49-1.89] in females). Higher mortality was also observed for the metabolic, gout and kidney disease cluster in males (aHR 1.63 [1.47-1.81]) and the gout, anaemia and kidney disease cluster in females (aHR 1.96 [1.74-2.21]). Conclusions: Distinct multimorbidity clusters differed by sex and were associated with increased all-cause mortality. These findings may support risk stratification, targeted screening, and more person-centred management of older adults with multimorbidity.

09.
medRxiv (Medicine) 2026-06-22

Pump-Free Patient-Derived Human Proximal Tubule Microphysiological System for Modeling Flow-Dependent Epithelial Maturation and Cisplatin Injury

Recent initiatives by the U.S. Food and Drug Administration and the National Institutes of Health to reduce animal testing in drug development have highlighted the need for in vitro platforms that better recapitulate human biology for preclinical safety assessment. Drug-induced nephrotoxicity remains a major cause of drug attrition, underscoring the need for human-relevant kidney models. To address this, a pump-free human patient-derived proximal tubule microphysiological system was developed by integrating human renal proximal tubular epithelial cells (hRPTECs), isolated from non-tumorous nephrectomy cortex, with a porous membrane-based microfluidic device. Expanded hRPTECs were cultured for 10 days under static conditions or rocker-driven shear stress approximating physiological proximal tubular flow. Shear stress increased epithelial density, enhanced proximal tubule marker expression (Na+/K+-ATPase and aquaporin-1), and improved Zonula occludens-1 and occludin localization. Bulk RNA sequencing demonstrated transcriptomic changes associated with enhanced apical maturation and epithelial signature. In cisplatin-induced injury assays, shear-conditioned epithelia exhibited reduced cell density and increased {gamma}H2AX staining, indicating greater sensitivity to nephrotoxicity. These findings demonstrate that rocker-driven shear stress promotes epithelial maturation in patient-derived hRPTECs. The pump-free human patient-derived proximal tubule microphysiological system offers a practical, scalable, and physiologically relevant platform for modeling flow-dependent proximal tubule biology and assessing human-relevant nephrotoxicity.

10.
arXiv (CS.CV) 2026-06-12

VLADriveBench: Evaluating CoT-Action Relationship in VLA for Autonomous Driving

Vision-language-action (VLA) models generate chain-of-thought (CoT) reasoning alongside driving trajectories, but existing benchmarks evaluate only trajectory quality and do not assess whether the CoT is relevant, consistent, or causally connected to the driving action. We introduce VLADriveBench, a framework that combines observational metrics (mentioning, hallucination, contradiction, action alignment) with a CoT intervention protocol to provide complementary views of the CoT-action relationship. Applying VLADriveBench to three models across two architectures, we find that the two analyses can diverge sharply: ORION scores highest on observational alignment yet its CoT is epiphenomenal, while Alpamayo v1.5 scores lower yet its CoT is strongly causal, with visual salience gating the extent of CoT influence.

11.
arXiv (CS.CV) 2026-06-16

An Extensive Benchmark for Single-round and Multi-round Instruction-based Image Editing

In recent years, there have been notable advancements in the area of instruction-based image editing (IIE), which focuses on the automatic alteration of input images using a model. Nevertheless, assessing the effectiveness of these editing models poses a considerable challenge due to the intricate nature of instructions and the wide variety of edits. To tackle this problem, one urgent task in this domain is the development of a robust evaluation framework that can precisely gauge the quality of editing outcomes and offer valuable benchmarks to guide future improvements. To address this challenge, we present a comprehensive evaluation benchmark named I2EBench2.0, designed for single-round and multi-round assessment of IIE models. I2EBench2.0 has four key features: 1) Evaluation Across Single and Multi-rounds: I2EBench2.0 simultaneously evaluates both single-round and multi-round instruction-based edits, assessing the precision and consistency of the edits. 2) Extensive Evaluation Criteria: I2EBench2.0 encompasses a broad range of criteria, evaluating both high-level and low-level aspects of each IIE model. Specifically, it incorporates 16 dimensions for single-round evaluations and 7 for multi-round evaluations. 3) Alignment with Human Judgment: To ensure our benchmark aligns with human evaluation, we conducted a comprehensive user study for each criterion. 4) Research-driven Insights: By analyzing the strengths and weaknesses of current IIE models across all 16 single-round and 7 multi-round dimensions, we provide critical insights aimed at directing future research in this area. We tested eight recently developed IIE models using I2EBench2.0 and derived academic insights through meticulous comparison and analysis. The related code, dataset, and images generated by all IIE models are available on GitHub: https://github.com/cocoshe/I2EBench.

12.
arXiv (CS.AI) 2026-06-19

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

arXiv:2507.19653v2 Announce Type: replace-cross Abstract: We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

13.
arXiv (CS.AI) 2026-06-19

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

arXiv:2606.20373v1 Announce Type: cross Abstract: Large Language Models (LLMs) show promise for code compilation tasks, but applying them to runtime performance tuning is difficult due to complex microarchitectural effects and noisy runtime measurements. We present AutoPass, a multi-agent framework for compiler performance tuning that uses compiler and runtime evidence to guide LLM-generated optimization decisions. Rather than treating the compiler as a black box like prior auto-tuning schemes, AutoPass opens up the compiler to the LLM, enabling it to query compiler-internal optimization states and analyze the intermediate representation to orchestrate compiler options. The search process iteratively refines optimization configurations using measured runtime feedback to diagnose regressions and guide latency-improving edits. AutoPass operates in an inference-only, training-free setting and requires no offline training or task-specific fine-tuning, making it readily applicable to new benchmarks and platforms. We implement AutoPass on the LLVM compiler and evaluate it on server-grade x86-64 and embedded ARM64 systems. AutoPass outperforms expert-tuned heuristics and classical autotuning methods, achieving geometric-mean speedups of 1.043x and 1.117x over LLVM -O3 on x86-64 and ARM64, respectively.

14.
arXiv (CS.LG) 2026-06-11

Persistent Homology as a Theory of Emergent Structure

作者:

arXiv:2507.03065v2 Announce Type: replace Abstract: Why do some macroscopic structures remain identifiable even though their microscopic constituents continually change? Vortices persist while fluid parcels turn over, neural memories persist while spikes and synapses fluctuate, and institutions persist while individuals enter and leave. We propose a scale-relative answer: an emergent property is a persistent nontrivial homology class $[z]\in H_p=\ker\partial_p/\im\partial_{p+1}$, a macro-feature that is closed but not exact across a filtration of descriptions. This identification turns emergence into a measurement problem. Persistent bars detect stable macro-features, and we introduce a contractive-similarity (CS) graph operator to supply scaffold spectral gaps that predict robustness. Hodge decomposition separates harmonic macro-scaffold from exact and co-exact micro-flow; and functorial condensation explains when one level's emergent class becomes a unit for the next. The resulting scaffold-flow framework expresses six familiar signatures of emergence (i.e., inevitability, coherence, irreducibility, complementarity, robustness, and hierarchy) within one mathematical language. It also yields falsifiable predictions across atmospheric, neural, and social systems: genuine emergent structures should persist across filtrations, remain spectrally stable, respond disproportionately to harmonic interventions, and require timescale separation for hierarchical autonomy.

15.
arXiv (CS.AI) 2026-06-19

Tri-Info: Generalizable, Interpretable Failure Prediction for VLA Models via Information Theory

arXiv:2606.19998v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models are increasingly deployed across diverse tasks, yet they remain black boxes whose physical interactions can cause irreversible harm, making generalizable and interpretable failure detection essential. We observe that successful and failed rollouts carry systematically different information-theoretic signatures. Building on this, we formalize VLA control as a closed-loop information pipeline and derive the Triple Information-theoretic (Tri-Info) signals that capture whether actions remain diverse, temporally consistent, and coupled to state transitions. Across six VLA models and three benchmark environments, Tri-Info matches the strongest baselines in-domain. Moreover, Tri-Info transfers across architectures, environments, and the sim-to-real gap without retraining, reaching 83\% accuracy on real-world tasks where prior detectors collapse to chance. This establishes Tri-Info as a simple yet powerful method that not only detects failures with strong cross-domain generalization, but also delivers interpretable diagnostics of the underlying failure modes.

16.
medRxiv (Medicine) 2026-06-15

Differential DNA Methylation and Delirium After Anesthesia and Surgery

Background: DNA methylation is an epigenetic modification that regulates gene expression in response to environmental exposures. We measured differential DNA methylation levels in blood before after general anesthesia and surgery in participants with and without postoperative delirium (POD) and postoperative neurocognitive disorder (PNCD). Methods: Blood sampling, delirium assessment and cognitive testing were prospectively performed at baseline before non-cardiac, non-neurologic surgery, and at 24 hours (24h) and 6 weeks (6wk) thereafter in 94 participants comprising 13 with POD and 81 without POD, and 40 with PNCD and 54 without PNCD 6wk after surgery who were matched for age and sex in the INTUIT and MADCO cohorts. DNA methylation was assessed using the Illumina Infinium MethylationEPIC Beadchip. Results: 132 differentially methylated positions (DMPs) annotated to 198 differentially methylated genes (DMGs) were identified in 94 participants 24h after surgery compared to baseline with a local false discovery rate (LFDR)

17.
arXiv (quant-ph) 2026-06-15

The Bilateral Efficiency of Ethernet: Recalibrating Metcalfe and Boggs After Fifty Years

作者:

arXiv:2603.19406v2 Announce Type: replace-cross Abstract: In July 1976, Metcalfe and Boggs published their foundational paper on Ethernet in Communications of the ACM. Their efficiency model – E = (P/C)/(P/C + W*T) – measures the fraction of Ether time carrying good forward packets under contention. For fifty years this model has framed how the community thinks about Ethernet performance. We argue it is silent on the question that matters for modern intra-rack interconnect: bilateral transaction efficiency – the fraction of link time that produces committed agreements between sender and receiver. Metcalfe and Boggs themselves planted the seed in their EFTP "end-dally" protocol (Section 7.2.2), and the deeper anchor is older still: Abramson's Alohanet carried positive acknowledgments at the link layer – a bilateral mechanism Metcalfe consciously removed in 1973 to obtain Ethernet's simple, ACK-free packet format. The result is a fifty-year bilateral zigzag: Aloha (bilateral) to Ethernet (unilateral) to the EFTP end-dally (bilateral) to TCP (unilateral-with-bilateral-above). We formalize bilateral efficiency, connect it to the back-to-back Shannon channel with Perfect Information Feedback, and – scoping the claim explicitly to intra-rack distances of one meter or less – describe how the Open Aethernet link recovers mutual knowledge at the link layer. The correction to Table 1 is not a different set of numbers. It is a different question.

18.
medRxiv (Medicine) 2026-06-22

Image-based deep learning for emergency electrocardiogram classification

Automated electrocardiogram analysis has advanced largely through digital waveforms, yet many emergency-care workflows rely on ECGs available only as printed tracings, scanned reports, PDFs or mobile photographs. We developed an image-based deep learning system for emergency ECG classification and evaluated it in InCor-EMG, an expert-adjudicated dataset of 18,519 emergency ECGs spanning 12 ECG categories, with labels from 19 cardiologists. On the held-out test set, the final ConvNeXt ensemble achieved a macro F1-score of 0.807 (95% CI, 0.788-0.825), compared with 0.820 (95% CI, 0.805-0.832) for annotating cardiologists, and higher F1-scores than Mortara Veritas in most evaluated categories. Performance was associated more strongly with inter-reader agreement than with training sample size and remained informative across scanned and photographed ECGs, with supportive performance in model-enriched temporal and heterogeneous public-image evaluations. These findings support ECG image classification when digital waveforms are unavailable.

19.
arXiv (CS.CV) 2026-06-16

Learn Temporal Consistency For Robust Satellite Video Detector

Satellite video object detection (SVOD) for oriented and fine-grained objects plays an important role in satellite applications. Most existing SVOD methods only focus on one or a few coarse-grained categories of moving objects and represent objects with horizontal bounding boxes. They have difficulty extracting complete, accurate, and consistent information about objects in whole satellite videos. In this paper, we propose a satellite video object detection framework based on Temporal Consistency Learning (TCL). TCL adeptly detects oriented and fine-grained objects by leveraging the rich temporal contexts within satellite videos. The framework integrates three key modules: temporal and fine-grained feature aggregation (TFA), structure encoding (SE), and temporal consistency constraint (TCC). TFA and TCC modules facilitate consistent representation learning across frames, while the SE module encodes both appearance and structural information for precise fine-grained recognition. Experimental results on the SAT-MTB benchmark dataset demonstrate TCL's superior performance, achieving a new state-of-the-art oriented and fine-grained detection accuracy of 47.7% mAP–a 4.8% improvement over the baseline. Furthermore, our TCL framework readily accommodates existing image-based detectors, leading to enhanced detection accuracies.

20.
arXiv (quant-ph) 2026-06-12

Entropic order parameters and topological holography

arXiv:2512.24225v2 Announce Type: replace-cross Abstract: We show that the symmetry topological field theory (SymTFT) construction, also known as the topological holography, provides a natural and intuitive framework for the entropic order parameter characterising phases with (partially) broken symmetries. Various examples of group and non-invertible symmetries are studied. In particular, the origin of the distinguishability of the vacua resulting from spontaneously broken non-invertible symmetries is made manifest with an information-theoretic perspective, where certain operators in the SymTFT are excluded from observation.

21.
arXiv (CS.LG) 2026-06-15

Private Prediction via PAC Privacy

arXiv:2601.14033v2 Announce Type: replace Abstract: Machine learning models are increasingly served behind APIs. This renders private prediction, i.e., privatizing a model's outputs rather than its parameters, a natural privacy target: model outputs are lower-dimensional and far more stable to training-data changes than weights. While differential privacy (DP) cannot effectively exploit this as it calibrates noise to worst-case sensitivity that is intractable to bound for non-convex models, we argue that PAC privacy is a natural fit for private prediction. It is instance-based, and calibrates noise to a black-box function's empirical stability to control mutual-information (MI) leakage. The missing ingredient is efficient, adaptive composition. Serving predictions means answering a long stream of adaptively chosen queries from untrusted users; existing composition either fails under adaptivity, grows quadratically, or reverts to input-independent, DP-like noise. We close this gap with a new adversarial composition result via adaptive noise calibration and prove that MI accumulates only linearly under adaptive and adversarial querying. Experiments across modalities show that prediction stability enables high utility even at a tiny per-query budget: on CIFAR-10, we achieve 87.79% accuracy with a per-query MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership-inference success to 51.08% – the same guarantee as $(0.04, 10^{-5})$-DP. Further, in the presence of auxiliary public data, the large volume of PAC-private predictions enables us to distill a publishable model that can be queried without limit. Concretely, 210,000 private labels on an ImageNet subset distill into a student reaching 91.86% accuracy on CIFAR-10 with membership inference success bounded by 50.49%, comparable to $(0.02, 10^{-5})$-DP.

22.
arXiv (quant-ph) 2026-06-24

Phase-space microscopes for quantum gases: Imaging conjugate variables and momentum-weighted densities

arXiv:2603.29568v2 Announce Type: replace-cross Abstract: Quantum gas microscopes offer unprecedented insights into quantum many-body states of cold atomic gases. Here we introduce concrete protocols for extending quantum gas microscopes to measure in phase space, by mapping momentum onto auxiliary degrees of freedom and using positive operator-valued measures. We distinguish between two distinct operational modes. In the Husimi-Q phase space microscope, position and momentum are jointly measured; in this mode the fundamental quantum noise is distributed between position and momentum. Conversely, the averaged-mode phase space microscope extracts the spatial dependence of averages of the momentum density (and its moments); these averages can be retrieved with arbitrary spatial resolution. We illustrate the utility of these techniques in diverse physical settings.

23.
arXiv (CS.LG) 2026-06-19

Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting

arXiv:2606.19560v1 Announce Type: new Abstract: Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.

24.
Nature (Science) 2026-06-17

Emergent decadal predictability in Antarctic contribution to sea-level rise

Despite large uncertainties associated with future mass loss from the Antarctic Ice Sheet, ice-sheet models show that the rate of sea-level rise from Antarctic ice loss in 2025 is strongly predictive of the rate for the next several decades, regardless of emission pathway or model complexity. This finding is robust across all models that were considered in the Intergovernmental Panel on Climate Change Sixth Assessment Report global mean sea-level projections, including the low-likelihood, high-impact scenarios of sea-level rise. Given this strong near-term decadal predictability, ice-sheet models that can accurately reproduce present-day ice-mass loss provide a reliable basis for near-term sea-level planning and adaptation through to mid-century. The predictability breaks down by the end of the twenty-first century as feedbacks, such as those related to marine ice-sheet retreat, begin to emerge, leading to accelerating ice loss. Drawing on these results, we identify key feedback mechanisms that can account for the transition between near-term decadal predictability and the longer-term, feedback-driven evolution, and suggest priorities for ice-sheet model development aimed at resolving long-term sea-level rise uncertainty. Although Antarctic ice loss projections diverge widely by 2100, this Perspective shows that present-day rates robustly predict mid-century sea level rise, providing a firm basis for near-term planning, while highlighting priorities for model development aimed at resolving longer-term sea level rise uncertainty.

25.
arXiv (quant-ph) 2026-06-24

Electrical-Circuit Simulation of the Uhlmann Phase

arXiv:2606.24559v1 Announce Type: new Abstract: The Uhlmann phase extends the concept of geometric phases to mixed quantum states through a parallel-transport condition on purification amplitudes, but its experimental realization has so far required sophisticated quantum platforms with carefully engineered auxiliary degrees of freedom. In this work, we reformulate the Uhlmann parallel-transport condition as a linear matrix differential equation and vectorize it to obtain an effective dynamical generator. This generator can be directly mapped onto the admittance matrix of a classical RC circuit, thereby translating the Uhlmann dynamics into the evolution of circuit node voltages. We illustrate the mapping using the equatorial-loop model and, via a rotating-frame transformation followed by a real decomposition, derive a time-independent, real-valued dynamical system suitable for analog implementation. LTspice simulations of the resulting active RC network faithfully reproduce the Uhlmann geometric phase and its topological transition at the critical purity, demonstrating that classical electrical circuits offer a simple and accessible platform for probing mixed-state geometric phases.