Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-12

Generalized Exact Fractional Quantum Information Model with Memory Effects

arXiv:2606.13525v1 Announce Type: new Abstract: In this paper, we analyze quantum information measures in fractional quantum mechanics using the Riemann-Liouville derivative formalism adopted here. In this case, we initially reconsider the conventional definitions of Shannon entropy and Fisher information, subsequently extending them to fractional quantum systems described by nonlocal differential operator frameworks adopted. Within this generalized formulation, fractional expressions of Shannon entropy and Fisher information are constructed and their mathematical structures examined thoroughly. Also, the formalism is then applied to the quantum harmonic oscillator, yielding explicit analytical expressions derived as functions of the fractional parameter therein. The obtained results demonstrate that fractional derivatives alter the localization properties of probability densities and generate nontrivial variations in information content and sensitivity across system behavior. In this context, the fractional parameter plays a central role in controlling deviations from the standard quantum information measures framework. Also, the study establishes a consistent framework for describing information-theoretic properties of quantum systems governed by nonlocal dynamics.

02.
arXiv (CS.AI) 2026-06-12

Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement

arXiv:2606.12834v1 Announce Type: new Abstract: As scientific workflows shift from deterministic executables to LLM-based agents, the development practices on offer, such as fine-tuning, reinforcement learning, and prompt-and-go, bury the scientist's judgment. We propose treating agent construction as a workflow stage and introduce AgentBuild, which builds a scientific agent from a contract the scientist authors. The contract is a version-controlled rubric, a difficulty-graded curriculum, and a curated external knowledge base. A rubric-driven judge gates a meta-optimizer coding agent that edits the agent within a declared boundary, so the build compiles the agent, not the scientist's judgment. We instantiate this for Rietveld refinement of X-ray diffraction data through GSAS-II behind MCP and A2A, where a blank-harness construction run progresses through a lithium lanthanum zirconium oxide (LLZO) signal-to-noise ladder, reaches the 4 hour scan as a frontier case, and exposes the workflow-scope limits that remain. The same rubric that rewards credible fits also scores trajectory scope, making the frontier a contract failure rather than a pattern-fitting failure. As base models evolve, re-running AgentBuild is a re-tune, not a rebuild, and the scientist's authored contract remains the durable asset.

03.
arXiv (CS.CV) 2026-06-12

Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation

Vision-language models (VLMs) have become a common foundation for vision-and-language navigation in continuous environments (VLN-CE). Yet most VLM-based methods cast navigation as low-level action prediction, an interface that is ambiguous, tied to short-horizon motion primitives, and inefficient due to repeated VLM querying. We propose Goal2Pixel, a pure pixel-based paradigm that reformulates VLN-CE as navigable pixel grounding. Rather than predicting actions, Goal2Pixel uses the image plane as a unified spatial interface between VLM reasoning and robot motion: the model predicts a visible navigable pixel to the agent, which is back-projected into a 3D waypoint for forward navigation. For non-forward actions, we append auxiliary directive regions to the image plane, where the left/right/bottom regions are interpreted as turning left, turning right, and stopping, respectively. To enable long-horizon navigation, we propose a visibility-aware keyframe memory for compact and informative history representation. To adapt pretrained VLMs to navigable pixel grounding, we introduce semantic embeddings and coordinate-aware auxiliary losses. Goal2Pixel achieves competitive state-of-the-art performance while requiring fewer VLM inference calls than prior methods. On R2R-CE Val-Unseen it achieves 54.1% SR and 52.5% SPL with just 7.75 VLM calls per episode, 6x fewer than the 46.62 required by direct action prediction at 32.9% SR. The same trend holds on RxR-CE.Project Page: https://baobao0926.github.io/Goal2Pixel/.

04.
arXiv (CS.AI) 2026-06-15

Quantile-Free Uncertainty Quantification in Graph Neural Networks

arXiv:2605.04847v2 Announce Type: replace-cross Abstract: Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice, and achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22% higher coverage and 50% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

05.
arXiv (CS.AI) 2026-06-16

Quantifying the Impact of Lossy Compression on Neural Generative Surrogate Modeling

arXiv:2606.15959v1 Announce Type: cross Abstract: Neural networks are used as generative surrogate models for scientific discovery, which are trainable approximations of scientific simulations. These models enable users to replace time-consuming numerical simulations with learned alternatives, providing quick solutions. However, high-fidelity generative surrogate models require massive training datasets, which can create storage and I/O challenges. Lossy compression is a promising way to reduce this burden, but compression errors may affect the model quality in subtle ways, making it challenging to quantify their impact. In this work, we examine how lossy compression of training data impacts the quality of generative surrogate models. We begin by characterizing the uncertainty inherent in training neural networks, showing that identical training configurations can produce different models. By exploiting this variability, we propose a method to estimate how much compression-induced error a surrogate model can tolerate without affecting its accuracy. Evaluation of two application simulations demonstrates that our approach significantly reduces memory/storage requirements and speeds up training while producing high-quality surrogate models. These results show that lossy compression saves data storage up to 23.7x and 39x with negligible impact on the quality of the surrogate model. Meanwhile, reducing the size of the training data set also enhances the data loading speed and reduces the training time by up to 3x.

06.
arXiv (CS.CV) 2026-06-17

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that reliability follows from "structural" visual perception: tight attention on relevant regions should signal a trustworthy answer, while scattered attention signals confusion. We challenge this through the VLM Reliability Probe (VRP), a systematic cross-family study of reliability signals in contemporary Vision-Language Models (VLMs). We introduce structural-attention metrics, cluster counts (C_k) and spatial entropy (H_s), to quantify the visual encoder's gaze, and track its evolution (Delta H_s) across layers. This reveals a "Symbolic Detachment": models often "Early Lock" visual features only to diffuse attention later, severing early perception from final generation. Contrary to the grounding hypothesis, we find a "Cluster Failure": spatial attention has near-zero correlation (R approx 0.001) with accuracy. Instead, reliability is a phenomenon of generation dynamics and internal-state distributions. Self-Consistency, the agreement rate across sampled reasoning paths, is the dominant predictor of truth (R = 0.429). Scaling causal interventions exposes a sharp architectural divergence: LLaVA locks its prediction in a fragile late-stage bottleneck, whereas PaliGemma and Qwen2-VL distribute reliability globally, staying resilient even when ~50% or more of their most predictive layer is destroyed. For current VLMs, reliability signals are detached from visual grounding maps and are best inferred from generation-time dynamics and hidden-state probes.

07.
arXiv (CS.CL) 2026-06-24

Same Lesson, Different Story: Cross-Lingual Reconstruction of Cultural Narratives in Large Language Models

The evaluation of cultural grounding context becomes complex when multiple cultures convey the same moral lesson. This challenge is particularly relevant to large language models (LLMs), which produce narratives across a wide range of languages and cultural contexts. However, it remains uncertain whether these models preserve culturally grounded meaning when equivalent moral lessons are conveyed through distinct cultural forms. This study introduces a multilingual evaluation narrative framework that integrates a cross-linguistic collection of 414 proverbs spanning 15 languages and uses four LLMs to generate 13k narratives. By employing semantically equivalent proverbs as culturally grounded prompts, the analysis assesses whether models preserve meaning across languages, how cross-lingual conditioning influences narrative realization, and whether different model families converge on similar interpretations. Results indicate that cross-lingual prompting largely preserves proverb-level semantic meaning while systematically redistributing agency, social positioning, and narrative structure. Additionally, strong inter-model convergence is observed in both monolingual and cross-lingual settings, suggesting that multilingual LLMs rely on shared semantic abstractions despite architectural and linguistic differences. These findings shed light on the need for more comprehensive evaluations of cultural grounding. Relying exclusively on semantic similarity in multilingual narrative assessments may overestimate cultural preservation by neglecting culturally meaningful variations in narrative expression.

08.
arXiv (CS.CL) 2026-06-19

Quality Over Clicks: Iterative Reinforcement Learning for Early-Stage E-Commerce Query Suggestion

Existing dialogue systems rely on query suggestion to enhance user engagement. Recent approaches mainly optimize generative models using click-through rate (CTR) models to align with user preferences. However, these methods are less effective in early-stage deployment scenarios, where click feedback is sparse and insufficient for training a reliable CTR model. To bridge this gap, we propose QualEQS, a quality-first iterative reinforcement learning framework for e-commerce query suggestion. We formalize actionable suggestion quality along three dimensions that directly affect downstream usability: answerability, factuality, and information gain. To continuously improve from online traffic without click supervision, we further propose group-level disagreement among candidate suggestions to identify ambiguous query contexts and mine hard training cases for iterative refinement. We also introduce EQS-Benchmark, a dataset of 16,949 real-world e-commerce queries for offline training and evaluation. Experiments show that our quality-based offline metrics correlate strongly with online performance, providing a practical evaluation recipe for sparse-feedback deployment. In both offline and online settings, QualEQS consistently outperforms strong baselines, yielding a 6.81% improvement in online ChatPV in a real-world enterprise-level conversational shopping assistant system.

09.
medRxiv (Medicine) 2026-06-22

Reliable quantification of renal function from frozen blood samples

BACKGROUND: Differences in renal function may affect Alzheimer disease (AD) blood biomarker levels independent of AD pathology. Although renal function was unaccounted for in foundational AD blood biomarker studies, there is potential to address this through quantification of estimated glomerular filtration rate (eGFR) from frozen serum and plasma samples. However, the validity of eGFR evaluation from long-term frozen blood samples is unknown. METHODS: Adults aged 50-85 with at least 2 vascular risk factors were recruited from vascular surgery or cardiology clinics in Tucson, Arizona from 2022-2025. Individuals with creatinine assessments in point-of-care whole blood (POC-WB) and frozen serum and plasma samples using the iSTAT (Abbott) were included. eGFR was calculated using the 2021 CKD-EPI creatinine equation without race. Agreement between POC-WB and frozen blood samples was assessed using Cohen's kappa with linear weights. RESULTS: 134 participants (mean [SD] age: 72.6 [7.5] years, 39.6% female, 23.1% chronic kidney disease) had POC-WB eGFR available. Frozen serum and plasma samples had strong agreement with POC-WB for eGFR (Kw= 0.90-0.95, P

10.
arXiv (CS.CL) 2026-06-12

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain outputs. As an example, one might be interested in which samples in the data could be the source of toxic behavior after training the LLM. Many methods quantify this conditioning through the paradigm of influence functions. While methods of this family are effective in its function, they lack the necessary processing speed and storage compactness to be practically implemented on large datasets. We propose a method, Influcoder, as a quick and cost-effective approach to influence-based Data Attribution at scale.

11.
arXiv (CS.AI) 2026-06-24

MVG-KAN: Multi-View Geo-Wind Guided KAN for PM$_{2.5}$ Forecasting

arXiv:2606.24347v1 Announce Type: new Abstract: Accurate short-term PM$_{2.5}$ forecasting is important for public health protection, air-quality early warning, and urban environmental management. However, PM$_{2.5}$ variation is driven by multiple coupled factors, including stable periodic changes induced by human activities and meteorological regularity, station-specific short-term concentration evolution, and meteorology-driven pollutant dispersion among monitoring stations. Existing spatio-temporal forecasting methods may capture station relationships to some extent, but distance-only, correlation-based, or purely adaptive graphs are often insufficient to comprehensively represent these heterogeneous factors, especially wind-direction-dependent pollutant transport. To address this problem, we propose a Multi-View Geo-Wind Guided KAN model for PM$_{2.5}$ forecasting, named MVG-KAN, which models station-level PM$_{2.5}$ evolution from three complementary views: local periodic regularity, station-wise residual temporal dynamics, and meteorological-environment-guided spatial dispersion. Specifically, the periodic-residual forecasting backbone first separates stable daily and weekly patterns from non-periodic residual variations. A Geo-Wind Graph is constructed by combining geographic distance decay with wind-direction- and wind-speed-aware transport, providing a lightweight physically motivated directed spatial prior for residual propagation among stations. In addition, a temporal Kolmogorov-Arnold network (TKAN) residual head is then introduced to learn station-wise nonlinear autoregressive correction from de-periodized PM$_{2.5}$ residuals and historical multi-pollutant sequences, thereby enhancing the modeling of local residual inertia and pollutant co-variation.

13.
arXiv (CS.AI) 2026-06-16

NVMOS: Non-Verbal Vocalization Quality Assessment in Speech

arXiv:2606.15888v1 Announce Type: cross Abstract: Non-verbal vocalizations (NVs), such as laughter, sighs, and coughs, are important acoustic cues for emotion and intent. Existing speech quality assessment methods typically focus on overall naturalness, while non-verbal TTS evaluations mainly examine whether a target NV appears with the correct type and position. However, the perceptual quality of NV events themselves remains underexplored. To address this gap, we construct an NV-MOS dataset containing outputs from multiple NV-TTS systems and naturally occurring NV samples, with ratings collected from three acoustic experts on a perceptual quality scale. We further analyze audio-capable multimodal large language models such as Gemini and find clear inconsistencies between their scores and expert ratings. These results suggest that general-purpose multimodal models cannot reliably replace human judgments for NV quality assessment. We then propose NVMOS, to our knowledge the first model that can reliably predict the perceptual quality of NV events in speech. Experimental results show that, with a local NV-event focusing module, NVMOS reaches expert-level or stronger agreement with human MOS.

14.
arXiv (CS.AI) 2026-06-16

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

arXiv:2606.16590v1 Announce Type: cross Abstract: Exploration in deep reinforcement learning (RL) is commonly implemented as temporally uncorrelated white noise. However, recent works show that temporally correlated colored noise can improve exploration efficiency by producing smooth trajectories with better coverage of the state space. We inquire whether action noise inspired by infant spontaneous movements can also improve exploration in deep RL. We find that the power spectral densities of babies' end-effector velocities follow a colored noise process where the spectral exponent increases with age. Inspired by this developmental pattern, we introduce a mechanism that progressively increases the temporal auto-correlation of exploration noise during RL training, matching the infant statistics. Experiments across several RL environments show that infant-inspired noise produces structured exploratory behavior and can improve learning efficiency compared to conventional exploration strategies. These findings suggest that human motor and cognitive development can provide useful guidance for designing learning mechanisms in artificial agents. Our code is available at https://github.com/trieschlab/baby-noise-rl.

15.
PLOS Medicine 2026-05-14

First-trimester nonsteroidal anti-inflammatory drugs exposure and risk of major congenital malformations: A retrospective register-based cohort study

by Ariel Avraham Hasidim, Itamar Ben Shitrit, Daphna Idan, Tal Michael, Amalia Levy, Gali Pariente, Eitan Lunenfeld, Sharon Daniel Background Pain and fever are common in early pregnancy, yet their management poses a major clinical dilemma. Although not confirmed, recent studies have raised safety concerns regarding acetaminophen. Evidence on the use of nonsteroidal anti-inflammatory drugs (NSAID) in the first trimester remains inconclusive. This uncertainty has left clinicians with limited evidence to guide treatment decisions. This study evaluated the association between first-trimester NSAID exposure and the risk of major congenital malformations (MCMs) in a large, population-based cohort of pregnancies. Methods and findings We conducted a population-based retrospective cohort study within the Southern Israeli Pregnancy Registry (siPREG) project, including all singleton pregnancies of women aged 15–45 years resulting in live births, stillbirths, or elective terminations for fetal malformations at a Soroka University Medical Center between 1998 and 2018. Pregnancies exposed to established teratogens, multiple gestations, and those with documented genetic or chromosomal anomalies were excluded. First-trimester NSAID exposure was defined by pharmacy dispensations (overall and by specific agents). MCMs were identified from linked clinical, hospitalization, and termination records through the first postnatal year.Propensity scores were estimated using covariates selected via a directed acyclic graph, including maternal age, ethnicity, diabetes, medical indication for NSAID use, exposure to other antipyretics, obesity, smoking, folic-acid use, gravidity, perinatal care, and year of pregnancy. Generalized full matching was used to balance covariates. Adjusted risk ratios were derived using weighted Poisson regression with G-computation, and two-way cluster-robust standard errors, jointly clustering by maternal identifier and matching subclass. Sensitivity analyses included a dose–response assessment across defined-daily-dose (DDD) categories and a tipping-point analysis evaluating the impact of potential misclassification from unrecorded over-the-counter NSAID use.A total of 264,858 singleton pregnancies were included in the final cohort; 20,202 (7.6%) were exposed to NSAID, most commonly ibuprofen (5.1%), diclofenac (1.6%), and naproxen (1.2%). NSAID exposure, in total and as individual agents, was not associated with MCMs overall (8.2% versus 7.0%; matched-adjusted-Relative Risk (aRR) = 0.99 (95% CI [0.90,1.10])) or with organ-system-specific MCMs, including cardiovascular (matched-aRR = 1.05 (95% CI [0.92,1.20]), musculoskeletal (matched-aRR = 1.03 (95% CI [0.77,1.39])), central nervous system (matched-aRR = 0.77 (95% CI [0.53,1.11])), cleft palate (matched-aRR = 0.95 (95% CI [0.47–1.91])), gastrointestinal (matched-aRR = 1.03 (95% CI [0.64–1.63])), and genitourinary (matched-aRR = 0.99 (95% CI [0.72,1.35])) malformations. Dose–response analyses showed no significant association with MCMs across cumulative NSAID exposure: short-term (1–7 DDD, matched-aRR = 1.06 (95% CI [0.97,1.15]), medium-term (8–21 DDD, matched-aRR = 1.10 (95% CI [0.99,1.22]), and long-term (>21 DDD, matched-aRR = 1.24 (95% CI [0.94,1.63])). The main limitation was the potential for minor exposure misclassification due to over-the-counter availability of ibuprofen, although sensitivity analyses simulating such misclassification suggested minimal impact on the risk estimates. Conclusion In this large, population-based cohort, we found no evidence supporting an association between first-trimester exposure to NSAID and MCMs, providing reassuring evidence regarding their fetal safety in early pregnancy.

16.
arXiv (quant-ph) 2026-06-25

Sp(2N, R) interferometry in multi-mode Gaussian bosonic systems for optimal metrology and quantum control

arXiv:2606.25768v1 Announce Type: new Abstract: Multi-mode interferometers for bosons in Gaussian states are important systems for quantum metrology with precision beyond the standard quantum limit and for bosonic quantum computing. However, there is a lack of theoretical foundation for generic $N$-mode Gaussian interferometry. In this work, we study quantum metrology and quantum control in multi-mode bosonic systems with quadratic Hamiltonians, exploiting the fundamental Sp$(2N,R)$ symmetry of such interferometers. We show that the optimal quantum control to maximize sensitivity requires aligning squeezing and displacement in the same direction. We propose Sp$(2N,R)$ echo, a multi-mode generalization of the SU$(1,1)$ interferometry, to achieve the sensitivity of phase estimation set by the quantum Fisher information. In addition, we introduce a geometrical means for reversing many-body dynamics with Sp$(2N,R)$ dynamical symmetry, such as dynamics of the bosonic Kitaev chain. Our schemes are readily realizable in optical, atomic, and mechanical platforms.

17.
medRxiv (Medicine) 2026-06-17

Impact of the disposable vape ban in Great Britain: a representative interrupted time-series study 2022-2026

Objective: To examine changes in vaping and smoking trends following the announcement and implementation of the disposable vape ban in Great Britain. Design: Interrupted time-series analysis of representative monthly cross-sectional data from the Smoking Toolkit Study. Setting: Great Britain. Participants: 118,946 adults ([≥]16y), including 12,042 young adults (16-24y), surveyed between Jan-2022 and Feb-2026. Main outcome measures: Changes in trends in disposable vape use among vapers, and current vaping and smoking prevalence, using seasonally-adjusted generalised additive models with comparisons against a no-ban counterfactual in which pre-announcement trends continued unchanged. Results: The proportion of vapers mainly using disposable devices began to decline following the announcement of the ban in Jan-2024, with the fall accelerating after implementation in June-2025. By Feb-2026, 5.6% (95%CI 4.6-6.9) of adult vapers and 7.1% (5.1-10.1) of young adult vapers mainly used disposables, compared with 62.0% (53.6-71.8) and 63.6% (52.7-76.7), respectively, under a no-ban counterfactual. Increases in vaping prevalence slowed post-announcement and plateaued post-implementation; by Feb-2026, prevalence was lower than the no-ban counterfactual in adults (13.6% v 18.8%; difference -5.2 percentage points, 95%CI -7.1 to -3.3) and young adults (27.8% v 39.1%; -11.3, -18.6 to -4.1). Declines in smoking prevalence stalled among adults and reversed among young adults post-announcement, before shifting downward again post-implementation; by Feb-2026, smoking prevalence was similar to the no-ban counterfactual in adults (difference +0.9 percentage points, -0.5 to +2.2) but possibly higher in young adults (+3.3, -0.5 to +7.1). Conclusions: The disposable vape ban in Great Britain was associated with substantial changes after both announcement and implementation, including a marked reduction in disposable vape use and a slowing then plateauing of growth in overall vaping prevalence. However, declines in smoking also temporarily slowed–and among young adults, reversed–after the announcement, before downward trends resumed after implementation.

18.
arXiv (CS.CL) 2026-06-12

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in RAG settings remains poorly understood. We conduct a systematic study across four benchmarks, revealing that LLMs exhibit poor calibration performance especially when noisy contexts are retrieved. Specifically, contradictory or irrelevant evidence tends to exacerbate the model's overconfidence issue. To address this, we propose NOVA Rules (NOise-Aware Verbal Confidence CAlibration Rules) to provide a principled foundation for resolving overconfidence under noise. We further design NOVA, a noise-aware calibration framework that synthesizes supervision from ~2K HotpotQA examples guided by these rules. By performing supervised fine-tuning (SFT) with this data, NOVA equips models with intrinsic noise awareness without relying on stronger teacher models. Empirical results show that NOVA yields substantial gains, improving ECE scores by 10.9% in-domain and 8.0% out-of-domain. By bridging the gap between retrieval noise and verbal calibration, NOVA paves the way for both accurate and epistemically reliable LLMs.

19.
medRxiv (Medicine) 2026-06-17

Accounting for Human Movement to Improve Exposure-Health Models

Background. Current exposure-health models rely on averaged, residential-based environmental exposures, failing to account for human movement. This aggregation can lead to exposure misclassification and biased exposure-response estimates, potentially distorting our understanding of the true health effects of environmental conditions. We developed exposure disaggregation regression models that explicitly account for human movement when linking environmental exposures to health outcomes. Methods. By weighting pixel-level exposures according to distance from home as a simple proxy for human movement, our model linked disaggregated environmental exposures to individual-level health outcomes. Weights were either fixed a priori or derived from a latent distance-decay power parameter learned from the data. We additionally evaluated model performance under a nonlinear exposure-response relationship. Model performance was assessed across multiple sample sizes (N = 1,114; 50,000; and 100,000). A simulation study examined parameter recovery using bias, empirical standard error (EmpSE), and credible interval coverage. As a case study, Demographic and Health Surveys (DHS) data from Albania were used to link acute respiratory infection (ARI) outcomes among children under five to pixel-level NDVI within a 3 km buffer around DHS cluster centroids, and the proposed models were applied to these data. Results. Across all models (fixed-weight, learned-weight, and restricted cubic spline models), parameter recovery improved with increasing sample size. At N = 1,114, estimates were biased and imprecise, with incorrect effect direction for exposure-response parameters (e.g., learned-weight {beta}1 bias = - 0.79; EmpSE = 2.61; coverage = 0.88). In contrast, the models accurately recovered parameters at larger sample sizes, including the latent distance-decay parameter (bias = - 0.02; EmpSE = 0.15; coverage = 0.95 at N = 100,000), demonstrating their ability to reliably learn movement-based exposure weights when sufficient data were available. Conclusion. Instead of relying on arbitrarily-sized buffers, this statistical framework provides a novel method for studying environmental exposure-health relationships whilst accounting for human movement. With sufficiently large sample sizes, it can accurately estimate the influence of disaggregated environmental exposures on individual-level health and help address exposure misclassification arising from residential-only metrics. This methodological framework remains scalable, interpretable, and adaptable to other exposures and outcomes, offering a foundation for future work that integrates richer mobility-informed exposure-health research.

20.
arXiv (quant-ph) 2026-06-17

Independent Chiral Control in Theory-Space Models:A Rank-Preserving Framework and Its Application to Neutrino Mass Generation

Authors:

arXiv:2409.09033v3 Announce Type: replace-cross Abstract: We develop a general framework of rank-preserving, element-wise matrix transformations for engineering fermion mass hierarchies in theory-space constructions. We prove that preservation of massless modes requires the transformation function to be separable, $g_f(i,j)=g^{(L)}_f(i)g^{(R)}_f(j)$, which in turn enables independent control of left- and right-chiral zero-mode profiles directly at the level of the theory-space mass matrix. This formalism unifies and extends the clockwork mechanism, permits controlled deformation of Kaluza–Klein spectra, and enhances hierarchy generation in GIM-like fine-cancellation scenarios. As a concrete application, we show that in theory-space models for neutrino masses, suitable transformations allow sub-eV light neutrinos to arise from TeV-scale new physics with only $\mathcal{O}(40)$ additional fermionic sites, while remaining consistent with charged-lepton flavor-violation bounds. In contrast, the corresponding untransformed models asymptote at the MeV scale and cannot access the phenomenologically required regime without extreme field multiplicities or hierarchical parameters.

21.
arXiv (quant-ph) 2026-06-24

On estimating Schatten norm and power distances between quantum states

arXiv:2505.00457v3 Announce Type: replace Abstract: We study the computational complexity of estimating the quantum Schatten $\alpha$-norm distance $T_\alpha(\rho_0,\rho_1)$, given $poly(n)$-size state-preparation circuits of $n$-qubit quantum states $\rho_0$ and $\rho_1$. This quantity serves as a lower bound on the trace distance and, for $\alpha > 1$, is interchangeable with its powered version $\Lambda_\alpha(\rho_0,\rho_1)$. For any constant $\alpha > 1$, we develop an efficient rank-independent quantum estimator for $T_\alpha(\rho_0,\rho_1)$ with time complexity $poly(n)$, achieving an exponential speedup over the prior best results of $\exp(n)$ due to Wang, Guan, Liu, Zhang, and Ying (TIT 2024). When $01$, QSD$_{\alpha}$ is $\sf BQP$-complete. 2. For any $1 \leq \alpha(n) \leq 1+negl(n)$, QSD$_\alpha$ is $\sf QSZK$-complete, implying that no efficient quantum estimator for $T_\alpha(\rho_0,\rho_1)$ exists unless ${\sf BQP}={\sf QSZK}$. This $\sf QSZK$-hardness result also extends to the promise problem defined by $\Lambda_\alpha(\rho_0,\rho_1)$ for constant $0

22.
arXiv (CS.AI) 2026-06-17

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

arXiv:2606.18206v1 Announce Type: new Abstract: Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagation problem induced by depth as the halting decision is postponed. In this paper, we address this signal propagation issue using pre-norm layers and residual scaling. Building on these architectural modifications, we propose FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. We show that fixed-point halting allows FPRM to adapt its compute to task difficulty. FPRM is effective on common reasoning benchmarks, namely Sudoku, Maze, state-tracking, and ARC-AGI.

23.
arXiv (CS.CV) 2026-06-25

Color Matters: Trigger Color Affects Success in Federated Backdoor Attacks

Federated learning is vulnerable to backdoor attacks in which malicious clients inject poisoned updates while preserving benign-task performance. In this paper, we study a semantics-driven backdoor mechanism in which attackers use natural visual accessories as triggers and manipulate only the trigger color while keeping the attack pipeline fixed. Our framework considers semantic trigger objects such as masks and sunglasses, instantiated in black and white variants, and evaluates their effect in a controlled federated learning setting. Malicious clients construct poisoned samples by applying a trigger to source-class images and relabeling them to an attacker-chosen target class, while benign clients train only on clean data. We analyze this mechanism under both a standard poisoning objective and a stronger SABLE-based objective that combines clean classification loss, triggered target loss, feature-separation loss in the penultimate representation space, and regularization to keep malicious updates close to the global model. This design enables the attack to remain effective while reducing excessive update drift. Experiments on a four-class CelebA hair-color task show that trigger color significantly changes attack success rate even when trigger semantics, placement, and poisoning budget are unchanged. White triggers are more effective for attacks targeting the blond class, whereas black triggers perform better for attacks targeting the black class. The same trend persists under robust aggregation, showing that trigger color is a meaningful factor in the operation, persistence, and evaluation of semantic backdoor mechanisms in federated learning.

24.
arXiv (CS.AI) 2026-06-24

Evaluating the Interpretability of Sparse Autoencoders with Concept Annotations

arXiv:2606.24716v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract interpretable concepts from vision and vision language models, yet existing evaluation methods largely rely on proxy metrics or qualitative inspection rather than measuring semantic correspondence. We present a human-grounded evaluation framework that quantifies alignment between SAE latents and human-annotated concepts, without requiring user studies, and validate this matching through targeted attribute perturbations. To enable this intervention-style evaluation in vision, we construct synCUB and synCOCO, synthetic benchmarks of paired images that differ in exactly one attribute. We introduce Fully-Binary Matching Pursuit (FBMP), a coalition-based matching procedure that supports many-to-one mappings between SAE latents and annotated concepts, and consistently outperforms one-to-one baselines. For functional validation, we propose a Targeted Attribute Perturbation Alignment Score (TAPAScore), which tests whether matched concepts respond selectively and in the expected direction under targeted image-level attribute perturbations. Under sanity checks, our matching and TAPAScore are the only evaluated metrics that reliably distinguish trained SAEs from untrained ones. Across SAEs trained on CLIP and DINOv2 embeddings, we find that increased overcompleteness can reduce perturbation alignment, indicating a reduction in interpretability. Our evaluation framework suggests that moderate dictionary sizes provide the best trade-off, yielding the most interpretable SAEs. Code and datasets are available at https://github.com/JonasKlotz/sae-concept-eval.

25.
Science (Express) 2026-05-06

A 481-meter-high landslide-tsunami in a cruise ship–frequented Alaska fjord | Science

Authors: Unknown Author

Early in the morning of 10 August 2025, a >64 × 10 6 m 3 landslide struck Tracy Arm fjord in Alaska. The landslide was preconditioned by glacial retreat caused by climate change. The resulting 481 m runup megatsunami followed an initial 100-m-high breaking wave traveling >70 m s −1 . The landslide was preceded by several days of microseismicity, which increased in rate and magnitude until ~1 hour before failure. The landslide produced globally observed long-period seismic waves equivalent in size to a M5.4 earthquake. A long-period (~66 s) global seismic signal, produced by a landslide-induced seiche trapped within the fjord, persisted for up to 36 hours, the second time a days-long seiche has been thus observed. With fjord regions increasingly visited by cruise ships, and climate change making similar events more likely, this unanticipated, near-miss event highlights the growing risk from landslides and tsunamis in coastal environments.