Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-19

Electrical Noise Produced by Micron-Sized Particles above a Surface Paul Trap

arXiv:2606.19585v1 Announce Type: new Abstract: Electric field noise produced by the surface of ion trap electrodes reduces the fidelity of quantum computing operations. Despite decades of investigation its microscopic origins remain unclear. Here, we measure electric field noise at trapping locations along the symmetry axis of a linear surface Paul trap. We find that noise levels vary by three orders-of-magnitude in one 600$\,\mu$m section of the trap. Optical and scanning electron microscope images show micron-sized particles close to the trapping locations with the highest noise levels. We find that modeling the particles as a lossy dielectric with a effective loss tangent $\tan\theta=0.33(0.06)$ describes the magnitude of the noise, as well as its spatial and frequency dependence. Our observations may explain the large variation of reported noise levels in literature.

02.
arXiv (CS.CV) 2026-06-18

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

Weakly supervised segmentation enables model training from plane-level labels. Existing methods often rely on 2D encoders, neglecting the volumetric nature of medical data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context via cross-plane modeling. TranSamba augments a Vision Transformer backbone with Cross-Plane Mamba blocks, leveraging linear-time modeling for efficient information exchange across neighboring planes. This exchange improves in-plane self-attention and subsequent attention maps for object localization. TranSamba maintains linear time complexity and constant space complexity with respect to the input volume depth. Extensive experiments on three datasets covering diverse modalities and pathologies show that TranSamba achieves state-of-the-art performance, demonstrating the generalizable efficacy of cross-plane modeling. Code is available at: https://github.com/YihengLyu/TranSamba.

03.
arXiv (CS.AI) 2026-06-16

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

arXiv:2606.14975v1 Announce Type: cross Abstract: How the wiring and functional organization of cortex shape recurrent computation remains a central question in both neuroscience and machine learning. Here, we leverage data released through the Machine Intelligence from Cortical Networks (MICrONS) program–a functional connectomics resource spanning multiple areas of mouse visual cortex, in which dense calcium imaging is co-registered with high-resolution electron microscopy reconstruction from the same animal–to build biologically grounded recurrent neural networks. Using neuronal spatial coordinates, anatomical connectivity, and function-derived relationships from nearly 12,000 coregistered excitatory neurons, we initialize recurrent weights and impose communication-aware spatial constraints during learning. Across three cognitive decision-making tasks, networks constrained by cortical structure and function consistently outperform baseline and partially constrained models. Functional weight initialization provides the largest gain, while real spatial embedding yields robust additional improvements across conditions. These biologically grounded networks also develop low-entropy, modular, and small-world organization, and retain strong performance even when recurrence is restricted to positive weights. Together, our results show that the machinery of cortex–its geometry, wiring, and functional structure–can be harnessed as a powerful inductive basis for building recurrent networks that learn more effectively while converging toward key organizational principles of biological computation.

04.
arXiv (quant-ph) 2026-06-16

Inflationary branch decoherence and the cosmological arrow of time

作者:

arXiv:2602.21263v3 Announce Type: cross Abstract: We analyze branch decoherence in inflationary quantum cosmology by computing reduced density matrices and branch-overlap factors for long-wavelength perturbations. The Hartle-Hawking no-boundary state is real in the semiclassical regime and contains both expanding and contracting WKB components, whereas the tunneling state is selected as an outgoing complex WKB branch; expanding-contracting decoherence is therefore central for the former and mainly diagnostic for the latter. Using the influence-functional formalism, we derive the noise kernel for a light spectator environment and evaluate decoherence under horizon-based and EFT-motivated coarse grainings. We then compute the single-mode branch overlap directly from the Bunch-Davies mode functions, obtaining $|\mathcal{D}_k(z)|=[z^2/(z^2+1)]^{1/4}$ in the massless limit and $|\mathcal{D}_k(z)|\sim z^\nu$ on superhorizon scales for massive fields, where $z=-k\eta$ is the dimensionless wavenumber with $\eta$ the conformal time. In the massless case, the accumulated geometric branch functional is evaluated in closed form, with a leading cutoff-sensitive phase-space term and a universal subleading contribution. The calculation provides an explicit quantitative bridge between quantum-cosmological boundary conditions, inflationary squeezing, and the emergence of effectively classical cosmological histories.

05.
arXiv (quant-ph) 2026-06-15

Landscape-Similarity-Guided Optimization in Divide-and-Conquer QAOA

arXiv:2602.21689v3 Announce Type: replace Abstract: Divide-and-conquer strategies mitigate hardware constraints for the Quantum Approximate Optimization Algorithm (QAOA) on Noisy Intermediate-Scale Quantum (NISQ) devices by partitioning large interaction graphs into smaller, hardware-compatible sub-problems. However, this approach introduces a severe classical training bottleneck: a decomposition across $m$ boundary nodes generates $2^m$ distinct sub-problems that typically require independent optimization. In this work, we demonstrate that across diverse synthetic and real-world interaction graphs, the variational landscapes of these reduced QAOA instances actually exhibit a robust universality. Adapting the replica-overlap framework of spin-glass physics, we define a landscape-overlap order parameter $q$ to quantify geometric correlations between energy landscapes, revealing a sharp landscape-similarity transition as graph connectivity is tuned. Exploiting this, we introduce Doubly Optimized QAOA (DO-QAOA), an adaptive pipeline that collapses the sub-problems from $2^m$ distinct sub-problems into $K=\mathcal{O}(1)$ effective landscape classes. By performing optimization on a single representative sub-problem and dynamically transferring parameters to remaining sub-problems, DO-QAOA lowers runtime and quantum measurement overhead by orders of magnitude while maintaining a competitive Approximation Ratio Gap (ARG).

06.
arXiv (CS.CV) 2026-06-16

Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions

While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions (e.g., 480P), leaving efficient, scalable, real-time high-resolution video generation a fundamental open challenge. To bridge this gap, we present Ultra Flash, a cascaded streaming framework capable of real-time high-resolution video generation. Ultra Flash achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU through three key contributions: (1) an architecture-preserving T2V-to-TV2V super-resolution training paradigm coupled with an AIGC-oriented data degradation pipeline that effectively preserves the generative capability of the base model, enabling enhanced high-resolution detail when cascaded after mainstream low-resolution generative models; (2) a causal streaming latent upsampler paired with a high-resolution decoder, which enhances spatiotemporal coherence while enabling efficient latent spatial scaling and precise high-resolution decoding with negligible computational overhead; and (3) a cascade high-resolution streaming video generation optimization scheme that first performs hybrid-reward-enhanced sparse causalization and single-step distillation of the super-resolution model, then introduces cascaded streaming self-forcing preference optimization with dynamic cache management, jointly enhancing overall coherence, improving quality, and enabling real-time high-resolution streaming video generation. Extensive experiments demonstrate that Ultra Flash reliably produces ultra-high-resolution streaming video while maintaining state-of-the-art visual quality and superior efficiency. Project Page: https://xin1u.github.io/UltraFlash/

07.
arXiv (CS.LG) 2026-06-16

Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support

arXiv:2606.15940v1 Announce Type: new Abstract: Synthetic and distilled student data are increasingly used to enable privacy-conscious learning analytics, yet their suitability for decision-facing institutional support remains uncertain. In dropout support, generated data must preserve not only predictive utility or distributional resemblance, but also the financial-status evidence used to guide advising, payment-plan assistance, and scholarship-related decisions. Method: This study introduces CaP-Eval, a decision-facing causal-privacy audit workflow for evaluating generated student data under a fixed estimand, timing-aware adjustment design, estimator set, and empirical privacy-governance screen. The workflow compares original, distilled, adversarial synthetic, statistical synthetic, and DPGNet privacy-oriented generated data on predictive utility, treatment-effect fidelity, robustness to alternative estimators, and local training-record proximity. Results: DPGNet and distilled data preserved the original financial-status treatment-effect structure more reliably than the adversarial and Gaussian Copula baselines. DPGNet preserved full direction and rank agreement across epsilon levels; epsilon = 10 produced the smallest non-original IPW and DML deviations, while epsilon = 1 and epsilon = 5 amplified several financial-status contrasts. Distilled data remained highly faithful but retained the strongest local training-record proximity signal. TabularGNet preserved qualitative directions with moderate attenuation, and Gaussian Copula compressed effect magnitudes. Conclusions: Predictive utility, privacy orientation, empirical disclosure signals, and causal fidelity diverged; generated student data require joint audits of direction, magnitude, overlap, and release-governance risk before decision use.

08.
arXiv (CS.AI) 2026-06-16

Learning aligned EEG representations with subject-specific encoders

arXiv:2606.16462v1 Announce Type: cross Abstract: Cross-subject EEG decoding promises more training data, but it also exposes neural networks to strong inter-subject distribution shifts. We study whether task supervision and architecture alone can learn subject-aligned representations. We replace a shared EEG encoder with subject-specific encoders followed by a common classifier, and compare this hybrid model with standard EEGNet, AttentionBaseNet, and CTNet baselines with Euclidean Alignment (EA) on four motor-imagery datasets. EA improves shared encoders by recentering subject covariances, but the hybrid encoder largely internalises this role: validation-loss curves and latent-distance analyses change little when EA is removed. Subject-specific heads increase class distinctiveness and place each subject close to its own latent manifold, improving most subjects while leaving a method-sensitive subset. These results support subject-specific encoders as a learned alignment mechanism for EEG decoding and identify head selection for unseen subjects as the remaining bottleneck.

09.
arXiv (CS.CV) 2026-06-17

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

Autonomous driving has shifted towards end-to-end policy learning, where reliable, interpretable policy evaluation is a fundamental challenge as driving quality is highly context-dependent. Commonly used rule-based driving metrics like EPDMS are interpretable but lack context-awareness, while recent VLMbased evaluations are context-aware but limited by ambiguous VLM outputs and weak physical grounding. To evaluate driving in a manner that is both interpretable and context-aware, we introduce DriveJudge. DriveJudge is a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning and selectively invokes physically-grounded deterministic rule functions after interpreting the environmental context. To train and evaluate DriveJudge, we curate a large-scale dataset of 33,577 challenging driving samples with human annotations on whether the driving behavior is reasonable in the given scenario. With this dataset, we address the underexplored problem of driving metric evaluation, and introduce two human-aligned benchmark tasks: Driving Quality Classification and Trajectory Preference Selection. DriveJudge outperforms EPDMS for driving quality classification by 21.23 AUC, and the recent VLM-based DriveCritic for trajectory preference selection by 6.5%, setting a new standard for interpretable and precise driving evaluation.

10.
PLOS Computational Biology 2026-06-22

Cell-type resolved transcriptional network analysis of <i>in vivo</i> cellular senescence following injury

作者:

by Alda Sabalic, Victoria Moiseeva, Andres Cisneros, Oleg Deryagin, Eusebio Perdiguero, Pura Muñoz-Cánoves, Jordi Garcia-Ojalvo Identifying the genetic correlates of complex phenotypes is a challenging task. Methods coming from the field of complex networks can help finding such molecular patterns, by revealing statistical associations among groups of genes that correlate with the phenotype. Here we study cellular senescence, a complex cell state whose molecular underpinnings are still under active investigation. We analyze cell type–resolved RNA sequencing data obtained from injured muscle tissue in mice, with a network-based approach that merges eigenvector centrality feature selection and community detection. Our analysis identifies genetic markers that had not been associated with senescence so far, which are validated with existing single-cell RNA sequencing data in a different type of tissue. The identified key genes belong to transcriptional pathways associated with established hallmarks of senescence, and thus can be interpreted as molecular correlates of such hallmarks. The method proposed here could be applied to any complex cellular phenotype even when only bulk RNA sequencing is available, provided the data is resolved by cell type.

11.
arXiv (CS.CL) 2026-06-16

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Improving the reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. We ask whether selectively skipping latent iterations can improve accuracy, and reveal significant potential with an oracle iteration policy that boosts performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration, only at tokens likely to be incorrect after the standard forward pass. During latent iterations, depth-aware Low-Rank Adaptation (LoRA) modules shift the objective from general next-token prediction to focused hard-token refinement. A duo-causal attention mechanism extends attention from the token sequence dimension to an additional iteration depth dimension, enabling cross-iteration information flow with full sequential parallelism. Experiments on nine benchmarks show consistent gains across math, QA, and coding tasks. With identical parameter counts, TaH outperforms always-iterate baselines by 3.8-4.4% while skipping iterations on 93% of tokens, and exceeds single-iteration Qwen3 baselines by 3.0-3.8%. When allowing

12.
arXiv (CS.CV) 2026-06-12

On Pitfalls of $RemOve-And-Retrain$: Data Processing Inequality Perspective

The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, cannot add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.

13.
arXiv (CS.CV) 2026-06-18

Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos

Training generalist Vision-Language-Action(VLA) models typically requires massive, diverse robotic datasets with high-fidelity action annotations. While egocentric human manipulation videos are abundant and capture significant environmental diversity, the absence of action labels makes them difficult to use in conventional training paradigms. To address this, we propose a latent-action-based framework designed to extract general action priors from unlabeled human videos. The architecture features a Hybrid Disentangled VQ-VAE that decouples motion dynamics from environmental backgrounds through physical masks, enabling the construction of a cross-embodiment action codebook. By pre-training on human videos with the codebook, the VLM backbone learns deep representations of action intent. For adaptation to specific embodiments, we introduce an intent-perception decoupling strategy where the VLM predicts the action intent while a separate frozen visual encoder provides state-specific features to the action expert, thereby reducing action hallucinations. Results in simulation and real-world environments show that our method, pre-trained exclusively on unlabeled human videos, performs competitively with state-of-the-art VLA models trained on massive annotated datasets, requiring only 50 trajectories for downstream adaptation.

14.
arXiv (CS.CL) 2026-06-12

A Survey on Long-Term Memory Security in LLM Agents: Attacks, Defenses, and Governance Across the Memory Lifecycle

The emergence of writable, cross-session persistent memory in LLM agents introduces a qualitatively different threat landscape from conventional input-centric security concerns, characterized by three properties: persistence, statefulness, and propagation. To systematically characterize this landscape, we propose a Memory Lifecycle Framework that organizes attacks, defenses, and their cross-phase dependencies along two axes: six lifecycle phases (Write, Store, Retrieve, Execute, Share & Propagate, Forget & Rollback) and four security objectives (Integrity, Confidentiality, Availability, Governance). This analysis in turn exposes the need for formal security guarantees at the system level, motivating Verifiable Memory Governance(VMG), a framework of five architectural primitives that specifies what verifiable mechanisms a long-term-memory system must provide to maintain auditable, recoverable control over its memory state. Our analysis indicates that robust Long-Term Memory (LTM) security cannot be retrofitted at retrieval or execution time alone, but must be anchored in storage-time provenance, versioning, and policy-aware retention from the outset.

15.
arXiv (CS.CV) 2026-06-11

Image Quality Assessment of Identity Cards Using Measures from Open Face Image Quality

This paper addresses the challenge of assessing image quality in ID cards in remote verification systems by applying capture-related quality measures from the Open Face Image Quality (OFIQ) standard to ID card images. Our preprocessing pipeline includes corner detection, perspective normalization, and comprehensive foreground masking to ensure accurate and unbiased quality measure computation. We evaluate the effectiveness of these measures by analyzing their correlation with the performance of three presentation attack detection (PAD) algorithms across four diverse ID card datasets, where two datasets contain bona fide, i.e. pristine, images and two contain printed mock ID cards. Our results suggest that quality assessment based on some OFIQ measures can significantly improve PAD performance.

16.
arXiv (CS.AI) 2026-06-16

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

arXiv:2605.22664v3 Announce Type: replace Abstract: LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions. To meet enterprise needs, frontier AI labs have developed agents that can construct entire spreadsheets from scratch. This is especially relevant in finance, where core workflows such as financial modeling, forecasting, and scenario analysis are commonly conducted through spreadsheets. Yet, existing spreadsheet benchmarks do not measure this advanced capability, focusing instead on question-answering or single-formula edits. To address this gap, we provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis. Since deliverables therein are routinely reviewed and revised by multiple stakeholders, judging their quality necessarily involves high-level criteria such as readability or ease of modification. To reflect the multidimensional nature of solution quality, we develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards. The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review, but even the strongest agents frequently fall short of professional finance standards and degrade sharply as the difficulty increases beyond a few chained calculations. This suggests that current agents are not yet able to reliably produce professional-quality spreadsheets at the level of complexity real-world workflows demand.

17.
arXiv (CS.LG) 2026-06-16

QuantKAN: A Unified Quantization Framework for Kolmogorov Arnold Networks

arXiv:2511.18689v3 Announce Type: replace Abstract: Kolmogorov–Arnold Networks (KANs) replace linear weights with spline-based functions, offering strong expressivity but posing challenges for low-precision deployment due to heterogeneous parameter distributions. We introduce QuantKAN, the first unified framework for quantization-aware training (QAT) and post-training quantization (PTQ) of KANs. The framework employs branch-aware quantizers for base and spline parameters and extends modern QAT and PTQ methods to spline-based layers across EfficientKAN, FastKAN, PyKAN, and KAGN. Experiments on MNIST, CIFAR-10/100, TinyImageNet, and ImageNet provide the first unified QAT/PTQ KAN benchmarks and show that DSQ is the most robust QAT method at aggressive low-bit settings, while GPTQ is the strongest PTQ method at moderate precision. Sensitivity analyses reveal architecture-specific failure modes: spline/basis parameters dominate in FastKAN, while base or scaling parameters dominate in EfficientKAN, GRAM, and PyKAN. Vivado HLS estimates on a Xilinx UltraScale+ device further suggest up to 3.32$\times$ throughput and 7.7$\times$ lower estimated dynamic energy per inference under W4A4, exposing a residual basis-evaluation tax that motivates basis-aware microarchitecture. QuantKAN is available at https://github.com/OSU-STARLAB/QuantKAN/.

18.
medRxiv (Medicine) 2026-06-15

Recruitment, Retention Approaches and Community Engagement in the THRIVE pilot Trial: Lessons Learned from a Food is Medicine Trial

Background: Recruitment of underrepresented populations, including Black and Hispanic populations, for Food is Medicine (FIM) and cardiovascular trials, may pose significant challenges. Methods: We implemented a multi-component recruitment approach for the THRIVE (AdapTive personalized dietitian coacHing and messaging with pRoduce prescrIptions to improVE healthy dietary behaviors) pilot trial to engage primarily Black and Hispanic adults in a Food is Medicine for hypertension intervention. The recruitment approaches included community engagement at approximately 40 community events (cultural festivals and neighborhood gatherings); partnerships with 8 community and faith-based service hubs and food distribution sites; recruitment through safety net primary care clinics, digital outreach via the study website, and social media campaigns; and direct recruitment at places of worship. We report lessons learned from the community engagement process, recruitment efficiency, representativeness, and retention outcomes. Results: Within 6 months, the enrollment target was exceeded by 40%, with an accrual index of 1.04. Over 1,000 individuals were reached through the direct-to-community engagement process, while faith-based partnerships engaged about 900 adults. There were 2,673 visits to the study webpage, and social media achieved 12,259 impressions with 399 clicks. About 95% of participants resided within 10 miles of the faith-based recruitment sites. Face-to-face engagement at the food distribution sites within faith-based organizations or community service hubs outperformed digital methods. Faith leader endorsements and follow-up in-person meetings (following unsuccessful email outreach) dramatically increased recruitment. Regarding retention, pre-randomization attrition was 6%, and 82% of participants completed the study. Conclusion: Culturally tailored, community-engaged recruitment grounded in faith-based and local community partnerships, was highly effective in engaging Black and Hispanic populations in this FIM cardiovascular trial. This provides a replicable model for implementing equitable and sustainable cardiovascular health interventions.

19.
arXiv (CS.LG) 2026-06-19

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

arXiv:2605.05481v2 Announce Type: replace Abstract: We revisit a classic "chicken-and-egg" problem in reinforcement learning: to safely improve a policy, the value function must be accurate on the state-visitation distribution of the updated policy. That distribution over states is unknown and cannot be sampled for the purposes of training the value function. Conservative updates solve this problem, but at the cost of shrinking the policy update. This paper explores an alternative solution, Approximate Next Policy Sampling (ANPS), which addresses the problem by modifying the training distribution rather than constraining the policy update. ANPS is satisfied if the distribution of the training data approximates that of the next policy. To demonstrate the feasibility and efficacy of ANPS, we introduce Stable Value Approximate Policy Iteration (SV-API). SV-API modifies the standard approximate policy iteration loop to hold the target policy fixed while an iteratively updated behavioral policy gathers relevant experience. It only commits to a new policy once a convergence criterion has been met. If certain stability criteria are met, the update is guaranteed to be safe; otherwise, it remains no less safe than standard approximate policy iteration. Applying SV-API to PPO yields Stable Value PPO (SV-PPO), which matches or improves performance on high-dimensional discrete (Atari) and continuous control benchmarks while executing substantially larger target policy updates. These results demonstrate the viability of ANPS as a new solution to this classic challenge in RL.

20.
arXiv (CS.CL) 2026-06-16

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning

Post-training LLMs with Reinforcement Learning, specifically Group Relative Policy Optimization (GRPO), has emerged as a paradigm for enhancing mathematical reasoning. However, standard GRPO relies on scalar correctness rewards that are often non-injective with respect to semantic content: distinct reasoning paths receive identical rewards. This leads to a Diversity-Quality Inconsistency, where the policy collapses into a narrow set of dominant modes while ignoring equally valid but structurally novel strategies. To bridge this gap, we propose Diversity-aware Reward Adjustment (DRA), a theoretically grounded framework that calibrates the reward signal using the semantic density of sampled groups. By leveraging Submodular Mutual Information (SMI), DRA implements an Inverse Propensity Scoring (IPS) mechanism that effectively de-biases the gradient estimation. This creates a repulsive force against redundancy, driving the policy to achieve better coverage of the high-reward landscape. Our method is plug-and-play and integrates seamlessly with GRPO variants. Empirical evaluations on five math benchmarks demonstrate that DRA-GRPO consistently outperforms strong baselines, achieving an average accuracy of 58.2% on DeepSeek-R1-Distill-Qwen-1.5B with only 7,000 training samples and $55 cost, highlighting the critical role of diversity calibration in data-efficient alignment. The code is available at https://github.com/xiwenc1/DRA-GRPO.

21.
arXiv (CS.LG) 2026-06-18

Generative models for decision-making under distributional shift

arXiv:2604.04342v2 Announce Type: replace Abstract: Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

23.
arXiv (quant-ph) 2026-06-15

Electromagnetic Wightman functions and vacuum densities for a brane intersecting the AdS boundary

arXiv:2604.17583v2 Announce Type: replace-cross Abstract: We investigate the combined effects of a brane intersecting the AdS boundary and background gravitational field on the local characteristics of the electromagnetic vacuum. Two types of boundary conditions on the brane are considered, which are higher-dimensional generalizations of the perfect electric (PEC) and perfect magnetic (PMC) boundary conditions in Maxwell's electrodynamics. The brane-induced contributions to the Wightman functions of the vector potential and field tensor are explicitly extracted. Simple expressions in terms of elementary functions are provided. The behavior of the vacuum expectation values (VEVs) is mimicked by a scalar field with a negative effective mass squared determined by the radius of the AdS spacetime. The expectation values of the electric and magnetic fields squares and of the energy-momentum tensor are investigated as local characteristics of the vacuum state. The brane-induced contributions to these VEVs have opposite signs for the PEC and PMC conditions. For the PMC condition, this contribution is negative for the electric field squared and positive for the magnetic field squared. The VEV of the energy-momentum tensor has a nonzero off-diagonal component. The brane-induced vacuum energy density is positive for PMC condition, whereas the normal and parallel stresses change sign as functions of the distance from the brane. Unlike the problem involving a planar boundary in the Minkowski bulk, the vacuum energy-momentum tensor does not vanish in (3+1)-dimensional AdS spacetime.

24.
arXiv (CS.AI) 2026-06-18

How Well Do Large Language Models Capture Human Personality?

arXiv:2606.18263v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to simulate human populations via persona prompting, often under the assumptions that richer persona descriptions improve behavioral fidelity, similarly sized attribute combinations are equally simulatable, and persona definitions generalize across tasks. In this work, we formalize these assumptions and systematically evaluate them across multiple architectures, scales, and simulation settings. We identify a fundamental limitation we term persona manifold collapse, where increasingly expressive persona specifications lead to systematic contraction of representational and behavioral diversity. Across models, increasing persona complexity consistently reduces inter-persona separation in latent space and weakens behavioral differentiation in downstream simulation tasks. These effects persist across multiple analyses as richer personas fail to preserve human subgroup disagreement, performance varies across attribute combinations of similar size, and adding descriptive detail often degrades rather than improves simulation fidelity. Surprisingly, simple Age-Gender personas consistently outperform richly specified Ideal Customer Profiles (ICPs) across industries, achieving substantially higher downstream prediction accuracy. We find that collapse is not uniform across attributes. Certain combinations remain behaviorally stable and preserve stronger alignment with human responses, forming localized regions we term alignment bridges. Together, our results provide empirical and conceptual foundations for understanding the limits of persona-conditioned simulation, highlighting the need for representation-aware persona construction rather than increasing persona expressivity alone.

25.
medRxiv (Medicine) 2026-06-15

Modelling the public-health impact of indoor air quality interventions on respiratory virus transmission

Respiratory virus transmission occurs in indoor settings where ventilation, occupancy, and dwell time determine exposure levels. Improving indoor air quality (IAQ) therefore could help reduce disease burden associated with respiratory viruses, yet its population-level impact remains poorly quantified. Here, we develop an individual-based transmission modelling framework that links within-location airborne dynamics to individual infection risk and population-level spread, whilst explicitly incorporating heterogeneity in ventilation and baseline indoor air quality across locations. We use this modelling approach to evaluate IAQ-improving interventions (air-quality interventions or AQIs), using hypothetical endemic and pandemic pathogen archetypes with properties similar to SARS-CoV-2 and influenza, and evaluate how effects on key epidemiological metrics (such as annualized incidence and epidemic final size) depend on AQI coverage, efficacy and allocation strategy. At 20% AQI intervention coverage and 80% efficacy, annualized incidence was reduced by approximately 7.2% for an endemic 'SARS-CoV-2-like' respiratory virus, and 17.0% for an endemic 'influenza-like' virus; at 60% coverage (80% efficacy) the reductions were 26.3% and 56.4%, respectively. Targeting AQI installation to the highest-risk locations outperformed random allocation: for SARS-CoV-2-like transmission, 20% coverage at 80% efficacy cut absolute incidence by 10.8% when targeted versus 7.2% when random; for influenza-like transmission, this comparison was 28.9% versus 17.0%. In epidemic scenarios, random installation at 40% coverage and 60% efficacy reduced final size by 23.7% (influenza-like) versus 6.3% (SARS-CoV-2-like). These results support treating clean indoor air as core public-health infrastructure and prioritising risk-based deployment of IAQ-improving interventions to maximise population-level benefit within budgetary and operational constraints.