Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Measuring Non-Stabilizerness in an SU(2) Lattice Gauge Theory

arXiv:2606.14842v1 Announce Type: new Abstract: One of the goals of quantum simulation is to provide novel insights into quantum systems, such as the gauge theories that are relevant for high-energy and nuclear physics. Recent years have seen rapid improvements in both the hardware and software necessary for these simulations. A central consideration in the design of such simulations is the quantum complexity of a given quantum state. This work takes a step towards studying a specific kind of complexity, namely the non-stabilizerness, in a simple yet non-trivial system: SU(2) lattice gauge theory of two plaquettes. The non-stabilizerness of low-energy eigenstates is studied and the implications for quantum simulations are discussed. The real-time evolution of this system is simulated on ibm_marrakesh and the non-stabilizerness is measured using a random measurement protocol. New techniques enhancing the efficiency of this protocol are developed, including both a new way to calculate the estimator for non-stabilizerness and a flexible error mitigation technique called Bit String Decoherence Renormalization. This mitigation method is central to accurately resolving the experimental time dependence of non-stabilizerness, and is anticipated to have broad applicability in digital quantum simulations.

02.
arXiv (CS.CV) 2026-06-16

SAMTok: Representing Any Mask with Two Words

Pixel-wise capabilities are essential for building interactive intelligent systems. However, pixel-wise multi-modal LLMs (MLLMs) remain difficult to scale due to complex region-level encoders, specialized segmentation decoders, and incompatible training objectives. To address these challenges, we present SAMTok, a discrete mask tokenizer that converts any region mask into two special tokens and reconstructs the mask using these tokens with high fidelity. By treating masks as new language tokens, SAMTok enables base MLLMs (such as the QwenVL series) to learn pixel-wise capabilities through standard next-token prediction and simple reinforcement learning, without architectural modifications and specialized loss design. SAMTok builds on SAM2 and is trained on 209M diverse masks using a mask encoder and residual vector quantizer to produce discrete, compact, and information-rich tokens. With 5M SAMTok-formatted mask understanding and generation data samples, QwenVL-SAMTok attains state-of-the-art or comparable results on region captioning, region VQA, grounded conversation, referring segmentation, scene graph parsing, and multi-round interactive segmentation. We further introduce a textual answer-matching reward that enables efficient reinforcement learning for mask generation, delivering substantial improvements on GRES and GCG benchmarks. Our results demonstrate a scalable and straightforward paradigm for equipping MLLMs with strong pixel-wise capabilities. Our code and models are available.

03.
medRxiv (Medicine) 2026-06-10

Resolving Diagnostic Discordance in Group 2 Pulmonary Hypertension Through Staged Physiologic Testing: Insights From PVDOMICS

Background World Symposium on Pulmonary Hypertension (WSPH) Group 2 pulmonary hypertension (PH) is a clinically integrated phenotype attributed to left heart disease, whereas pre- versus post-capillary classification is operationalized primarily by pulmonary capillary wedge pressure (PCWP). Although current recommendations emphasize contextual interpretation and provocative testing for intermediate PCWP values, the relationship between PCWP-based classification and underlying phenotype has not been systematically evaluated. We aim to quantify phenotype-hemodynamic discordance across the PCWP spectrum and evaluate a staged physiology-guided framework incorporating inhaled nitric oxide (iNO), ventricular geometry, and provocative testing. Methods We studied 1,032 participants from the NHLBI-sponsored PVDOMICS cohort with multidisciplinary adjudicated phenotypes integrating clinical, imaging, physiologic, and hemodynamic data. Stage-specific PCWP thresholds classified pre- versus post-capillary physiology at rest, during iNO, and during provocation (fluid challenge or invasive cardiopulmonary exercise testing [iCPET]). Echocardiographic right ventricular-to-left ventricular (RV/LV) ratio was evaluated as a marker of ventricular interdependence. Restricted cubic spline and staged concordance analyses defined certainty-based PCWP ranges and incremental diagnostic yield. Results Adjudicated Group 2 phenotype was present in 37.0% of participants. Resting PCWP demonstrated good discrimination (AUC 0.86), but substantial bidirectional phenotype-hemodynamic discordance persisted across intermediate PCWP ranges. At a resting PCWP of 12 mmHg, 25% of participants classified as pre-capillary had adjudicated Group 2 PH, whereas at 18 mmHg, 35% classified as post-capillary remained discordant non-Group 2. Concordance did not approach 90% until PCWP values were 24 mmHg. Dynamic testing incrementally improved concordance within these overlap zones. Nearly half of adjudicated Group 2 PH participants (46.5%) were not identified by resting PCWP alone; incorporation of iNO and provocative testing increased cumulative Group 2 identification by 63.4% and improved sensitivity from 79.9% to 83.7%. Model discrimination improved from an AUC of 0.863 to 0.908 (likelihood-ratio P

04.
arXiv (quant-ph) 2026-06-11

Measurement incompatibility and quantum steering via linear programming

arXiv:2506.03045v3 Announce Type: replace Abstract: The problem of deciding whether a set of quantum measurements is jointly measurable is known to be equivalent to determining whether a quantum assemblage is unsteerable. This problem can be formulated as a semidefinite program (SDP). However, the number of variables and constraints in such a formulation grows exponentially with the number of measurements, rendering it intractable for large measurement sets. In this work, we circumvent this problem by transforming the SDP into a hierarchy of linear programs that compute upper and lower bounds on the incompatibility robustness with a complexity that grows polynomially in the number of measurements. The hierarchy is guaranteed to converge and it can be applied to arbitrary measurements – including non-projective POVMs (Positive Operator-Valued Measures) – in arbitrary dimensions. While convergence becomes impractical in high dimensions, in the case of qubits our method reliably provides accurate upper and lower bounds for the incompatibility robustness of sets with several hundred measurements in a short time using a standard laptop. We also apply our methods to qutrits, obtaining non-trivial upper and lower bounds in scenarios that are otherwise intractable using the standard SDP approach, although such bounds are significantly looser than the ones obtained in the qubit case. Finally, we show how our methods can be used to construct local hidden state models for states (i.e., to prove that a state cannot lead to steering under any possible local measurements), or conversely, to certify that a given state exhibits steering; for two-qubit quantum states, our approach is comparable to, and in some cases outperforms, the current best methods.

05.
arXiv (CS.AI) 2026-06-15

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

arXiv:2602.05670v2 Announce Type: replace-cross Abstract: Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework designed to capture high-order relations associated with synergistic patterns through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments on 13 test sets show that HyperPotter improves over the baseline on 11 sets, yielding an average relative EER reduction of 12.68\% across all test sets and 22.15\% on the improved sets. These results demonstrate strong cross-scenario generalization, while also revealing robustness limits under severe codec or channel distortion.

06.
arXiv (CS.CV) 2026-06-12

Measurement Plasticity: Sensor-Level Adaptation for Vision-Language Models

We propose Multi-View Physical-prompt (MVP) for Test-Time Adaptation (TTA), a forward-only framework that moves TTA from tokens to photons by treating the camera exposure triangle (i.e., ISO, shutter speed, and aperture) as physical prompts. At inference, MVP acquires selected multiple physical views using a source-affinity score, evaluates digitally augmented variants of each retained view and filters the lowest-entropy predictions, and aggregates predictions with hard voting. This selection-then-vote design is simple, calibration-friendly, and requires no gradients or model modifications. On ImageNet-ES and ImageNet-ES-Diverse, MVP outperforms digital-only TTA on both Auto-Exposure and a combination with conventional sensor control. MVP remains effective under reduced parameter candidates that lower capture latency, demonstrating its practicality.

08.
arXiv (CS.CV) 2026-06-19

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

09.
arXiv (CS.AI) 2026-06-18

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv:2606.18304v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

10.
arXiv (quant-ph) 2026-06-19

Robust Generation of Topological Biphoton Mode via Adiabatic Passage

arXiv:2606.19786v1 Announce Type: new Abstract: Topological waveguide arrays support robust mode propagation in the presence of fabrication imperfections, providing a significant advantage for on-chip quantum information processing. However, this robustness does not fully extend to nonlinear biphoton generation. Structural disorder can enhance the excitation of non-topological biphoton modes during nonlinear interactions, which degrades the quantum properties of the generated state. To overcome this limitation, we propose an adiabatic passage that connects an isolated site to a topological defect array. By initiating the nonlinear process in a strongly isolated regime, nonlinear coupling to unwanted modes is effectively suppressed, thereby preserving the Schmidt number of the generated state. The subsequent adiabatic connection facilitates the high fidelity transfer of the generated biphoton into the topological biphoton mode. Our numerical simulations demonstrate that, unlike conventional topological structures, the adiabatic scheme maintains both high biphoton fidelity and a unit Schmidt number in the presence of waveguide gap disorder. Furthermore, we show that this robustness extends to path entangled NOON states, achieving a near-unity quantum interference visibility. Our approach provides a practical design strategy for disorder-tolerant integrated quantum photonic devices.

13.
arXiv (quant-ph) 2026-06-15

Trap-Quenched Matter-Wave Optics for Dual Species Lensing

arXiv:2606.14577v1 Announce Type: cross Abstract: Dual-species atom interferometry in space promises precise tests of the Universality of Free Fall (UFF), with a sensitivity that grows quadratically with the extended interrogation time accessible in weightlessness. These tests demand exquisite control over the expansion energies of both condensed sources as well as over their differential center-of-mass dynamics. We propose a trap-quenched collimation technique featuring in-trap excitations of collective modes compatible with state-of-the-art atom-chip setups. Using NASA's Cold Atom Laboratory aboard the International Space Station, we demonstrate it on a single-species $^{87}$Rb condensate. By controlling the center-of-mass release dynamics we observe free expansion times up to 700 ms and measure a two-dimensional expansion energy of $k_B \cdot 78\pm 9 \;\mathrm{pK}$ in the imaging plane. A detailed model of the magnetically-induced dynamics indicates that this corresponds to a two-dimensional expansion energy of about $k_B \cdot 15^{+12}_{-5}\; \mathrm{pK}$ along two of the condensate's eigenaxes. Finally, we theoretically study this trap-quenched collimation scheme for a $^{41}$K-$^{87}$Rb mixture, predicting a simultaneous collimation that meets the expansion energy requirements for a state-of-the-art UFF test at the $10^{-15}$ accuracy level.

14.
arXiv (quant-ph) 2026-06-15

Quantum sensing through bosonic-fermionic Bell-state transitions in two-photon interference

arXiv:2606.14408v1 Announce Type: new Abstract: Hong-Ou-Mandel (HOM) interference has become a central resource for quantum sensing and metrology owing to its sensitivity to temporal delay and photon indistinguishability. However, existing HOM-based sensing schemes generally rely on inserting a sample into one arm of the interferometer, making the measurement vulnerable to optical loss, alignment instability, and bandwidth-dependent distortion of the interference profile. Here, we demonstrate a symmetry-controlled quantum sensing scheme based on continuous transitions between symmetric (bosonic-like) and antisymmetric (fermionic-like) Bell states in two-photon interference. By imprinting a geometric phase onto the classical pump beam and transferring it to polarization-entangled photons generated via spontaneous parametric down-conversion, we coherently tune the exchange symmetry of the entangled state without altering the temporal or spectral indistinguishability of the photons. The HOM response evolves continuously from bunching to antibunching with a sine square phase dependence, producing a coincidence modulation of approximately 10 * 10^4 counts s^-1 counts/s. In contrast to conventional HOM sensing, the phase-modulation linewidth remains fixed at pi/2, independent of photon bandwidth. Using a birefringent crystal placed directly in the pump beam, we measure thermo-dispersive birefringence with a resolution of the order of 10^{-6} over a broad temperature range. Our results establish exchange symmetry as a controllable resource for robust quantum sensing and symmetry-engineered photonic quantum information processing.

15.
arXiv (CS.CL) 2026-06-17

findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Syllable-level units offer compact and linguistically meaningful representations for spoken language modeling and unsupervised word discovery, but research on syllabification remains fragmented across disparate implementations, datasets, and evaluation protocols. We introduce findsylls, a modular, language-agnostic toolkit that unifies classical syllable detectors and end-to-end syllabifiers under a common interface for syllable segmentation, embedding extraction, and multi-granular evaluation. The toolkit implements and standardizes widely used methods (e.g., Sylber, VG-HuBERT) and allows their components to be recombined, enabling controlled comparisons of representations, algorithms, and token rates. We demonstrate findsylls on English and Spanish corpora and on new hand-annotated data from Kono, an underdocumented Central Mande language, illustrating how a single framework can support reproducible syllable-level experiments across both high-resource and under-resourced settings.

16.
arXiv (CS.CV) 2026-06-11

Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation

作者:

PCA can be used for rotation invariant features, describing a shape with its $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ covariance matrix approximating shape by ellipsoid, allowing for rotation invariants like its traces of powers. However, real shapes are usually much more complicated, hence there is proposed its extension to e.g. $p_{abc}=E[(x_a-E[x_a])(x_b-E[x_b])(x_c-E[x_c])]$ order-3 or higher tensors describing central moments, or polynomial times Gaussian allowing decodable shape descriptors of arbitrarily high accuracy, and their analogous rotation invariants. Its practical applications could be rotation-invariant features to include shape modulo rotation e.g. for molecular shape descriptors, or for up to rotation object recognition in 2D images/3D scans maybe also for 3D scene understanding, or shape similarity metric allowing inexpensive comparison of objects modulo rotation avoiding costly optimization over rotations.

17.
arXiv (CS.CV) 2026-06-16

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary task-level supervision, but applying RL to real-world autonomous driving is challenging in offline settings without interactive simulators, where datasets are dominated by expert actions and provide limited behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples imitation and reward optimization into separate actors to alleviate objective conflicts, uses imagined future rollouts for long-horizon reward estimation, and introduces a competition mechanism that selectively transfers beneficial behaviors while keeping RL anchored to expert-like driving. Experiments on the nuScenes benchmark show that CoIRL-AD consistently improves robustness over strong IL-based baselines, with especially large gains in cross-city generalization and long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.

18.
arXiv (CS.CL) 2026-06-16

Enhancing LLM Safety Through a Theoretical Minimax Game Lens

The rapid advancement of large language models (LLMs) necessitates effective mechanisms to ensure their responsible deployment by accurately distinguishing unsafe content from benign content. While substantial safety datasets are available in English, multilingual safety modeling remains underexplored due to limited open-source safety datasets in other languages. Even within English datasets, safe yet sensitive corner-case content is scarce, leading to shortcut learning by models and non-trivial false-positive rates. To mitigate these issues, we introduce a novel minimax reinforcement learning (RL) framework wherein a data generator and a classifier model co-evolve, facilitating the production of high-quality synthetic multilingual safety data. We theoretically formalize this interaction as a minimax game and rigorously demonstrate convergence to a Nash equilibrium. Empirical evaluations confirm that our synthetic data generation method significantly enhances the classifier model performance, enabling a substantially smaller model to surpass the state-of-the-art by nearly 10% on English benchmarks while achieving 4.5x faster inference speed. These results establish a scalable and efficient methodology for synthetic data generation, advancing the development of safer and more robust multilingual LLM deployments.

19.
arXiv (CS.AI) 2026-06-18

AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces

arXiv:2606.19152v1 Announce Type: cross Abstract: Identifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, yet exhaustive exploration with ab initio calculations is computationally prohibitive. Machine-learning force fields (MLFFs) accelerate structural relaxation but leave the search over the vast configurational space a major bottleneck, and open-loop large language model (LLM) agents lack a physics-grounded feedback mechanism to correct erroneous initial guesses. We propose AdsMind (Adsorption configuration discovery with Machine intelligence and relaxation feedback), a closed-loop multi-agent framework that enables autonomous error correction through MLFF relaxation feedback. Across four LLM backends, AdsMind achieves consistently high search reliability, with success rates of 100% and 98.8% on the benchmarks AA20 and OCD-GMAE62. Relative to its single-pass (1-Shot) ablation it reduces cross-backend energy dispersion, and it uses only 4.11 and 4.67 MLFF relaxations per case, respectively – an approximately 14-fold reduction over heuristic enumeration baselines. Density functional theory (DFT) validation using VASP/PBE on six representative AA20 systems shows that the reported open-loop Adsorb-Agent outputs exhibit qualitative adsorption-energy sign errors for molecular adsorbates, whereas AdsMind preserves the correct sign in all tested cases with closer quantitative agreement. AdsMind thus delivers reliability, self-reflection, and interpretability simultaneously, supporting more DFT-informed autonomous chemistry workflows.

20.
arXiv (CS.AI) 2026-06-19

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

arXiv:2606.19376v1 Announce Type: cross Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

21.
arXiv (CS.CL) 2026-06-19

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

Online hate speech and misinformation frequently overlap, yet NLP research has mainly treated them in isolation. While LLMs represent a scalable solution for assisting humans in the generation of counterspeech for both threats, zero-shot models frequently generate repetitive and vague responses, underscoring the need for high-quality examples to steer model generation. However, existing counterspeech datasets against the overlap of hate and misinformation are scarce and limited to single-turn English dialogues, while real-life interactions span across multiple turns and languages. To bridge this gap, we introduce the first large-scale, expert-curated, multilingual dataset of dialogues tackling the intersection of hate and misinformation. To ensure factual grounding, the dialogues are also anchored in verified external knowledge (i.e., fact-checking articles and NGO reports) and include document- and chunk-level span annotations, making it directly applicable for RAG systems. Covering five languages and targeting hate directed at seven marginalized groups, this novel resource enables the training and evaluation of more persuasive, factually grounded counterspeech models.

22.
arXiv (CS.AI) 2026-06-17

Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning

arXiv:2606.17591v1 Announce Type: new Abstract: Training-free verbal reinforcement learning enables LLM agents to learn from world feedback – objective signals such as dynamic task outcomes, market returns, or demand forecasts – by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining stale insights causes negative transfer, while discarding them causes catastrophic forgetting when conditions recur. We identify four requirements for navigating this dilemma – outcome-driven evaluation, persistent structured evidence, non-monotonic knowledge lifecycle, and compositional governance – and show that existing methods invest heavily in experience extraction while underinvesting in insight governance. We propose a three-layer architecture – rules, evidence, and skills – connected by a feedback-driven curation loop that closes the governance gap. Rules capture distilled experience from world outcomes; evidence logs track each rule's reliability across episodes; skills govern which rules to apply, how to resolve conflicts, and when to abstain. On financial forecasting as a case study, where world feedback is naturally abundant, noisy, and non-stationary, we show that the same accumulated experience either degrades performance below the zero-shot baseline or dramatically improves accuracy and risk-adjusted returns, depending on whether the curation loop is present.

23.
arXiv (CS.CV) 2026-06-17

Attention Sinks in Diffusion Transformers: A Causal Analysis

Attention sinks – tokens that receive disproportionate attention mass – are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, training-free interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless sink-specific – $\sim\!6\times$ larger than equal-budget random masking – revealing an empirical dissociation between trajectory-level perturbation and semantic alignment in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

24.
arXiv (CS.AI) 2026-06-17

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

arXiv:2606.17657v1 Announce Type: new Abstract: People make decisions differently in strategic interactions. Some update beliefs like a Bayesian; others exhibit biases like motivated reasoning. Although creators of large language models use simulated humans for safety evaluations and training, they often fail to cover this breadth of human behavior. We argue that cognitive science and economics provide a convenient tool for doing so, making use of mathematical models of human decision-making. We propose an approach that we call Equation-to-Behavior Prompting for guiding large language models to match cognitive models, and evaluate this approach on persuasion games based on legal decision-making. We find that large models can approximate equation-based specifications – Bayesian updating, affine distortion, motivated updating, and Grether's $\alpha$-$\beta$ model – using prompting, but small models fail to do so. However, training small models with reinforcement learning to adhere to mathematical rules, Equation-to-Behavior RL, reduces belief error by 26.5% in out-of-distribution parameterizations. We show that these simulations can help create diverse training environments; training small models to consider different kinds of decision-makers improves average belief change by 2.5%–12% over Bayesian-only training, even when persuading GPT-5-mini. Our work could improve human simulations for training and evaluation in increasingly realistic settings, and could also enable novel research into more complicated mathematical models of human decision-making.

25.
medRxiv (Medicine) 2026-06-16

Diurnal variation in brain-derived tau and five other blood-based biomarkers for dementia and their association with cognitive performance

Blood-based biomarkers of dementia are a promising scalable tool for early diagnosis, tracking disease progression, and evaluating therapeutic efficacy. Utility of these biomarkers will not only be dependent on the reliability of their association with pathology but also contingent on their ability to track cognitive status. Previously, we demonstrated diurnal variation in several biomarkers (amyloid beta (A{beta}) 42 and 40, 42/40 ratio, glial fibrillary acidic protein (GFAP), neurofilament light (NfL), and phosphorylated-Tau 217 (p-Tau217)) which has implications for their reliability. Here, we extend these observations to a larger cohort, include brain-derived tau (BD-Tau), which is assumed to be produced exclusively in the brain, and report endocrine measures of circadian rhythmicity. We not only assessed whether these biomarkers vary with time of day, but also whether they associate with daytime function and whether these associations vary with cognitive domain and number of repeated assessments. Data collected in 20 PLWA (72.4{+/-}5.9 years, mean{+/-}SD) and 19 controls (68.9{+/-}9.8 years) were analysed. Participants completed 14 days of home monitoring and one laboratory assessment of sleep and daytime function: mood, daytime sleepiness, reaction time, immediate and delayed memory recall, everyday memory errors. During the 27-hour residential laboratory session, 3-hourly blood samples were collected and analysed for the six blood-based biomarkers of dementia as well as melatonin and cortisol. Rhythmicity of melatonin and cortisol did not differ between groups. P-Tau217 and GFAP (p