Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-17

PACE-RAG: Patient-Aware Contextual and Evidence-Constrained RAG for Clinical Drug Recommendation

Drug recommendation requires a deep understanding of individual patient context, especially for complex conditions like Parkinson's disease. While LLMs possess broad medical knowledge, they fail to capture the subtle nuances of actual prescribing patterns. Existing RAG methods also struggle with these complexities because guideline-based retrieval remains too generic and similar-patient retrieval often replicates majority patterns without accounting for the unique clinical nuances of individual patients. To bridge this gap, we propose PACE-RAG (Patient-Aware Contextual and Evidence-Constrained RAG). Rather than directly copying frequent medications from retrieved patients, PACE-RAG personalizes recommendations by first extracting patient-specific clinical features, retrieving cases around these features, and then refining the final prescription using the patient's current symptoms, active medication history, and focus-specific prescribing tendencies. By analyzing treatment patterns tailored to specific clinical features, PACE-RAG generates patient-specific medication recommendations along with an explainable clinical summary. Evaluated on a Parkinson's cohort and the MIMIC-IV benchmark using Llama-3.1-8B and Qwen3-8B, PACE-RAG achieved state-of-the-art performance, reaching F1 scores of 80.84% and 47.22%, respectively. These results suggest that PACE-RAG is a robust and clinically grounded framework for personalized decision support. Our code is available at: https://github.com/ChaeYoungHuh/PACE-RAG.

02.
arXiv (CS.LG) 2026-06-18

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

arXiv:2606.19101v1 Announce Type: cross Abstract: Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

03.
arXiv (CS.CL) 2026-06-12

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

We present M\"OVE (Modelle für die \"Offentliche Verwaltung Evaluieren), a holistic benchmark for evaluating large language models (LLMs) in the context of the German public sector. While LLMs are increasingly adopted in public administration, model selection remains largely ad hoc, and existing benchmarks offer limited guidance: they are predominantly English-centric, US-centric in content, and focus exclusively on task performance. M\"OVE addresses these gaps by evaluating 39 models across two complementary dimensions. Performance criteria cover summarization, question answering, and topic extraction. Governance criteria assess hallucination tendencies, energy consumption, provider transparency, and alignment with German constitutional values and knowledge about positions by German political parties. In total, we utilize ten German-language datasets, including gold- and silverstandard datasets that we constructed to reflect public-administration domains. We employ a multi-metric evaluation strategy combining classical NLP metrics, embedding-based methods, and LLM-as-a-judge approaches. Our results show that no single model dominates across all criteria: top performers differ between tasks, and model size alone is a poor predictor of quality. We further evaluate the benchmark itself, analyzing its statistical precision, LLM judge reliability, the impact of our private datasets on model rankings, the sensitivity of our results to prompt formulation, and the validity of our energy consumption estimates. M\"OVE is designed as a living benchmark under active development; results are publicly available at https://moeve.bundesdruckerei.de/.

04.
arXiv (CS.CL) 2026-06-12

Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL

When a user reveals task-critical information across several conversation turns, LLM accuracy drops by up to 65% despite full context availability. We show that this Lost in Conversation degradation can be substantially mitigated by training models to maintain a compact rolling memory instead of attending to a growing history. To make such training scalable, we introduce a low-cost sharding pipeline that converts single-turn QA datasets into multi-turn fragmented-information episodes, eliminating the need for hours of manual annotation. Training only on sharded GSM8K, our memory-augmented policy significantly improves multi-turn accuracy and generalises zero-shot to harder math and out-of-domain long-context QA. Moreover, memory-trained models outperform full-history baselines even when given the full history at test time, suggesting that learning to compress induces more robust incremental reasoning than full-context exposure alone.

05.
arXiv (CS.AI) 2026-06-18

AI-Driven Assessment of Human Tutors: Linking Training Performance to Real-Life Practice

arXiv:2606.18617v1 Announce Type: cross Abstract: There exist numerous tutor training platforms. However, few provide AI-driven training and evaluation for human tutors based on real-life performance. We present an AI-driven system that assesses both open responses during training and authentic real-life tutoring. Unlike platforms that only assess learning through online training or simulations, our system utilizes Generative AI (Gemini-2.5-pro) to analyze transcriptions of authentic tutoring, measuring the transfer of tutor skills to real-life application. Human tutors instructing students remotely in math (N=86) completed six scenario-based lessons, averaging a significant 7.4% learning gain. Using mixed-effects models across 405 session-to-lesson pairs, we found that training performance significantly predicted real-life transcript scores with an effect size of 0.25 SD. Model comparison (AIC/BIC) indicated averaging open response and multiple choice performance during training predicted real-life tutor performance best, although open responses were comparatively more predictive. Exploratory analysis showed that after training, tutors were significantly more likely to encounter pedagogical opportunities to apply their skills (61.1% to 68.9%) and demonstrated higher execution quality within those opportunities (65.5% to 68.1%). Interrupted time series analysis suggested that these tutor improvements were part of a gradual trend over time rather than an immediate intervention effect of training. We illustrate an AI-driven method to link tutor training with real-life assessment. In doing so, we contribute open datasets, AI prompts, and scoring rubrics to support transparency and reproducibility.

06.
arXiv (CS.LG) 2026-06-18

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

arXiv:2606.18438v1 Announce Type: cross Abstract: In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

07.
arXiv (CS.LG) 2026-06-18

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

arXiv:2606.19092v1 Announce Type: cross Abstract: Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

08.
arXiv (CS.CV) 2026-06-18

Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity

The unstructured and irregular nature of points poses a significant challenge for accurate point cloud quality assessment (PCQA), particularly in establishing accurate perceptual feature correspondence. To tackle this, we propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM). Unlike traditional point-to-point matching, MS-ISSM utilizes radial basis function (RBF) to represent local features continuously, transforming distortion measurement into a comparison of implicit function coefficients. This approach effectively circumvents matching errors inherent in irregular data. Additionally, we propose a ResGrouped-MLP quality assessment network, which robustly maps multi-scale feature differences to perceptual scores. The network architecture departs from traditional flat multi-layer perceptron (MLP) by adopting a grouped encoding strategy integrated with residual blocks and channel-wise attention mechanisms. This hierarchical design allows the model to preserve the distinct physical semantics of luma, chroma, and geometry while adaptively focusing on the most salient distortion features across High, Medium, and Low scales. Experimental results on multiple benchmarks demonstrate that MS-ISSM outperforms state-of-the-art metrics in both reliability and generalization. The source code is available at: https://github.com/ZhangChen2022/MS-ISSM.

09.
arXiv (CS.AI) 2026-06-11

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

arXiv:2606.12146v1 Announce Type: cross Abstract: Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along each axis or empirically mix frequencies, which limits cross-dimensional interactions and yields direction-dependent representations. To address these limitations, we propose nD-RoPE, a decomposition-free generalization of RoPE to arbitrary dimensions. From a translation-invariant formulation in continuous Hilbert space, we derive a spectral condition for isotropy that requires treating positions and frequencies as coupled \(n\)-dimensional vectors. We instantiate this formulation with a multi-scale regular-simplex wave-vector design, which provides non-degenerate spatial coverage and a symmetric, directionally balanced second-order response. Experiments across images, videos, and point clouds demonstrate consistent performance gains and improved generalization in high-dimensional settings.

10.
arXiv (CS.AI) 2026-06-17

Confusion-Aware Transfer Teacher Curriculum Learning Framework: Disentangling Scoring and Pacing Effects

arXiv:2606.17706v1 Announce Type: cross Abstract: Curriculum learning couples two design choices, how samples are scored by difficulty and how harder samples are paced into training, making it difficult to attribute observed gains to either component. We disentangle these factors with two evaluation protocols: stage-wise test subsets that validate scoring functions independently of curriculum training, and a baseline that applies the same pacing schedule to randomly ordered data. Within the Transfer Teacher framework (TTF), we use these protocols to evaluate a confusion-aware difficulty score that considers both correct-class confidence and the probability distribution over incorrect classes. On CIFAR-10 with ResNet-18 and VGG-16, the proposed score produces model-interpretable difficulty rankings that align with human intuition. However, at full data, neither curriculum nor anti-curriculum ordering improves accuracy over standard training, indicating that improving the scoring function alone is insufficient to overcome the known failure modes of curriculum learning in TTF. In contrast, We find that confusion-aware curriculum ordering result in consistent data-efficiency benefits, outperforming random ordering by up to 8.7% points at the 20% data regime, suggesting the potential of TTF as a data-efficient training method.

11.
arXiv (CS.AI) 2026-06-19

Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models

arXiv:2606.19793v1 Announce Type: cross Abstract: The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid DNN/HMM sequence discriminative training. This paper presents a comprehensive investigation of various combinations of acoustic features tailored to different Acoustic Models, offering suitable feature selections for each. The incorporation of Pitch features notably improved recognition performance, especially for sentence recognition tasks involving dysarthric speech. Through a systematic examination of the TORGO database, we have demonstrated the potential to enhance the performance of the state-of-the-art Factorized Time Delay Neural Network (F-TDNN) model for recognizing dysarthric speech. Our methods, implemented with the F-TDNN model, resulted in a 4.65\% relative improvement in isolated word recognition and a 4.63\% relative improvement in sentence recognition for dysarthric speech, compared to previous research. This improvement effectively compensates for speech variability, attributable to our deliberate selection of the number of overlapping frames between consecutive training example chunks.

12.
arXiv (CS.LG) 2026-06-18

On the Residual Scaling of Looped Transformers: Stability and Transferability

arXiv:2606.18524v1 Announce Type: new Abstract: Looped (weight-tied) Transformers apply a shared residual block $N$ times ($h \leftarrow h + \varepsilon\,f(h)$, same $f$ at each step), increasing effective depth without adding parameters. Prior depth-scaling analyses prescribe $\varepsilon = 1/\!\sqrt{L}$ for depth-$L$ residual networks. We show that this is insufficient for looped architectures: weight sharing makes residual updates correlated across iterations, requiring the stronger scaling $\varepsilon = 1/N$. For multi-layer blocks ($L$ unique layers looped $N$ times), we derive a factored parameterization $\varepsilon = \lambda/(N\!\sqrt{L})$ that separates the two sources of growth: $1/N$ controls the within-layer loop correlation, and $1/\!\sqrt{L}$ controls the across-layer variance. A key consequence is that the optimal learning rate depends only on the number of unique layers $L$, not on the loop count $N$, enabling direct hyperparameter transfer from small to large $N$ without retuning. Experiments on looped Transformers confirm that $1/N$ scaling improves trainability and yields better loss than $1/\!\sqrt{N}$ scaling across loop counts.

13.
arXiv (CS.LG) 2026-06-11

Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics

arXiv:2604.23874v3 Announce Type: replace-cross Abstract: The differentiable physics paradigm may be leveraged as an a-posteriori approach for discovering turbulence closure models by embedding a neural network parameterization directly inside the solver and optimizing it given potentially sparse target data. This addresses a key limitation of a-priori learning where direct numerical simulation (DNS) data is used to approximate the subgrid stress with the assumption of a low-pass filter. Closures trained in this a-priori manner frequently lead to unstable deployments due to the mismatch between the assumed filter and the effect of numerical discretizations and coarse-graining. In comparison, while typically stable during deployment, a-posteriori learning incurs high computational costs due to the need to backpropagate through a large eddy simulation (LES) solver. Furthermore, a-posteriori methods are challenging to apply broadly since they require significant modification of existing solvers. Finally, both approaches are limited when generalization is desired across different numerical schemes with their implicit filtering characteristics. In this work, we present a deep-learning approach for turbulence closure modeling built on the continuous data assimilation framework. Our approach enables the a-priori training of closures using sparsely observed DNS data without modifying or differentiating through the LES solver, while preserving stability during deployment for the recovery of invariant statistics. We focus on the model's ability to adapt to different discretizations by explicitly conditioning it on the numerical scheme. We use two- and three-dimensional canonical cases to test our framework and show that the learned correction systematically tracks the discretization error of the coarse solver.

14.
arXiv (math.PR) 2026-06-16

On the empirical spectral distribution of matrix perpetuities

arXiv:2605.31054v2 Announce Type: replace Abstract: We study matrix perpetuities, that is, solutions to affine fixed-point equations of the form \[ \mathbf{X} \stackrel{d}{=} \mathbf{A}\,\mathbf{X} \,\mathbf{A}^\top+\mathbf{B},\qquad (\mathbf{A},\mathbf{B})\mbox{ and }\mathbf{X} \mbox{ are independent}, \] with particular emphasis on the empirical spectral distribution of the solution. We first establish existence and uniqueness results by relating the problem to classical vector perpetuities, and then develop tools that preserve the matrix structure under orthogonal invariance. For positive semidefinite, orthogonally invariant models, we obtain power-law tail asymptotics for the expected empirical spectral distribution and show that the tail is governed by the largest eigenvalue. We also prove that, in the subcritical regime, the expected empirical spectral distribution of matrix perpetuities converges weakly, as the dimension tends to infinity, to the distribution of the corresponding free perpetuity. Our results are illustrated by matrix Beta prime perpetuities, for which explicit limiting spectral distributions are available.

15.
arXiv (CS.CV) 2026-06-11

Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection

Vision-language models (VLMs) are increasingly used for scene understanding in autonomous driving, but robustness analysis often relies on task-agnostic embedding stability alone. We study whether corruption-induced embedding drift predicts changes in a task-aligned hazard score derived from CLIP image-text similarities. Using controlled corruptions on BDD100K road scenes, we compare embedding drift against margin drift, defined as the change in hazard score under perturbation. The relationship is highly corruption-dependent: some families exhibit strong coupling between representation drift and decision drift, while others induce hazardous decision instability despite relatively modest embedding change. Furthermore, corruption families differ in failure direction: most suppress hazard detections via false negatives, while occlusion instead triggers false alarms, suggesting that benchmark design should account for asymmetric failure modes, not just overall instability rates. These results suggest that robustness benchmarks should include task-aligned stability measures in addition to embedding-level perturbation statistics.

16.
arXiv (quant-ph) 2026-06-16

A Gauge-Covariant Geometric Framework for Non-Hermitian Quantum Systems

arXiv:2606.15922v1 Announce Type: new Abstract: We develop a comprehensive, gauge-covariant geometric framework for non-Hermitian quantum systems in the quasi-Hermitian regime, that is, the region of parameter space where the non-Hermitian Hamiltonian admits a real spectrum and a positive-definite metric operator. We build this framework by elevating the Dyson map to a central geometric object. This map is the transformation that converts a non-Hermitian Hamiltonian into an equivalent Hermitian one. From it we construct the Dyson connection and decompose it into Hermitian and anti-Hermitian parts, identified respectively as {\it stretching } and {\it rotation } components. This decomposition cleanly separates the genuine physical metric deformations from the unitary gauge redundancies. Working with manifestly gauge-covariant states, we then derive the complex non-Hermitian Berry phase and the quantum geometric tensor (QGT), and show that the non-Hermitian geometric curvature originates from the non-commutativity of the stretching components at the operator level. We further analyse the geometric singularities near an exceptional point (EP) and uncover a distinct hierarchy of divergences. For a general two-level non-Hermitian model, the quantum metric tensor (QMT) exhibits a leading-order divergence $\sim |\epsilon_\mu|^{-2}$, while the Berry curvature shows a weaker, subleading divergence $\sim |\epsilon_\mu|^{-3/2}$, with $\epsilon_\mu$ denoting the parameter displacement from the EP along an individual parameter axis $\mu$. Finally, we examine physical realizations of this model, including the non-Hermitian Su–Schrieffer–Heeger (SSH) and Hatano–Nelson (HN) models, where exact analytical results confirm the predicted critical scaling laws and illustrate the metric-deformation-driven non-Hermitian geometries.

17.
arXiv (CS.AI) 2026-06-19

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

arXiv:2606.19733v1 Announce Type: cross Abstract: Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a formidable challenge in multimedia analysis. Existing approaches predominantly follow a "scene-level embedding" paradigm, which requires distilling high-dimensional semantic features into every 3D primitive. This strategy suffers from a fundamental architectural bottleneck: memory and computational costs scale linearly with scene complexity, inevitably triggering out-of-memory (OOM) failures in city-scale environments. To address this barrier, we propose QueryGaussian, a training-free framework for expeditious and scalable open-vocabulary 3D instance retrieval. Unlike holistic semantic distillation, QueryGaussian employs an instance-level query mechanism that decouples semantic understanding from geometric representation. Specifically, we leverage pre-trained 2D vision models to interpret user prompts and lift segmentation masks into 3D via a concurrent maximum-weight association strategy, ensuring semantic-visual consistency. To mitigate projection ambiguity, we introduce a temporal fusion module with multi-stage adaptive density clustering. Experimental results demonstrate that QueryGaussian not only matches the accuracy of state-of-the-art methods but also delivers a decisive efficiency leap, reducing GPU memory usage by over 70% and accelerating inference by 180x. Crucially, QueryGaussian enables expeditious instance retrieval on city-scale scenes containing tens of millions of Gaussians using consumer-grade hardware.

18.
arXiv (CS.LG) 2026-06-12

Efficient Stochastic Optimisation via Sequential Monte Carlo

arXiv:2601.22003v2 Announce Type: replace-cross Abstract: The problem of optimising functions with intractable gradients frequently arises in machine learning and statistics, ranging from maximum marginal likelihood estimation procedures to fine-tuning of generative models. Stochastic approximation methods for this class of problems typically require inner sampling loops to obtain (biased) stochastic gradient estimates, which rapidly becomes computationally expensive. In this work, we develop sequential Monte Carlo (SMC) samplers for optimisation of functions with intractable gradients. Our approach replaces expensive inner sampling methods with efficient SMC approximations, which can result in significant computational gains. We establish convergence results for the basic recursions defined by our methodology which SMC samplers approximate. We demonstrate the effectiveness of our approach on the reward-tuning of energy-based models within various settings.

19.
arXiv (CS.LG) 2026-06-12

A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling

arXiv:2606.12710v1 Announce Type: new Abstract: Diffusion models provide expressive data-driven priors for Bayesian inverse problems, but many diffusion posterior samplers rely on heuristic guidance approximations that can fail for nonlinear operators and multimodal posteriors. In this work, we develop a stabilized path-space framework for diffusion-based posterior sampling. Starting from a base diffusion process whose terminal marginal represents the prior, we define a likelihood-weighted target measure on trajectories and cast posterior sampling as learning a controlled stochastic process whose path measure matches this target. This formulation connects diffusion posterior sampling to stochastic optimal control while preserving the Bayesian structure needed for uncertainty quantification. We introduce a time reparameterization that makes the path-space control problem well posed by removing the bias induced by the unknown initial value function, without auxiliary training. We then learn the control via a trust-region path-space optimization method with log-variance objectives. The path-space perspective also unifies our learned control approach with existing guidance-based samplers, quantifies the sampling error induced by approximate controls, and yields importance sampling corrections for asymptotically exact posterior expectations. We evaluate the proposed framework on a suite of benchmark inverse problems with analytically characterized or high-quality reference posteriors, enabling principled assessment of sampling accuracy and uncertainty quantification. These experiments provide insight into the behavior of diffusion-based posterior samplers and demonstrate improved accuracy and robustness over leading approaches.

20.
arXiv (CS.AI) 2026-06-19

RACL: Reasoning-Agent Control Layers for Continuous Metaheuristic Learning

arXiv:2606.20142v1 Announce Type: new Abstract: This paper introduces RACL, a Reasoning-Agent Control Layer for metaheuristics. RACL places a reasoning agent above an existing optimizer. The agent does not replace the optimizer and does not modify business constraints. Instead, it controls the optimizer's internal search behavior by observing operational memory, reasoning over past behavior, formulating bounded hypotheses, testing interventions, evaluating outcomes, applying guardrails, consolidating useful policies and explaining its decisions. The experiment uses vehicle routing as a testbed, but the contribution is not a new routing solver, a particular ALNS configuration or a specific set of routing rules. The contribution is the RACL method: a way for a reasoning agent to discover, validate, consolidate and explain algorithmic control rules for a metaheuristic. In the current experimental setting, RACL improves or ties the Operational Memory Policy in 21 of 21 feasible cases and improves or ties a non-reasoning Stagnation-Triggered Policy in 18 of 21 feasible cases, with an average RACL vs STP cost delta of -0.641%. In the Sevilla-9/10 runtime sample, RACL improves average cost by -8.337% versus Fixed and -1.605% versus STP without showing material computational overhead. During the proof-of-concept, Codex was used as an in-the-loop reasoning agent observing executions, interpreting logs and proposing live bounded interventions. The policy proxy was later used only to make quantitative evaluation reproducible.

21.
arXiv (CS.CL) 2026-06-19

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship

Large language models (LLMs) increasingly review and revise text, including their own. A documented self-preference bias (models favoring their own generations when acting as judges) raises the question of whether models also resist valid corrections to their own writing. We test this in a setting where "valid" is decided not by another model but by a deterministic verifier: instruction-following revision on IFEval. A model writes a draft; the official IFEval checker confirms the draft violates a constraint and that a candidate edit fixes it; the model then accepts or rejects that edit either as the genuine in-context author or as a fresh model that sees the draft neutrally. Across four mid-tier model families and 85 author-versus-fresh comparisons, we find no detectable self-preference: authors reject verified-good fixes to their own drafts at essentially the same rate as fresh models judging the same drafts (gap -5.1 pp, 95% CI [-12.9, +2.7]). A self-skepticism hint from a smaller pilot did not replicate at scale. The one robust observation is qualitative: when authors do reject a verified-good fix, 97% of their stated reasons are flaw-catching rather than preference, that is, about the character of rejections, not an elevated rate. Effects smaller than ~13 pp cannot be excluded at this sample size.

22.
arXiv (CS.CV) 2026-06-16

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs

True general intelligence requires not only a model of the physical world but also a social world model: the capacity to infer how individual mental states interact and crystallize into group-level outcomes. Despite notable progress in individual-level Theory of Mind (ToM) reasoning, existing multimodal large language models fail at this broader task. Collective behavior emerges non-linearly from social tensions, conformity dynamics, and structural constraints, meaning it cannot be recovered by merely summing individual intentions. We present GroupToM-Bench, the first multimodal benchmark for group-level ToM, built around a causal chain spanning micro-level BDI states (belief, desire, intention), meso-level group tension and structural constraints, and macro-level outcome prediction and mechanistic attribution. To probe this full arc, we develop a seven-level cognitive audit framework. Experiments reveal a gap between current models and human baselines, highlighting a failure to process social structures and non-linear collective dynamics.

23.
arXiv (CS.CL) 2026-06-16

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

24.
arXiv (quant-ph) 2026-06-15

Digital programming of spin correlations in a fermionic lattice quantum simulator

arXiv:2606.13772v1 Announce Type: cross Abstract: Analog quantum simulation provides a highly controlled platform to study diverse quantum many-body phenomena. However, current methods for state initialisation are limited to thermal ensembles or uncorrelated product states. Here we present a hybrid approach that complements analog preparation with a digital quantum-gate protocol. This approach enables the engineering of target states with specific, long-range spin-correlations from the same initial resource state. By applying collisional gates to adiabatically prepared and filtered four-fermion singlet chains, we program diverse spin-correlation patterns, including that of a Heisenberg chain. We measure the spin correlations using a sequence of quantum gates followed by singlet-pair measurements. Our method paves the way to the targeted preparation of strongly correlated states of matter.

25.
arXiv (math.PR) 2026-06-11

On the spatio-temporal increments of nonlinear parabolic SPDEs and the open KPZ equation

arXiv:2508.05032v3 Announce Type: replace Abstract: We study spatio-temporal increments of the solutions to nonlinear parabolic SPDEs on a bounded interval with Dirichlet, Neumann, or Robin boundary conditions. We identify the exact local and uniform spatio-temporal moduli of continuity for the sample functions of the solutions. These moduli of continuity results imply the existence of random points in space-time at which spatio-temporal oscillations are exceptionally large. We also establish small-ball probability estimates and Chung-type laws of the iterated logarithm for spatio-temporal increments. Our method yields extension of some of these results to the open KPZ equation on the unit interval with inhomogeneous Neumann boundary conditions. Our key ingredients include new strong local non-determinism results for linear stochastic heat equation under various types of boundary conditions, and detailed estimates for the errors in linearization of spatio-temporal increments of the solution to the nonlinear equation.