Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-24

Universal Dynamical Response to Slow Driving in Chaotic Systems

arXiv:2606.23810v1 Announce Type: cross Abstract: We propose a unified perspective on classical and quantum chaos based on the stability of a system's stationary states under slow driving. We probe this sensitivity via the system's susceptibility to the average protocol speed, which we call the ``speed-Fisher information," and relate it to irreversible entropy production in the system. We show that chaotic dynamics manifests as a divergence of the speed-Fisher information with the protocol time, and that this response is controlled by the perturbation's low-frequency spectral weight. This approach to chaos applies to both classical and quantum Hamiltonian systems, and naturally extends to non-Hamiltonian classical flows. We illustrate this framework with simple classical and quantum examples, along with a non-Hamiltonian flow that qualitatively exhibits analogous low-frequency spectral behavior.

02.
arXiv (CS.LG) 2026-06-17

QueryMarket: Cost-Aware Online Active Learning in Data Markets

arXiv:2606.17805v1 Announce Type: new Abstract: Data acquisition is a major bottleneck for learning in real-time streams: analysts must decide on the fly which labels to purchase while respecting a rolling budget. However, existing online active learning rarely unifies pricing, information gain, and rolling budget constraints under concept drift. We introduce QueryMarket, a market-inspired framework that queries each incoming data point based on its estimated utility to the model and its price. Within this framework, we propose OVBAL (online variance-based active learning), which integrates data pricing with information-driven selection by estimating each sample's marginal utility via a D-optimality criterion with exponential forgetting and executing cost-aware purchases under rolling budget constraints. OVBAL yields a simple, fully online decision rule that adapts to nonstationary streams and heterogeneous label costs. Experiments on synthetic data and a real-world solar power generation forecasting task show that OVBAL is particularly effective under seller-centric pricing and yields a more favorable long-run error-cost trade-off in the real-world task under both pricing schemes.

03.
arXiv (CS.AI) 2026-06-19

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

arXiv:2606.19887v1 Announce Type: cross Abstract: Existing safety benchmarks target general adversarial scenarios but miss finance-specific risks. Financial LLMs face regulatory compliance violations, fraud facilitation, and systemic trust erosion that require targeted evaluation. We introduce FinRED, an expert-guided red-teaming framework for financial LLM safety evaluation developed with financial experts. FinRED uses a novel two-level taxonomy mapping global standards (e.g., FATF and EU DORA) to threats ranging from regulatory evasion to complex fraud, integrated with a scalable pipeline that converts real financial documents into context-rich red-teaming Behavioral Prompts (seeds) through an expert-defined schema. Rigorous expert validation confirms seed plausibility and realism for meaningful LLM safety evaluation. We also provide an expert-validated, finance-specific rubric that goes beyond disclaimer checks, aligns more closely with human experts than static one-size-fits-all rubrics, and reduces critical false negatives from 28 to 12. Aligned with internationally adopted risk-management and information-security standards (e.g., ISO/IEC 27001), FinRED is deployed in South Korea's Financial Security Institute (FSI) regulatory sandbox for generative AI security evaluation in real financial services. To mitigate dual-use risks, the dataset, generation pipeline, prompt template, and evaluation framework are gated for qualified researchers at https://github.com/selectstar-ai/FinRED-paper and https://huggingface.co/datasets/datumo/FinRED.

04.
arXiv (CS.AI) 2026-06-11

Market Design for AI: Beyond the Copyright Binary

arXiv:2606.12260v1 Announce Type: cross Abstract: How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and – by modeling as a static Stackelberg game – strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance – a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

05.
arXiv (CS.AI) 2026-06-24

On the Position Bias of On-Policy Distillation

arXiv:2606.22600v2 Announce Type: replace-cross Abstract: On-Policy Distillation (OPD) improves the learning efficiency of standard reinforcement learning through dense, token-level supervision from teachers. In the standard KL objective of OPD, token-level losses are uniformly averaged, implying equal weights for all tokens. However, we discover that not all tokens are created equal: as student rollouts grow longer, they deviate further from the teacher's distribution, leading to degraded supervision quality at later positions. As a result, OPD using only the first 30% of tokens can perform comparably to using all tokens, whereas OPD using only the last 30% of tokens barely learns anything. In this work, we provide a principled understanding of this issue through the lens of constrained optimization. Based on these insights, we derive Importance-Weighted On-Policy Distillation (IW-OPD), in which the weight assigned to each token depends on the accumulated discrepancy between the student's and teacher's distributions, naturally upweighting earlier tokens and downweighting later ones with larger deviations. We show that IW-OPD converges significantly faster than OPD, with better learning efficiency, and achieves better final performance than standard OPD in both same-size and cross-scale settings, improving performance up to 6.9 points on AIME-2025.

06.
arXiv (CS.LG) 2026-06-18

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

arXiv:2606.18898v1 Announce Type: new Abstract: Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

07.
arXiv (CS.AI) 2026-06-11

The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning

arXiv:2505.03296v2 Announce Type: replace-cross Abstract: We present Mixture of Discrete-time Gaussian Processes (MiDiGap), a novel approach for flexible policy representation and imitation learning in robot manipulation. MiDiGap enables learning from as few as five demonstrations using only camera observations and generalizes across a wide range of challenging tasks. It excels at long-horizon behaviors such as making coffee, highly constrained motions such as opening doors, dynamic actions such as scooping with a spatula, and multimodal tasks such as hanging a mug. MiDiGap learns these tasks on a CPU in less than a minute and scales linearly to large datasets. We also develop a rich suite of tools for inference-time steering using evidence such as collision signals and robot kinematic constraints. This steering enables novel generalization capabilities, including obstacle avoidance and cross-embodiment policy transfer. MiDiGap achieves state-of-the-art performance on diverse few-shot manipulation benchmarks. On constrained RLBench tasks, it improves policy success by 76 percentage points and reduces trajectory cost by 67%. On multimodal tasks, it improves policy success by 48 percentage points and increases sample efficiency by a factor of 20. In cross-embodiment transfer, it more than doubles policy success. We make the code publicly available at https://midigap.cs.uni-freiburg.de.

08.
arXiv (CS.LG) 2026-06-15

Cluster LOCO: Feature Importance For Interpreting Clusters

arXiv:2606.14592v1 Announce Type: cross Abstract: Clustering is widely used for exploratory analysis and scientific discovery, driving insights from market segmentation to biological data analysis, but its outputs can be difficult to interpret, audit, and reproduce as modern datasets become increasingly large and complex. Reliable use of clustering requires understanding which features drive the discovered structure, yet feature-level explanations for clustering remain scarce compared with methods in supervised learning. Furthermore, existing clustering feature importance scores are often tied to specific algorithms and data assumptions. To address these challenges, we propose Cluster LOCO (Leave-One-Covariate-Out), a family of model-agnostic feature importance scores for clustering. Cluster LOCO is built on feature occlusion and clustering generalizability, defined as whether cluster labels learned on one subset of the data can be accurately predicted on held-out samples. For any chosen clustering algorithm, Cluster LOCO quantifies a feature's importance by measuring how much its removal degrades generalizability. We first introduce Cluster LOCO-Split, which relies on data splitting, and then extend it to Cluster LOCO-MP, a minipatch ensemble-based version designed for large-scale data. Across synthetic simulations and an application to cell-type discovery in single-cell transcriptomics, we show that Cluster LOCO more reliably recovers informative features than existing clustering feature importance methods.

09.
medRxiv (Medicine) 2026-06-17

Determinants of non-utilization of insecticide-treated nets among children under five in Rwanda: analyses of the 2024 Rwanda malaria indicator survey

Background Insecticide-treated nets (ITNs) are effective for preventing malaria among children under five years, who bear a disproportionate burden of malaria. This study assessed the prevalence and determinants of ITN non-utilization among children under five in Rwanda using data from the 2024 Rwanda Malaria Indicator Survey (RMIS).Methodology This cross-sectional study utilized nationally representative data from the 2024 RMIS. Analyses were restricted to children under five residing in households that owned at least one ITN. The outcome was non-utilization of ITN, defined as not sleeping under an ITN the night preceding the survey. Survey-weighted descriptive statistics were used to estimate the prevalence of ITN non-utilization. Factors associated with non-utilization were identified using a survey-weighted Poisson regression model. Adjusted prevalence ratios (aPRs), 95% confidence intervals and p-values were reported.Results A total of 1,979 children were included in the study. The weighted prevalence of ITN non-utilization among children under five years was 20.11% (95% CI: 17.81 - 22.63). After adjusting for other factors, children aged 2 - 3 years were associated with an 83% higher prevalence of ITN non-utilization compared with those aged [&le;]1 year (aPR = 1.83, 95% CI: 1.423 - 2.352, p < 0.001). Compared with households that owned only one ITN, children in households with three or more ITNs were associated with a 76% lower prevalence of ITN non-utilization (aPR = 0.24, 95% CI: 0.171 - 0.332, p < 0.001). Children living in households with 5 - 7 members were associated with an 87% higher prevalence of ITN non-utilization compared with those in households with 1 - 4 members (aPR = 1.87, 95% CI: 1.476 - 2.358, p < 0.001).Conclusion The findings suggest that ITN utilization among children is influenced not only by household access to nets but also by household composition and dynamics that shape the allocation and use of available preventive resources.

10.
arXiv (CS.CV) 2026-06-11

Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation

作者:

PCA can be used for rotation invariant features, describing a shape with its $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ covariance matrix approximating shape by ellipsoid, allowing for rotation invariants like its traces of powers. However, real shapes are usually much more complicated, hence there is proposed its extension to e.g. $p_{abc}=E[(x_a-E[x_a])(x_b-E[x_b])(x_c-E[x_c])]$ order-3 or higher tensors describing central moments, or polynomial times Gaussian allowing decodable shape descriptors of arbitrarily high accuracy, and their analogous rotation invariants. Its practical applications could be rotation-invariant features to include shape modulo rotation e.g. for molecular shape descriptors, or for up to rotation object recognition in 2D images/3D scans maybe also for 3D scene understanding, or shape similarity metric allowing inexpensive comparison of objects modulo rotation avoiding costly optimization over rotations.

11.
arXiv (CS.AI) 2026-06-12

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

arXiv:2606.12418v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

12.
arXiv (CS.CV) 2026-06-17

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

Autonomous driving has shifted towards end-to-end policy learning, where reliable, interpretable policy evaluation is a fundamental challenge as driving quality is highly context-dependent. Commonly used rule-based driving metrics like EPDMS are interpretable but lack context-awareness, while recent VLMbased evaluations are context-aware but limited by ambiguous VLM outputs and weak physical grounding. To evaluate driving in a manner that is both interpretable and context-aware, we introduce DriveJudge. DriveJudge is a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning and selectively invokes physically-grounded deterministic rule functions after interpreting the environmental context. To train and evaluate DriveJudge, we curate a large-scale dataset of 33,577 challenging driving samples with human annotations on whether the driving behavior is reasonable in the given scenario. With this dataset, we address the underexplored problem of driving metric evaluation, and introduce two human-aligned benchmark tasks: Driving Quality Classification and Trajectory Preference Selection. DriveJudge outperforms EPDMS for driving quality classification by 21.23 AUC, and the recent VLM-based DriveCritic for trajectory preference selection by 6.5%, setting a new standard for interpretable and precise driving evaluation.

13.
arXiv (CS.LG) 2026-06-16

Finite-Width Neural Tangent Kernels from Feynman Diagrams

arXiv:2508.11522v4 Announce Type: replace Abstract: Neural tangent kernels (NTKs) are a powerful tool for analyzing deep, non-linear neural networks. In the infinite-width limit, NTKs can easily be computed for most common architectures, yielding full analytic control over the training dynamics. However, at infinite width, important properties of training such as NTK evolution or feature learning are absent. Nevertheless, finite width effects can be included by computing corrections to the Gaussian statistics at infinite width. We introduce Feynman diagrams for computing finite-width corrections to NTK statistics. These dramatically simplify the necessary algebraic manipulations and enable the computation of layer-wise recursion relations for arbitrary statistics involving preactivations, NTKs and certain higher-derivative tensors (dNTK and ddNTK) required to predict the training dynamics at leading order. We demonstrate the feasibility of our framework by extending stability results for deep networks from preactivations to NTKs and proving the absence of finite-width corrections for scale-invariant nonlinearities such as ReLU on the diagonal of the Gram matrix of the NTK. We numerically implement the complete set of equations necessary to compute the first-order corrections for arbitrary inputs and demonstrate that the results follow the statistics of sampled neural networks for widths $n\gtrsim 20$.

14.
arXiv (CS.AI) 2026-06-12

Stubborn: A Streamlined and Unified Reinforcement Learning Framework for Robust Motion Tracking and Fall Recovery for Humanoids

arXiv:2606.12814v1 Announce Type: cross Abstract: Recent reinforcement learning approaches have shown great promise in improving humanoid motion tracking performance and achieving fall recovery under disturbances. However, most existing works treat motion tracking and fall recovery as different tasks and require multi-stage training with specialized recovery rewards and/or separate recovery policies. Moreover, existing reinforcement learning-based methods often terminate training episodes immediately after severe tracking failures, limiting recovery-oriented exploration in unstable or fallen states. To address the above issues, we propose Stubborn, a streamlined and unified reinforcement learning framework to achieve robust humanoid motion tracking and fall recovery. Specifically, Stubborn uses an asymmetric Actor-Critic architecture and consists of three major components. First, a yaw-aligned tracking representation is adopted to reduce sensitivity to global drift and heading disturbances while preserving gravity-related balance information. Second, we introduce a Bernoulli-based probabilistic termination mechanism that enables the policy to encourage exploration of fall-recovery behaviors under varying failure modes. Third, we propose a probabilistic termination and tracking-error-driven strategy that dynamically reshapes the sampling distribution based on tracking performance, increasing the training efficiency for difficult motion segments and unstable states. Extensive comparisons with SOTA methods and ablation studies show that Stubborn achieved competitive performance, and the proposed probabilistic termination mechanism and adaptive sampling strategy contributed to the performance and robustness gains. For real-world demonstrations, please refer to https://aislab-sustech.github.io/Stubborn/.

15.
arXiv (CS.CV) 2026-06-11

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.

16.
arXiv (CS.AI) 2026-06-17

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

arXiv:2605.26195v2 Announce Type: replace-cross Abstract: LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.

17.
Nature (Science) 2026-06-17

Reimagining machine vision with optical computing

作者: 未知作者

A general-purpose artificial-intelligence vision system for use in image-sensing devices has been developed by embedding fundamentals of core computer-vision operations into a light-manipulating planar material called an optical metasurface. A prototype enables accurate, real-time perception and processing across diverse tasks, suggesting that this could be a solution for rapid, low-energy, on-device vision intelligence. A specialized ‘metasurface’ can preprocess incoming scene information on image-generating devices.

18.
arXiv (CS.CL) 2026-06-11

SAGE: Answer-Conditioned Uncertainty Targets for Verbal Uncertainty Alignment

Large language models increasingly express uncertainty through natural-language statements, yet these expressions often fail to reflect the model's sampled behavior. We study verbal uncertainty alignment as a distributional calibration problem: the appropriate uncertainty target for a prompt should be estimated from repeated model outputs rather than from an isolated response. However, group rollouts alone are insufficient, since the resulting target must provide a useful training signal. Existing targets only partially satisfy this requirement. We propose SAGE, Semantic-Answer Guided Entropy, a group-level uncertainty target that constructs an answer-conditioned uncertainty geometry over sampled responses. SAGE preserves categorical, numeric, and symbolic answer distinctions while maintaining a smooth and scale-preserving calibration signal. We further apply this target through Group-Uncertainty Preference Optimization, or GUPO, an uncertainty-channel training framework that supervises verbal uncertainty expressions rather than the full response. Experiments across factual, mathematical, and multiple-choice reasoning tasks show improved uncertainty ranking, lower calibration error, and reduced overconfidence.

19.
arXiv (CS.AI) 2026-06-11

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

arXiv:2507.21164v2 Announce Type: replace-cross Abstract: Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that couples representation learning with an analytically solvable One-Class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a \deleted{new} benchmark based on MNIST-C, and a challenging brain MRI \deleted{subtle} lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and texture or population age variations in MRI. Results demonstrate performance and robustness of our proposed model, highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning.

20.
medRxiv (Medicine) 2026-06-22

Anterior-superior hypothalamic enlargement as specific marker in episodic migraine: converging evidence from an independent discovery-replication design

Background: Growing evidence implicates the hypothalamus as a key structure in migraine pathophysiology; however, our understanding of its precise role and of the specific nuclei involved remains limited. We combined MRI data from our laboratory with publicly available MRI datasets from OpenNeuro to examine hypothalamic subunit volumes in episodic migraine and assess the specificity of these alterations relative to chronic pain conditions. Methods: Structural MRI combined with an automated atlas-based segmentation algorithm and a discovery-replication design was employed to investigate cross-sectional volumetric differences across 5 bilateral hypothalamic subunits in two independent migraine cohorts: DS1-MIG (DS1-MIG-base, n = 111 patients, n = 35 controls) and DS2-MIG (n = 27 patients, n = 31 controls). The adjusted volumes were compared between groups using MANOVA as an omnibus test, followed by Welch t-tests to test univariate follow-up. Longitudinal volumetric changes were additionally assessed in DS1-MIG participants with available follow-up scans using linear mixed models. To assess the specificity of findings to migraine, the same pipeline was applied to two chronic pain datasets, one including patients with fibromyalgia (DS-FM, n = 33 patients, n = 33 controls) and the other including patients with trigeminal neuralgia (n = 119 patients, n = 55 controls). Results: MANOVA revealed significant multivariate group differences in the discovery and replication migraine cohorts (DS1-MIG-base: = .006; DS2-MIG: = .008). Follow-up univariate analyses identified a consistent enlargement of the left anterior-superior subunit across both cohorts (FDR = .023 in DS1-MIG-base and FDR = .046 in DS2-MIG), representing the only cross-cohort replication finding. Beyond this shared signature, DS2-MIG exhibited additional significant enlargements of the right anterior-inferior and right tubular-inferior subunits. Longitudinal analyses in DS1-MIG showed that hypothalamic subunit volumes remained broadly stable over time within both migraine patients and control participants. No significant volumetric alterations were detected in the fibromyalgia or trigeminal neuralgia cohorts, either in multivariate or univariate analyses, underscoring migraine-specific findings. Conclusions: These findings provide evidence for subunit-specific hypothalamic structural alterations in migraine localized in the left anterior hypothalamic subunit. The stability of these differences over time and their absence in other chronic pain conditions suggest a migraine-specific structural organisation of hypothalamic circuitry.

21.
arXiv (CS.AI) 2026-06-16

WorkflowPerturb: Calibrated Stress Tests for Evaluating Multi-Agent Workflow Metrics

arXiv:2602.17990v2 Announce Type: replace Abstract: Multi-agent LLM systems that generate structured workflows from natural-language requests are now deployed in production across cloud automation, DevOps, and enterprise process orchestration. Operating such systems exposes a recurring change-management problem. Routine updates, such as re-running the same input, swapping the underlying LLM, or refactoring an agent's prompt or orchestration code, frequently produce workflows that differ substantially from previously validated references. Engineers are then left without a principled way to decide whether a change is safe to ship. Automatic workflow evaluation is the natural tool for answering this question. In practice, however, metric scores are poorly calibrated, and a numeric change rarely communicates the severity of the underlying degradation. We introduce WorkflowPerturb, a controlled benchmark for studying workflow evaluation metrics by applying realistic, graded perturbations to golden workflows. WorkflowPerturb contains 4,973 golden workflows and 44,757 perturbed variants across three perturbation types (Missing Steps, Compressed Steps, and Description Changes), each applied at severity levels of 10%, 30%, and 50%. We benchmark multiple metric families and analyze their sensitivity and calibration using expected score trajectories and residuals. Our results characterize systematic differences across metric families and support severity-aware interpretation of workflow evaluation scores in change-management settings. Our dataset will be released upon acceptance.

22.
arXiv (math.PR) 2026-06-11

Percolation phase transition on planar spin systems

arXiv:2105.13314v2 Announce Type: replace Abstract: In this article we study the continuity and sharpness of the phase transition for percolation models defined on top of planar spin systems. The two examples that we treat in detail concern the Glauber dynamics for the Ising model and a Dynamic Bootstrap process. For both of these models we prove that their phase transition is continuous and sharp, providing also quantitative estimates on the two point connectivity. The techniques that we develop in this work can be applied to a variety of different percolation models based on spin-flip dynamics. We also discuss some of the problems that can be tackled in a similar fashion.

23.
arXiv (CS.CV) 2026-06-11

From Nominal Intensity to Equivalent Rainfall: A Path-Based Credibility Evaluation Framework for Simulated Rainfall in Autonomous-Driving Perception Tests

Credible simulated-rainfall conditions are essential for identifying perception-system boundaries and supporting SOTIF-oriented risk assessment in automated driving. However, closed-field tests are often described only by nominal rainfall intensity or single-point measurements, making it difficult to align simulated rain fields with real rainfall and map test results to real-world scenarios. This paper proposes a path-based credibility evaluation method for simulated rainfall in autonomous-driving perception tests. Using the drop size and velocity joint distribution of real rainfall as the reference, each candidate path is represented by path-equivalent rainfall intensity, an uncertainty band, and a path-averaged Realism of Raindrop Distribution (RRD) score. Lidar target point-cloud count and mean reflectivity are further used for perception-consistency correction, quantifying the proxy capability of each simulated-rainfall path for real-rainfall perception effects. Experiments are conducted using about 10,000 real-rainfall raindrop-spectrum samples, 728 RainSense perception samples, and 45 spatial sampling points in a 2.4 m x 7.2 m simulated-rainfall area. Results show that spatial non-uniformity remains under the same nominal condition, confirming the need for path-based evaluation. The method identifies Path IV and Path VI as preferable candidates, with results of 11.54 +/- 0.31 mm/h, RRD = 0.43, and 8.28 +/- 0.34 mm/h, RRD = 0.46, respectively. These paths show more balanced performance in rainfall-intensity stability, raindrop-spectrum realism, and perception consistency. The proposed method supports path selection, condition description, and credible interpretation of autonomous-driving perception tests under rainfall.

24.
arXiv (CS.CL) 2026-06-16

Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing

作者:

Drawing on 23 anonymized student pro-jects from a fourth-year Machine Transla-tion and Post-editing course in a BA-level translation programme, this paper exam-ines how structured comparison of gen-eral-purpose LLMs and online MT sys-tems can elicit evaluative judgement in AI-mediated translation. Students translat-ed short specialised English Wikipedia texts into Catalan or Spanish, generated four system outputs, evaluated them using automatic metrics and human adequa-cy/fluency assessment, selected one output for post-editing, and justified their deci-sion in written reports. Descriptive counts are reported for all 23 projects, while qualitative interpretation is based on the 22 cases accompanied by written reports. Results show that students did not treat automatic metrics as final authority: final post-editing selections often diverged from metric rankings and were justified through adequacy, fluency, terminology, naturalness, and expected post-editing ef-fort. The study therefore does not bench-mark systems under controlled conditions; it analyses how students justified system choice within an authentic classroom as-signment.

25.
arXiv (CS.AI) 2026-06-15

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

arXiv:2606.01730v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as heuristic advisors for black-box optimization, yet their suggestions and self-reported confidence are not necessarily calibrated to downstream objective values. This issue becomes more pronounced in multi-objective Bayesian optimization, where different objectives may require different expert knowledge and where an LLM expert can be useful for one objective but misleading for another. We study how to use LLM-generated expert priors in discrete multi-objective Bayesian optimization without blindly trusting them. We propose an objective-wise reputation-market mechanism that treats each expert-objective pair as a falsifiable prior source. Expert weights are updated online from observed objective feedback, discounted over time, and gated by market-level trust. We then introduce a decoupled counterfactual gate that can use the LLM prior without confidence, use it with confidence, or abstain from the LLM prior entirely. Across controlled synthetic stress tests and three molecule optimization benchmarks with \qwenflash{}-generated expert priors, we find that dynamic objective-wise calibration improves robustness over fixed LLM priors. However, raw LLM confidence is not reliably beneficial: on ESOL, confidence is positively correlated with prediction error; on FreeSolv, confidence can help; and on Lipophilicity, ignoring confidence remains strongest. Our fixed three-arm counterfactual gate improves over the first counterfactual variant on ESOL and FreeSolv, while an attempted margin portfolio exposes a useful negative result: margin selection should be acquisition-aware rather than based only on one-step prior error.