Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-17

From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing

arXiv:2503.05598v2 Announce Type: replace-cross Abstract: This review examines neural operator architectures for learning solution operators of parametric partial differential equations (PDEs), with an emphasis on conceptual clarity and practical implementation. The work analyzes key models, including DeepONet, PCANet, and the Fourier Neural Operator, highlighting their underlying representations, computational structures, and comparative performance. These architectures are demonstrated on three canonical PDE problems: the Poisson equation, a linear elasticity problem, and a hyperelasticity problem. To make the presentation self-contained, key foundational topics are introduced, including finite-dimensional representations of function spaces, singular-value decomposition, and sampling from infinite-dimensional function spaces. Beyond forward modeling, the review discusses the use of neural operators as surrogate models within a Bayesian inverse-problem framework, including prior specification, forward-map approximation, and posterior computation. The performance of the three neural-operator architectures is evaluated on in-distribution samples, out-of-distribution samples, and Bayesian inference tasks. The review also discusses challenges related to prediction accuracy and generalization, outlining emerging strategies such as residual-based error correction and multi-level training. The review concludes by positioning neural operators within broader scientific-computing workflows and by identifying directions for reliable, scalable operator learning.

02.
arXiv (CS.LG) 2026-06-17

Randomized Midpoint Method for Log-Concave Sampling under Constraints

arXiv:2405.15379v3 Announce Type: replace-cross Abstract: In this paper, we study the problem of sampling from log-concave distributions supported on convex and compact sets, with a particular focus on the randomized midpoint discretization of both overdamped and kinetic Langevin diffusions in constrained domains. We revisit the proximal framework for handling constraints through projection operators and develop a more general formulation that encompasses Euclidean, Bregman, and Gauge projections. The resulting smooth approximation allows a unified and tractable analysis of Langevin algorithms and their variants under constraints. Within this framework, we establish convergence guarantees in Wasserstein-$q$ $(q\geqslant 1)$ distances between the smooth surrogate and the target distribution. We further derive complementary lower bounds, showing that the results are near-optimal in order. Building upon this tight approximation analysis, we obtain new convergence guarantees for the randomized midpoint Langevin algorithms and refined bounds for both vanilla and kinetic Langevin Monte Carlo methods under constraints, thereby advancing the theoretical understanding of constrained diffusion-based sampling.

03.
arXiv (CS.CV) 2026-06-15

CausalMotion: Structured Physical Reasoning as Keyframe and Trajectory Guidance for Training-Free Video Generation

Recent advances in diffusion-based video generation have significantly improved visual quality and short-term temporal coherence. However, existing methods still struggle to produce videos with physically consistent and causally plausible dynamics, especially in scenarios involving long-horizon interactions. This limitation arises from the fact that video diffusion models primarily learn physical consistency implicitly, while vision-language models can directly model physical laws. Based on this idea, in this work, we propose CausalMotion, a training-free framework that injects explicit physical reasoning into video generation through structured intermediate representations. Our key idea is to decouple reasoning from generation by leveraging a vision-language model to decompose a text prompt into a sequence of causally consistent keyframes and object-centric motion trajectories. These representations are then aligned and integrated as soft constraints to guide a pretrained video diffusion model during inference. This design enables explicit modeling of object dynamics and causal transitions without requiring additional training or supervision. Extensive experiments show that our method consistently improves physical plausibility and temporal coherence, particularly in dynamics-intensive scenarios, while maintaining high perceptual video quality.

04.
arXiv (CS.AI) 2026-06-18

The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbot

arXiv:2606.19197v1 Announce Type: cross Abstract: Abduction is a central approach to explain missing entailments from a knowledge base by providing a hypothesis, that would, if added to the knowledge base, make the missing entailment become true. Abduction under repair semantics has recently been investigated in detail, where several desirable properties and optimality criteria were considered, such as signature-restrictions and minimality in size and of introduced conflicts. Naturally, hypotheses that satisfy more than one of these properties or combine a property with an optimality criterion would be even more desirable for applications. So far, such hypotheses have not been investigated in the literature. In the present paper, we consider the ABox abduction problem for hypotheses satisfying more than one property or additional optimality criteria, for EL_bot under brave and AR semantics. Our main observation is that often requiring additional properties for hypotheses does not lead to an increase of complexity.

05.
medRxiv (Medicine) 2026-06-22

Artificial Intelligence-Enabled Cardiac Function Estimation from Phone Videos of Echocardiograms

Importance: Mobile phone-recorded echocardiogram videos are commonly used in point of care, telemedicine, and resource-limited workflows, but artificial intelligence models for left ventricular ejection fraction (LVEF) estimation have primarily been evaluated on native Digital Imaging and Communications in Medicine (DICOM) videos. Objective: To evaluate whether previously described artificial intelligence models for LVEF estimation retain performance when applied to mobile phone-recorded echocardiographic videos. Design: Multicenter model validation study comparing model-estimated LVEF with clinician reported LVEF. Setting: Three medical centers: Kaiser Permanente Northern California, Beth Israel Deaconess Medical Center through MIMIC-IV-ECHO, and Cedars-Sinai Medical Center. Participants: Source studies with clinician reported LVEF and apical 4-chamber or apical 2-chamber views, yielding 6209 phone-recorded videos from 2648 studies and 2611 patients. Exposures: Mobile phone recording of native echocardiographic videos and fine-tuning of pretrained models using mobile phone-recorded videos from the Kaiser Permanente Northern California training cohort. Main Outcomes and Measures: Mean absolute error in ejection fraction percentage points, R^2 for continuous estimation, and area under the receiver operating characteristic curve for identifying ejection fraction greater than 50%. Results: The study included 6209 mobile phone recorded echocardiographic videos from 2648 studies and 2611 patients; the weighted mean age was 68.4 years, and 1031 patients were male (39.5%). Without phone-video fine-tuning, the primary model achieved a mean absolute error of 7.00 percentage points, coefficient of determination of 0.49, and area under the receiver operating characteristic curve of 0.91 on phone-recorded videos; corresponding native DICOM performance was 6.08 percentage points, 0.60, and 0.93, respectively. On the 2396-video fine-tuning evaluation cohort, fine-tuning improved primary model performance to a mean absolute error of 6.96 percentage points, coefficient of determination of 0.61, and area under the receiver operating characteristic curve of 0.93. Fine-tuning the public EchoNet-Dynamic model improved performance from 9.36 percentage points, 0.37, and 0.84 to 7.86 percentage points, 0.50, and 0.89, respectively. Progressive central zoom preprocessing degraded model performance. Conclusions and Relevance: These findings suggest that artificial intelligence assisted left ventricular ejection fraction estimation from mobile phone-recorded echocardiograms may be feasible when native image export is unavailable, although prospective evaluation is needed before clinical deployment.

06.
arXiv (CS.CL) 2026-06-12

Language Model Circuits Are Sparse in the Neuron Basis

The high-level concepts that a neural network uses to perform computation need not be aligned to individual neurons (Smolensky, 1986). Language model interpretability research has thus turned to techniques which decompose the neuron basis into more interpretable units of model computation, such as sparse autoencoders (SAEs). However, not all neuron-based representations are uninterpretable. For the first time, we empirically show that MLP neurons are as sparse a feature basis as SAEs. We use this finding to develop an end-to-end gradient-based attribution pipeline for circuit tracing on the MLP neuron basis, which surfaces causally effective neurons on a variety of tasks. On a standard subject-verb agreement benchmark (Marks et al., 2025), a circuit of $\approx 10^2$ MLP neurons is enough to control model behaviour. On the multi-hop city-state-capital task from (Lindsey et al., 2025), we find a circuit in which small sets of neurons encode specific latent reasoning steps (e.g. mapping a city to its state), and can be steered to change the model's output. This work thus advances automated interpretability of language models without imposing additional training costs.

07.
arXiv (CS.AI) 2026-06-16

Frame-Conditioned Moral Computation in LLaMA 3.1-8B-Instruct: A Mechanistic Interpretability Audit of Ethical Reasoning

arXiv:2606.15507v1 Announce Type: new Abstract: Behavioral audits of Large Language Models on moral prompts measure what the model says, not the internal computation producing it. We use Transluce, an AI-driven mechanistic-interpretability platform, to examine LLaMA 3.1-8B-Instruct on 54 moral prompts in four batteries: 17 dilemmas, policy, and meta-ethical questions (B1); 6 role-playing scenarios (B3); and a controlled trolley contrast varying the switching mechanism with people fixed (B4, 15 prompts) or identity attributes with mechanism fixed (B5, 16 prompts). Two complementary metric families, five cluster-level metrics and a six-metric neuron-level panel, converge on a Situational Anchor Effect: domain-specific representations dominate the top of the activation list across every battery. The model's ethics-labeled capacity stays essentially constant; its salience (rank, priority, top-of-list presence) is highly sensitive to the interpretive frame the prompt selects. The B4-vs-B5 contrast confirms the model attends to whichever surface feature varies: aggregate ethics metrics are indistinguishable, but the dominant non-ethics distractor mirrors the design. A multi-temperature audit identifies a candidate ethics neuron (L16/N3837) stable across temperatures; a cross-model behavioral proxy on two frontier models yields preliminary evidence of divergence in self-reported moral focus, consistent with an Alignment Wrapper in which RLHF re-orders surface text without removing underlying domain-first frames. We unify these as Frame-Conditioned Moral Computation: the prompt's surface vocabulary selects a feature manifold, and the moral conclusion is downstream of that selection. Behavioral alignment must be supplemented by Mechanistic Alignment: a research program asking whether ethics-related features can be shown causally privileged under controlled frame variation, not merely loud in the explanation.

08.
arXiv (CS.AI) 2026-06-11

A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

arXiv:2606.12040v1 Announce Type: new Abstract: The design of reinforced concrete highway barriers is a safety-critical process that requires strict compliance with regulatory provisions such as the AASHTO-LRFD bridge design guidelines. Current engineering practice relies heavily on manual, iterative, and heuristic calculations to satisfy complex nonlinear material and mechanics constraints. Although Large Language Models (LLMs) demonstrate strong generative capabilities, their direct application to structural engineering remains limited by hallucination risks and insufficient physical grounding. To address these challenges, this study proposes a novel "generation-evaluation-optimization" closed-loop framework for automated concrete barrier design using the multi-agent orchestration capabilities of AutoGen. Experimental results demonstrate that the proposed agentic framework achieves over 98% design accuracy, significantly outperforming standalone general-purpose LLMs. More importantly, the study reveals that design performance is not necessarily correlated with model scale, where an 8B-parameter lightweight model could outperform unconstrained 631B-parameter flagship models. This finding highlights the potential to substantially reduce computational costs while improving the accessibility of AI-assisted engineering tools for industry applications. The source code for the proposed multi-agent design framework is available at the project GitHub repository: https://github.com/MXY820/barrier-design. Keywords: Structural Engineering; Multi-Agent Systems; Large Language Models; Concrete Barrier Design; AutoGen; Design Automation.

09.
arXiv (CS.CV) 2026-06-12

ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection

Time-series anomaly detection (TSAD) is critical in domains such as industrial monitoring, healthcare, and cybersecurity, but it remains challenging due to rare and heterogeneous anomalies and the scarcity of labelled data. This scarcity makes unsupervised approaches predominant, yet existing methods often rely on reconstruction or forecasting, which struggle with complex data, or on embedding-based approaches that require domain-specific anomaly synthesis and fixed distance metrics. We propose ASTER, a framework that generates pseudo-anomalies directly in the latent space, avoiding handcrafted anomaly injections and the need for domain expertise. A latent-space decoder produces tailored pseudo-anomalies to train a Transformer-based anomaly classifier, while a pre-trained LLM enriches the temporal and contextual representations of this space. Experiments on three benchmark datasets show that ASTER achieves state-of-the-art performance and sets a new standard for LLM-based TSAD.

10.
arXiv (CS.CL) 2026-06-16

When the Same Musical Knowledge Forgets Differently: A Clean Probe of Pathway-Dependent Forgetting

A model can learn that the piano piece Für Elise is calm and reflective by listening to the audio or by reading a text description, but does it matter which route that knowledge took when it is later at risk of being forgotten? Forgetting research in multimodal models measures what knowledge is lost under adaptation, yet has not asked whether acquisition route affects how easily that knowledge is forgotten. We call this untested premise the Pathway-Invariant Assumption. Music understanding enables a clean test because a music clip and a canonical text description can be aligned to the same perceptual content, allowing the same knowledge unit to enter a model through listening or reading while the target remains fixed. Across multiple architecturally distinct audio-language models, we observe a consistent asymmetry: text-pathway knowledge is forgotten more than matched audio-pathway knowledge under identical adaptation pressure. To attribute this effect to route rather than confounds, we introduce the Paired Pathway Controlled Protocol (PPCP), a three-phase design that establishes matched pathway baselines, activates both pathways under symmetric supervision on the same knowledge pool, and applies identical forgetting pressure to both pathways. The gap is stable across models and gain-controlled analyses, persists when contradictory overwrite is replaced by correct-label cross-domain learning, remains under single-modality pressure, and is not removed by lightweight replay. Two independent routing-depth controls confirm that the effect is not explained by architectural depth, pointing to input representation as the dominant factor. Under PPCP, our results demonstrate that forgetting is highly route-dependent, establishing acquisition route as a new analytical dimension for forgetting research and multimodal system design.

11.
arXiv (CS.LG) 2026-06-16

GRASP: Gradient-Aligned Sequential Parameter Transfer for Memory-Efficient Multi-Source Learning

arXiv:2606.14900v1 Announce Type: new Abstract: Multi-source transfer learning faces a fundamental scalability bottleneck: existing approaches require either loading all K source models into memory simultaneously during parameter fusion, requiring O(K) memory, or deploying all models at inference time, making production deployment infeasible. We propose GRASP (Gradient-Aligned Sequential Parameter Transfer), which achieves superior knowledge integration while maintaining O(1) memory consumption through three key innovations: (1) sequential processing that merges one source at a time into an evolving target model, (2) parameter-wise gradient alignment that selectively transfers only parameters whose optimization directions align with the target domain, avoiding negative transfer, and (3) iterative fine-tuning that adapts transferred knowledge before integrating the next source. Extensive experiments across three continual learning benchmarks (Yearbook, CLEAR-10, CLEAR-100) spanning 10 to 108-year temporal distribution shifts and four architectures (1.3M to 25.6M parameters) demonstrate that GRASP achieves 93.5% mean accuracy over all datasets and architectures compared to ensemble method's 71.7% accuracy while requiring only constant memory versus K models for standard multi-source fusion. Critically, GRASP's sequential previously merged models and scales to arbitrarily many sources without memory growth, making it uniquely suitable for resource-constrained deployment and continually evolving source domains.

12.
arXiv (CS.AI) 2026-06-16

ALCL: An Adaptive Log-Correntropy Loss for Robust Learning under Non-Gaussian Noise

arXiv:2606.16050v1 Announce Type: cross Abstract: Robust deep learning under heavy-tailed and impulsive noise remains challenging because conventional losses such as mean squared error (MSE) exhibit unbounded sensitivity to outliers. Although correntropy-based objectives improve robustness, existing formulations rely on fixed kernel parameters that must be empirically tuned and remain static during training. To address these limitations, we propose an Adaptive Log-Correntropy Loss (ALCL), a heavy-tailed loss formulation that adaptively learns its robustness geometry during optimization. ALCL introduces a logarithmic residual model whose shape and scale parameters are learned jointly with network weights through differentiable reparameterization. This yields a principled maximum likelihood formulation whose influence function is formally bounded and redescending, allowing the loss geometry to adapt dynamically to evolving residual statistics while suppressing extreme outliers. Comparative experiments on four widely used benchmark datasets spanning grayscale and red-green-blue (RGB) image data under mixed heavy-tailed and impulsive noise demonstrate that ALCL consistently outperforms MSE and optimally tuned generalized correntropy losses in both reconstruction fidelity and downstream classification accuracy. While performance differences remain small under low-noise conditions, under high-noise regimes ALCL improves median accuracy by up to 4.75% on grayscale benchmarks and 4.51% on RGB datasets, with reduced variance across runs. These results demonstrate that adaptive robustness through joint learning of loss parameters provides a computationally efficient alternative to static correntropy-based losses for deep learning in non-Gaussian environments.

13.
arXiv (CS.CV) 2026-06-17

OpenTie: Open-vocabulary Sequential Rebar Tying System

Robotic practices on the construction site emerge as an attention-attracting manner owing to their capability of tackling complex challenges, especially in the rebar-involved scenarios. Most of existing products and research are mainly focused on the collection of large amounts of data with model training demands. To fulfill this gap, we propose OpenTie, a 3D training-free rebar tying framework utilizing a RGB-to-point-cloud generation and an open-vocabulary rebar detection on the real-world test. We implement the OpenTie via a robotic arm with a binocular camera and guarantee a high accuracy by applying the prompt-based object detection method on the image filtered by our proposed post-processing procedure for the image-to-point-cloud generation framework. Our pipeline requires no training efforts and outperforms the training-based object detection, i.e., YOLO-based method, with the verification on the real-world sequential rebar tying test. The system is flexible for horizontal and vertical rebar tying tasks and holds the potential application to the real construction site with possibility of commercialization.

14.
arXiv (CS.AI) 2026-06-16

Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning

arXiv:2606.15107v1 Announce Type: new Abstract: Time series data in real-world deployments is overwhelmingly irregular. Observations are asynchronous, missing values are informative rather than random, and sampling frequencies vary across sensors and operational windows. However, existing Time Series Question Answering (TSQA) benchmarks mostly assume regularly sampled inputs, leaving a fundamental gap in understanding how large language models (LLMs) and AI agents perform under irregular conditions. To bridge this gap, we introduce IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains. IRTS-ToolBench is designed to be used independently by any researcher working on LLM-based irregular time series analysis, providing standardized inputs and a reproducible evaluation protocol. Code can be found in https://github.com/SanhornC/IRTS-ToolBench.

15.
arXiv (CS.LG) 2026-06-16

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

作者:

arXiv:2605.04998v2 Announce Type: replace-cross Abstract: This revision updates a pop-to-jazz chord-generation rehearsal study. Best-epoch metrics still show that modest pop rehearsal preserves pop accuracy while improving jazz prediction, but v2 corrects released-checkpoint selection: the released F1 equals Phase 0, F2 had a transcription error, and ft-pop80-v2 restores a hash-distinct jazz-adapted F1 across 3 seeds.

16.
arXiv (CS.LG) 2026-06-11

NetBurst: Event-Centric Forecasting of Bursty, Intermittent Time Series

arXiv:2510.22397v2 Announce Type: replace-cross Abstract: Network operators monitor their infrastructure by collecting telemetry data such as packet counts, byte rates, or flow volumes, yet answering the questions that effective operations demand – forecasting future load, diagnosing and characterizing anomalies, and searching for and retrieving historical precedents – requires more than raw measurements. Bridging this gap calls for learned representations: compact per-entity summaries that capture temporal dynamics from each entity's univariate time series. Time-series foundation models are the natural starting point, but they are designed for dense, periodic benchmark datasets – the mild statistical regime. However, network telemetry data inhabits the wild regime: operationally relevant events are rare, separated by variable-length stretches of low or no activity (``ebbs''), with intermittent bursts of heavy-tailed extremes (``tides''). We present NetBurst, an event-centric pipeline that collapses ebbs, separates each time series into a stream of burst timings and a stream of burst magnitudes, and learns a single representation serving all three operational tasks. Compared to the strongest competitors among eight baselines – including Amazon's Chronos-2 and Datadog's Toto – and across nine production telemetry configurations, NetBurst reduces median forecasting error by $1.3$–$116\times$ on wild-regime data with a $1.0$–$7.5\times$ better match to the true burst distribution, and matches baselines on mild-regime benchmarks. For characterizing anomalies, NetBurst produces balanced, well-spread clusters that are $16\times$ more describable in operator-familiar terms under a novel interpretability score, and cluster-filtered search delivers $7.5\times$ faster end-to-end retrieval.

17.
arXiv (quant-ph) 2026-06-17

Learning Arbitrary Lindbladians with Quantum Error Correction

arXiv:2606.18188v1 Announce Type: new Abstract: We study ansatz-free Lindbladian learning, the problem of reconstructing the generator of an open quantum system without prior knowledge of its Hamiltonian or dissipator structures. This problem exhibits two distinct information-theoretic precision limits: Hamiltonian components unmasked by dissipation are Heisenberg-limited, while the remaining Lindbladian components are subject to the quadratically worse standard quantum limit. Existing approaches that attain these optimal scalings strongly rely on pre-specified structure of interaction and noise, leaving the ansatz-free setting an open problem. In this work, we present the first standard-quantum-limited algorithm for learning arbitrary sparse Lindbladians. Under an additional physically motivated regularity condition, our framework also learns the Hamiltonian component disjoint from the dissipator at the Heisenberg limit, without prior knowledge of either the Hamiltonian or dissipator supports. Our main technical ingredient is a recursive random stabilizer-code construction that suppresses the strongest Lindbladian terms while preserving sensitivity to weaker unknown ones. These results establish a scalable framework for characterizing unknown open quantum systems, with quantum error correction serving as a key learning primitive.

18.
arXiv (quant-ph) 2026-06-16

Counterdiabatic Raman Atom Optics for Compact High-Sensitivity Gravimetry

arXiv:2606.16945v1 Announce Type: new Abstract: Large-momentum-transfer (LMT) atom interferometry provides a route toward enhanced inertial sensitivity in compact quantum sensors, but its scalability is limited by the accumulation of pulse-transfer errors across long Raman pulse sequences. We investigate theoretically the use of stimulated Raman shortcut-to-adiabatic passage (STIRSAP) for high-fidelity LMT atom optics in a Mach–Zehnder interferometer geometry. The counterdiabatic correction is encoded directly into the Raman pulse envelopes, eliminating the need for auxiliary microwave or radio-frequency control fields. Numerical simulations based on an effective Raman model show that $1~\mu\mathrm{s}$ STIRSAP pulses achieve single-pulse transfer fidelities of $F_\pi = 0.99902$ while maintaining negligible pulse-time overhead even at high momentum order. We analyze the resulting tradeoff between interferometric phase enhancement and compound contrast decay and identify an unconstrained shot-noise optimum near $n\approx270$. The analysis further shows that practical operation at extreme LMT order is constrained by wave-packet separation, vibration noise, Doppler detuning, and accumulated systematic effects rather than by pulse duration itself. These results establish superadiabatic Raman control as a promising approach for scalable high-fidelity atom optics and clarify the physical limitations governing compact high-order atom interferometers.

19.
medRxiv (Medicine) 2026-06-22

Anterior-superior hypothalamic enlargement as specific marker in episodic migraine: converging evidence from an independent discovery-replication design

Background: Growing evidence implicates the hypothalamus as a key structure in migraine pathophysiology; however, our understanding of its precise role and of the specific nuclei involved remains limited. We combined MRI data from our laboratory with publicly available MRI datasets from OpenNeuro to examine hypothalamic subunit volumes in episodic migraine and assess the specificity of these alterations relative to chronic pain conditions. Methods: Structural MRI combined with an automated atlas-based segmentation algorithm and a discovery-replication design was employed to investigate cross-sectional volumetric differences across 5 bilateral hypothalamic subunits in two independent migraine cohorts: DS1-MIG (DS1-MIG-base, n = 111 patients, n = 35 controls) and DS2-MIG (n = 27 patients, n = 31 controls). The adjusted volumes were compared between groups using MANOVA as an omnibus test, followed by Welch t-tests to test univariate follow-up. Longitudinal volumetric changes were additionally assessed in DS1-MIG participants with available follow-up scans using linear mixed models. To assess the specificity of findings to migraine, the same pipeline was applied to two chronic pain datasets, one including patients with fibromyalgia (DS-FM, n = 33 patients, n = 33 controls) and the other including patients with trigeminal neuralgia (n = 119 patients, n = 55 controls). Results: MANOVA revealed significant multivariate group differences in the discovery and replication migraine cohorts (DS1-MIG-base: = .006; DS2-MIG: = .008). Follow-up univariate analyses identified a consistent enlargement of the left anterior-superior subunit across both cohorts (FDR = .023 in DS1-MIG-base and FDR = .046 in DS2-MIG), representing the only cross-cohort replication finding. Beyond this shared signature, DS2-MIG exhibited additional significant enlargements of the right anterior-inferior and right tubular-inferior subunits. Longitudinal analyses in DS1-MIG showed that hypothalamic subunit volumes remained broadly stable over time within both migraine patients and control participants. No significant volumetric alterations were detected in the fibromyalgia or trigeminal neuralgia cohorts, either in multivariate or univariate analyses, underscoring migraine-specific findings. Conclusions: These findings provide evidence for subunit-specific hypothalamic structural alterations in migraine localized in the left anterior hypothalamic subunit. The stability of these differences over time and their absence in other chronic pain conditions suggest a migraine-specific structural organisation of hypothalamic circuitry.

20.
arXiv (CS.CV) 2026-06-12

Dual-State Slot Attention: Decoupling Appearance and Identity for Video Object-Centric Learning

Unsupervised video object-centric learning aims to decompose dynamic scenes into persistent, object-level representations without supervision. However, existing slot-based methods struggle to maintain stable object identity in challenging settings such as rapid motion and partial occlusion. First, they typically encode both the per-frame appearance of an object and its identity across frames in a single slot vector, creating an objective conflict that leads to slot swapping: reconstruction requires sensitivity to transient visual changes, whereas temporal consistency requires invariance to them. Second, the token renormalization used in Slot Attention can amplify weakly attending slots, allowing them to absorb tokens from other objects and destabilize slot-to-object correspondence. We propose Dual-State Slot Attention (DSSA), a fully self-supervised framework that addresses these limitations by separating appearance from identity and by reducing spurious updates from weakly matching slots. DSSA decomposes each slot into a local state for per-frame appearance and an identity state for temporally stable object information, thereby aligning reconstruction and temporal consistency with separate representations. The identity state is updated through a learned recurrent transition that acts as a temporal filter on the local state, while competition-modulated aggregation (CMA) down-weights updates from weakly matching slots and prevents them from absorbing tokens from other objects. Experiments on MOVi-C, MOVi-D, and YouTube-VIS demonstrate that DSSA consistently improves segmentation quality and temporal consistency over prior methods, while also yielding stronger downstream object recognition and video dynamics prediction. Code and models will be made publicly available upon acceptance.

21.
arXiv (math.PR) 2026-06-12

Pathwise integration beyond Young via Faber–Schauder energy spaces

作者:

arXiv:2606.13331v1 Announce Type: cross Abstract: We develop a pathwise integration theory based on Faber–Schauder energy spaces. The approach replaces the classical Hölder–Young and finite-variation Young conditions by dyadic summability conditions expressed in terms of Faber–Schauder coefficients. On the normalized interval $[0,1]$, these conditions define Banach spaces $\mathcal{E}^p$, which we call Faber–Schauder energy spaces. For $p,q>1$ satisfying $1/p+1/q\ge1$, we prove that every pair $f\in\mathcal{E}^p$ and $g\in\mathcal {E}^q$ admits a continuous pathwise integral $I_{f,g}$, constructed from dyadic left Riemann sums. We call $I_{f,g}$ the Faber–Schauder integral, and show that it depends boundedly and bilinearly on $(f,g)$ in the corresponding energy norms. The integral satisfies additivity, integration by parts, and a dyadic Young–Loève estimate. It is also the uniform limit of classical Riemann–Stieltjes integrals of finite Faber–Schauder approximations. The Faber–Schauder integral agrees with the classical Young integral whenever the latter is available, but also applies to deterministic and Gaussian examples for which neither the Hölder–Young condition nor the finite-variation Young condition can be verified. In this sense, it provides a Faber–Schauder coefficient-based extension of Young's framework.

22.
arXiv (quant-ph) 2026-06-11

TensorKit.jl: A Julia package for large-scale tensor computations, with a hint of category theory

arXiv:2508.10076v2 Announce Type: replace-cross Abstract: TensorKit$.$jl is a Julia-based software package for tensor computations, especially focusing on tensors with internal symmetries. This paper introduces the design philosophy, core functionalities, and distinctive features, including how to handle abelian, non-abelian, and anyonic symmetries through the ``TensorMap'' type. We highlight the software's flexibility, performance, and its capability to extend to new tensor types and symmetries, illustrating its practical applications through select case studies.

23.
arXiv (CS.AI) 2026-06-12

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

arXiv:2606.12674v1 Announce Type: new Abstract: Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausible workflow graphs that fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-time evolutionary search method that treats compact tool use as the repair of executable tool workflows. It evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast, SFT and SFT+DPO on the same search-mined data match, underperform, or collapse below zero-shot performance; ReAct reaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.

24.
arXiv (CS.AI) 2026-06-15

Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

arXiv:2602.00845v3 Announce Type: replace Abstract: Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner

25.
arXiv (CS.CL) 2026-06-12

PiDA: Phonetically-Informed Data Augmentation for Robust Vietnamese Speech Translation

Cascaded speech translation (ST) systems suffer from error propagation when Automatic Speech Recognition (ASR) outputs incorrect transcripts. We present the first systematic categorization of ASR errors for Vietnamese ST, classifying substitution errors by phonetic cause and quantifying their impact on downstream Neural Machine Translation (NMT) performance using Linear Mixed-Effects Modelling. We confirm that most ASR substitution errors arise from phonetic confusions rather than random noise, and that these phonetic errors significantly degrade ST quality. Motivated by this finding, we propose Phonetically-Informed Data Augmentation (PiDA), which generates ASR-like corruptions by substituting words with phonetically similar alternatives using phonetic word embeddings. Fine-tuning on a PiDA-augmented version of FLEURS Vietnamese-English improves translation of erroneous ASR outputs (up to +2.04 BLEU over standard fine-tuning) while also slightly improving clean-text performance.