Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

ERTS: Adversarial Robustness Testing of Ethical AI via Semantic Perturbation in a Bounded Consequence Space

arXiv:2606.13282v1 Announce Type: new Abstract: As AI systems are deployed in high-stakes ethical contexts such as healthcare triage, autonomous vehicle control, and employment screening, formal methods for evaluating their robustness against adversarial manipulation of ethical reasoning remain underdeveloped. This paper introduces the Ethical Robustness Testing System (ERTS), a closed-pipeline framework that: (1) encodes ethical dilemmas into a 22-dimensional Ethical Consequence Space (ECS) grounded in established ethical theory; (2) applies 17 semantic perturbation functions subject to 6 validity constraint classes including a novel semantic coherence constraint; (3) measures decision deviation via a 4-component Ethical Instability Index (EII); and (4) produces domain-adaptive pre-deployment robustness assessment verdicts. We evaluate 4 structured baseline models and 2 production LLMs (Gemini 2.0 Flash and Llama 3.2) across 50 ethical scenarios spanning 8 deployment domains, generating 1,500 adversarial test cases. Results demonstrate that only 33% of models achieve assessment clearance, with the local Llama-3.2 model proving particularly vulnerable to fairness corruption and information degradation attacks (ERS = 0.737). To the best of our knowledge, no existing framework combines a bounded ethical consequence space, semantic coherence constraints, and domain-adaptive assessment in a single adversarial testing pipeline.

02.
arXiv (quant-ph) 2026-06-11

Collective neutrino oscillations: Many-body non-forward effects and non-classicality

arXiv:2606.12404v1 Announce Type: cross Abstract: Neutrino evolution in dense astrophysical environments is typically described either within a quantum kinetic framework, which neglects the build-up of multi-body correlations, or through simplified many-body calculations that allow significant entanglement to develop. In this work, we compare these two approaches in a simple neutrino-gas configuration, with particular emphasis on the role of non-forward scattering processes. These effects are incorporated either through a collision term in the kinetic description, or by considering the full neutrino-neutrino many-body Hamiltonian. We highlight differences between the two descriptions in both their characteristic timescales and asymptotic behavior. Motivated by the natural suitability of quantum computing for many-body calculations, we further investigate the non-classicality of neutrino evolution, discussing Trotter error scaling, along with the associated costs of constructing quantum circuits in terms of entangling gates and non-Clifford gates. We find that the resources needed for neutrino many-body evolution are on the low end of typical high-energy physics problems and on the mid to high end with respect to quantum chemistry problems. For the full Hamiltonian, resource requirements increase relative to the truncated version. We emphasize the importance of efficient fermion-to-qubit encodings, which are essential for reducing the substantial computational resources required for such simulations.

03.
arXiv (CS.CL) 2026-06-19

How LLMs Fail and Generalize in RTL Coding for Hardware Design?

Translating sequential programming priors into the parallel temporal logic of hardware design remains a crucial bottleneck for large language models(LLM). To investigate this, we introduce a new error taxonomy grounded in problem solvability, inspired by cognitive theory. Our taxonomy categorizes failures into syntactic, semantic, solvable functional, and unsolvable functional types. Evaluations reveal a strict empirical ceiling on the VerilogEval benchmark, as frontier models plateau at a 90.8% initial pass rate. These plateaus are defined by unsolvable functional errors, exposing persistent knowledge gaps immune to test time compute scaling. Furthermore, we expose a striking surface convergence gap: optimization readily eliminates syntax errors but concurrently exacerbates deeper functional failures. Our findings demonstrate that alignment techniques merely teach models to compile. While repeated sampling strategies can patch solvable errors, register-transfer level(RTL) coding capacity remains strictly bounded by pretraining knowledge. Addressing challenges in the current LLM based hardware generation pipeline requires more studies in model reasoning rather than alignment interventions.

04.
arXiv (CS.CL) 2026-06-12

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single behavioral mode. Recent work introduces style variation through post-hoc labels on observational data or LLM-inferred reward weights, but these signals act as proxies for what a style should reward rather than demonstrations of humans explicitly asked to drive in that style. We introduce PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, in which participants drive CARLA leaderboard routes under aggressive, neutral, and conservative instructions on a driver-in-the-loop rig. The pipeline has three stages: (i) offline triplet mining over per-style human driving data using a combined image-text similarity score; (ii) training a lightweight retrieval head that fuses frozen visual features with a small control encoder over per-style databases; and (iii) fine-tuning a single VLA backbone to treat retrieved context points as in-context behavioral demonstrations during waypoint prediction. At inference, the same backbone is conditioned on any style by swapping which per-style database the retrieval head queries, so selecting a style requires no per-style retraining while enabling human-style, style-diverse non-ego agents for closed-loop simulation. On Bench2Drive, PersonaDrive (no style) improves the driving score by 4.6% over SimLingo and 2.5% over HiP-AD, and under style conditioning attains the highest driving score in every style within a roughly 2% band (its weakest style surpassing the strongest baseline, DMW, by 5.4%), while average speed and acceleration rise by 18% and 25% from the conservative to the aggressive instruction.

05.
arXiv (CS.LG) 2026-06-19

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

arXiv:2606.19883v1 Announce Type: new Abstract: We study a multi-agent multi-armed bandit problem in the competitive setup with two-sided matching markets under a human centric decision making model. To capture human preferences, we use cumulative prospect theory (CPT) that weighs the actions of the agent in a nonlinear fashion using a ($\alpha$-Hölder continuous) weight function. CPT has been widely used in behavioral economics and risk sensitive machine learning to emulate human preferences. We analyze the state-of-the-art learning algorithm with CPT weight distorted rewards and obtain a player optimal regret of $\mathcal{O}(K\log T \left(\frac{1}{\Delta}\right)^{2/\alpha})$, where $K$ denotes the number of arms, $T$ is the learning horizon, and $\Delta$ represents (suitably defined) players' minimum preference gap. Noticing the dependence on $\Delta$ to be sub-optimal, we further improve this regret by judiciously selecting the active set of arms during exploration, which removes the dependence on $K$ in the dominant term and achieves an improved (optimal) regret guarantees in the setting where the number of arms $K$ is significantly larger than the number of players $N$. In addition, we consider adversarial markets where the observed rewards of the agents may be corrupted. We propose and analyze algorithms for robust markets with CPT as risk sensitive measure in both settings where the total corruption budget is known and where it is unknown, and establish logarithmic player-optimal regret guarantees in both cases.

06.
arXiv (CS.AI) 2026-06-11

End-to-End Machine Learning for Depressive State Classification via EEG and fNIRS

arXiv:2606.11555v1 Announce Type: cross Abstract: The escalating demand for mental healthcare, driven by rising societal stress, highlights the limitations of traditional psychiatric diagnostics. Conventional methods - relying primarily on clinical interviews and patient self-reports - are inherently vulnerable to subjective bias and the varying empirical judgment of practitioners. To address the need for quantitative evaluation, biological signal-based detection, including electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a promising objective alternative. Such technology is particularly vital for identifying latent depressive states that may be unrecognized by the subjects themselves. Furthermore, in aging populations, the high comorbidity between depression and dementia necessitates early differentiation to prevent mutual symptom exacerbation and maintain Quality of Life (QoL). This pilot study of eleven healthy students establishes a framework for biological signal-based depression detection, serving as a foundational step toward automated, objective diagnostic tools for clinical use.

07.
arXiv (CS.AI) 2026-06-18

What Must Generalist Agents Remember?

arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.

08.
medRxiv (Medicine) 2026-06-17

Preserved Medial Temporal Lobe Flexibility Predicts Memory Generalization Only in the Context of Good Sleep Quality among Older African Americans

Objectives: Poor sleep quality is a risk factor for Alzheimer's disease (AD). Older African Americans experience disproportionately high rates of sleep disturbance and AD. Medial temporal lobe (MTL) flexibility reflects dynamic neural reorganization and may be a marker of generalization performance. This study examined whether sleep quality moderates the association between MTL flexibility and memory generalization. Methods: Fifty older African Americans (MeanAge=69.7{+/-}6.21 years; 80% women) underwent rs-fMRI to quantify MTL flexibility, Rutgers Acquired Equivalence Task for memory generalization, and Pittsburgh Sleep Quality Index for sleep quality. Results: Greater MTL flexibility was associated with better generalization (r=0.367, p=.017). Good sleepers showed higher MTL flexibility (F(1,44)=8.11, p2=.156, p=.007) and superior generalization (F(1,46)= 12.33, p2=.211, p=.001). Sleep quality significantly moderated the MTL flexibility and generalization relationship ({beta}=-1.519, p=.012). Conclusions: Preserved MTL flexibility may confer generalization only in good sleepers, suggesting that sleep disturbance may disrupt the MTL neural resilience among older African Americans.

09.
arXiv (CS.CV) 2026-06-18

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We argue that 3D points in world coordinates provide a general representation that is class-agnostic, view-stable, compact, and directly useful for downstream tasks. We formalize the task of goal-conditioned 3D point motion forecasting: given a short visual history, a set of 3D query points on an object of interest, and a language description of the intended goal, the model predicts the future 3D trajectory of each point. We introduce a full stack to study this task at scale: (1) MolmoMotion-1M is a large corpus of action-described, object-grounded 3D point trajectories annotated from 1.16M unconstrained videos; (2) PointMotionBench is a human-verified benchmark spanning 111 object categories and 61 motion types; and (3) MolmoMotion is a general motion forecasting model that supports both autoregressive coordinate prediction and flow-matching-based trajectory generation. MolmoMotion accurately predicts diverse motion patterns with different language instructions, and significantly outperforms existing motion prediction baselines on PointMotionBench. Finally, we show that the learned 3D motion prior transfers well to downstream applications: it improves training efficiency and generalization for robot manipulation, and its predicted trajectories provide effective motion guidance for generative models to synthesize videos with more realistic object motion.

10.
arXiv (CS.CV) 2026-06-16

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion

Autoregressive video diffusion models enable streaming generation but often degrade over long rollouts: static scene layouts drift, while mechanisms that improve spatial stability tend to suppress motion, causing natural flows such as water, fire, or smoke to stagnate. We study this stability-motion trade-off in fixed-camera long-horizon nature video generation, where the two failure modes can be more clearly separated than in moving-camera settings. We propose Steady-Forcing, a memory and training framework combining a persistent visual anchor (V-Sink), an exponential moving-average motion memory (EMA-Sink), block-relative temporal encoding, periodic cache purification, and distillation from a Wan2.1-14B teacher with motion-rewarded priors under task-focused configurations. Together, these components are designed to preserve background identity while sustaining visually plausible fluid dynamics over multi-minute autoregressive rollouts. Evaluations across seven baselines show that Steady-Forcing improves long horizon background consistency and imaging quality, while a blind user study indicates stronger perceived stability and motion continuity. The benchmark evaluation further suggest that generic VBench aggregate scores under-penalize fixed-camera artifacts as well as rewarding drift-induced optical flow as Dynamic Degree while not directly penalizing texture hardening or flow stagnation - motivating future task-specific benchmarks for static-camera nature-flow evaluation. Project page: https://minar09.github.io/steadyforcing/

11.
arXiv (CS.CV) 2026-06-18

Bidirectional Cross-Attention Fusion of High-Resolution RGB and Low-Resolution Hyperspectral Inputs for Multimodal Semantic Segmentation

Multimodal semantic segmentation with heterogeneous sensors must reconcile complementary information across modalities that differ in spatial resolution and channel dimensionality. In particular, high-resolution RGB imaging provides detailed spatial structure but often fails to distinguish visually similar materials, whereas hyperspectral imaging (HSI) provides discriminative spectral signatures but at lower spatial resolution. We present Bidirectional Cross-Attention Fusion (BCAF), which aligns high-resolution RGB with low-resolution HSI at their native grids via localized, bidirectional cross-attention, avoiding pre-upsampling or early spectral collapse. BCAF uses two independent backbones: a standard Swin Transformer for RGB and an HSI-adapted Swin backbone that preserves spectral structure through 3D tokenization with spectral self-attention. Although our evaluation targets RGB-HSI fusion, BCAF is modality-agnostic and applies to co-registered RGB with lower-resolution, high-channel auxiliary sensors. On the benchmark SpectralWaste dataset, BCAF delivers strong performance, achieving 75.4% at 55 images/s. We further evaluate a novel industrial dataset: K3I-Cycling (first RGB subset already released on Fordatis). On this dataset, BCAF reaches 62.3% mIoU for material segmentation (paper, metal, plastic, etc.) and 66.2% mIoU for plastic-type segmentation (PET, PP, HDPE, LDPE, PS, etc.). These results show that preserving native-grid spatial detail and spectral structure improves multimodal segmentation under real-time constraints. Code and model checkpoints are publicly available at https://github.com/jonasvilhofunk/BCAF_2026.

12.
arXiv (CS.LG) 2026-06-17

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

arXiv:2606.18001v1 Announce Type: new Abstract: Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

13.
arXiv (CS.CV) 2026-06-12

Modality-Aware Feature Matching in Visual and Vision-Language Applications: A Comprehensive Survey

Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

14.
arXiv (CS.AI) 2026-06-16

Unifying Post-hoc Explanations of Knowledge Graph Completions

arXiv:2507.22951v2 Announce Type: replace Abstract: Knowledge Graphs organize information as entity-relation-entity triples, enabling machine learning models to predict plausible missing triples in a task known as Knowledge Graph Completion (KGC). Post-hoc explainability for KGC addresses the problem of identifying which triples most influence the predictions of machine learning models. Currently, the field lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified taxonomy for post-hoc explainability in KGC. First, we propose a characterization of post-hoc explanations via multi-objective optimization that unifies existing post-hoc explainability algorithms in KGC and the explanations they produce, balancing explanation effectiveness and conciseness. Next, we examine improved evaluation protocols based on popular metrics, such as Mean Reciprocal Rank and Hits@k, through illustrative experiments. Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end users. By unifying methods and discussing evaluation standards, this work puts forward a case for more reproducible and impactful research in KGC explainability.

15.
arXiv (CS.LG) 2026-06-15

Learning Variable-Length Tokenization for Generative Recommendation

arXiv:2605.17779v2 Announce Type: replace Abstract: Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincaré ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

16.
arXiv (CS.CL) 2026-06-18

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

Applications of narrative theories using large language models (LLMs) deliver promising methods in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research uses LLM methods to engage with diverse concepts from narrative studies. We use established distinctions from narratology to categorise ongoing efforts and discover the following: \redtext{(a) narrative texts come from diverse sources beyond just literature, (b) theoretical synthesis and validation are potential outcomes, (c) generation tasks lag behind understanding in several ways: theoretical application, post-training methods, exploring non-fiction narratives and addressing narrative levels beyond fabula and discourse.} For future directions, instead of the pursuit of a single, generalised benchmark for `narrative quality', we believe that progress can benefit from efforts that focus on the following: defining and improving theory-based metrics for individual narrative attributes; continue conducting large-scale, theory-driven literary/social/cultural analysis; generating narratives in situated contexts; and continuing experiments where outputs can be used to validate or refine narrative theories. This work provides a contextual foundation for more systematic and theoretically informed narrative research in NLP by providing an overview to ongoing research efforts and the broader narrative studies landscape.

17.
arXiv (CS.LG) 2026-06-16

Cross-Silo De-Anonymization Under Local Differential Privacy: Threat Model, Phase Transition, and Coordination Necessity

arXiv:2606.16763v1 Announce Type: cross Abstract: When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a valid (k*epsilon, k*delta)-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: at what k can an adversary actually identify a target person? This paper develops the information-theoretic framework needed to answer that question. We introduce cross-silo person-level DP (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound carries over to this adjacency model. Within this framework we prove that de-anonymization undergoes a phase transition at k* = Theta(log n / epsilon^2) (population size n, per-silo RR parameter epsilon): a Fano lower bound shows any estimator fails for k > k*. An explicit XOR + randomized-response construction demonstrates information synergy: each silo's output is individually uninformative about the target, yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once k exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and Theta-level threshold for cross-silo inference attacks under local DP.

18.
medRxiv (Medicine) 2026-06-16

Comparative Effectiveness and Safety of Prophylactic Vasopressors for Preventing Post-induction Hypotension in the Elderly: A Systematic Review and Network Meta-analysis

Background: Post-induction hypotension is a predictable haemodynamic hazard in older adults undergoing general anaesthesia. Prevention remains divided among volume optimisation, anaesthetic dose reduction, rescue treatment after hypotension occurs and proactive vasoactive support. Methods: We searched PubMed, Embase, Web of Science, CENTRAL, CNKI, Wanfang and VIP from inception to 30 March 2026. Eligible studies were randomised trials of prophylactic vasoactive drugs given before, during or immediately after induction in older adults. The primary outcome was post-induction hypotension. Secondary outcomes were post-induction mean arterial pressure (MAP), systolic arterial pressure (SBP), heart rate (HR) and reported haemodynamic adverse events. Random-effects network meta-analysis was used, and confidence in network estimates was assessed using CINeMA principles. Results: Thirty-one trials including 2,821 participants were included in the revised network. Compared with placebo/control, all active agents favoured lower post-induction hypotension. The most favourable point estimates were observed for phenylephrine (odds ratio [OR] 0.17, 95% confidence interval [CI] 0.01 to 2.16) and metaraminol (OR 0.19, 95% CI 0.02 to 1.53), although both were imprecise. More precise reductions were observed for methoxamine (OR 0.23, 95% CI 0.13 to 0.43), norepinephrine (OR 0.25, 95% CI 0.13 to 0.47) and ephedrine (OR 0.34, 95% CI 0.19 to 0.63). Phenylephrine ranked highest for MAP support, norepinephrine ranked highest for SBP support, and ephedrine ranked highest for HR preservation. Global inconsistency was detected for SBP but not for hypotension incidence, MAP or HR, supporting cautious profile-based interpretation. Conclusions: Prophylactic vasopressor choice during induction should be guided by haemodynamic phenotype rather than ranking alone. In the revised network, active prophylaxis consistently favoured lower hypotension, but sparse nodes produced uncertainty. Norepinephrine retained a comparatively balanced profile when vasodilatory post-induction hypotension is anticipated, phenylephrine and related alpha-agonists provided stronger pressure support when HR and cardiac-output reserve are preserved, and ephedrine was most relevant when chronotropic support is desired. Keywords: general anaesthesia; induction; hypotension; norepinephrine; phenylephrine; ephedrine; network meta-analysis; older adults.

19.
arXiv (CS.AI) 2026-06-16

Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning

arXiv:2606.15107v1 Announce Type: new Abstract: Time series data in real-world deployments is overwhelmingly irregular. Observations are asynchronous, missing values are informative rather than random, and sampling frequencies vary across sensors and operational windows. However, existing Time Series Question Answering (TSQA) benchmarks mostly assume regularly sampled inputs, leaving a fundamental gap in understanding how large language models (LLMs) and AI agents perform under irregular conditions. To bridge this gap, we introduce IRTS-ToolBench, a benchmark of 1,700 questions spanning 10 task types across 13 domains. IRTS-ToolBench is designed to be used independently by any researcher working on LLM-based irregular time series analysis, providing standardized inputs and a reproducible evaluation protocol. Code can be found in https://github.com/SanhornC/IRTS-ToolBench.

20.
medRxiv (Medicine) 2026-06-10

Frozen elephant trunk repair in heritable thoracic aortic disease: Impact of genetic aortopathy on long-term outcomes - A multicenter analysis

Aims This multicenter study aims to compare outcomes of total aortic arch replacement (TAR) using the frozen elephant trunk (FET) technique in patients with and without heritable thoracic aortic disease (HTAD) and to assess whether HTAD influences postprocedural adverse aortic events (AAEs). Methods From 06/2007 to 05/2024, aortic databases from 13 European centers were screened for HTAD patients undergoing TAR with FET. All consecutive dissection and aneurysm non-HTAD patients from the four core centers served as comparator. The primary outcome was AAE, a composite of diameter progression, distal stent graft induced new entry (dSINE), malperfusion, rupture and pseudoaneurysm at 5 years after FET implantation. Results Of 2739 FET patients, 196 (7.2%) were diagnosed with HTAD. The control group consisted of 867 non-HTAD FET patients. Marfan syndrome was the most common condition (72%), followed by Loeys-Dietz syndrome (11%), vascular Ehlers-Danlos syndrome (5.6%) and Turner syndrome (2.0%). Seventeen (8.8%) patients were diagnosed with ns-HTAD. At 5 years 46 (24%) AAEs occurred in the HTAD group, 169 (20%) in the non-HTAD group (p=0.2). Diameter progression was the most common event (10% vs. 12%; p=0.6), followed by dSINE (5.8% vs. 4.5%; p=0.5), malperfusion (4.2% vs. 3.3%; p=0.5), rupture (2.1% vs. 0.7%; p=0.09) and pseudoaneurysm (0.5% vs. 0.2%; p=0.5). Conclusions The FET technique appears safe and effective for acute and chronic aortic disease in HTAD patients, with outcomes comparable to non-HTAD cases and no increase in graft-related complications, challenging traditional concerns about stent graft use in genetically mediated aortic disease.

21.
arXiv (CS.AI) 2026-06-19

Hybrid ANN-SNN Pipeline with Local Plasticity

arXiv:2606.20151v1 Announce Type: cross Abstract: This work proposes a hybrid ANN-SNN pipeline that effectively leverages the rich embeddings of pretrained artificial neural networks (ANNs) to enable high-performance spiking neural networks (SNNs). The architecture couples a pretrained EfficientNet encoder with a CoLaNET spiking classifier. We convert the encoder's activations into spike trains via rate-coding and train the subsequent SNN classifier using local, biologically inspired learning rules, bypassing end-to-end gradient propagation. This approach achieves 99.09% accuracy on a 64-class ImageNet benchmark, demonstrating performance on par with conventional deep networks. The work presents a biologically plausible and efficient framework for adapting powerful pretrained encoders to downstream spiking neural network tasks.

22.
arXiv (quant-ph) 2026-06-11

Circulators Based on Coupled Quantum Anomalous Hall Insulators and Resonators

arXiv:2505.07770v2 Announce Type: replace Abstract: Integrated plasmonics is advancing rapidly, enabling a wide range of functionalities to be incorporated onto a single chip. Applications span information processing, computation, quantum sensing, and dark-matter detection. This progress has driven the development of integrated non-reciprocal devices, which are essential for preventing unwanted feedback that can degrade system performance. While non-reciprocal devices have been realized in edge magnetoplasmon materials via classical interference effects, their operation is often limited by the input power range. Here, we demonstrate that topological circulators utilizing asymmetric coupling offer improved input power range, isolation, and insertion loss. In this configuration, we demonstrate the coupling between a chiral edge magnetoplasmonic resonator and a pair of LC resonators is well described by an effective non-Hermitian two-site Hatano-Nelson model with asymmetric directional couplings, resulting in nonreciprocal behavior. The coherent photon-plasmon interaction enables a circulator with up to 50 dB of isolation across a broad range of excitation power. These results suggest that magnetic topological insulators provide a promising platform for realizing asymmetric non-Hermitian couplings at radio frequencies and for exploring regimes of strong directional suppression and possible exceptional-point physics. More broadly, they highlight the potential of topological-material-based microwave devices for future integration with superconducting quantum information platforms.

23.
arXiv (CS.LG) 2026-06-12

Earth Science Foundation Models: From Perception to Reasoning and Discovery

arXiv:2605.12542v2 Announce Type: replace-cross Abstract: Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.

24.
arXiv (CS.CL) 2026-06-12

GENIE: A Fine-Grained Measure for Novelty

Large Language Models have consistently demonstrated a lack of creativity and diversity across tasks. Prior work has focused on addressing whether models are capable of generating creative outputs. Here, we aim to consider novelty and investigate what makes model-generated content novel or not novel in a task-specific manner. We propose a fine-grained evaluation metric GENIE to measure the novelty of responses along task-specific features with respect to a population of responses. We show that unlike GENIE, holistic metrics struggle to capture the high-dimensionality of novelty and do not provide insight on which properties they target. Finally, we use GENIE to measure the effectiveness of mitigation methods that address creativity to better understand where these methods can improve novelty.

25.
arXiv (CS.LG) 2026-06-11

Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

arXiv:2603.15158v2 Announce Type: replace Abstract: Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a small, targeted set of diverse domains that satisfy this rank condition. PQAL can recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites, IHDP, ACS Folktables datasets.