Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-25

Ramanujan Graph Rewiring with Non Negative Resistance Curvature

arXiv:2606.21333v2 Announce Type: replace-cross Abstract: Graph Neural Networks (GNNs) have emerged as a powerful paradigm for learning on graph-structured data by iteratively propagating and aggregating information across edges. However, conventional message passing schemes often suffer from over-squashing, whereby exponentially large neighborhoods are compressed into fixed-dimensional embeddings, impeding effective long-range dependency learning. In this work, we introduce Ramanujan Propagation, a graph rewiring strategy that leverages Ramanujan graphs to alleviate topological bottlenecks in GNNs. We first establish that suitably chosen Ramanujan graphs guarantee non-negative resistance curvature, which mitigates over-squashing and facilitates efficient information flow. We then propose an algorithmic framework to construct a Ramanujan rewired graph that preserves the local connectivity of the original graph. Our experiments demonstrate that our method outperforms nine state-of-the-art rewiring techniques. These results establish Ramanujan graphs as a rigorous structural prior for scalable, topology-aware message passing in GNNs.

02.
arXiv (CS.LG) 2026-06-16

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

arXiv:2606.15244v1 Announce Type: new Abstract: Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for every training anchor has become a hidden systems bottleneck. In a representative maritime AIS pipeline, spatial context construction requires roughly 17 CPU-days for a 5.48M-anchor corpus, dominating the cost of the downstream predictor. We present M-CTX, an exact and scalable spatial context-retrieval framework for trajectory analytics. M-CTX recasts context construction as an ingest-once, query-many spatial database workload and replaces three brute-force stages – OSM range retrieval, SDF computation, and moving-vessel neighbour lookup – with composable, index-backed operators. Its learned range-index backend, BR-LZ, provides recall-complete MBR-overlap range retrieval and reduces candidate amplification by 1.1x–2.7x relative to global-expansion one-curve baselines. Across four maritime regions, eight baseline systems, synthetic workloads with up to 40M spatial features, and 10^7-record AIS streams, M-CTX reproduces the reference context exactly. On the 5.48M-anchor corpus, it reduces context construction from about 17 CPU-days to 1.8 hours, a measured 226x end-to-end speed-up. An optional storage mode further compresses SDF context by 64x with only a 0.04 m ADE change. These results establish exact spatial context retrieval as a first-class database problem in modern trajectory analytics. Code and datasets are publicly available at https://github.com/mark000071/M-CTX-Traj.

03.
arXiv (math.PR) 2026-06-16

Exact Label Recovery in Euclidean Random Graphs

arXiv:2407.11163v3 Announce Type: replace-cross Abstract: In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in $\mathbb{R}^d$ according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and $\mathbb{Z}_2$-synchronization, and includes the Geometric Stochastic Block Model (proposed by Sankararaman and Baccelli) as a special case. We study the fundamental limits of exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff-Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cramér lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The information-theoretic threshold is dictated by the performance of the so-called genie estimator, which decodes the label of a single vertex given all the other labels. This shows that our proposed models exhibit the local-to-global amplification phenomenon.

04.
arXiv (CS.CL) 2026-06-11

VietMed-MCQ: A Consistency-Filtered Data Synthesis Framework for Vietnamese Traditional Medicine Evaluation

Large Language Models (LLMs) have demonstrated remarkable proficiency in general medical domains. However, their performance significantly degrades in specialized, culturally specific domains such as Vietnamese Traditional Medicine (VTM), primarily due to the scarcity of high-quality, structured benchmarks. In this paper, we introduce VietMed-MCQ, a novel multiple-choice question dataset generated via a Retrieval-Augmented Generation (RAG) pipeline with an automated consistency check mechanism. Unlike previous synthetic datasets, our framework incorporates a dual-model validation approach to ensure reasoning consistency through independent answer verification, though the substring-based evidence checking has known limitations. The complete dataset of 3,190 questions spans three difficulty levels and underwent validation by one medical expert and four students, achieving 94.2 percent approval with substantial inter-rater agreement (Fleiss' kappa = 0.82). We benchmark seven open-source models on VietMed-MCQ. Results reveal that general-purpose models with strong Chinese priors outperform Vietnamese-centric models, highlighting cross-lingual conceptual transfer, while all models still struggle with complex diagnostic reasoning. Our code and dataset are publicly available to foster research in low-resource medical domains.

05.
arXiv (CS.LG) 2026-06-11

Triangular-Reference Schrödinger Bridges for Time Series Generation

arXiv:2605.27478v3 Announce Type: replace-cross Abstract: Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya–Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

06.
bioRxiv (Bioinfo) 2026-06-18

Structure Bioinformatics of Eight Human ATP Synthase Fo Subunits and Their AlphaFold3-Predicted Water-Soluble QTY Analogs

Human mitochondrial ATP synthase is an essential rotary motor enzyme that produces most of the cellular ATP through oxidative phosphorylation. Its membrane-embedded Fo sector contains highly hydrophobic transmembrane subunits that are challenging to study in aqueous environments without detergents. This study explores whether applying the QTY code can reduce the hydrophobicity of selected ATP synthase Fo subunits while preserving their overall molecular structures. We applied the QTY code to eight human ATP synthase Fo subunits: ATP6, ATP8, ATPK, ATP68, ATPMK, AT5G1, AT5G2, and AT5G3. Hydrophobic amino acids leucine (L), isoleucine (I), valine (V), and phenylalanine (F) in transmembrane regions were systematically replaced with hydrophilic glutamine (Q), threonine (T), and tyrosine (Y). Four native subunits with available CryoEM structures from human ATP synthase (PDB: 8H9S) were superposed with their AlphaFold3-predicted QTY analogs. The native ATP synthase Fo subunits superposed well with their respective QTY analogs. For the CryoEM-native comparisons, RMSD values ranged from 0.565[A] to 2.546[A]. For the AlphaFold3-native comparisons of subunits without CryoEM structures, RMSD values ranged from 0.204[A] to 0.297[A]. Despite substantial QTY substitutions in the transmembrane regions, ranging from 38.89% to 50.79%, the QTY analogs retained similar overall folds, molecular weights, and isoelectric points. Hydrophobic surface analysis showed that the QTY analogs had reduced hydrophobic patches compared with their native counterparts, with average hydrophobicity decreasing from 0.2959 in native proteins to -1.1023 in QTY analogs. These structural bioinformatics studies suggest that the QTY code can be applied to ATP synthase Fo subunits to generate more hydrophilic, potentially water-soluble analogs while preserving overall structural similarity. These results extend the application of the QTY code to the membrane-embedded Fo sector of ATP synthase and provide a foundation for future experimental studies testing whether these QTY analogs can be expressed, purified, and evaluated for assembly or proton-transfer-related functions.

07.
arXiv (math.PR) 2026-06-15

Longest weakly increasing subsequences of discrete random walks on the integers with heavy tailed distribution of increments

arXiv:2603.29047v2 Announce Type: replace-cross Abstract: We investigate the behavior of the length of the longest weakly increasing subsequences (weak LIS) of $n$-step random walks with nonzero integer increments $k = \pm 1, \pm 2, \dots$ given by a symmetric heavy tailed mass distribution proportional to $|k|^{-1-\alpha}$ for several values of the real parameter $\alpha > 0$ together with that of the simple random walk ($k=\pm 1$), to which the $n$-step heavy tailed walks reduce when $\alpha$ grows large enough that step jumps beyond $\pm 1$ become essentially absent on the scale of $n$. By means of exploratory fits, weighted nonlinear least squares, and nested-model comparisons, we found that the sample average length $\langle{L_{n}}\rangle$ scales like $\langle{L_{n}}\rangle \sim \sqrt{n}\log{n}$ when the distribution of increments has finite variance ($\alpha > 2$) and $\langle{L_{n}}\rangle \sim n^{\theta}$ with a varying exponent $\theta > 0.5$ when the variance is infinite ($\alpha \leq 2$). Distributional diagnostics indicate that the bulk of the $L_{n}$ distribution is very well-approximated by a lognormal model, though systematic deviations are observed in the tails. Our results corroborate and expand upon previous results for the LIS of other types of heavy-tailed random walks and raise a conjecture as to whether the distribution of $L_{n}$ is given, or can be effectively described, by a lognormal distribution.

08.
arXiv (CS.CL) 2026-06-12

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an execution-granularity mismatch: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context and forcing the model to manage low-level dataflow in the trace. We introduce HyperTool, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution. A model invokes HyperTool with a code block that can call existing tools through their original schemas, manipulate returned values, and pass intermediate results locally, folding deterministic tool subroutines into a single outer call. To train models to use this interface, we synthesize HyperTool-format trajectories from cross-tool compositional tasks and verify them in real MCP environments. On MCP-Universe, HyperTool improves average accuracy from 15.69\% to 35.29\% on Qwen3-32B and from 9.93\% to 33.33\% on Qwen3-8B, and surpass GPT-OSS and Kimi-k2.5 on average accuracy, showing that our HyperTool can substantially improve multi-step tool use.

09.
arXiv (CS.LG) 2026-06-24

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

arXiv:2606.08671v2 Announce Type: replace Abstract: Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. On deep-research benchmarks, SkillHone runs without a pre-integrated search stack and outperforms the commercially backed deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods. We further deploy SkillHone on internal tool-mediated analysis scenarios, where it improves accuracy by an average of 18.8 points across seven settings.

10.
arXiv (math.PR) 2026-06-16

A Machine-Checked Itô Calculus for Brownian Motion

arXiv:2606.15089v1 Announce Type: cross Abstract: We present a machine-checked development of the $L^2$ Itô calculus of Brownian motion on a bounded time interval $[0,T]$, formalized in Lean 4 on top of Mathlib and the BrownianMotion package. The development contains: the construction of the Itô integral as an isometry of Hilbert spaces, from a predictable-rectangle $\pi$-system through the density of simple adapted processes; the Itô integral as a process, proved to be an $L^2$-continuous martingale through a single structural identity (the integral at time $t$ is the conditional-expectation projection of its terminal value onto $\mathcal{F}t$), from which adaptedness, the martingale property, the contraction bound, and both the terminal and the time-indexed Itô isometries follow as corollaries; and Itô's formula for $C^3$ functions with bounded derivatives, including its time-dependent form $df = f_x,dB + (f_t + \tfrac12 f{xx}),dt$, obtained by a discrete-to-continuous argument through weighted quadratic variation and explicit $L^2$ remainder bounds. To our knowledge this includes the first machine-checked proof of Itô's formula, and the first machine-checked construction of the Itô integral as a martingale-valued process, in any proof assistant. We are deliberate about the boundary: the theory is the $L^2$ theory on $[0,T]$ with bounded-derivative integrand classes; localization to the unrestricted $C^2$ formula, integrators beyond Brownian motion, and pathwise statements are out of scope, and we say precisely why and where. The development is roughly 7,200 lines of Lean across 22 modules; every theorem is sorry-free, the axioms of each headline result are pinned to Mathlib's classical defaults by a build-enforced gate, and the whole is reproducible from a pinned toolchain.

11.
arXiv (CS.AI) 2026-06-25

Agentic evolution of physically constrained foundation models

arXiv:2606.25532v1 Announce Type: new Abstract: Artificial intelligence increasingly drives automated scientific discovery, yet contemporary generalist agents lack physical grounding, frequently hallucinating hardware-incompatible designs. Here, we present a physically grounded, multi-agent discovery engine that autonomously architects hardware-compliant computing systems. Anchored by an Evolutionary Knowledge Graph structuring past scientific innovations, the framework extracts an "algorithmic Chain-of-Thought" to transform blind stochastic search into directed structural evolution. Applied to the extreme testbed of foundation model deployment, the engine evolved two hardware-aware compression methodologies surpassing human-engineered heuristics: Q-Enhance mitigates long-context accuracy loss in dense models, and MoE-Salient-AQ outperforms state-of-the-art manual sparse Mixture-of-Experts designs by 3.7% at sub-3-bit regimes. Utilizing a bandwidth-efficient Sensitivity Profile, we successfully deployed a massive 235-billion-parameter model onto a constrained dual-A100 server, reducing memory requirements by 75% with a marginal 0.64% accuracy degradation. By transforming unconstrained combinatorial search into knowledge-driven autonomy, this establishes a scalable hardware-software co-design paradigm for machine-driven discovery within strict physical boundaries.

12.
arXiv (CS.CL) 2026-06-17

OpenLID-v3: Improving the Precision of Closely Related Language Identification – An Experience Report

Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID) often struggle to identify closely related languages and to distinguish valid natural language from noise, which contaminates language-specific subsets, especially for low-resource languages. In this work we extend the OpenLID classifier by adding more training data, merging problematic language variant clusters, and introducing a special label for marking noise. We call this extended system OpenLID-v3 and evaluate it against GlotLID on multiple benchmarks. During development, we focus on three groups of closely related languages (Bosnian, Croatian, and Serbian; Romance varieties of Northern Italy and Southern France; and Scandinavian languages) and contribute new evaluation datasets where existing ones are inadequate. We find that ensemble approaches improve precision but also substantially reduce coverage for low-resource languages. OpenLID-v3 is available on https://huggingface.co/HPLT/OpenLID-v3.

13.
arXiv (CS.CL) 2026-06-15

Poker Arena: Multi-Axis Profiling of Strategic Reasoning and Memory in LLMs

Strategic reasoning under uncertainty underpins consequential decisions in negotiation, finance, and policy, but prevailing game-play benchmarks collapse heterogeneous reasoning dimensions into a single scalar, leaving the capability structure of frontier LLMs unexamined. We introduce Poker Arena, a no-limit Texas Hold'em tournament platform that couples a three-layer memory architecture (within-hand, session, and cross-session) with a nine-axis cognitive profile decomposing strategic reasoning into interpretable dimensions such as bet-sizing calibration and positional awareness. We evaluate seven frontier models across 50 sessions of 1,000 hands and a controlled memory ablation; tournament chips and aggregate axis score order the field differently: Claude Opus 4.6 wins +$15,730 chips with 14 first-place finishes, yet ranks only fifth of seven on mean axis score, while persistent memory helps some models and hurts others. These findings show that multi-axis evaluation surfaces capability structure that scalar leaderboards systematically misrank, with cross-dimensional consistency outweighing peak performance on any single axis.

14.
arXiv (CS.LG) 2026-06-19

Diffuse AI Control on Fuzzy Tasks

arXiv:2606.08892v2 Announce Type: replace Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. We then propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

15.
arXiv (CS.CV) 2026-06-12

MinhwaNet: Faithful but Insufficient Object Grounding in Korean Folk Painting

Korean folk painting (minhwa) is built from a small vocabulary of auspicious symbols, a tiger for protection, a pair of birds for marital harmony, a peony for wealth, that recur across many of its painted genres. This suggests an obvious computational approach, identify which symbols appear in a painting and read the genre from the inventory. Working with a public corpus that pairs whole paintings, eight-field bilingual curatorial captions, and a separate set of expert object crops, we find that this approach does not work. A model given only a list of which symbols a painting contains predicts the genre far worse than a model that fuses the image with the curatorial text, and forcing the genre representation to be object-grounded actively hurts accuracy. The visual evidence on which the genre prediction rests is nonetheless localized and inspectable. A leakage-safe object evidence map projected from a part-level detector is spatially faithful to where curators isolated symbolic objects and to a patch-based surrogate's own gradient saliency. We name this configuration a faithful-but-insufficient dissociation. The part-level explanation is honest about what the part-level model sees, yet the genre target turns on how symbols are arranged rather than on which ones appear. The same lens separates a content label that survives transfer to held-out source institutions, genre, from a style label that does not, era, a prediction we confirm on two further labels in the corpus. We release the multimodal system, a worked-example reading of one painting's evidence map against its catalogue, and a set of evaluation cautions that recur in long-tailed heritage collections.

16.
arXiv (CS.LG) 2026-06-11

Knowledge Manifold: A Riemannian Geometric Framework for Semantic Mapping and Geodesic Analysis of Scientific Literature

arXiv:2606.05907v2 Announce Type: replace-cross Abstract: We present the knowledge manifold: a Riemannian geometric space in which a corpus of documents is arranged according to semantic positional relationships derived from character n-gram TF-IDF representations. The framework proceeds in five tightly coupled stages. First, each document is converted to a character-level n-gram TF-IDF vector (4-7 grams, up to 250,000 features, L2-normalized) and embedded in a two-dimensional knowledge map via constrained stress minimization with repulsion, variance, and centering regularizers. Second, knowledge at an arbitrary query point is estimated through Smoothed Particle Hydrodynamics (SPH) interpolation using a cubic-spline kernel, yielding an interpolated TF-IDF feature vector that can be linguistically characterized. Third, directional knowledge gradients at 0, 45, and 90 degrees are computed from the SPH interpolation map, and pairwise directional similarity is quantified via inner product and cosine similarity. Fourth, a Gaussian Process Regression (GPR) model, with a Constant x RBF + White kernel fitted on a 10-dimensional SVD projection, provides a Bayesian posterior mean, uncertainty estimate, and per-document contribution rate at the query point. Fifth, geodesics in the knowledge space are obtained by minimizing a discrete Riemannian path energy derived from the SPH-induced metric tensor, using L-BFGS-B with seven deterministic initial-path candidates. We apply the formulation to a corpus of 20 papers in fiber-reinforced composite materials and aerospace structural mechanics, showing that the semantic map recovers meaningful research clusters, geodesic paths reveal natural conceptual bridges between distant topics, and SPH/GPR interpolation enables the generation of virtual knowledge: hypothetical paper abstracts describing unstudied but geometrically predicted research directions.

18.
arXiv (CS.CV) 2026-06-17

NeuroClaw Technical Report

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent research assistant for executable and reproducible neuroimaging research. NeuroClaw operates directly on raw neuroimaging data across formats and modalities, grounding decisions in dataset semantics and BIDS metadata so users need not prepare curated inputs or bespoke model code. The platform combines harness engineering with end-to-end environment management, including pinned Python environments, Docker support, automated installers for common neuroimaging tools, and GPU configuration. In practice, this layer emphasizes checkpointing, post-execution verification, structured audit traces, and controlled runtime setup, making toolchains more transparent while improving reproducibility and auditability. A three-tier skill/agent hierarchy separates user-facing interaction, high-level orchestration, and low-level tool skills to decompose complex workflows into safe, reusable units. Alongside the NeuroClaw framework, we introduce NeuroBench, a system-level benchmark for executability, artifact validity, and reproducibility readiness. Across multiple multimodal LLMs, NeuroClaw-enabled runs yield consistent and substantial score improvements compared with direct agent invocation. Project homepage: https://cuhk-aim-group.github.io/NeuroClaw/index.html

19.
arXiv (quant-ph) 2026-06-12

Quasi-local Edge Mode in XXX Spin Chain/Circuit with Interaction Boundary Defect

arXiv:2603.17835v2 Announce Type: replace-cross Abstract: We study the Heisenberg spin-1/2 model on a semi-infinite chain - or, equivalently, a trotterized unitary SU(2) symmetric six-vertex quantum circuit - with a boundary defect where the interaction between the two spins nearest the edge differs from that in the bulk. For sufficiently strong boundary interaction we explicitly construct a conserved operator quasi-localized near the boundary using a matrix-product ansatz. This quasi-local edge mode leads to non-decaying boundary correlation functions, corresponding to a nonzero boundary Drude weight. The correlation length of the edge mode diverges at a finite critical value of the boundary interaction, signaling a transition to ergodic boundary dynamics for subcritical interactions.

20.
arXiv (CS.LG) 2026-06-18

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

arXiv:2606.18418v1 Announce Type: new Abstract: The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

21.
arXiv (quant-ph) 2026-06-16

Enhanced Sensitivity near a Quantum Exceptional Point in the Absence of Engineered Dissipation

arXiv:2606.16060v1 Announce Type: new Abstract: Non-Hermitian systems exhibit phenomena absent from Hermitian systems, including exceptional points (EPs), at which two or more eigenvectors coalesce. Conventional implementations rely on gain and loss, which strongly limit quantum coherence. Here, following a proposal by Wang and Clerk (PRA 2019), we realize a closed four-mode quantum system that emulates the dynamics of a PT dimer - two coupled resonators with balanced gain and loss - without engineered dissipation. The four modes are implemented as harmonics of a superconducting coplanar-waveguide resonator, with parametric couplings engineered using a current-pumped SNAIL. We use this device as a sensor for small variations in the PT dimer coupling strength. From signal-to-noise-ratio measurements, we observe enhanced sensitivity near the EP in a non-quantum-limited regime.

22.
arXiv (CS.CL) 2026-06-16

Enhancing LLM Safety Through a Theoretical Minimax Game Lens

The rapid advancement of large language models (LLMs) necessitates effective mechanisms to ensure their responsible deployment by accurately distinguishing unsafe content from benign content. While substantial safety datasets are available in English, multilingual safety modeling remains underexplored due to limited open-source safety datasets in other languages. Even within English datasets, safe yet sensitive corner-case content is scarce, leading to shortcut learning by models and non-trivial false-positive rates. To mitigate these issues, we introduce a novel minimax reinforcement learning (RL) framework wherein a data generator and a classifier model co-evolve, facilitating the production of high-quality synthetic multilingual safety data. We theoretically formalize this interaction as a minimax game and rigorously demonstrate convergence to a Nash equilibrium. Empirical evaluations confirm that our synthetic data generation method significantly enhances the classifier model performance, enabling a substantially smaller model to surpass the state-of-the-art by nearly 10% on English benchmarks while achieving 4.5x faster inference speed. These results establish a scalable and efficient methodology for synthetic data generation, advancing the development of safer and more robust multilingual LLM deployments.

23.
arXiv (CS.AI) 2026-06-11

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

arXiv:2606.11918v1 Announce Type: new Abstract: Current Large Reasoning Models (LRMs) exhibit remarkable general capabilities but significantly underperform in spatial reasoning tasks. Existing approaches treat this gap as a knowledge deficit, relying on supervised fine-tuning (SFT) to ingest labeled spatial data from external vision sources or synthetic engines. In contrast, we argue that for many tasks, spatial reasoning capabilities are already present in pre-trained LRMs but require alignment through logical coherence under geometric 2D and 3D constraints. In this work, we propose a self-supervised reinforcement learning (RL) framework that targets the internal reasoning process without requiring ground-truth annotations. By formalizing the notion of consistency verifiers – reward functions that check for geometric and semantic consistency under transformations – we demonstrate that models can improve their spatial reasoning abilities. We use both image transformations, like flipping, and textual transformations, like swapping the order of objects in the question, and propose a new optimal transport-based RL strategy, OT-GRPO, which is a minimal-matching variant of group relative policy optimization tailored to pairwise verifiers. We show that this label-free consistency training approaches the accuracy of models trained with ground-truth supervision and achieves similar generalization across diverse tasks and data domains.

24.
arXiv (CS.AI) 2026-06-24

Inclusive Interactive Collisions for Multi-View Consistent Compositional 3D Generation

arXiv:2606.24206v1 Announce Type: cross Abstract: Recent breakthroughs in 3D generation have advanced notably with the development of text-to-image diffusion model. However, existing methods remain two practical challenges: (1) They primarily generate single 3D object, but struggle to generate multi-object compositional 3D assets due to the lack of the modeling for Gaussian primitives in reasonable interactions. (2) They often suffer from cross-view inconsistency during 3D optimization, as Score Distillation Sampling inherently performs on each single view, inevitably resulting in cross-view hallucinations. To solve above issues, we propose I2C-3D, a novel optimization-based method to generate multi-view consistent compositional 3D assets with reasonable interactions. Specifically, we propose an Inclusive Interactive Collisions strategy to guide Gaussian primitives appearing in reasonable interaction regions naturally, thereby ensuring objects in the compositional scene interact in a physically plausible and visually coherent way. Additionally, to enhance multi-view consistency, Multi-View Adaptive Score Distillation Sampling is devised to distill multi-view consistency prior and layout prior from pre-trained diffusion model by modulating attention map of instance token and spatial token across viewpoints. Benefiting from above elaborate designs, I2C-3D not only generates high-fidelity multi-view consistent compositional 3D assets but also supports 3D editing flexibly, facilitating complex scene generation. Extensive experiments demonstrate our I2C-3D outperforms existing methods in generation quality and multi-view consistency.

25.
arXiv (CS.LG) 2026-06-25

From Forecasting Leaderboards to Deployment Decisions: A Fail-Closed Certification Protocol

arXiv:2606.24996v1 Announce Type: new Abstract: Forecasting leaderboards rank models by predictive quality, but their winners are often read as deployment-ready top-1 advice. That reading can fail when forecasts are passed through a fixed decision interface, such as an alert threshold, a top-k budget, or a switching-cost policy. We study when a forecast-side winner can be certified as deployment-actionable for a specified interface and deployed utility. We introduce a fail-closed certification protocol whose gates are sufficient evidential conditions for a strong claim: a friction-caused, non-tie, statistically supported, and recurrent deployment-side reversal. Traffic-Hourly provides a certified anchor: winners agree at zero friction, but positive switching friction makes the forecast winner deployed-suboptimal. A locked native audit tests overclaiming: across 22 verified candidates and 362 full-grid cells, 155 apparent forecast/deployment winner inversions are blocked before certification. The contribution is not a new forecaster, metric, or universal utility, but a conservative protocol for deciding when forecasting leaderboard winners should be read as deployment-actionable top-1 advice.