Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-16

SimSiam Naming Game: A Unified Approach for Emergent Communication and Representation Learning

Emergent Communication (EmCom) investigates how agents develop symbolic communication through interaction without predefined language. Recent frameworks, such as the Metropolis–Hastings Naming Game (MHNG), formulate EmCom as the learning of shared external representations negotiated through interaction under joint attention, without explicit success or reward feedback. However, MHNG relies on sampling-based updates that suffer from high rejection rates in high-dimensional perceptual spaces, making the learning process sample-inefficient for complex visual datasets. In this work, we propose the SimSiam Naming Game (SSNG), a feedback-free EmCom framework that replaces sampling-based updates with a symmetric, self-supervised representation alignment objective between autonomous agents. Building on a variational inference–based probabilistic interpretation of self-supervised learning, SSNG formulates symbol emergence as an alignment process between agents' latent representations mediated by message exchange. To enable end-to-end gradient-based optimization, discrete symbolic messages are learned via a Gumbel–Softmax relaxation, preserving the discrete nature of communication while maintaining differentiability. Experiments on CIFAR-10 and ImageNet-100 show that the emergent messages learned by SSNG achieve substantially higher linear-probe classification accuracy than those produced by referential games, reconstruction games, and MHNG. These results indicate that self-supervised representation alignment provides an effective mechanism for feedback-free EmCom in multi-agent systems.

02.
arXiv (CS.AI) 2026-06-11

Sparsified Kolmogorov-Arnold Networks for Interpretable Quantum State Tomography

arXiv:2606.11814v1 Announce Type: cross Abstract: Machine-learning approaches to quantum state tomography can achieve high reconstruction fidelity, but the physical structure used by the trained model often remains implicit. Here we ask whether a sparsified Kolmogorov-Arnold Network (KAN) can be used not only as a regressor, but also as an inspectable reconstruction rule whose internal organization can be checked against known Pauli structure. We study a controlled three-qubit GHZ-family benchmark in which all 63 non-identity Pauli expectation values are used to reconstruct three GHZ-subspace variables: the population imbalance $z$, the real off-diagonal component $c$, and the imaginary off-diagonal component $s$. Under finite-shot sampling and depolarizing noise, external ablation identifies the extended 12-channel GHZ-relevant Pauli set from the 63 measurements, with exact top-12 recovery across the tested shot counts and depolarizing-noise strengths. These support patterns remain stable across multi-seed random-initialization and noise-level analyses, and collapse under random-label controls. The dominant pruned input-hidden-output pathways organize Z-type population observables and X/Y off-diagonal observables in a pattern consistent with the analytic GHZ Pauli grouping, and sparse formula recovery recovers the canonical signed Pauli relations. The contribution of the KAN is therefore pathway-level structural interpretability within a neural reconstruction model, rather than superior sparse regression. Together with negative controls, these probes provide a consistency chain for auditing learned reconstruction rules against known physical structure.

03.
arXiv (CS.AI) 2026-06-16

TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate

arXiv:2605.13909v2 Announce Type: replace-cross Abstract: Negotiation is a central mechanism of economic exchange, shaping markets, procurement, labor agreements, and resource allocation. It is also a canonical testbed for agentic language models, requiring multi-turn interaction under hidden preferences, strategic communication, and binding constraints. These properties make negotiation hard to evaluate: unlike math or code, it has no intrinsic verifier. Existing LLM negotiation evaluations rely on LLM-vs.-LLM interaction or aggregate outcomes such as deal rate, leaving failures opaque. We introduce Terms-Bench, short for Testbed for Economic Reasoning in Multi-turn Strategy, a Bayesian-game framework that makes the environment itself the verifier by specifying the counterpart's latent type, policy, and payoff structure. We instantiate it in bilateral price negotiation, where the counterpart's private state and simulator policy are hidden from the agent but observable to the evaluator. This turns the counterpart from a black-box opponent into a diagnostic instrument, enabling agent-attributable failure analysis and oracle-reference optimality gaps. Evaluating 13 LLM agents spanning frontier systems from major providers, Terms-Bench turns negotiation evaluation from aggregate ranking into actionable diagnosis: where agents fail, why they fail, and what to strengthen. Empirically, frontier models saturate deal rate yet diverge in surplus extraction, cue use, belief calibration, and compliance, revealing agent-specific bargaining bottlenecks masked by prior benchmarks.

04.
arXiv (CS.CL) 2026-06-11

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Interpreting language models remains challenging due to the existence of residual stream, which linearly mixes and duplicates features across adjacent layers, causing single-layer analyses to miss this cross-layer structure. Cross-layer sparse autoencoders (SAEs) address layer mixing but operate in continuous space, where concepts split across many neurons without clear boundaries. We introduce Cross-Layer Vector Quantized-Variational Autoencoder (CLVQ-VAE), a novel framework which maps representations from a lower layer to a higher layer through a discrete vector-quantization bottleneck, collapsing duplicated residual-stream features into compact, interpretable concept vectors. Our approach combines top-k temperature-based sampling with exponential moving average (EMA) codebook updates, providing controlled exploration of the discrete latent space while maintaining codebook diversity. Across both encoder- and decoder-based models on ERASER-Movie, Jigsaw, and AGNews, CLVQ-VAE outperforms clustering, single-layer vector quantized-variational autoencoder (VQ-VAE), and sparse autoencoder (SAE) baselines across three evaluation axes: removing identified concepts drops model accuracy by up to 93%, LLM judges rank our concepts first in 66.7% of comparisons, and human annotators recover model predictions from our visualizations with 78% accuracy versus 54% for clustering.

06.
medRxiv (Medicine) 2026-06-16

Physiological Aging of the Respiratory System (PARS): from development to application

Background: Aging has a critical role in lung changes and the outcome of lung disease. Several lung aging equations have been proposed to measure deviation from physiological aging of the respiratory system. In this study, we aimed to develop a single measure of accelerated lung aging and show its application as a measure of lung aging. Method: We used a pre-bronchodilator pulmonary function test (PFT) from NHANES adult participants recruited from 2007 to 2011. We applied Klemera-Dubal Method (KDM) to four PFT measurements, FEV1, FVC, FEF25-75, and PEF, to calculate a measure of lung biological aging. Physiological Aging of the Respiratory System (PARS) was calculated from the residual method vs. chronological age. We tested the construct validity of PARS by measuring its association with risk factors of lung health. The prognostic validity was measured using a survival analysis. Sampling weights were applied to all analyses. Results: In 14,123 adult participants, the mean (SD) of accelerated lung age (PARS) was 0 (8.2) years. Participants with a history of asthma and emphysema had 4- and 10-year higher PARS. Cigarette smoking, lower socioeconomic status, black race, higher serum cadmium, and lower serum selenium and magnesium were associated with higher PARS. During 116 months of follow-up, PARS was associated with a higher mortality (HR = 1.06, 95%CI: 1.05-1.07 per year). Females with higher PARS had a higher risk of death (P for interaction < 0.001). Results were consistent across different subgroups and sensitivity analyses. Conclusion: PARS is a noninvasive lung aging marker and can be applied as a single measure of lung accelerated aging in the adult population. Its strong construct and predictive validity support its future application among different populations with and without lung disease.

07.
arXiv (CS.LG) 2026-06-19

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

arXiv:2606.20055v1 Announce Type: new Abstract: Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

08.
arXiv (CS.CV) 2026-06-24

Full-resolution MLPs Empower Medical Dense Prediction

Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks.

09.
arXiv (CS.CV) 2026-06-18

URDF Synthesis from RGB-D Sequences via Differentiable Joint Inference and Energy-Consistent Verification

作者:

Reconstructing simulation-ready digital twins of articulated objects from sensor observations remains constrained by two persistent gaps: (i) part-level geometric reconstruction is decoupled from kinematic-parameter estimation, and (ii) the recovered models often violate basic dynamic invariants such as energy conservation, leading to drift when the URDF is replayed in physics simulators. We present KinemaForge, a constraint-driven pipeline that jointly infers part-level shape, joint topology, and joint parameters from short RGB-D sequences and validates the result against an energy-consistent verifier built on differentiable rigid-body dynamics. The pipeline introduces three components: a kinematic constraint graph that encodes joint-part incidences as soft edges; a differentiable screw-axis solver that backpropagates from rendered observations through Featherstone's articulated-body algorithm to joint parameters; and an energy residual loss that penalises non-physical free responses of the reconstructed model. Across five PartNet-Mobility categories and an internal RGB-D benchmark, KinemaForge reduces the average joint-axis error from 4.52 degrees to 2.83 degrees (-37.4%) over the strongest geometric baseline (PARIS) and from 5.30 degrees to 2.83 degrees (-46.6%) over the interaction-based Ditto baseline, lowers long-horizon simulation drift by 64% (vs. PARIS) over 50 s rollouts, and yields URDFs whose closed-loop manipulation success rate improves by 14.6 percentage points over Ditto in our preliminary evaluation. Code and reconstruction data will be released upon acceptance.

10.
arXiv (CS.CV) 2026-06-17

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Reconstructing fast-dynamic scenes from multi-view videos is crucial for high-speed motion analysis and realistic 4D reconstruction. However, the majority of 4D capture systems are limited to frame rates below 30 FPS (frames per second), and a direct 4D reconstruction of high-speed motion from low FPS input may lead to undesirable results. In this work, we propose a high-speed 4D capturing system only using low FPS cameras, through novel capturing and processing modules. On the capturing side, we propose an asynchronous capture scheme that increases the effective frame rate by staggering the start times of cameras. By grouping cameras and leveraging a base frame rate of 25 FPS, our method achieves an equivalent frame rate of 100-200 FPS without requiring specialized high-speed cameras. On processing side, we also propose a novel generative model to fix artifacts caused by 4D sparse-view reconstruction, as asynchrony reduces the number of viewpoints at each timestamp. Specifically, we propose to train a video-diffusion-based artifact-fix model for sparse 4D reconstruction, which refines missing details, maintains temporal consistency, and improves overall reconstruction quality. Experimental results demonstrate that our method significantly enhances high-speed 4D reconstruction compared to synchronous capture.

11.
arXiv (CS.AI) 2026-06-16

ROSA-RL: Uncertainty-Aware Roundabout Optimized Speed Advisory with Reinforcement Learning

arXiv:2606.16558v1 Announce Type: new Abstract: Roundabouts challenge automated driving in mixed traffic, as heterogeneous and non-deterministic human behavior, unknown driving intentions, and high interaction complexity create uncertainty about whether the conflict zone will be blocked or available at the moment of entry. We present ROSA-RL – uncertainty-aware Roundabout Optimized Speed Advisory with Reinforcement Learning. It enables safe and efficient roundabout entry for automated and human-driven vehicles in mixed traffic through probabilistic conflict forecasting. A Transformer-based model predicts conflict zone occupancy over a five-second horizon, capturing multi-agent interactions to anticipate upcoming conflicts and available gaps. The prediction outputs encode uncertainty in future motion and intent, and augment the state of a classical RL framework, enabling uncertainty-aware speed coordination. Evaluated in simulations grounded in real-world data, ROSA-RL can effectively handle uncertainty and outperform a comparable model-based baseline, closing the gap to an ideal setting assuming fully known occupancy while improving traffic efficiency and safety. The source code of this work is available under: github.com/urbanAIthi/ROSA-RL.

12.
arXiv (CS.CL) 2026-06-19

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

LLMs can personalize education, although current static-prompt tutoring systems struggle to adapt to diverse academic disciplines. We develop and test a system with subject-aware prompting, based on 14 pedagogical features (e.g., tutor scaffolding, student understanding) extracted from raw transcripts. We first train a prompt routing model in a simulation environment, and then deploy it for online adaptation with actual high-school students. The simulation benchmark shows the router outperforming two static baselines ($0.694$ vs. $0.647$ and $0.64$, $p

13.
arXiv (CS.AI) 2026-06-15

I'm Sorry Driver, I'm Afraid I Can't Do That: Appraising the Safety of LLMs within Automotive Contexts

arXiv:2606.14327v1 Announce Type: cross Abstract: This paper appraises recent frameworks within AI development to integrate LLMs into control tasks in automotive contexts from the perspective of safety assurance. This work has built upon the rapid integration of LLMs across automotive settings. However, we find that at present, these frameworks face significant challenges, limiting their efficacy in real-time safety-critical contexts. Firstly, we consider conceptual challenges, including the fact that deployers are faced with a dual challenge, wherein they must assure a model which has been developed upstream, i.e. as general-purpose tools by the large AI labs, in a downstream context, i.e. into specific vehicle architectures. Secondly, we consider concrete challenges from across existing standards. We show that there are currently both fundamental engineering constraints covered in ISO21448, such as latency, and novel LLM-specific issues, such as alignment-related issues covered in ISO/PAS8800. We ground both examples in a concrete introductory, experimental case study exploring an existing open-source repository, Talk2Drive. We present a safety argument in order to make explicit the limitations of existing solutions. Nonetheless, given that the use of LLMs in automotive contexts is being explored at a technical level and operationalised, we propose potential assurance mechanisms for LLM-related hazardous events going forward.

14.
arXiv (CS.AI) 2026-06-15

Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech

arXiv:2606.13989v1 Announce Type: cross Abstract: Recent alignment-free non-autoregressive (NAR) text-to-speech (TTS) models formulate synthesis as a conditional infilling task, bypassing explicit duration predictors and external aligners. When speech is represented with neural codec tokens, the infilling problem becomes discrete, making Discrete Flow Matching (DFM), a Continuous-Time Markov Chain (CTMC) framework for discrete generation, a natural fit. However, inference-time control for stable low-step conditional infilling remains underexplored. We propose Mask, Sample, Revise, an inference-time CTMC stack for alignment-free DFM-TTS. The stack combines predictor-free guidance to strengthen text conditioning, prompt-matched conditional coupling to align the probability path with the acoustic prompt, and SC-ReMask, a schedule-constrained remasking mechanism that introduces token-to-mask transitions so early de-masking decisions can be revised. These components require no post-hoc fine-tuning and operate in a single tau-leaping sampler. Controlled ablations show that this stack improves intelligibility and robustness in the low-NFE prompted setting, outperforming unguided and guidance-only samplers with substantially more steps.

15.
arXiv (CS.CL) 2026-06-16

ttda704 at SemEval-2026 Task 4: Modeling Narrative Structures via Pseudonymization and Multi-View Sentence Alignment

We present our approach to SemEval 2026 Task 4: Narrative Story Similarity and Narrative Representation Learning. Our solution uses contrastive learning with fine-tuned sentence transformers to capture narrative similarity across abstract themes, course of action, and outcomes. We develop two pipelines: (Track A) a single-view method that encodes full narratives with smart layer freezing to reduce overfitting, and (Track B) a multi-view method that models theme, plot, and outcome with view-specific projection heads and self-supervised alignment. Both pipelines build on sentence-transformers models and are trained with contrastive loss on synthetic data. The code is available at the following GitHub repository: https://github.com/dinhthienan33/SemEval2026-Task4-ttda704.

16.
arXiv (CS.LG) 2026-06-18

Riemannian MeanFlow for One-Step Generation on Manifolds

arXiv:2603.10718v3 Announce Type: replace Abstract: Flow Matching enables simulation-free training of generative models on Riemannian manifolds, yet sampling typically still relies on numerically integrating a probability-flow ODE. We propose Riemannian MeanFlow (RMF), extending MeanFlow to manifold-valued generation where velocities lie in location-dependent tangent spaces. RMF defines an average-velocity field via parallel transport and derives a Riemannian MeanFlow identity that links average and instantaneous velocities for intrinsic supervision. We make this identity practical in a log-map tangent representation, avoiding trajectory simulation and heavy geometric computations. For stable optimization, we decompose the RMF objective into two terms and apply conflict-aware multi-task learning to mitigate gradient interference. RMF also supports conditional generation via classifier-free guidance. Experiments on spheres, tori, SO(3), and SE(3) demonstrate competitive one-step sampling with improved quality-efficiency trade-offs and substantially reduced sampling cost.

17.
arXiv (CS.LG) 2026-06-17

Finsler Geometry, Graph Neural Networks, and You

arXiv:2606.17185v1 Announce Type: new Abstract: Graph neural network architectures based on the graph Laplacian approximate the Laplace-Beltrami operator, thus limiting their application to isotropic operators. As a nonlinear alternative to the Laplace-Beltrami operator, we consider estimates of the Finsler Laplacian on point clouds sampled from a manifold. We prove that these discrete estimates converge to the true operator on the manifold as the number of point samples grows. Moreover, we show that this operator can be expressed as a graph neural network layer, which we use to define a family of Finslerian graph neural networks constrained to express Finsler geometry. We show that Finslerian graph neural networks recover the geometry underlying nonlinear diffusion equations in practice.

18.
arXiv (quant-ph) 2026-06-24

A no-go theorem for privacy in distributed sensing using Gaussian states

arXiv:2606.23796v1 Announce Type: new Abstract: In the discrete variable setting, entangled resource states allow a set of parties to learn a global function of a set of spatially separated systems, whilst keeping the local parameters of those systems completely private. In the continuous variable setting, distributed sensing has been carried out using Gaussian resource states, but without the same guarantees about privacy. Here, we show that perfect privacy is impossible to achieve for any distributed sensing protocol that uses Gaussian states as a resource. We also introduce a measure of relative privacy, bounding the degree to which any Gaussian distributed sensing protocol can keep local parameters hidden.

19.
arXiv (CS.CV) 2026-06-18

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

20.
arXiv (CS.LG) 2026-06-16

SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles

arXiv:2606.15559v1 Announce Type: cross Abstract: The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.

21.
medRxiv (Medicine) 2026-06-15

Multidimensional nutritional assessment in Crohns disease: cross-sectional comparison of active disease and remission

Malnutrition is common in Crohns disease (CD), and its assessment requires multiple tools. Comprehensive evaluation of nutritional status in a population with CD, predominantly characterized by metabolic phenotype, was inadequately reported. This study evaluated the nutritional status of CD patients using anthropometric, clinical, and biochemical measures and compared patients with active disease with those in remission. This cross-sectional study included 127 adults with CD: 63 with active disease and 64 in remission. Disease activity was classified using the Crohns Disease Activity Index, the Simple Endoscopic Score for Crohns Disease, and magnetic resonance enterography. Nutritional assessment included body mass index (BMI), mid-upper arm circumference, calf circumference, triceps skinfold thickness, mid-arm muscle circumference, Mini Nutritional Assessment-Short Form (MNA-SF), and biochemical markers including hemoglobin, serum iron, folate, vitamin B12, albumin, and zinc. Malnutrition was defined using the Global Leadership Initiative on Malnutrition criteria. Overall, 47.2% of participants were malnourished. Malnutrition was significantly more frequent in active disease than in remission (81.0% vs. 14.1%, P

22.
arXiv (CS.CL) 2026-06-17

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has become a popular paradigm, in which two responses to the same prompt are compared and the resulting judgments are aggregated into an overall ranking. A central challenge of this paradigm is intransitivity: the induced comparison outcomes may fail to support any coherent global ranking. For example, one may observe cyclic preferences such as $A \succ B \succ C \succ A$, or inconsistencies involving ties such as $A \equiv B\equiv C\neq A$. Such contradictions make the resulting leaderboard unstable and challenging to interpret. In this paper, we propose a prompt perturbation framework for improving the consistency of pairwise LLM evaluation. Our approach generates perturbed variants of each prompt, uses the resulting comparison graphs to identify and filter out structurally inconsistent comparison patterns, and then applies standard ranking methods to the filtered comparisons. A key feature of the proposed framework is that graph-level structural consistency is incorporated explicitly into the evaluation pipeline before ranking aggregation. This provides a simple and principled way to reduce cyclic inconsistencies and improve the reliability of LLM rankings.

23.
arXiv (CS.CV) 2026-06-16

Dehaze-GaussianImage: Zero-Shot Dehazing via Efficient 2D Gaussian Splatting Representation

Existing single image dehazing methods are often constrained by computational redundancy in pixel-level optimization and the lack of physical interpretability in implicit neural networks. These limitations hinder the balance between representation efficiency and reconstruction fidelity. To address these issues, we propose Dehaze-GaussianImage, the first zero-shot framework that introduces 2D Gaussian Splatting (2DGS) into the image dehazing domain to break the traditional pixel-grid processing paradigm. Distinct from static convolutional neural networks (CNNs) or Transformers, our approach models hazy images as continuous and dynamically evolvable anisotropic Gaussian fields. Specifically, we propose a novel reconstruction-decoupling zero-shot learning strategy that embeds the atmospheric scattering model into the Gaussian parameter space. This strategy drives Gaussian primitives to adaptively split, clone, and prune during optimization, achieving geometric-level decoupling of the transmission medium and clear textures. Furthermore, explicit structure-preserving constraints are introduced to suppress artifacts commonly caused by traditional physical priors. Experimental results demonstrate that the proposed method achieves state-of-the-art (SOTA) performance in a fully unsupervised manner with minimal parameters, highlighting the potential of explicit Gaussian representation for low-level vision tasks.

24.
arXiv (CS.LG) 2026-06-24

A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

arXiv:2505.14251v2 Announce Type: replace Abstract: We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input $(m,\alpha,\beta)$-subsamplable if a random subsample of size $m$ (or larger) preserves w.p $\geq 1-\beta$ the spectral structure of the original second moment matrix up to a multiplicative factor of $1\pm \alpha$. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of $(1\pm\gamma)$. We then show how to apply our algorithm to approximate the second moment matrix of a distribution $\mathcal{D}$, even when a noticeable fraction of the input are outliers.

25.
arXiv (CS.AI) 2026-06-17

An AI Security Agent for Banking: Multi-Vector Fraud and AML Detection Across Retail and Corporate Accounts

arXiv:2606.17555v1 Announce Type: cross Abstract: Banks simultaneously face signature-based fraud (card-not-present attacks, account takeover, ATM cloning) and behavioural financial crime (structuring, layering, mule networks, business email compromise) – two threat families with fundamentally different detection requirements. Static rule engines that reliably catch brute-force and high-velocity events are structurally blind to business-email-compromise (BEC) payment redirection, session hijacking, and money-laundering layering, which are engineered to appear indistinguishable from legitimate activity at the individual transaction or session level. This paper presents an AI security agent for retail and corporate banking that addresses this gap through a three-component fusion architecture operating on two parallel event streams: a transaction stream (card fraud, ACH/wire fraud, AML categories) and a session stream (account takeover, session hijacking, SIM-swap, insider abuse). Each stream combines an LSTM sequence model capturing per-account behavioural history, a statistical velocity/threshold monitor, and a graph/network module capturing account-counterparty relationship patterns (fan-in, fan-out, pass-through ratio) for money-laundering detection. Experiments on a synthetic event log of 237,669 transactions and 113,508 sessions across 13 threat categories and 3,470 simulated accounts demonstrate overall F1 of 0.787 (transaction stream) and 0.867 (session stream) for the proposed model, versus 0.562/0.733 for a rule-based baseline and 0.655/0.713 for an LSTM-only baseline. The agent includes a customer-facing transaction-verification chatbot (96.6% identity verification accuracy, 86.8% mass-reset attack detection) and an analyst case-summary assistant (99.3% action-recommendation F1), with Critical-tier automated response latency under 0.43 ms at the 95th percentile.