Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

XFlow: An Executable Protocol Programming System for Reliable Multi-Agent Workflows

arXiv:2606.14790v1 Announce Type: cross Abstract: LLM-based multi-agent systems increasingly coordinate planning, reasoning, tool use, and human interaction, yet their reliability remains limited. A central source of this limitation is the underspecified prompt–harness boundary. Current systems lack a principled way to decide which workflow commitments should remain in prompts and which should become harness structure. We present XFlow, an executable protocol programming system for reliable multi-agent workflows, and XPF (XFlow Protocol Format), its domain-specific protocol programming language. XFlow occupies a middle position between prompt-only orchestration and markup-like workflow descriptions. XPF remains readable as a literate protocol, but it is compiled and executed as a program. Its design keeps informal semantic work inside actors while moving selected commitments into harness structure that can be checked, preserved, and enforced. At runtime, XFlow stages uncertainty through lifecycle-governed symbols, which are typed state cells with validation and commit states. Actor outputs are mediated before they become shared state, instead of spreading through prompts, transcripts, or implicit memory. Our experiments cover Constrained Interaction, Long-Context Reasoning, and Agentic Software Engineering. They show that XFlow improves reliability by making constraints, evidence handling, and process requirements explicit and enforceable.

02.
arXiv (CS.LG) 2026-06-16

Deep Learning-Based Lunar Crater Terrain Relative Navigation

arXiv:2606.14776v1 Announce Type: cross Abstract: Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

03.
arXiv (CS.CV) 2026-06-16

FireRed-Image-Edit-1.0 Technical Report

We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt re-indexing. To stabilize optimization and enhance controllability, we propose Asymmetric Gradient Optimization for DPO, DiffusionNFT with layout-aware OCR rewards for text editing, and a differentiable Consistency Loss for identity preservation. We further establish REDEdit-Bench, a comprehensive benchmark spanning 15 editing categories, including newly introduced beautification and low-level enhancement tasks. Extensive experiments on REDEdit-Bench and public benchmarks (ImgEdit and GEdit) demonstrate competitive or superior performance against both open-source and proprietary systems. To support future research, our code, models, and benchmark suite are publicly available at https://github.com/FireRedTeam/FireRed-Image-Edit/ .

04.
arXiv (CS.CL) 2026-06-16

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility – yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position – a content-free artifact – shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.

05.
arXiv (CS.AI) 2026-06-15

History of the Muddy Children Puzzle

arXiv:2606.13703v1 Announce Type: new Abstract: The Muddy Children Puzzle is a puzzle about knowledge and ignorance that has been inspiring for the development of epistemic logic. Who came up with it first? This is unclear. We trace the origin of the Muddy Children Puzzle through logical and literary publications over the past two centuries. The puzzle inspired a numerous variations such as involving numbers or coloured hats. We also present a novel hats puzzle involving self-reference.

06.
arXiv (CS.AI) 2026-06-17

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

arXiv:2606.06523v2 Announce Type: replace Abstract: Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods for specifying, verifying, and debugging their workflow and execution trajectories. This challenge mirrors a long-standing problem in mathematics, where the ambiguity of natural languages (NLs) motivates the development of formal languages (FLs). Inspired by this paradigm, we propose **Lean4Agent**, to the best of our knowledge, the first framework that uses Lean4, a dependent-type FL to model and verify agent behavior. **Lean4Agent** launches **FormalAgentLib**, an extensible Lean4 library for formally modeling and verifying agent workflows' semantic consistency under explicit assumptions, and enabling localization of execution-time failures revealed by trajectories. Building on **FormalAgentLib**, we further develop **LeanEvolve**, which applies results in **FormalAgentLib** to revise workflows to enhance its capability. Extensive experiments on a hard problem subset of SWE-Bench-Verified and a subset of ELAIP-Bench across 5 leading LLMs indicate that the verification-passing workflows outperform the failing ones by an average of **11.94%**, and **LeanEvolve** further improves SWE performance by **7.47%** on average. Furthermore, **Lean4Agent** establishes a foundation for a new field of using expressive dependent-type FL to formally model and verify agent behavior.

07.
arXiv (quant-ph) 2026-06-16

Towards Interpretability of Neural Quantum States

arXiv:2508.14152v2 Announce Type: replace Abstract: Neural quantum states (NQS) have emerged as a powerful variational ansatz for representing quantum many-body wave functions. Their internal mechanisms, however, remain poorly understood. We investigate the role of correlations for NQS-like quantum state representation by employing a correlation-based interpretable neural network architecture and then proving our observations using Boolean function theory. The correlator neural network demonstrates that, even for simple product states, up to all system-size correlation orders in the chosen computational basis are required to represent a quantum state faithfully. We explain these observations using Fourier expansion, which reveals the correlator basis as the effective basis of the internal NQS structure, the resulting necessity for high-order correlations that is supported by an entanglement bound that scales with the correlation order, consequences of linear dependencies in constrained Hilbert spaces for correlation requirements, and connections between spin basis rotations and the correlator basis. Furthermore, we analyze how neural networks achieve high correlation orders by increasing the magnitude of the network weights, which can be compensated by increasing the network depth. Lastly, we discuss how activation functions, network architectures, and choice of reference basis influence correlation requirements. Our results provide new insights and a better understanding of the internal structure and requirements of NQS, enabling a more systematic use of NQS in future research.

08.
arXiv (CS.AI) 2026-06-19

Mitigating Simplicity Bias in OOD Detection through Object Co-occurrence Analysis

arXiv:2605.07821v2 Announce Type: replace-cross Abstract: Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual information within images. This issue is particularly challenging for detecting near-OOD, as models with simplicity bias struggle to learn discriminative features in disentangled representations. The human visual system can use the co-occurrence of objects in the natural environment to facilitate scene understanding. Inspired by this, we propose an Object-Centric OOD detection framework that learns to capture Object CO-occurrence (OCO) patterns within images. The proposed method introduces a new OOD detection paradigm that understands object co-occurrence within an image by predicting disentangled representations for the test sample, then adaptively divides patterns into three scenarios based on object co-occurrence patterns observed in ID training data, and finally performs OOD detection in a divide-and-conquer manner. By doing so, OCO can distinguish near-OOD by considering the semantic contextual relationships present in their images, avoiding the tendency to focus solely on simple, easily learnable regions. We evaluate OCO through experiments across challenging and full-spectrum OOD settings, demonstrating competitive results and confirming its ability to address both semantic and covariate shifts. Code is released at https://github.com/Michael-McQueen/OCO.

09.
bioRxiv (Bioinfo) 2026-06-17

AMaNITA: an end-to-end workflow for native tRNA nanopore sequencing data analysis

Transfer RNA (tRNA) molecules serve as essential adapters during protein translation. While direct RNA sequencing (DRS) via Oxford Nanopore Technologies has emerged as a powerful platform for systematic tRNAome profiling, we currently lack a simple and robust statistical framework for nanopore tRNA data analyses. Here, we address this gap by developing AMaNITA (Abundance, Modifications, and Nanopore Intensity Toolbox Application), an end-to-end bioinformatic workflow that enables simplified, robust, and scalable analyses of nanopore native tRNA sequencing datasets. AMaNITA streamlines the entire analytical trajectory: from upstream processing (basecalling, mapping, filtering, batch effect correction) to downstream assessment of differential tRNA abundance and modification stoichiometry. The workflow generates an interactive HTML report for data exploration and analysis, allowing the user to download the source data files and resulting plots. AMaNITA can be executed using Singularity from the command line, without requiring installation of dependencies.

10.
arXiv (math.PR) 2026-06-15

Stability of the $k$-Plane Transform on Measures and Hölder-Type Comparisons of Wasserstein Metrics

arXiv:2605.00375v2 Announce Type: replace-cross Abstract: We establish stability estimates for the $k$-plane transform on finite positive Radon measures, with emphasis on Fourier and Wasserstein metrics. We first introduce a metric on $k$-plane transform data and prove a bi-Lipschitz stability estimate showing that this metric is equivalent to a generalized Fourier metric obtained by augmenting the Fourier distance between centered normalized measures with separate barycenter and total mass difference terms. Building on a Hölder-type comparison between Fourier and Wasserstein metrics due to Carrillo and Toscani, we extend this comparison to positive Radon measures under uniform bounds on centered moments of order slightly larger than $2$. This yields Hölder-type stability for the $k$-plane transform in a generalized $2$-Wasserstein metric and, in particular, a $W_2$-stability estimate for centered probability measures. We also compare the $2$-Wasserstein distance with its max-sliced analogue. For centered probability measures with uniformly bounded moments of order slightly larger than $2$, we prove a two-sided Hölder-type comparison between these distances. We then extend the result to positive Radon measures by applying it to centered normalized measures and adding separate barycenter and mass terms. Finally, for absolutely continuous compactly supported probability measures with bounded densities, we prove a strong equivalence between the $2$-Wasserstein distance of the measures and the $(k/2-1)$-order Sobolev norm of the $k$-plane transform data of the difference of their densities.

11.
arXiv (CS.CL) 2026-06-12

Emergence of Hierarchical Emotion Organization in Large Language Models

As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels, i.e., a psychological framework that argues emotions organize hierarchically, we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.

12.
arXiv (CS.AI) 2026-06-18

AI Sovereignty as National Learning Capacity: A Human-Centered Learning Mechanics Viewpoint on France, the United States, and China

Authors:

arXiv:2606.00729v2 Announce Type: replace Abstract: Artificial intelligence in France is often discussed through separate dimensions such as investment, compute, regulation, employment, sovereignty, and education. This viewpoint paper proposes a unified interpretation: France can be analyzed as a national AI learning system. Building on Human-Centered Learning Mechanics (HCLM), we use HCLM not as a validated econometric model, but as a conceptual and diagnostic lens for interpreting national AI development as a balance between information injection, absorptive capacity, and institutional dissipation. Information injection includes compute, data, talent, research, capital, industrial deployment, and policy experimentation. Institutional dissipation refers to avoidable frictions such as administrative overload, coordination failures, energy constraints, regulatory uncertainty, talent mobility pressures, and weak industrial absorption. Regulation is not treated as mere friction: adaptive governance, trusted data spaces, and safety-oriented standards may increase long-term learning capacity by improving legitimacy, interoperability, and social trust. The central claim is not that a country follows neural-network equations, but that AI sovereignty depends on how effectively it converts distributed information into absorbed, coordinated, and socially legitimate capability. The paper connects HCLM with neural scaling laws, endogenous growth theory, creative destruction, absorptive capacity, and coordination mechanisms. It offers a formal heuristic, policy indicators, illustrative scenarios, and implications for France. The numerical results are diagnostic scenarios, not econometric estimates or official rankings. The proposed viewpoint reframes AI policy as the governance of an open, strategic, non-equilibrium learning system that should be tested with historical and cross-country data.

13.
arXiv (CS.CV) 2026-06-16

Scribby: A Multi-Level LLM Framework for Semantic Video Analysis

As video content continues to expand across educational platforms, recorded lectures, and live-streamed entertainment, the need for efficient and structured analysis of long-form footage has increased [1]. Although many existing AI programs provide high-level video summaries based on AI-generated transcripts [2,3,4,5], these approaches are often limited to coarse overviews and lack detailed analysis of a video's structure, thematic progression, and semantic relationships, all of which are required for comprehensive video analysis. This paper proposes an LLM-based video summarization framework that balances macro-level comprehension with micro-level semantic analysis [6,12,13]. The first stage of the process indexes the video at a micro level by (1) analyzing the full transcript, (2) analyzing individual transcript sentences, and (3) grouping these sentences by semantic similarity using an LLM as a judge [6,13]. Contextual continuity is retained during sentence-level processing by incorporating both the global transcript analysis and adjacent sentence information into each evaluation prompt. This framework establishes a foundation for video analysis tools that visualize semantic chunking and semantic matching through relevance-based heatmaps. Limitations and future expansions of the framework are also discussed.

14.
bioRxiv (Bioinfo) 2026-06-19

ContinuumCellAgent: A Framework-Guided Agent for Long-Horizon Scientific Research

AI-scientist systems are beginning to automate parts of scientific research. We present ContinuumCellAgent, an autonomous agent that executes literature review, hypothesis formation, computational experimentation, manuscript drafting, and adversarial peer review as a single unattended run. Existing AI scientist systems remain difficult to diagnose because they lack modularity, systematic prompt grounding, and observability into long-running behavior. ContinuumCellAgent addresses these gaps with a modular supernode architecture for stage-wise backend swapping, protocols grounded in curated research-method checklists that also define reviewer rubrics, and a diagnostics layer that records file-based artifacts, message traces, and state transitions. We evaluate the system on open-domain QA benchmarks and biomedical/longevity case studies, showing that it can produce checkable research artifacts while exposing pipeline dynamics for rigorous AI co-scientist research.

15.
arXiv (CS.AI) 2026-06-12

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

arXiv:2606.13079v1 Announce Type: cross Abstract: Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

16.
arXiv (CS.CV) 2026-06-15

Avatar V: Scaling Video-Reference Avatar Video Generation

Generating avatar videos that are not merely visually similar to a target individual but behaviorally recognizable, faithfully reproducing their talking rhythm, gestural tendencies, and expression dynamics, remains an open challenge. Existing methods predominantly condition on single static images, which provide insufficient identity information and cannot capture dynamic motion traits, while standard pixel-level objectives underserve the perceptually critical facial regions that determine avatar fidelity. We present Avatar V, a production-scale framework that addresses these limitations through video-reference-conditioned identity modeling. Rather than compressing identity into fixed-size embeddings, the model conditions directly on the full token sequence of a reference video, learning to reproduce both static identity attributes (facial geometry, skin texture) and dynamic behavioral patterns (talking rhythm, micro-expressions) through attention over the reference context. We introduce Sparse Reference Attention, an asymmetric mechanism achieving linear-complexity conditioning on arbitrarily long references; a motion representation stream enabling closed-loop talking style transfer; and an identity-aware super-resolution refiner inheriting the full reference conditioning. These are supported by a data engine curating 100M+ training clips from 50M raw videos, and a five-stage training pipeline with flow matching pre-training, personality fine-tuning, two-phase distillation (>10x acceleration), and RLHF alignment, deployed across thousands of GPUs. Avatar V generates 1080p videos of unlimited duration, achieving state-of-the-art identity preservation, lip synchronization, and generation quality on our cross-scene benchmark, consistently outperforming leading systems including Seedance 2.0, Kling O3 Pro, Veo 3.1, and OmniHuman 1.5 in both automated metrics and human evaluation.

17.
medRxiv (Medicine) 2026-06-15

Toward a National Registry for Inborn Errors of Immunity in Peru: A Qualitative Implementation Study

Background: Peru lacks an integrated information system for patients with Inborn Errors of Immunity (IEI). Although disease registries are essential tools for data management and health planning, their success depends on implementation science approaches that account for local contextual factors. This study reports Phase I of a three-phase mixed-methods implementation project to design and develop a national IEI registry. Methods: Phase I consisted of a phenomenological qualitative study exploring stakeholder perspectives. Semi-structured focus groups and in-depth interviews were conducted with 29 key stakeholders across four groups: policy-makers, clinical experts, end-users (immunologists, residents, allied health personnel), and patient organization representatives. Interviews followed a guide structured around four a priori domains (structure, navigation, feasibility, and perception of existing systems). Discussions were conducted in Spanish, audio-recorded, transcribed verbatim, and coded using ATLAS.ti. A hybrid thematic analysis combining deductive and inductive coding was performed. Data elements proposed for the registry were triangulated with qualitative findings. Results: Thirty-six initial codes were consolidated into 15 categories, which were further integrated into four overarching themes conceptualized as pathways toward intention to use: (1) Environment, where governance, regulatory backing, and sustainable financing were identified as key enablers, while limited interoperability emerged as a structural barrier; (2) Technical Dimension, emphasizing usability, alignment with clinical workflow, and a hierarchical data architecture (demographic, clinical, therapeutic); (3) Users, highlighting clinical leadership, protected time, digital readiness, and perceived usefulness as stronger motivators than financial incentives; and (4) Patients, underscoring data protection, transparency, trust, and advocacy as essential for legitimacy and sustainability. Conclusions: A national IEI registry in Peru is perceived as necessary and feasible if implemented with strong regulatory foundations, interoperable design, robust data security, and user-centered architecture. These findings informed the development of an initial functional prototype and the operational plan for Phase II, focused on usability evaluation.

18.
arXiv (math.PR) 2026-06-19

Asymptotic properties for fully coupled delayed forward-backward stochastic differential equations

arXiv:2606.19925v1 Announce Type: new Abstract: We investigate the asymptotic behavior of solutions to a class of fully coupled forward-backward stochastic differential equations with time-delayed generators. Such systems arise naturally in stochastic models with memory effects and constitute a significant extension of the classical fully coupled FBSDE framework. The presence of delay introduces additional analytical difficulties due to the dependence of the coefficients on the past trajectories of the solution processes and the resulting non-Markovian structure. Under suitable assumptions on the coefficients, we study the asymptotic properties of a perturbed delayed FBSDE driven by a small noise parameter. We first establish the convergence in distribution of the associated solution processes as the perturbation parameter tends to zero. We then prove almost sure convergence towards the solution of the corresponding deterministic limiting system. As a consequence of these asymptotic results, we derive a large deviation principle for the solution processes. Our results extend the asymptotic analysis of Cruzeiro, Gomes and Zhang (2014) from the classical fully coupled FBSDE setting to the delayed framework, and complement existing works on weakly coupled delayed forward-backward systems. They provide, to the best of our knowledge, the first large deviation principle for fully coupled forward-backward stochastic differential equations with delayed generators.

19.
arXiv (CS.CL) 2026-06-19

LaViSA: A Language and Vision Structural Ambiguity Benchmark

Structural ambiguity arises when a single sentence admits multiple valid interpretations due to its syntactic structure, posing a fundamental challenge for language understanding. Visual scenes serve as useful cues for resolving such ambiguity, and Vision and Language Models (VLMs) need to be capable of deriving possible semantic interpretations from visual scenes. We introduce Language and Vision Structural Ambiguity (LaViSA), a benchmark designed to evaluate the ability of VLMs to resolve structural ambiguity leveraging visual scenes. LaViSA consists of ambiguous sentences, their disambiguated sentences, and corresponding images of these disambiguated sentences across seven ambiguity categories. Using LaViSA, we conduct a comprehensive evaluation of diverse VLMs, including both proprietary and open-source models with varying parameter scales and reasoning capabilities. Experimental results show that although recent VLMs can leverage visual scenes to resolve structural ambiguity to a some extent, they still struggle with certain ambiguity types and visually subtle semantic distinctions, indicating remaining limitations in resolving structural ambiguity using visual scenes.

20.
arXiv (CS.CV) 2026-06-16

Sinkhorn-CPD: Robust point cloud registration via unbalanced entropic optimal transport

Coherent Point Drift (CPD) is widely used for rigid point cloud registration because of its soft correspondences and closed-form parameter updates. However, CPD's target-side marginal constraint forces every observation, including outliers, to receive exactly unit probability mass. This assumption degrades registration accuracy under heavy outliers and partial overlap. Optimal transport (OT) methods can handle missing mass through unbalanced formulations, but require hand-tuned annealing schedules. In this paper, we propose Sinkhorn-CPD, which replaces CPD's target-side marginal constraint with dual Kullback-Leibler penalties, allowing the algorithm to discard outliers on both sides. The resulting formulation is a fully unbalanced entropic optimal transport problem, which can be efficiently solved by generalized Sinkhorn iterations. Moreover, Sinkhorn-CPD preserves the closed-form Procrustes and variance updates of CPD. In our method, the variance sigma^2 plays the role of the entropic regularization parameter, which induces an automatic annealing schedule from diffuse to sharp correspondences without manual temperature tuning. Experiments on synthetic, cross-category, and scan-to-CAD benchmarks show that Sinkhorn-CPD achieves state-of-the-art accuracy, with strong robustness to outliers and partial overlap.

21.
arXiv (CS.AI) 2026-06-15

Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language Models

arXiv:2606.14507v1 Announce Type: new Abstract: Fine-tuning vision-language models to emit dense coordinate lists improves visual grounding but also changes how models serialize, repeat, and terminate structured outputs. We study this behavior as a generation and control surface. In Gemma 4 12B, high-capacity q/k/v/o LoRA raises class-aware F1@0.3 from 0.007 to 0.448 while inducing repeated-tail pressure (duplicate rate 0.080, max repeat 23). A q/v rank sweep keeps max repeat at 21-22 across ranks 4-64, showing capacity persistence. The target signal is separable: object-level repeat-stop removes exact repeated records (duplicate rate 0.000, max repeat 1) while preserving F1 (0.494 to 0.490) and stricter F1@0.5 (0.381 to 0.385). Structure-axis probes localize the effect to bbox-coordinate object lists; dense non-bbox and spatial/count JSON remain repeat-clean, including under high-capacity adapters. Qwen3-VL-8B reproduces a clean controlled endpoint (F1@0.3 0.318, duplicate rate 0.000), and COCO 2017 reproduces acquisition plus duplicate pressure. Dense coordinate-list adaptation therefore creates a structure-bound, cross-family interference surface that can be measured and controlled.

22.
arXiv (CS.LG) 2026-06-16

A Compositional Framework for Open-ended Intelligence

arXiv:2606.15386v1 Announce Type: new Abstract: Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. We formalize open-ended intelligence as the closure induced by a finite primitive set \(P\) and a set of composition operators \(C\). We characterize properties of the induced closure \(\mathcal{L}(P,C)\) that support unbounded compositional generation across families of tasks and worlds. A mathematics of open-ended intelligence requires two pillars: a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., nearest neighbor), together with composition motifs (e.g., recursion, sequencing) that reflect an acquired compositional grammar. The closure of these two pillars enables the generation of infinite adaptive responses across a wide range of settings. The mathematics supports complementary research agendas, including evaluation metrics for explanation and interpretability, as well as building architectures where compositional generalization is native. We propose next primitive prediction as a novel architectural objective, where the training objective encourages the acquisition of reusable algorithmic primitives and their compositional grammar, such that new solutions are generated through recombination. Curriculum learning and self-play enable lifelong learning and expansion of the closure by discovering reusable primitives and transition motifs across families of tasks and worlds. We ground the framework through case studies in physics, evolution, and neuroscience.

23.
arXiv (CS.CL) 2026-06-19

Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families

Fine-tuning language models on insecure code induces emergent misalignment with poorly understood internal structure. We investigate whether this misalignment corresponds to a causally actionable activation-space direction shared across architectures. Across four instruction-tuned model families (Qwen2.5-1.5B, Gemma-2-2B, Llama-3.2-1B, Ministral-3-3B) finetuned identically, a difference-in-means direction achieves 99.6% separation of aligned and misaligned activations at each model's final layer. Causal steering by subtracting this direction reduces code spillover by 21-51 points, while a secure-code control confirms content specificity. Cross-architecture transfer via ridge regression maps yields large behavioral suppression (up to 46 points) but fails specificity controls as random and orthogonal directions perform comparably. We identify a two-tier specificity structure: within-model directions are causally specific and actionable; cross-model directions are causally real but non-specific. An asymmetric transfer topology emerges, with Gemma and Qwen acting as geometric donors and Llama as a receiver. These findings define the limits of linear cross-architecture correction and recommend within-model probing for auditing.

24.
arXiv (CS.CL) 2026-06-12

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration. To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation. Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github.com/Bxzfrm/PRISM.

25.
arXiv (quant-ph) 2026-06-12

Reservoir-controlled electromagnetically induced gratings in a weakly driven two-level medium

arXiv:2606.13085v1 Announce Type: cross Abstract: We theoretically investigate the transmission and diffraction of a weak probe field from an electromagnetically induced grating formed in a weakly driven two-level medium coupled to engineered quantum reservoirs. Using a perturbative solution of the optical Bloch equations in the weak-driving regime, we analyze how normal-vacuum, thermal, and broadband squeezed-vacuum environments modify the probe susceptibility and consequently reshape both the spatial transmission function and the far-field diffraction patterns. We show that reservoir statistics have a pronounced impact on the diffraction response by altering the amplitude and phase of the induced grating. Thermal reservoirs enhance the transmission modulation and increase the intensity of the dominant diffraction orders, whereas squeezed-vacuum reservoirs generate strongly phase-sensitive modifications that selectively redistribute optical power among diffraction channels. We further demonstrate that the detuning between the squeezed reservoir and the driving field provides an efficient mechanism for controlling diffraction directionality, leading to substantial amplification of selected angular orders. In two-dimensional geometries, squeezed-vacuum correlations produce highly structured phase landscapes and strongly anisotropic diffraction patterns, enabling directional enhancement of specific diffraction channels while suppressing others. These results establish reservoir engineering as a versatile approach for controlling transmission, diffraction efficiency, and angular selectivity in minimal two-level systems, with potential applications in programmable photonic devices, beam steering, and quantum optical platforms.