Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
bioRxiv (Bioinfo) 2026-06-12

PHI-Reason: evidence-grounded species-level phage-host prediction from structured biological text profiles

Phage–host interaction (PHI) prediction is a fundamental problem in microbiology with applications in microbial ecology and microbiome engineering. Existing computational approaches typically convert phage and host information into numerical representations derived from sequence similarity, protein content, genome composition or reference databases, then score candidate hosts or train host-prediction models. Although effective, such representations often make it difficult to inspect which biological evidence supports a prediction. Here, we present PHI-Reason, a species-level PHI prediction framework that reformulates host prediction as constrained biological text reasoning. Instead of embedding phages and hosts directly as numerical vectors, PHI-Reason converts heterogeneous PHI-related evidence from phage genomes, host genomes, functional annotations, homology searches and biological metadata into modular natural-language profiles. A frozen large language model then performs species-level candidate-host ranking or pairwise PHI assessment by integrating the supplied evidence at inference time. Across species-level benchmarks, PHI-Reason achieved competitive host-prediction performance and recovered complementary correct assignments relative to established sequence- and reference-based methods. Its explicit profile design enabled systematic evidence perturbation and rationale-grounding analyses, showing that predictions depend on coherent multi-source biological evidence and that hallucination risk from unsupported or incomplete profiles can be made operationally measurable. These results position PHI-Reason as a constrained evidence-integration framework for species-level PHI prediction. Rather than replacing sequence-based predictors, it provides an interpretable layer that shows how far explicit biological evidence can support host inference, and where that evidence falls short.

02.
arXiv (CS.CV) 2026-06-25

Gastroendoscopy View Synthesis: A New Real Dataset and Evaluation

Novel view synthesis (NVS) is an active research topic in computer vision, owing to the success of neural radiance field (NeRF) and 3D Gaussian splatting (3DGS) methods. While NVS opens the door to potential applications in gastroendoscopy, such as extending the field of view of endoscopic images and enabling digital twins for 3D archiving and endoscopist manipulation training, the dataset is insufficient to evaluate NVS for gastroendoscopy. In this paper, we present the first real gastroscopy dataset for NVS, namely the GastroNVS dataset, which contains a set of gastroscopic images, camera poses, and a point cloud for real gastroendoscopy inspection. To assess the suitability of the GastroNVS dataset, we evaluate several 3DGS methods and discuss the challenges for future development. The dataset is available on request from our project page.

03.
arXiv (CS.CV) 2026-06-12

Diffusion Transformer World-Action Model for AV Scene Prediction

Action-conditioned world models let an autonomous vehicle predict future camera scenes from its own planned controls, enabling planning and simulation without real-world rollouts, but at compact, trainable scale the futures are ambiguous and the field's standard distortion metrics actively mislead: they reward a blurry regression mean over a realistic prediction. We confront this with a compact latent world model that, given the present front-camera latent and a sequence of ego-actions, predicts future scene latents a frozen decoder renders to $256 \times 256$ frames up to 8 seconds ahead, evaluated on 150 held-out nuScenes scenes. We first benchmark where to predict: across six frozen encoders spanning four representation families, V-JEPA2 with temporal context reduces steering RMSE by 40% over the best single-frame encoder. We then train a latent Diffusion Transformer (DiT) and, through a controlled diagnosis, identify the four ingredients it needs: spatial tokens, the $x_0$ objective, residual anchoring, and sampling matched to target uncertainty. In a Stable-Diffusion-VAE encode-predict-decode pipeline we expose the central tension: distortion metrics (cosine similarity, SSIM) favor the blurry mean, masking that the diffusion model is far closer to the real frame distribution. Inception-based FID and KID reveal a clean perception-distortion frontier: diffusion attains KID 0.078 versus 0.375 for regression ($4.8\times$ better), and a deployable train-derived calibration makes this practical without test-time ground truth. The model is genuinely action-controllable (steering drives scene displacement, Spearman $\rho = 0.81$, vs $-0.18$ for regression). We trace limited single-pass motion to a shared-present anchor and engineer a compact 1.7M-parameter "jump" model that recovers full ground-truth motion magnitude ($1.02\times$ GT), where single-pass models capture less than half.

04.
arXiv (CS.CV) 2026-06-16

Through-Foliage Surface-Temperature Reconstruction for Early Wildfire Detection

We present a method to reconstruct surface temperatures through forest vegetation by combining signal processing and machine learning, enabling fully automated aerial wildfire monitoring with drones for early fire detection. Synthetic aperture (SA) sensing reduces canopy occlusion but introduces thermal blur. To overcome this, we train a visual state space model to recover subtle thermal signals of partially occluded soil and fire hotspots from blurred data. To address limited real-world training data, we generate realistic surface temperature simulations using a latent diffusion model, temperature augmentation, and procedural thermal forest modeling. On simulated datasets, our method reduces RMSE by 2-2.5 versus conventional thermal and uncorrected SA imaging; in field experiments on hotspots, RMSE improved by 12.8-fold and 2.6-fold, respectively. Our approach also generalizes to other thermal signals, including human signatures, capturing morphology and extent – critical where simple thresholding fails – while conventional imaging struggles with partial occlusion.

05.
arXiv (quant-ph) 2026-06-24

q-Askey Deformations of Double-Scaled SYK

arXiv:2605.13956v2 Announce Type: replace-cross Abstract: We construct families of deformations of the double-scaled SYK (DSSYK) model and investigate their bulk interpretation. We introduce microscopic deformations of the SYK model which, after ensemble averaging and in the double-scaling limit, are described by a transfer matrix encoding the recurrence relations of basic orthogonal polynomials in the q-Askey scheme. For certain families of deformations in the semiclassical limit at finite temperature, the chord number (encoding Krylov complexity) corresponds to the length of an Einstein-Rosen bridge connecting an End-Of-The-World brane to an anti-de Sitter asymptotic boundary. By increasing one of the deformation parameters, the models eventually exhibit discrete energy levels, signaling a new geometric transition in sine dilaton gravity. Via the SYK-Schur duality, Krylov complexity also admits a representation-theoretic interpretation as the spread of the SU(2) spin in the index of an $\mathcal{N}=2$ SU(2) gauge theory. We study the operator algebras of the deformed theories. The algebras can be type II$_1$ or type I$_\infty$ factors, depending on the operators that are included. The entanglement entropy between the type II$_1$ algebras for a pure state manifests as an extremal surface through the Ryu-Takayanagi formula. We discuss connections between our results and the emergence of baby universes in the bulk.

06.
arXiv (CS.CV) 2026-06-16

Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their internal visual representations remain difficult to interpret. Sparse Autoencoders (SAEs) provide a scalable way to decompose dense model activations into sparse, interpretable features. However, existing SAE architectures primarily recover flat feature dictionaries and are less suited for explicit multi-level concept organization. In this paper, we introduce cascaded sparse autoencoders (CSAEs) for learning hierarchical visual concepts in MLLMs. Rather than nesting or stacking SAE sparse activation codes, CSAEs train a second-level SAE directly on the decoder weights of the first-level SAE, treating learned low-level feature directions as inputs for higher-level abstraction. This design enables CSAEs to learn "concepts of concepts" while avoiding drawbacks from the shared-prefix coupling of nesting, Matryoshka-style hierarchies and the bottlenecks of naively stacked SAEs. Experiments across Qwen3-VL, Gemma-3, and LLaVA on multiple visual datasets show that CSAEs improve interpretability in terms of hierarchical concept coherence over state-of-the-art SAE baselines. Results on concept steering further demonstrate that the learned concept groups support effective group-level interventions in MLLM outputs.

07.
medRxiv (Medicine) 2026-06-12

Design, Implementation, and Evaluation of a Shadowing Program for Medical Students in the Basic Sciences Phase

Introduction Shadowing, as an educational method based on active observation, can foster a realistic understanding of professional roles and enhance the communication skills of medical students. This study aimed to design, implement, and evaluate a shadowing program for basic sciences medical students. Methods This development study was conducted based on the ADDIE model in five phases. The study population consisted of 799 medical students in semesters 2 to 5. The stages included Analysis (determining needs through literature review and expert panels), Design (specifying learning environments and evaluation methods), Development (preparing guides and educational tools), Implementation (within the Medical Ethics course), and Evaluation (using questionnaires and reflection forms). Findings This study aimed to design and evaluate an educational shadowing program based on the ADDIE model. In the Analysis phase, the profiles of 799 students and learning objectives were determined. In the Design phase, a structured program for four types of shadowing was designed. In the Development phase, all guides and educational tools were prepared. In the Implementation phase, the program was carried out with complete coverage and adherence to ethical considerations. Finally, the program evaluation showed that "Motivation to become a good physician" (3.75-3.95) and "Enhancing empathy" (3.50-3.94) received the highest scores, while "Increasing understanding of the basic science-clinical connection" (2.53-2.89) and "Willingness to attend on holidays" (1.87-2.31) received the lowest scores. Conclusion The findings indicate that implementing the shadowing program is an effective method for strengthening the professional attitudes and academic motivation of medical students. However, the program did not significantly improve students perception of the basic science-clinical connection, indicating a need for curricular refinement. The continuation and extension of this program to other levels and fields of medical sciences are recommended.

08.
arXiv (CS.LG) 2026-06-16

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

arXiv:2606.15569v1 Announce Type: new Abstract: Test-time training (TTT) adapts a pretrained model to each prompt via parameter updates, improving accuracy under pretraining-to-test distribution shifts. Yet, its performance often suffers from instability and sensitivity to hyperparameters such as update steps and subspace. We explain this behavior through a decision-theoretic lens, treating TTT as implicit Bayesian inference in the kernel regime. Under a Gaussian process benchmark, we show that TTT reduces prediction error when updates are spectrally matched to the prompt's signal-to-noise ratio and aligned with query-relevant eigen-directions. This perspective underpins the following results: (1) we show when fixed update steps and subspaces fail under distribution shifts, motivating adaptive strategies; (2) we prove that selecting update steps via prompt evidence admits a PAC-Bayes guarantee against overfitting; and (3) we characterize the Bayes-optimal update subspace under a linear-Gaussian correction model, yielding a scoring rule for selecting Transformer blocks and heads. Our theory helps explain the empirical instability of TTT, taking a step toward principled guidance for when, how far, and which directions to adapt.

09.
arXiv (CS.AI) 2026-06-16

CIWI-CKT: Chaos-Informed Wave Interference Feature Fusion and Cross-City Knowledge Transfer for Traffic Flow Forecasting

arXiv:2606.15642v1 Announce Type: cross Abstract: Accurate traffic flow prediction remains challenging in cross-city, data-scarce scenarios where limited historical data hinders model generalisation. The chaotic nature of traffic dynamics, complex spatio-temporal dependencies, and heterogeneous urban networks complicate few-shot learning across cities. Existing deep learning approaches either treat traffic as purely deterministic or lack mechanisms to model wave-like interference patterns essential for cross-regime traffic dynamics. To address these limitations, this paper proposes CIWI-CKT, a novel Chaos-Informed Wave Interference Feature Fusion framework with Cross-City Knowledge Transfer. Our framework introduces three core innovations: chaos-informed wave generation that extracts measurable chaos invariants and models traffic as adaptive wave components; meta-interference processing that captures wave interactions between support and query regimes while producing a predictability score for confidence estimation; and chaos-aware meta-learning that enables efficient cross-city knowledge transfer while preserving chaotic characteristics. We establish theoretical guarantees including chaos-to-wave stability, wave-induced dimension reduction, and meta-learning generalisation bounds. Extensive experiments on four real-world traffic datasets demonstrate that CIWI-CKT significantly outperforms state-of-the-art spatio-temporal graph learning, transfer learning, prompt-based, and few-shot methods, improving prediction accuracy while substantially reducing required training data.

10.
arXiv (CS.AI) 2026-06-24

Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering

arXiv:2505.17338v3 Announce Type: replace-cross Abstract: Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approaches such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require prohibitive per-scan optimization (hours for NeRF, about 30 minutes for 3DGS), making them impractical in clinical settings. We propose Render-FM, a feedforward model that eliminates this bottleneck by directly regressing 6D Gaussian Splatting (6DGS) parameters from a CT volume in a single 2.8-second forward pass, a 500x speedup over per-scan optimization. To bridge the domain gap between natural scene reconstruction and medical volumetric rendering, we introduce Anatomy-Guided Priming (AGP), which incorporates segmentation masks and transfer functions as structural and appearance priors, information that existing Gaussian splatting methods overlook. Built on an nnU-Net-inspired 3D U-Net trained on diverse CT scans, Render-FM predicts per-voxel 6DGS parameters and supports immediate real-time rendering. Unlike per-scan methods, it generalizes to unseen anatomies, novel transfer functions, and enables compositional organ visualization with zero additional preparation time. Optional 89-second fine-tuning further improves quality, surpassing per-scan optimized baselines. Project page: https://gaozhongpai.github.io/renderfm/.

11.
arXiv (CS.CL) 2026-06-12

AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas

Students at all levels of higher education face a significant barrier in the form of information overload, which often paralyzes the initial stages of the research process and suppresses motivation. In response, this article introduces a pedagogical framework that leverages AI SciBrief, a platform powered by a Large Language Model (LLM) designed to automatically generate digests of scientific trends. We describe how this multidisciplinary tool - with initial coverage in finance, medicine, and education - can be integrated into the curriculum to overcome this "entry barrier." The framework provides concrete methodologies for utilizing these digests to facilitate topic selection for term papers, accelerate literature reviews for dissertations, and enable postgraduate students to continuously monitor emerging trends. We conclude that AI SciBrief functions as a "gateway to research" effectively reducing students' cognitive load and empowering them to transition more rapidly from information searching to knowledge creation.

12.
arXiv (CS.AI) 2026-06-15

Sensitivity Shaping for Latent Modeling

arXiv:2606.14585v1 Announce Type: cross Abstract: Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution (OOD) transitions. Existing methods typically treat the learned dynamics as fixed and attach post hoc support surrogates. We show that these surrogates can fail when the dynamics are locally insensitive to critical action choices: unsupported control actions may produce latent predictions that resemble demonstrated transitions, suppressing OOD signals despite large true predictive errors. To address this, we introduce support-conditioned control-sensitivity regularization, which promotes sensitive local response to control input changes in learned dynamics in high-support training regions. This preserves control-induced variation while limiting unstable extrapolation due to weak empirical support. Experiments in vision-based obstacle avoidance, manipulation, and real-robot navigation show improved OOD detection and safer closed-loop planning.

13.
arXiv (CS.CL) 2026-06-19

A Survey of On-Policy Distillation for Large Language Models

As Large Language Models continue to grow in both capability and cost, transferring frontier capabilities into smaller, deployable students has become an important engineering problem, and knowledge distillation remains a common technique for this transfer. The prevailing recipe in industrial pipelines, static imitation of teacher-generated text, carries a structural weakness that grows more severe as tasks become longer and more reasoning-intensive. Because the student is trained on flawless teacher prefixes but generates its own at inference, small errors tend to accumulate into trajectories it has rarely been trained to recover from, and the resulting exposure bias has been shown to scale roughly with the square of sequence length. On-Policy Distillation reorganizes the training loop around this observation by having the teacher provide feedback on what the student actually produces, with the goal of reducing the compounding term toward linear and reframing distillation as an iterative correction process rather than single-pass imitation. The resulting literature has expanded along divergence design, reward-guided optimization, and self-play, yet contributions remain scattered across the knowledge distillation, RLHF, and imitation learning communities without a unified treatment. This survey provides such a treatment. We formalize OPD as f-divergence minimization over student-sampled trajectories, organize the field along three design axes (what to optimize, where the signal comes from, and how to stabilize training in practice), and consolidate success conditions, recurring failure modes, and the connection between OPD and KL-constrained reinforcement learning. We close with open problems that emerge from this synthesis, including distillation scaling laws, uncertainty-aware feedback, agent-level distillation, and the growing overlap between knowledge distillation and RL.

14.
arXiv (CS.CL) 2026-06-24

Decoherence as Defence and the Magnitude of Noise Regularisation: A Rigorous N -Qubit Theory of Stochastic Quantum Neural Networks for Adversarially Robust Network Intrusion Detection

Stochastic quantum neural networks (SQNNs) encode neuronal activations as qubits, synaptic topology as entanglement, and neural noise through a Lindblad master equation. A recent conference study applied a ring-entangled SQNN to collaborative intrusion detection and reached three conclusions: ring entanglement is essential for non-local anomaly detection; an adversarial-resilience bound holds but is conservative; and the depolarising channel fails to act as a dropout-style regulariser, behaving instead as output noise. It left open whether a per-gate stochastic deactivation (``true quantum dropout'') could regularise where the depolarising channel could not, and whether the loose robustness bound could be replaced by a predictive theory. This paper resolves both and extends the framework to real data and to neutral-atom hardware. We give an $N$-qubit formulation through the stochastic master equation and its vectorised Liouvillian, and prove a decoherence-contraction theorem: a depolarising channel of strength $\gamma$ over $L$ entangling layers contracts every weight-$w$ Pauli read-out by a factor $(1-4\gamma/3)^{wL}$ (for the weight-$1$ read-out used here, $(1-4\gamma/3)^{L}$); building on the general noise-as-defence result of Du et al., we make this quantitative and operational for intrusion detection. On the real NSL-KDD dataset under white-box FGSM and PGD attacks, a depolarising SQNN trained with the channel is, over seven seeds under strong $\ell_\infty$/$\ell_2$ attacks, significantly more robust than the noiseless circuit ($\ell_\infty$ PGD-$20$, $p=0.04$, large effect) and, critically, never suffers the catastrophic robustness collapse that the noiseless model and gradient-trained classical detectors (which fall from $95\%$ to $47\%$) do, cutting robustness variance roughly twofold; we show this robustness arises from a noise-reshaped training boundary rather than from attack-time gradient contraction. For generalisation, we derive an adaptive-penalty formula showing that per-gate dropout implements a curvature-weighted $L_2$ penalty $\tfrac{p(1-p)}{2}\sum\theta^2\partial^2_\theta L$ in weight space, maximised at $p=1/2$, whereas depolarising noise implements an output-space penalty. A $30$-seed study confirms the formula's quantitative prediction: both mechanisms reduce the train-test gap by a small but statistically significant margin ($\approx\!0.01$; $p

15.
Nature Medicine 2026-06-11

Microglia at a key inflection point in Alzheimer’s disease

Authors: Unknown Author

We analyzed brains from octogenarians and cognitively resilient centenarians to understand why some individuals with substantial Alzheimer’s disease pathology develop dementia whereas others remain cognitively intact. Spatial transcriptomics revealed gene expression changes in discrete tissue domains surrounding amyloid plaques and tau pathology that distinguish early, clinically silent, disease from later stages associated with cognitive decline.

16.
arXiv (CS.LG) 2026-06-16

Context-Aware Markov VAE for CSI Compression in Wireless Systems

arXiv:2606.16607v1 Announce Type: cross Abstract: This paper considers neural channel state information (CSI) compression for time-varying massive multiple-input multiple-output (MIMO) channels in frequency division duplex (FDD) systems with limited feedback resources. The main challenge lies in obtaining a compact and efficient representation of the CSI given that it exhibits strong temporal correlation across successive snapshots. Existing memoryless compression models do not exploit this property, while simple temporal extensions often incorporate multiple observations without explicitly modeling the latent dynamics. We propose a context-aware compression framework based on a k-memory Markov variational autoencoder (k-MMVAE), which uses a finite temporal window to capture the evolution of CSI in the latent space. The model introduces Markov-structured latent dynamics with finite memory, enabling efficient use of temporal dependencies for compression. Simulation results show that the proposed approach improves target CSI reconstruction performance compared to memoryless and weakly sequential baselines, particularly at low and moderate compression rates. These results suggest that explicit latent temporal modeling can provide an effective mechanism for CSI compression under limited feedback constraints.

17.
arXiv (CS.CL) 2026-06-15

Fractured Chain-of-Thought Reasoning

Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches the full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.

18.
arXiv (CS.CV) 2026-06-19

SAM3 Self-Distillation for Fine-Grained GOOSE 2D Semantic Segmentation

Authors:

We describe our 4th-place entry to the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge, which reached a composite mean Intersection-over-Union (mIoU) of 69.73% on the official 1,815-image test set. Our model adapts the image encoder of a recent visual foundation model, Segment Anything Model 3 (SAM3), with a lightweight decoder. Beyond this, we contribute two techniques and one empirical finding: (i) a self-distillation scheme that re-uses SAM3 itself, prompted with ground-truth boxes, as a teacher on the classes where it outperforms our own model; (ii) an image-level multi-scale test-time augmentation scheme that restores multi-scale inference for a fixed-input-size model by rescaling the image rather than the model input; and (iii) the finding that an aggressive photometric distortion from a winning 2025 GOOSE 2D entry, transplanted onto our pipeline, is its single largest source of improvement.

19.
arXiv (CS.AI) 2026-06-19

Reward as An Agent for Embodied World Models

arXiv:2606.19990v1 Announce Type: new Abstract: While RL has become a promising tool for refining world models, existing methods largely rely on conservative rollouts near the training distribution, limiting exploration, behavioral diversity, and richer dynamic discovery. In this work, we challenge this conservative paradigm. We argue that the core limitation is not exploration itself, but the lack of reliable verification strategies to support broader exploration. Without reliable verification, expanded exploration becomes highly susceptible to reward hacking, where policies exploit imperfect rewards without achieving genuine improvement. To evaluate this motivation, we instantiate our method in embodied world models, where physical plausibility, and task completion provide a rigorous testbed for scalable RL under complex dynamics. On the verification side, we introduce Reward as an Agent, an agentic reward framework that actively evaluates generated behaviors to provide robust reward signals and mitigate reward hacking under distribution shifts. On the exploration side, we introduce Dynamic-Aware Rollout Diversification through DynDiff-GRPO, which explicitly expands action-space exploration to diversify trajectories, broaden state-action coverage, and encourage richer embodied behaviors beyond conservative rollout regimes. By unifying Reward as an Agent with DynDiff-GRPO, we enable RL on a more reliable reward foundation with substantially diversified sampling, effectively mitigating reward hacking while yielding significant accuracy gains across multiple open-source world models, thereby demonstrating that broader exploration can scale successfully when grounded in robust verification.

20.
arXiv (CS.AI) 2026-06-17

LineageMark: Multi-user White-box Watermarking for Contribution Tracing in Model Derivation Chains

arXiv:2606.17123v1 Announce Type: cross Abstract: In open large language model (LLM) ecosystems, models are frequently adapted across multiple domains and applications, forming multi-stage derivation chains. Consequently, tracking and verifying historical contributions is essential for model provenance and intellectual property protection. However, existing watermarking methods are mainly designed for single-user, one-time embeddings, often fail under repeated model derivation and incremental updates. To address this problem, we propose LineageMark, a multi-user white-box watermarking framework for model derivation chains. The framework encodes watermarks in model parameters using a projection-based approach. Stable carriers are first selected to reduce sensitivity to model changes, each watermark bit is then represented as a projection statistic over these carriers. Additional watermark insertions introduce only bounded perturbations in the projection space, and margin constraints are used to maintain signal integrity. We evaluate the effectiveness of LineageMark in multi-stage model derivation chains. Experimental results show that LineageMark preserves contributor watermarks across multi-stage derivation and supports incremental multi-user watermark insertion. Furthermore, it exhibits robustness against perturbations such as re-watermarking, fine-tuning, quantization, and pruning.

21.
arXiv (CS.AI) 2026-06-12

ERTS: Adversarial Robustness Testing of Ethical AI via Semantic Perturbation in a Bounded Consequence Space

arXiv:2606.13282v1 Announce Type: new Abstract: As AI systems are deployed in high-stakes ethical contexts such as healthcare triage, autonomous vehicle control, and employment screening, formal methods for evaluating their robustness against adversarial manipulation of ethical reasoning remain underdeveloped. This paper introduces the Ethical Robustness Testing System (ERTS), a closed-pipeline framework that: (1) encodes ethical dilemmas into a 22-dimensional Ethical Consequence Space (ECS) grounded in established ethical theory; (2) applies 17 semantic perturbation functions subject to 6 validity constraint classes including a novel semantic coherence constraint; (3) measures decision deviation via a 4-component Ethical Instability Index (EII); and (4) produces domain-adaptive pre-deployment robustness assessment verdicts. We evaluate 4 structured baseline models and 2 production LLMs (Gemini 2.0 Flash and Llama 3.2) across 50 ethical scenarios spanning 8 deployment domains, generating 1,500 adversarial test cases. Results demonstrate that only 33% of models achieve assessment clearance, with the local Llama-3.2 model proving particularly vulnerable to fairness corruption and information degradation attacks (ERS = 0.737). To the best of our knowledge, no existing framework combines a bounded ethical consequence space, semantic coherence constraints, and domain-adaptive assessment in a single adversarial testing pipeline.

22.
arXiv (CS.CV) 2026-06-16

CRIS: Cross-Plane Self-Supervised Isotropic Restoration for Anisotropic Volumetric Imaging Across Modalities

Anisotropic volumetric acquisitions are common in clinical MRI and volume electron microscopy (vEM), where sparse through-plane sampling creates thick slices or sections that degrade orthogonal reformats and downstream analysis. We present CRIS, a cross-plane self-supervised framework for isotropic restoration without paired isotropic ground truth. CRIS casts 3D restoration as 2D stripe completion on orthogonal reformats of an isotropic grid: high-resolution in-plane slices are synthetically degraded and periodically masked for training, while at inference blank slices define the isotropic grid, two orthogonal reformats are restored, and predictions are fused by multi-view averaging. We evaluate CRIS on two MRI cohorts and two microscopy benchmarks up to 8x anisotropy. On brain MRI, CRIS achieves 32.921 +/- 0.436 dB PSNR and 0.9631 +/- 0.0027 SSIM, outperforming interpolation, SMORE4, SIMPLE, SA-INR, and ATME, and gives the best segmentation consistency (Dice 0.940 +/- 0.004, ASSD 0.245 +/- 0.014 mm, HD99 1.275 +/- 0.061 mm). On reference-free abdominal MRI, CRIS reduces FID/KID to 48.714/0.023. On vEM, CRIS outperforms interpolation, NIIV, and vEMINR, reaching 29.133 dB/0.834 3D PSNR/SSIM at 4x, 27.123 dB/0.734 on EPFL at 8x, and 21.915 dB/0.699 on noisy hemibrain data. In a robustness experiment, one variable-gap CRIS model evaluated across gap factors 3–7 and coronal, axial, and sagittal degradations maintained higher PSNR/SSIM than interpolation (36.36–31.14 dB and 0.977–0.932 vs. 33.07–27.85 dB and 0.951–0.853). These results support CRIS as a modality-flexible route to isotropic restoration without paired isotropic targets or configuration-specific retraining. Code is available at https://github.com/adi-hatav/CRIS.

23.
arXiv (CS.CL) 2026-06-25

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultaneously enabled, multiple open-weight models cease invoking tools despite maintaining high schema compliance. We refer to this behavior as Tool Suppression. Through controlled experiments across multiple model families and deployment settings, we consistently reproduce Tool Suppression under joint constraints, while tool execution and schema compliance remain functional when evaluated independently. Further analysis reveals that JSON Schema constraints are compiled into grammar-based token masks, causing tool-call tokens to become unreachable during decoding. This provides an implementation-level explanation for the observed behavior. To interpret the phenomenon, we formulate the Constraint Priority Inversion (CPI) hypothesis, which suggests that schema satisfaction may dominate action-selection behavior under multiple simultaneous constraints. We present CPI as a behavioral hypothesis consistent with the observed evidence rather than a verified internal mechanism. To mitigate the problem, we propose Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation. Experimental results show that this approach restores tool invocation while preserving structured output guarantees without requiring model retraining. These findings suggest that evaluating tool use and structured output separately may overlook important reliability issues in production Agent systems. Code, data, and docs will be released at https://github.com/Fzsama/Constrain-Tax-26-06.git.

24.
arXiv (CS.LG) 2026-06-24

Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization

arXiv:2606.24543v1 Announce Type: new Abstract: Large Language Models (LLMs) are traditionally viewed as autoregressive generators. However, from the perspective of collective computation, they function as high-dimensional Dense Associative Memories that store complex reasoning patterns as latent attractors. In this work, we investigate the energy landscape of mathematical reasoning. We posit that correct reasoning chains correspond to deep, wide attractor basins ("flat minima") in the model's output distribution, whereas hallucinations manifest as sharp, unstable local minima. To exploit this geometry, we introduce a retrieval mechanism based on a Gibbs measure of the trajectory's spectral entropy. By sampling multiple reasoning paths and weighting them by their inverse energy ($P \propto e^{-\beta E}$), we approximate the equilibrium distribution of the associative memory, effectively ``relaxing'' the system into a robust solution. Empirically, this physics-inspired mechanism improves Microsoft Phi-3.5 performance on GSM8K by 5.38\% (84.7\% $\to$ 90.1\%), demonstrating that inference is better modeled as a dynamic settling process into an attractor basin rather than greedy next-token prediction.

25.
arXiv (CS.CL) 2026-06-24

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B classifiers. Under gold-policy conditioning, a compact structured state improves macro-F1 over raw trajectories by 0.13-0.17 after tuning. We then replace the benchmark-designated policy clause with the top-ranked clause retrieved from decision-time context. Although the exact governing clause is retrieved at rank 1 for only 7% of airline states, the primary 3B classifier obtains macro-F1 0.58 with retrieved clauses versus 0.60 with gold clauses (Delta=-0.02, task-cluster 95% CI [-0.23,+0.21]); mismatched-policy and no-policy controls score 0.32 and 0.21. We do not detect a macro-F1 difference between retrieved and gold clauses in this configuration, although the interval remains too wide to establish non-inferiority. The same qualitative pattern appears with a second retriever and at 7B, while varying across fine-tuning configurations. These results indicate that exact-match clause recall can underestimate downstream policy utility in this benchmark setting, motivating evaluation with retrieved policies in the classification loop rather than recall alone.