Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-18

Architectural Bias in Face Presentation Attack Detection: A Comparative Study of Vision Transformers and Convolutional Neural Networks

Face Presentation Attack Detection (PAD) systems constitute a critical security layer in biometric authentication; however, existing approaches exhibit systematic performance disparities across demographic groups, disproportionately affecting individuals with darker skin tones. This paper presents a comparative empirical investigation of whether Vision Transformer architectures reduce demographic bias in face PAD systems relative to convolutional baselines. Experiments are conducted on the CASIA-SURF Cross-Ethnicity Face Anti-Spoofing (CeFA) dataset. Three architectures are evaluated: a Multimodal ViT-Tiny trained from scratch, a ResNet18 CNN baseline, and a pretrained DeiT-S fine-tuned on CeFA across African, East Asian, and zero-shot Central Asian demographic groups. DeiT-S achieves the highest overall accuracy of 97.27% and the lowest EER of 0.86%, outperforming ResNet18 at 90.15% accuracy. In terms of fairness, DeiT-S reduces the inter-ethnic ACER gap between African and East Asian subjects to 0.13%, compared to 0.75% reported in an LBP-based work [6], representing an 83% reduction. Most notably, while ResNet18 records a BPCER of 10.44% on zero-shot Central Asian subjects, DeiT-S maintains 2.89% on the same unseen group, demonstrating a 3.6x generalization advantage. These results suggest that pretrained Vision Transformers achieve superior PAD accuracy, produce smaller demographic performance gaps, and generalize more equitably across unseen demographic groups, indicating that cross-demographic fairness in PAD may partly be influenced by architectural design.

02.
arXiv (CS.AI) 2026-06-18

EffiNav: Fusing Depth and Vision-Language for Efficient Object Goal Navigation

arXiv:2606.18634v1 Announce Type: cross Abstract: To locate a target object while exploring the unknown environment is a fundamental capability for autonomous agents, with applications ranging from search-and-rescue to field robots. A simplified version of such task is Object Goal Navigation (ObjNav). In ObjNav, successful arrival at the target object provides a basic measure of performance; however, the efficiency of the navigation trajectory is equally important, as it indicates how intelligently the agent explores and how much time remains for subsequent tasks. In unknown environments, the key to efficient navigation lies in deciding where to explore next. While many prior works aim to address this core challenge and achieved promising performance in certain settings, recent training-based models and non-training frameworks still suffer from generalization and efficiency issues respectively, which in the worst cases can lead to excessive exploration of already-visited areas or redundant back-and-forth motion. We evaluate EffiNav on two widely used simulation benchmarks Habitat Matterport 3D (HM3D) and Open-Vocabulary Object goal Navigation (OVON), and further validate its effectiveness on physical robots in real-world settings. We conduct failure analysis on massive simulation episodes. With minimal modification, we also extend EffiNav to a memory-augmented ObjNav task on the GOAT-BENCH dataset, demonstrating its adaptability beyond standard ObjNav settings. Across two standard metrics–Success Rate (SR) and Success weighted by Path Length (SPL), EffiNav matches or outperforms recent baselines, reflecting its efficiency, robustness, and practical applicability. Recognizing the different emphases of the two datasets, the performances reveals this framework is more balanced and generalizable for efficient ObjNav.

03.
arXiv (CS.CL) 2026-06-24

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental announcements. This review works at two levels. At the level of mechanism, we organise the main transformer families into a working taxonomy, covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. We then extend the discussion to post-2023 developments that changed the picture in practice: instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the current flagship model families from OpenAI, Anthropic, Google, Meta, Mistral and DeepSeek. At the level of use, we survey deployments across healthcare, finance, legal, education, customer service, creative writing and scientific work. Based on this we link each to the specific capabilities that make a transformer the appropriate tool. The contribution of this paper is a critical assessment that is based on the survey. We compare architectures on four axes that matter to deployment decisions, we quantify the trade-off between parameter count and energy cost. We also discuss how alignment methods, data provenance and benchmark saturation change what it means to call a model "state of the art". The final section lists the research questions that we think deserve more attention.

04.
arXiv (CS.AI) 2026-06-12

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv:2606.12848v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for tasks once reserved for trained researchers, including hypothesis generation, specification choice, and drafting conclusions. We argue that the reliability of AI-assisted research depends not only on model capability, but also on how cognitive labour is structured between humans and machines. We study this problem through Human-in-the-Loop Economic Research (HLER), a decision architecture based on pre-commitment, decision sequencing, accountability, and attention allocation. In a pre-specified 2*4 factorial experiment with 280 complete research runs across four datasets, an unconstrained multi-agent baseline produced critical failures in 72% of runs. Using the same underlying model, the same agent decomposition, and identical prompts for the shared reasoning agents, HLER reduced the failure rate to 16% by imposing three architectural commitments: LLMs reason but do not execute data work, data and estimation are handled deterministically, and three human decision gates bind the workflow. Fisher's exact test rejects equality of failure rates at p

05.
arXiv (math.PR) 2026-06-16

Cluster sizes in subcritical soft Boolean models

arXiv:2404.13730v2 Announce Type: replace Abstract: We consider the soft Boolean model, a model that interpolates between the Boolean model and long-range percolation, where vertices are given via a stationary Poisson point process. Each vertex carries an independent Pareto-distributed radius and each pair of vertices is assigned another independent Pareto weight with a potentially different tail exponent. Two vertices are now connected if they are within distance of the larger radius multiplied by the edge weight. We determine the tail behaviour of the Euclidean diameter and the number of points of a typical maximally connected component in a subcritical percolation phase. For this, we present a sharp criterion in terms of the tail exponents of the edge-weight and radius distributions that distinguish a regime where the tail behaviour is controlled only by the edge exponent from a regime in which both exponents are relevant. Our proofs rely on fine path-counting arguments identifying the precise order of decay of the probability that far-away vertices are connected.

06.
arXiv (CS.AI) 2026-06-16

Continuous Cross-Domain Traffic State Prediction via Memory-Augmented Graph Liquid Time-Constant Networks

arXiv:2606.15807v1 Announce Type: cross Abstract: Traffic state prediction is a fundamental task in intelligent transportation systems. In practical applications, some regions suffer from limited traffic observations due to insufficient sensing infrastructure, making cross-domain knowledge transfer an important solution for data-scarce traffic prediction. However, existing cross-domain traffic prediction methods still face several limitations, including coarse-grained source-target adaptation, limited capability in handling unseen target-domain patterns, and insufficient modeling of continuous traffic dynamics under irregular or heterogeneous temporal conditions. To address these issues, this paper proposes a continuous cross-domain traffic prediction framework, termed Memory-Augmented Graph Liquid Time-Constant Network (MA-GLTC). Specifically, we first construct spatio-temporal units (STUs) to decompose traffic networks into transferable local units, enabling fine-grained knowledge alignment across domains. Then, a graph liquid time-constant network (GLTC) is developed to model graph-coupled traffic evolution in continuous time. Different from generic graph neural ODE-based models, GLTC introduces graph-coupled recurrent conductance into liquid time-constant dynamics, allowing node states to evolve with leakage, adaptive time constants, and neighborhood-aware feedback. Furthermore, a Memory-based Transfer Storage (MTS) mechanism is designed to preserve source-domain knowledge, retrieve matched traffic patterns, and update reliable target-domain patterns when unseen states emerge. Experiments on five public traffic datasets demonstrate that MA-GLTC consistently outperforms representative innerdomain and cross-domain baselines in both short-term and longterm prediction tasks. Compared with the second-best method, MA-GLTC reduces the average prediction errors by 3.02%, 0.33%, 8.92%, 10.09%, and 2.11%, respectively.

07.
arXiv (CS.LG) 2026-06-11

Coverage Guarantees for Pseudo-Calibrated Conformal Prediction under Distribution Shift

arXiv:2602.14913v2 Announce Type: replace Abstract: Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.

08.
arXiv (CS.LG) 2026-06-16

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

arXiv:2602.05779v2 Announce Type: replace Abstract: The Edge-of-Chaos (EoC) theory developed for the random initialization of deep networks allows more efficient training by both preserving information in the initial outputs of the network and minimising exploding or vanishing gradients through characterisation of the intermediate layers as Gaussian processes. This EoC theory provides formulae for the choice of the initialisation distribution variances of the weights and biases. For activations which are approximately linear around the origin, the EoC theory typically encourages the Gaussian process variance to converge towards zero with increasing depth. Here we consider the less studied setting of highly sparsity inducing activations where a large region of values near the origin are set to zero. In this setting we prove a new phenomenon whereby initialisations leading to larger fixed Gaussian processes are beneficial to training stability. This theory informs a new, yet simple, initialisation strategy that allows training DNNs and CNNs with as large as 90\% sparsity in the hidden layers.

09.
arXiv (CS.CV) 2026-06-12

Towards More General Control of Diffusion Models Using Jeffrey Guidance

A key strength of diffusion models lies in their flexibility, since their outputs can be controlled at sampling time through guidance. However, beyond simple cases such as conditional sampling, the target distribution is often left implicit, defined only through a sampling rule or a heuristic energy function. To address this, we propose Jeffrey guidance, a principled framework that extends diffusion-model control to applications beyond what standard guidance can express. It leverages Jeffrey's rule of conditioning to update marginal distributions towards a prescribed target, preserving the conditional structure and minimally perturbing the joint distribution. We first demonstrate Jeffrey guidance by targeting a prescribed embedding distribution. With Inception embeddings as the target, this leads to substantial reductions in FID on both CIFAR-10 and FFHQ. We further apply Jeffrey guidance to fairness on CelebA-HQ, updating an unconditional diffusion model to enforce independence between attributes.

10.
arXiv (CS.CL) 2026-06-25

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or sentence level, or at tokens where failure has already occurred. Neither identifies the precise token that triggers the shift toward failure. We introduce the cliff token, a token where the token-wise potential drops significantly under an adaptive threshold that scales with the local token-wise potential, based on a one-sided two-proportion z-test. Across seven models and three mathematical reasoning benchmarks (GSM1K, MATH500, AIME 2025), cliff tokens act as failure triggers; deleting the first cliff token and resampling recovers pass@64 to 1.0, while keeping it limits recovery to between 0.71 and 1.00. We further introduce a cliff taxonomy of deterministic, uncertain, and sampled-off cliffs, defined by greedy choice and token entropy. Each type has distinct probabilistic characteristics, and the taxonomy generalizes across model scales. Finally, we validate the taxonomy via single-token preference optimization at cliff positions (Cliff-DPO). Trained on GSM8K, Cliff-DPO improves accuracy across benchmarks by up to +6.6. Optimizing at uncertain and sampled-off cliffs improves reasoning, while deterministic cliffs do not.

11.
arXiv (quant-ph) 2026-06-17

Universal Design and Physical Applications of Non-Uniform Cellular Automata on Translationally Invariant Lattices

arXiv:2605.13379v2 Announce Type: replace Abstract: Motivated by recent theoretical and experimental advances, hyperbolic lattices have emerged as a paradigmatic setting in which geometry becomes an active organizing principle of quantum systems. Their negative curvature, exponential volume growth, and non-Abelian translation symmetry make them fundamentally distinct from Euclidean lattices and give rise to rich geometry-dependent physics, but also hinder the direct application of well-established analytical and computational approaches originally developed for physical systems defined on Euclidean lattices. To establish a unified framework for geometry-dependent physics on Euclidean and hyperbolic lattices, we develop higher-order non-uniform cellular automata (NUCA) as a local-to-global construction for translationally invariant regular lattices. This construction derives geometry-dependent update rules through a lattice-deforming procedure that embeds hyperbolic lattices into a Euclidean square lattice, thereby encoding hyperbolic geometry while preserving physical locality. It thus provides a systematic route toward quantum and classical physics on hyperbolic lattices. We demonstrate the framework in three applications ranging from quantum many-body physics to non-equilibrium statistical physics. First, on the hyperbolic $\{5,4\}$ lattice, a linear NUCA generates exactly solvable subsystem symmetry-protected topological (SSPT) models and spontaneous subsystem symmetry-breaking models. Second, as a quantum generalization, we construct non-uniform Clifford quantum cellular automata (CQCA) for the hyperbolic cluster state. Third, we formulate a probabilistic NUCA for directed percolation (DP) on the hyperbolic lattice.

12.
arXiv (CS.CL) 2026-06-19

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Knowledge graph (KG) reasoning infers new knowledge from existing facts and is widely applied in question answering, recommendation, and decision support. With the rapid development of large language models (LLMs), LLM-based KG reasoning frameworks have become increasingly popular by leveraging retrieved KG information. However, hallucinations in LLMs remain a critical issue. Even when relevant KG knowledge is incorporated, models may still generate incorrect outputs, leading to misinformation and unreliable decisions. Existing hallucination detection methods either focus on LLM internal states or verify consistency with retrieved contexts, but both overlook the structural information in KGs, resulting in suboptimal performance. To address this gap, we propose LUCID, the first halLUcination deteCtIon method for LLM-based knowleDge graph reasoning frameworks. LUCID jointly leverages LLM attention scores, KG semantics, and structural information. Specifically, it extracts node and edge features from attention scores and semantic similarities, and integrates them with KG structure using a graph neural network. We also construct manually annotated benchmark datasets for evaluation. Experiments on nine datasets show that LUCID achieves state of the art performance compared to 15 baselines.

13.
arXiv (CS.CV) 2026-06-15

TSA: Temporal Slot Activation for Persistent Object-Centric Video Representation

Unsupervised video object-centric learning aims to decompose dynamic scenes into temporally persistent entity representations. Existing recurrent video slot-attention methods propagate a fixed set of slots across frames, but typically assume unconditional slot propagation: every slot is updated and decoded at every frame, regardless of whether its corresponding object is visible. We show that this design violates a basic lifecycle requirement for persistent slots: when an object is absent or fully occluded, its slot should preserve its previous state and avoid explaining unrelated visible content. Instead, unconditional propagation creates two failure pathways: update-induced state drift, where current-frame evidence overwrites the absent object's representation, and decoder-induced reconstruction interference, where the inactive slot remains coupled to reconstruction through decoder attention. We propose Temporal Slot Activation (TSA), a mechanism that learns a per-slot, per-frame activation score $\alpha_{k,t} \in (0, 1)$ without visibility supervision. TSA uses this activation as a shared latent control variable for slot lifecycle modeling. When a slot is inactive, TSA anchors its state to the previous slot via activation-gated updating and suppresses its decoder participation through an activation-dependent additive bias on attention logits before softmax normalization. This jointly reduces state drift and reconstruction-driven interference. To improve decisions under partial occlusion and gradual reappearance, TSA further conditions activation prediction on a per-slot temporal memory produced by a Temporal Context Encoder. We evaluate TSA on MOVi-C/E, YT-VIS, and OVIS benchmarks using both standard and tracking-based metrics (FG-ARI, mBO, IDF1, HOTA). TSA consistently improves object decomposition and temporal identity preservation, with large gains on long, heavily occluded videos.

14.
arXiv (CS.LG) 2026-06-19

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

arXiv:2606.19690v1 Announce Type: new Abstract: From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

15.
bioRxiv (Bioinfo) 2026-06-18

Metrics for Evaluating Biological AI Model Predictive Accuracy at the Data-Substrate Level

Authors:

Reports in the biological literature disagree on whether a given model can predict a biological outcome from a given data sample — one study finding a model capable, another, on the same kind of data, finding it is not. This is particularly a challenge in relation to LLMs–where the models are large and opaque, with weights and training data inaccessible.textbf{ }Such disagreements cannot be settled by directly inspecting the model. To address this challenge, we considertextbf{ }an alternative approach: assessing whether the data sample is adequate to support the prediction asserted. For a given dataset, its substrate — the underlying structure of the data — determines what any model can recover, independent of architecture or capacity. At the same time, predicting the present state of a biological process and predicting the direction of its future change are different tasks; the second is supportable among AI models only where the data encode direction as determinable from the state — a property we call encoding — and is unsupportable where the same observed state precedes change in opposite directions — a property we call non-identifiability, in the informational rather than the statistical sense. We introduce two generic metrics, Predictive Blindness Risk (PBR) and Prediction Indeterminacy Measure (PIM), that evaluate a data substrate for predictive accuracy directly — without access to model weights, architecture, or training data — and locate the regions of a data substrate where a predictive claim can be supported and where it cannot. Using human biological subjects, we employ the Yale Brain Metastases Longitudinal Data (1,430 human subjects; 11,892 MRI studies; four sequences) and show that direction of change was non-identifiable across regions encompassing the majority of transitions; a nonlinear AI model gained essentially nothing over majority-direction prediction there while recovering direction near-perfectly where the state encoded it; and model accuracy tracked data-substrate resolvability continuously (Spearman {rho} = -0.95 to -1.00). The metrics adjudicate, before any model is trusted and from the data alone, where claims of predictive accuracy — of state, or of the law of change — can be supported.

16.
arXiv (CS.AI) 2026-06-24

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

arXiv:2602.23334v2 Announce Type: replace-cross Abstract: Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185 to 3.5671 times speedup in inferring mixed-precision models and has less critical path delay, supporting a higher clock frequency (250MHz).

17.
medRxiv (Medicine) 2026-06-22

Maternal-Fetal immune networks and viral signatures in the healthy amniotic cavity

The intrauterine environment has traditionally been viewed as a privileged site protected by the placental barrier. However, emerging evidence suggests that early in utero microbial exposure may prime the developing fetal immune system. Here, using target-enriched metagenomics and high-dimensional proteomics, we characterized the intra-amniotic viral landscape and immune networks in 114 healthy pregnancies including both normal and anomalous fetuses. We identify a sparse yet heterogeneous human viral signature in 26% of samples, predominantly composed of Herpesviridae, Polyomaviridae, and Picornaviridae. Although viral reads abundance was associated with fetal abnormalities, viral detection generally did not induce overt inflammatory activation, supporting a state of immune homeostasis within the amniotic cavity. Instead, viral presence was associated with subtle and selective immune modulation, including altered inducible antimicrobial peptide expression (HBD-2 and HBD-3), coupled with an attenuation of regulatory cytokines. Our results further reveal that the amniotic immune environment is primarily governed by gestational age, transitioning from a Th1-predominant "alert" phase to innate-readiness preceding parturition. These findings suggest that fragments of viral genetic material within the amniotic cavity may contribute to fetal immune instruction without triggering overt inflammation, providing a foundational framework for understanding how "silent" viral-exposure during gestation influences the developmental origins of neonatal immunity.

18.
arXiv (CS.LG) 2026-06-17

Variational autoencoders with latent high-dimensional steady geometric flows for dynamics

Authors:

arXiv:2410.10137v5 Announce Type: replace Abstract: We develop Riemannian approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics, which we refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, embedded in Euclidean space are learned in the intermediary latent space developed by encoders and decoders. By tailoring the geometric flow in which the latent space evolves, we induce latent geometric properties of our choosing, which are reflected in empirical performance. We reformulate the traditional evidence lower bound (ELBO) loss with a considerate choice of prior. We develop a linear geometric flow with a steady-state regularizing term. This flow requires only automatic differentiation of one time derivative, and can be solved in moderately high dimensions in a physics-informed approach, allowing more expressive latent representations. We discuss how this flow can be formulated as a gradient flow, and maintains entropy away from metric singularity. This, along with an eigenvalue penalization condition, helps ensure the manifold is sufficiently large in measure, nondegenerate, and a canonical geometry, which contribute to a robust representation. Our methods focus on the modified multi-layer perceptron architecture with tanh activations for the manifold encoder-decoder. We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better. Our methods can outperform this and a VAE endowed with our proposed architecture, frequently reducing out-of-distribution (OOD) error between 15% to 35% on select datasets. We highlight our method on ambient PDEs whose solutions maintain minimal variation in late times. We provide empirical justification towards how we can improve robust learning for external dynamics with VAEs.

19.
arXiv (CS.CL) 2026-06-15

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

To address the challenge that current dialogue policy planning methods struggle to dynamically adapt to diverse user characteristics, this paper proposes a User Portrait based Nested Rollout Policy Adaptation (UP-NRPA) online framework with Large Language Models. In contrast to conventional approaches dependent on model training and require offline reinforcement learning policy models for user groups, UP-NRPA enables dynamic customization of dialogue strategies through an adaptive mechanism. This is achieved by leveraging real-time user feedback alongside personality, preferences, and objectives mapped from the current user portrait, thereby adapting to user characteristics without offline reinforcement learning. In collaborative and non-collaborative dialogue benchmarks, UP-NRPA demonstrated considerable benefits, achieving an impressive 100% success rate in multiple dialogue tasks. Particularly in negotiation tasks, the sale-to-list ratio (SL) increased by 56.41%. This demonstrates that UP-NRPA can adapt to diverse user needs without requiring a training mechanism, enabling the dialogue system to adapt to user characteristics.

20.
arXiv (CS.CL) 2026-06-11

Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models

With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-personality induction, multi-personality induction, and personality switching. Experiments show that personality induction improves image captioning performance but can impair performance on tasks requiring precise reasoning, such as visual question answering (VQA). Balancing and residual effects are observed during multi-trait composition and dynamic switching, indicating that model behavior is co-modulated by both previous and current personality constraints. Existing prompt-based personality induction methods show limited transferability to multimodal settings. Our work reveals the dynamic and complex nature of personality modeling in MLLMs and underscores the need for robust, tailored methods for personality induction and evaluation. The code will be released when the paper is accepted.

21.
arXiv (CS.CL) 2026-06-16

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code and implementation details are released at https://github.com/XMUDeepLIT/SAAS.

22.
arXiv (CS.CV) 2026-06-12

Modality-Aware Feature Matching in Visual and Vision-Language Applications: A Comprehensive Survey

Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

23.
medRxiv (Medicine) 2026-06-24

Computational Decomposition of New Memory Failure in Alzheimer's Disease Through a Hippocampal Cortical Consolidation Bottleneck Model

Alzheimer's disease (AD) is clinically marked by difficulty retaining newly learned information, yet routine memory scores often conflate poor initial encoding with failure to stabilise information after encoding. This ambiguity limits the mechanistic interpretability of cognitive assessment during the transition from mild cognitive impairment to AD. Here we propose a Hippocampal Cortical Consolidation Bottleneck (HCCB) model to computationally separate these two components of new memory failure. The model represents newly presented information as a rapidly formed hippocampal trace and a slowly stabilised cortical trace, predicting a residual bottleneck when delayed recall falls below the level expected from immediate recall. We operationalised this prediction as Consolidation Bottleneck Index*(CBI*), a cognitively normal reference normalised residual index, and evaluated it using Alzheimer's Disease Neuroimaging Initiative (ADNI) cognitive and MRI data, with independent dynamical support from OpenNeuro EEG. Simulations showed recent memory vulnerability when hippocampal vulnerability exceeded cortical vulnerability. In ADNI, CBI* increased from cognitively normal participants to mild cognitive impairment nonconverters, reached Alzheimer like levels in mild cognitive impairment converters, and was associated with hippocampal atrophy. CBI* added minimal discrimination beyond established clinical and structural predictors, supporting its role as a mechanistic phenotype rather than a replacement prognostic model. OpenNeuro EEG further showed increased neurodynamic rigidity in AD. Our findings provide a computational framework for quantifying failed stabilisation of newly encoded information in AD progression.

24.
PLOS Medicine 2026-05-21

Novel symptoms associated with eclampsia could improve detection and save lives

by Alice Beardmore-Gray, Andrew Shennan Eclampsia is a life-threatening complication of pre-eclampsia, yet remains difficult to predict. In this Perspective, Alice Beardmore-Gray and Andrew Shennan highlight a recent study that identifies 10 novel prodromal symptoms of eclampsia, with potential to better predict which women are at risk and therefore reduce delays in intervention.

25.
arXiv (CS.LG) 2026-06-15

Neural Slack Variables for Shape Constraints

arXiv:2606.13803v1 Announce Type: new Abstract: Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.