Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

Clothes-changing person re-identification (CC-ReID) aims to recognize individuals despite drastic appearance changes caused by clothing variation. While existing methods rely on adversarial learning to disentangle clothing features, we propose Ortho-ReID, which explicitly models a low-rank clothing subspace from VLM text descriptions and extracts clothing-invariant representations via direct geometric constraints. A critical component is our transformer-based Basis Maker, which refines a shared, low-dimensional clothing prior into an instance-adaptive low-rank subspace through cross-attention with image patches, enabling robust clothing feature extraction even under varying visibility conditions. This instance-adaptive subspace is supervised via alignment with clothing text embeddings, while identity features are extracted via a learnable projection head and geometrically constrained to be strictly orthogonal to it. Extensive experiments demonstrate state-of-the-art performance on PRCC (+5.9% top-1), Celeb-reID-light (+3.5%), and LaST (+5.3%), with competitive results on LTCC.

02.
arXiv (CS.CL) 2026-06-15

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

Deployed language models must decide not only what to answer but also when not to answer. We present UniCR, a unified framework that turns heterogeneous uncertainty evidence including sequence likelihoods, self-consistency dispersion, retrieval compatibility, and tool or verifier feedback into a calibrated probability of correctness and then enforces a user-specified error budget via principled refusal. UniCR learns a lightweight calibration head with temperature scaling and proper scoring, supports API-only models through black-box features, and offers distribution-free guarantees using conformal risk control. For long-form generation, we align confidence with semantic fidelity by supervising on atomic factuality scores derived from retrieved evidence, reducing confident hallucinations while preserving coverage. Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics, lower area under the risk-coverage curve, and higher coverage at fixed risk compared to entropy or logit thresholds, post-hoc calibrators, and end-to-end selective baselines. Analyses reveal that evidence contradiction, semantic dispersion, and tool inconsistency are the dominant drivers of abstention, yielding informative user-facing refusal messages. The result is a portable recipe of evidence fusion to calibrated probability to risk-controlled decision that improves trustworthiness without fine-tuning the base model and remains valid under distribution shift.

03.
arXiv (CS.CL) 2026-06-24

Sexualised synthetic personas encode and amplify gendered power asymmetries through voice

This work examines sexualised AI-generated English-speaking voices offered by a popular commercial platform. New technologies may enable sexual empowerment and greater diversity in gender expression, yet toxic masculinity, heteronormativity, and the abuse of women and LGBTQ+ people remain pervasive online. Drawing on a Feminist HCI perspective, we examine how commercial voice AI systems reproduce and circulate particular performances of gender. We conducted a listening experiment with a diverse group of listeners, combining quantitative adjective selection, qualitative free-text responses, and acoustic analysis. Participants evaluated male- and female-coded voices presented with either sexualised scripts or neutral text. Results reveal a narrow range of gender expression, largely binary and heteronormative. Female-coded voices are more frequently described using sexualised and submissive terms, while male-coded voices are more often associated with dominance and positive traits.

05.
arXiv (CS.CL) 2026-06-17

ALAS: An Automatic Latent Alignment Score for Audio Language Models

Large Language Models (LLMs) are extended into Speech-LLMs, and the quality of the audio–text alignment they learn affects most downstream Spoken Language Understanding (SLU) behavior. Yet despite a growth of fusion strategies, there is no standard way to measure how well a Speech-LLM internally binds audio frames to text tokens. We introduce ALAS (Automatic Latent Alignment Score), a model and task-agnostic metric that probes the LLM's per-layer hidden states, scoring the cross-modal cosine similarity between audio and text representations against a Whisper-derived reference. ALAS needs only a frozen forward pass and an off-the-shelf ASR reference, with no training or fitted classifier, and is calibrated to an interpretable uniform baseline comparable across tasks. Applying ALAS to four open-source Speech-LLMs (AF3, Qwen2-Audio, Qwen-Omni, SALMONN) across emotion recognition (IEMOCAP), open-ended SQA (LibriSQA), and multi-choice audio understanding (MMAU-speech), we find that the depth and strength of alignment reflect each model's audio-encoder design and the acoustic-versus-semantic demands of the task, and that ALAS tracks but does not duplicate task accuracy, exposing models that score well without genuinely grounding in the audio. We release ALAS as an open-source library so that practitioners can probe their own Speech-LLMs or try it on new tasks.

06.
arXiv (CS.AI) 2026-06-24

Polycepta: Object-Centric Appearance Estimation for Multi-Object Tracking

arXiv:2606.23604v2 Announce Type: replace-cross Abstract: The tracking-by-detection paradigm in multi-object tracking (MOT) typically relies on static appearance descriptors to complement motion estimation. However, these descriptors are frame-independent, limiting their robustness as visual cues. Since such descriptors are often obtained from computationally intensive pretrained backbones, real-time MOT systems frequently abandon appearance cues altogether and rely solely on motion prediction and geometric association. In this work, we introduce Polycepta, an object-centric appearance state estimation framework that reformulates appearance modeling as a recursive estimation problem rather than a frame-wise matching task. Polycepta constructs and continuously updates an independent appearance state for each tracked object, enabling future appearance representations to be estimated from accumulated observations. Polycepta is encouraged to learn the appearance-state construction of object-specific representations rather than memorize them through a proposed learning strategy, enabling appearance estimation for unseen classes. A key property of Polycepta is that the quality of appearance estimation improves as object states evolve during inference. While conventional appearance descriptors remain static or degrade over time, Polycepta progressively refines appearance estimates as additional observations are accumulated. Extensive experiments on KITTI, the Waymo Open Dataset, and MOT17 demonstrate consistent reductions in identity switches and improvements in tracking performance when integrated into the tracking-by-detection pipelines. Polycepta operates at 90.57 Hz and delivers state-of-the-art performance on the KITTI benchmark when integrated into the RobMOT framework, achieving a MOTA of 92.27\%.

07.
arXiv (CS.LG) 2026-06-11

Triangular-Reference Schrödinger Bridges for Time Series Generation

arXiv:2605.27478v3 Announce Type: replace-cross Abstract: Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya–Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

08.
arXiv (CS.AI) 2026-06-16

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

arXiv:2606.01365v2 Announce Type: replace Abstract: Failure-aware observability diagnoses wasted computation in multi-agent LLM systems before final-answer evaluation can explain what went wrong. We propose a trace-based framework for a three-agent architecture – orchestrator, search agent, and execution agent – that converts structured events into online signals for loops, budget pressure, low information gain, and tool instability, then adds offline semantic grounding metrics and selective LLM-as-judge evaluation. On 165 GAIA validation traces under identical caps, 98 runs produce usable final answers and 67 fail or stop without one. Among warned failed runs, 58.1% of tokens are spent after the first warning on average, indicating substantial opportunity for intervention. A 10-task Level-2 pilot uses warnings to diversify search or require evidence, reducing post-warning token fraction from 0.638 in the baseline to 0.304. The results support a layered design: cheap online signals help the orchestrator redirect or halt redundant behavior, while deeper semantic checks identify whether completed answers are grounded enough to trust.

09.
bioRxiv (Bioinfo) 2026-06-16

Super Learner Ensemble Modeling of CPTAC Proteomic Data for Survival Prediction in Head and Neck Squamous Cell Carcinoma

Survival analysis in head and neck squamous cell carcinoma (HNSCC) is traditionally performed using Cox proportional hazards models, alongside some exploration into black-box machine learning methods. The Super Learner (SL) algorithm addresses this model selection dilemma by combining diverse candidate algorithms into a weighted ensemble to perform comparably to the best candidate method. This study evaluates the performance of SL in HNSCC. Proteomic features as well as clinical covariates from 96 CPTAC HNSCC samples were modeled with three candidate algorithms (Cox LASSO, Cox Ridge, and Random Survival Forest) as well as the ensemble SL method. Models were optimized via Uno's time-dependent Concordance Index (C-index) and tested at 1- and 3-year time horizons using 2000 bootstrap resamples. The Cox Ridge regression model achieved the highest predictive accuracy among the four total methods. However, the SL demonstrated stable performance over both time horizons (1-year C-index: 0.985; 3-year C-index: 0.960). Variable importance analysis of the Cox Ridge model successfully identified malignant proteins (ATR, MAML1, MIEN1) alongside novel potential prognostic indicators (ZNF800, KERA). This analysis emphasizes the statistical necessity for larger cohorts for ensemble learning, while providing a benchmark of proteomic indicators in HNSCC.

10.
arXiv (CS.AI) 2026-06-24

Heterogeneous 2D/1D Signal Representation Fusion for Underwater Acoustic Modulation Recognition Under Distribution Shift

arXiv:2606.23702v1 Announce Type: cross Abstract: Modulation recognition systems rely on heterogeneous signal representations. 2D signal-image modalities such as time-frequency and cyclostationary maps capture structural patterns, while 1D statistical descriptors such as higher-order power spectra encode complementary cues. Under distribution shift, these modalities degrade unevenly, making robust fusion a central challenge for practical deployment. Progress is further limited by the lack of a unified evaluation protocol that systematically separates different shift types. This paper addresses both challenges through a joint benchmark-and-model study in underwater acoustic modulation recognition. UAMR-ShiftBench is the first benchmark to jointly cover in-distribution, low-SNR, unseen-environment, unseen-communication-parameter, and measured sea-trial evaluation under a single matched protocol, with two independent real-world subsets collected during two sea-trial campaigns conducted in March and November in the South China Sea. SCP-TriCA fuses STFT, cyclostationary, and P2/P4 (second- and fourth-order power spectra) modalities hierarchically: the two 2D modalities are first aligned through bidirectional cross-attention, and the 1D statistical modality is then incorporated through a sample-adaptive selective gate. On UAMR-ShiftBench, SCP-TriCA achieves 95.33% in-distribution accuracy and 74.59% simulated OOD average, outperforming the strongest baseline by 5.12 percentage points, and reaches 91.14% and 94.86% on the two sea-trial subsets, exceeding the best baseline by 15.71 and 23.00 percentage points respectively. Ablation results confirm that the gains stem from modality complementarity and the hierarchical fusion design. Code and models are available at https://github.com/ronglaiqian/UAMR-ShiftBench.

11.
PLOS Computational Biology 2026-06-22

TCRBinder: Unified pre-trained language model with paired-chain synergy for predicting T-cell receptor binding specificity

作者:

by Weihe Dong, Qiang Yang, Long Xu, Xiaokun Li, Kuanquan Wang, Suyu Dong, Gongning Luo, Xianyu Zhang, Tiansong Yang, Xin Gao, Guohua Wang Deciphering how human T cells recognise peptide-HLA (pHLA) complexes underpins next-generation vaccines and personalised immunotherapies, yet extreme sequence diversity and paired-chains interdependence still hamper reliable in silico prediction of T-cell receptor (TCR) specificity. To overcome these hurdles, we built TCRBinder, a paired-chain-aware deep model with a multi-branch encoder that routes each molecular component through dedicated transformer-based modules to capture contextual signals in both HLA pseudo-sequences and antigenic peptides while simultaneously processing the TCR α and β chains. This design captures the synergistic interaction between paired chains to emulate peptide-HLA-TCR (PHT) interactions and expose residue-level contact motifs. Across PHT and peptide-TCR (pTCR) benchmarks, the model delivered state-of-the-art performance (AUC-ROC = 0.911, AUPR = 0.791 for the PHT task) and remained superior on multiple independent datasets. We tracked the dynamics of clonal expansion and, in a large SARS-CoV-2 repertoire containing completely unseen peptides, improved the AUC-ROC by up to 16.3% over the leading alternatives. Moreover, TCRBinder provided mechanistic insights by pinpointing contact hotspots and quantifying residue contributions to binding probability. These capabilities position TCRBinder as a versatile tool for rational antigen discovery, immunotherapy stratification, and neoantigen vaccine design.

12.
arXiv (quant-ph) 2026-06-25

Maximal global device-independent randomness from projective measurements in every dimension

arXiv:2606.21369v2 Announce Type: replace Abstract: Device-independent random number generation (DIQRNG) is the most secure form of generating private randomness using quantum physical processes. Its strength lies in producing numbers that are impossible to predict by any eavesdropper restricted by the laws of quantum theory. Moreover, security is proven solely from observed measurement statistics, without the need to characterise or trust the devices used in random number generation. Implementing DIQRNG is, however, costly, as it requires high-quality entangled systems. It is therefore important to make the best use of available resources. In this work, we show that using projective measurements – which are most readily implementable experimentally – one can certify $2\log(d)$ bits of device-independent randomness from a bipartite system of local dimension $d$ for every $d \ge 2$, thus reaching the theoretically maximum possible rate of DIQRNG. We provide explicit protocols reaching $2\log(d)$ bits based on mutually unbiased bases. Furthermore, we compute numerical bounds on the rate for the case of imperfect implementations, showing that our protocols are robust to experimental noise.

13.
arXiv (CS.CL) 2026-06-16

From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text

Modeling dimensional affect in longitudinal text requires distinguishing current affect estimation from future affective change forecasting. Existing approaches often treat each text as an independent observation and apply similar assumptions to both tasks, without testing whether they rely on different information sources. This paper investigates that distinction using longitudinal self-reported ecological essays and feeling-word entries. We propose the Trait–State Affective Prediction (TSAP) framework and its temporal extension E-TSAP for per-text valence and arousal prediction, evaluated on a held-out prediction test set of 1,737 entries from 91 users. We further propose the Affective Change Forecaster Hybrid (ACF-Hybrid) for next-step affective change forecasting, evaluated on a held-out forecasting test set of 46 users. For prediction, E-TSAP achieves composite Pearson correlations of 0.670 for valence and 0.449 for arousal. For forecasting, textual representations perform worse than compact numeric trajectory baselines: the text-inclusive model achieves only r=0.316 for valence and r=0.284 for arousal, whereas a simple prior-state baseline reaches r=0.615 and r=0.670, respectively. ACF-Hybrid, using dimension-specific numeric trajectory features, achieves r=0.659 for valence and $r=0.658$ for arousal. These results show that textual semantics support current affect prediction, whereas future affective change is better captured through prior numeric trajectory dynamics.

14.
Nature (Science) 2026-06-17

Optical metasurfaces for general vision processing on the edge

作者:

Large-scale artificial intelligence (AI) models achieve notable performance in computer vision but require substantial computational resources, limiting their deployment on edge devices1,2. Optical neural networks (ONNs) promise reduced latency and energy consumption by making use of the inherent parallelism of light3. However, present ONNs struggle to scale and are confined to simple tasks, owing to the challenges of replicating exact algebraic operations of digital models using physical (analogue) systems. This work introduces a new paradigm that directly embeds core computer vision principles, including similarity-based recognition, attention-guided perception and detail–context fusion, into a large-scale optical metasurface. By unifying optical physics with these computer vision fundamentals, we develop a photonic–electronic engine that overcomes scalability and generality barriers, enabling high-accuracy, general-purpose computer vision at the edge. The resulting system combines a 41-million-parameter optical metasurface front end with a co-designed, ultraefficient 87,000-parameter digital back end, outperforming many digital models with tens of millions of parameters across object detection, segmentation, 3D reconstruction and video understanding. We build a deployable prototype and demonstrate real-time edge visual processing in natural scenes. This work represents a path towards practical optical computing for general vision tasks in complex natural environments, enabling a new paradigm for low-energy, low-latency, real-time on-device vision intelligence. By embedding core computer vision principles into a large-scale optical metasurface, an efficient vision processing system using far fewer parameters is demonstrated to outperform many digital models and enables deployment on edge devices.

15.
arXiv (CS.CV) 2026-06-19

BAFIS: Dataset + Framework to assess occupational Bias and Human Preference in modern Text-to-image Models

Generative artificial intelligence has the potential to improve productivity and transform the production of creative content. However, existing research indicates that image generation models are significantly influenced by biases. This work investigates the inherent biases and language-induced biases present in text-to-image models within the context of occupation-related image generation, complementing established metrics with human preference feedback. We present a comprehensive evaluation of five current text-to-image models: Midjourney v6.1, Stable Diffusion 3 Medium, DALL-E 3, Playground v2.5, and FLUX.1-dev , focusing on gender and ethnicity bias, image quality, and prompt alignment. To facilitate this evaluation, we developed the "Battle-Arena for Fair Image Synthesis" (BAFIS), a platform designed to collect human feedback on bias in generated images. Furthermore, we created a dataset comprising 21,140 synthetic images generated using multilingual prompts, which serves as a basis for our analysis. We further place our results within a broader social context by comparing them to official statistics from the German Federal Employment Agency. Our findings reveal systematic biases in text-to-image models, with established evaluation metrics in partial correlation with subjective user ratings. Thus, our research emphasizes the need for including human preferences to develop fairer and more inclusive text-to-image models.

16.
arXiv (CS.CV) 2026-06-16

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. Existing approaches rely on a dictionary of activity label as input, cannot provide frame-by-frame labeling explanations, or rely on superseded computer vision techniques. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.

18.
arXiv (CS.CV) 2026-06-24

Modality-Aware Out-of-Distribution Detection for Multi-Modal Action Recognition

The incorporation of additional modalities into action recognition models increases their performance across a wide range of settings. However, how this additional information can contribute to making the models more robust remains underexplored, particularly for the case of multi-modal out-of-distribution (OOD) detection. While methods exist that regularize the multi-modal training process with OOD detection in mind, they still apply off-the-shelf OOD detectors designed for the uni-modal case during inference, discarding important information. Based on an interesting relationship we find between the multi-modal and uni-modal predictions, we propose to use this signal to build a post-hoc detector explicitly designed for the multi-modal scenario. We combine this new source of information with a feature-space score, which detects off-manifold samples in the multi-modal space, and normalize them by the multi-modal logits. In doing so, the proposed hybrid detector is compatible with existing training-time approaches and consistently improves performance. Experiments on a wide range of established datasets from the MultiOOD benchmark show that, on average, our approach outperforms the state of the art. Our results show the importance of explicitly considering the different modalities at inference time for multi-modal OOD detection.

19.
arXiv (quant-ph) 2026-06-25

A Candidate Framework for Free-Space Quantum Key Distribution based on Geometrical-Configuration Modulation

arXiv:2606.25807v1 Announce Type: new Abstract: This paper proposes a candidate framework for free-space quantum key distribution (QKD) based on geometrical-configuration modulation (GM). In the minimal implementation considered here, Alice coherently splits a single photon emitted from one source into two spatial output modes with a tunable separation, and uses the source separation $R$ as the GM variable that defines the prepared single-photon spatial superposition state. Bob records the single-photon detection coordinate in the far field or Fourier plane, providing the correlated data used for soft-input information reconciliation. Based on this physical mechanism, we first establish an $R-x$ protocol model in which the source separation $R$ and the single-photon detection coordinate $x$ are random variables, and further propose an $R-\Delta x$ extension based on the difference variable $\Delta x$ between adjacent accepted detection events to mitigate slowly varying center drift in free-space links. The framework specifies state preparation, far-field conditional probabilities, soft-input information generation, parameter estimation, reconciliation, and asymptotic candidate key-rate formulas. A complete composable security analysis further requires derive an explicit computable upper bound on Eve's information from experimentally observed parameters, together with finite-key analysis and experimental validation under free-space conditions. The proposed candidate framework (GM-QKD) provides a modulation approach based on spatial degrees of freedom in which the source geometry serves as the modulation variable.

20.
arXiv (CS.CV) 2026-06-12

CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing

Editing natural images using textual descriptions in text-to-image diffusion models remains a significant challenge, particularly in achieving consistent generation and handling complex, non-rigid objects. Existing methods often struggle to preserve textures and identity, require extensive fine-tuning, and exhibit limitations in editing specific spatial regions or objects while retaining background details. This paper proposes Context-Preserving Adaptive Manipulation (CPAM), a novel zero-shot framework for complicated, non-rigid real image editing. Specifically, we propose a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively. This ensures that the objects' shapes, textures, and identities are maintained while keeping the background undistorted during the editing process using the mask guidance technique. Additionally, we develop a localized extraction module to mitigate the interference with the non-desired modified regions during conditioning in cross-attention mechanisms. We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner. CPAM can be seamlessly integrated with multiple diffusion backbones, including SD1.5, SD2.1, and SDXL, demonstrating strong generalization across different model architectures. Extensive experiments on our newly constructed Image Manipulation BenchmArk (IMBA), a robust benchmark dataset specifically designed for real image editing, demonstrate that our proposed method is the preferred choice among human raters, outperforming existing state-of-the-art editing techniques. The source code and data will be publicly released at the project page: https://vdkhoi20.github.io/CPAM

21.
arXiv (CS.CV) 2026-06-24

TopoPult-SSL: Gland-Mask-Free Cross-Device Meibomian Gland Segmentation via Self-Distilled Weak Clinical Priors

Every new clinical imaging device creates a domain shift where dense gland masks are expensive yet cheap clinical signals – eyelid outlines, Pult grades, morphometric ratios – are routinely recorded. We present TopoPult-SSL, a two-stage framework for cross-device meibomian gland segmentation. Stage 1 adapts a source-trained model without target gland masks in the training loss, using four weak-prior anchors driven by target eyelid masks and clinical metadata only. Stage 2, when target gland masks are available, distils complementary Stage-1 teachers into a single compact student via supervised self-distillation. We develop and validate the technique on the public MGD-1k to CAMG research benchmark (1,000 to 100 images, different device), where the distilled model achieves Dice 0.716+/-0.006 (best 0.726), surpassing UA-MT (0.710) and the ensemble teacher (0.720) – with a single pass. The gland-mask-free Stage-1 variant reaches Precision 0.694 vs. 0.30-0.34 for SAM/MedSAM (p

22.
arXiv (CS.LG) 2026-06-16

High-Dimensional Random Projection for Activation Steering in Language Models

arXiv:2606.15092v1 Announce Type: new Abstract: Activation steering has emerged as a key methodology for controlling the behavior of large language models (LLMs). Existing difference-in-means based methods, however, are fundamentally limited: they capture only mean differences between class activations and fail to recover discriminative signals that naturally exist in the nonlinear feature subspace under the superposition hypothesis. Motivated by that, we propose High-Dimensional Random-projection for Activation Steering (HiDRA), a training-free approach that integrates seamlessly with existing activation steering methods. By performing activation addition in the projected high-dimensional space, HiDRA can provably capture a better discriminative structure beyond the reach of linear methods. Experiments across diverse LLM families and benchmarks demonstrate that HiDRA consistently outperforms baseline counterparts, achieving stronger behavioral control without significant computational overhead.

23.
bioRxiv (Bioinfo) 2026-06-12

From Proteome Mining to Structural Validation: Phosphopyruvate Hydratase as a Structurally Tractable Drug Target in Kinetoplastid Parasites

Chagas disease, caused by Trypanosoma cruzi, demands novel therapeutic strategies that overcome the toxicity and limited efficacy of current treatments. To address this need, herein we report an integrative, target-centric strategy that combines parasite proteome mining, structural modeling, and experimental validation. Functional enrichment and druggability analyses identified phosphopyruvate hydratase (PPH) as a promising candidate due to its essential metabolic role and limited similarity to human homologs. Notably, proteome mining revealed the presence and conservation of PPH across kinetoplastid parasites, including Leishmania donovani, supporting its evaluation beyond T. cruzi. For the selected PPH sequences, AlphaFold-derived three-dimensional models underwent extensive molecular dynamics refinement, yielding stable conformational ensembles suitable for structure-based studies. Using this validated model, virtual screening of the Latin American Natural Products Database - LANaPDB - identified aptosimon as a top-ranked compound candidate. Molecular dynamics simulations further showed ligand-dependent binding behavior, suggesting alternative binding modes distinct from the canonical substrate configuration. In vitro assays demonstrated consistent antiparasitic activity against intracellular T. cruzi amastigotes (IC50 = 3.52 ug/mL) and Leishmania donovani promastigotes (IC50 = 13.06 ug/mL), supporting the biological relevance of the aptosimon-related lignan chemotype, hinokinin, across two kinetoplastid parasite models. Together, these results support PPH as a structurally tractable and biologically relevant candidate target, while identifying an aptosimon-related lignan chemotype, represented experimentally by hinokinin, as a cross-species antiparasitic scaffold that warrants further biochemical target-validation studies.

24.
arXiv (CS.AI) 2026-06-16

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

arXiv:2606.16112v1 Announce Type: cross Abstract: Residual architectures are ubiquitous in deep learning, but they suffer from a subtle structural limitation: the norm of the residual stream can grow rapidly with depth. As a result, updates from later layers become small relative to the accumulated residual state. This reduces their impact on the representation and limits the benefits of scaling models in depth. To address this, we introduce NAG, a norm-agnostic residual architecture that separates magnitude from directional information in the residual stream, preserving meaningful layer contributions throughout depth and preventing later updates from being systematically suppressed by residual-norm growth. Importantly, NAG introduces only a negligible number of additional parameters and relies on simple operations that are easily kernel-fusible, preserving training efficiency in practice. We show that this architecture outperforms baseline Transformers, with gains that increase substantially as depth grows, enabling effective training of much deeper models. The norm-agnostic formulation also leads to an interpretable Mixture-of-Depths (MoD) mechanism that adaptively skips both attention and MLP layers. Beyond serving as a post-training accuracy-compute tradeoff, this mechanism can be used as a pretraining-time scaling strategy: under iso-FLOP training, compute saved by reducing per-token forward-pass cost can be reinvested into training on more tokens while keeping the total parameter count and KV-cache budget fixed. In our experiments, moderate Mixture-of-Depths rates of approximately 20%-25% match full-depth baseline performance under equal training compute while substantially reducing the number of executed layer parameters and forward-pass FLOPs. These results identify sparsity in depth as a new scaling axis for fixed-compute training, enabling very deep yet FLOP-efficient models.

25.
arXiv (CS.LG) 2026-06-25

Stagnant Neuron: Towards Understanding the Plasticity Loss in Multi-Agent Reinforcement Learning Value Factorization Methods

arXiv:2606.25335v1 Announce Type: new Abstract: Multi-Agent Reinforcement Learning (MARL) value factorization methods can suffer from a loss of plasticity, gradually failing to adapt when transferring to new task instances. We trace this issue to stagnant neurons, units whose gradient updates become negligibly small relative to their weights, thereby hindering learning. While existing plasticity injection methods exist, they prove ineffective for such neurons. To address this, we propose Knowledge-retentive Neuron-level PlastIcity Focusing InjEction (KNIFE), a novel method that directly targets stagnant neurons. KNIFE replaces each stagnant neuron with a composite unit comprising three specialized components: a frozen knowledge neuron to preserve acquired knowledge, a re-initialized active neuron to restore learning capacity, and a compensation neuron to ensure the combined output matches the original, thus maintaining previous learned cooperation knowledge. Extensive experiments on SMACv2, predator-prey, and matrix games demonstrate that KNIFE significantly outperforms state-of-the-art plasticity injection methods.