论文广场 - AcademicHub

01.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.19501

DeXposure-Claw: An Agentic System for DeFi Risk Supervision

作者:

Aijie Shu ↗Bowei Chen ↗Wenbin Wu ↗Cathy Yi-Hsuan Chen ↗Fengxiang He ↗

Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no regulator-aligned way to measure the resulting false alarms. We introduce DeXposure-Claw, a forecast-grounded agentic supervision system that routes LLM decisions through structured evidence: (1) DeXposure-FM, a graph time-series foundation model, forecasts future exposure networks; (2) deterministic monitors and stress scenarios then turn those forecasts into typed alerts, attribution signals, and scenario evidence; and (3) data-health and confidence gates constrain escalation before DeXposure-Claw emits auditable supervisory tickets with rationales. We further develop DeXposure-Bench, a six-axis evaluation harness, whose decision axis scores tickets against a regulator-aligned absolute-loss ground truth and an explicit false-intervention rate. Experiments on five years of weekly real data fully support our system. Code is at https://github.com/EVIEHub/DeXposure-Claw.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CV) 2026-06-24 DOI: arXiv:2606.24570

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

作者:

Vision-language contrastive pretraining has become the dominant recipe for 3D medical foundation models, leveraging the large volumes of paired scans and reports produced in clinical practice. However, medical images usually span dozens of organs, and radiological reports are much longer than typical natural image captions and are composed of multiple structured sections. CLIP-style pretraining compresses this structure by encoding each modality into a single global token, at the risk of losing important details. We introduce ConQuer (Concept Queries), an image-text pretraining method that augments CLIP's global alignment with a set of localized alignments, one per concept. ConQuer splits the report into concept-specific sections and learns cross-attention queries that pool the matching image features without using any segmentation mask or spatial supervision. Contrastive learning is then applied independently for each concept. Concepts can be any unit of semantic localization; here, they are anatomical regions, one query per organ or gross body region. As a byproduct, each query learns attention maps focused on its concept, providing built-in spatial interpretability. We use ConQuer to train Jolia, a 3D CT foundation model on chest and abdominal CT. Jolia consistently outperforms a CLIP baseline on findings classification, report generation, and cross-center transfer, and sets a new state of the art across multiple public benchmarks. Jolia's weights will be released upon acceptance.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2605.15980

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

作者:

Xiaoxuan He ↗Siming Fu ↗Zeyue Xue ↗Weijie Wang ↗Ruizhe He ↗Yuming Li ↗Dacheng Yin ↗Shuai Dong ↗Haoyang Huang ↗Hongfa Wang ↗Nan Duan ↗Bohan Zhuang ↗…

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs through sliding window subsampling training timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO's effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment quality.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2602.06470

Improve Large Language Model Systems with User Logs

作者:

Changyue Wang ↗Weihang Su ↗Qingyao Ai ↗Xingzhao Yue ↗Rui Zhang ↗Xiaojia Chang ↗Yiqun Liu ↗

Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and diminishing returns from rising computational costs. As a result, recent work is increasing the focus on continual learning from real-world deployment, where user interaction logs provide a rich source of authentic human feedback and procedural knowledge. However, learning from user logs is challenging due to their unstructured and noisy nature. Vanilla LLM systems often struggle to distinguish useful feedback signals from noisy user behavior, and the disparity between user log collection and model optimization (e.g., the off-policy optimization problem) further strengthens the problem. To this end, we propose UNO (User log-driveN Optimization), a unified framework for improving LLM systems (LLMsys) with user logs. UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model's prior knowledge and the log data. This assessment guides the LLMsys to adaptively filter out noisy feedback and construct different modules for primary and reflective experiences extracted from user logs, thereby improving future responses. Extensive experiments show that UNO achieves state-of-the-art effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines. We have open-sourced our code at https://github.com/bebr2/UNO .

阅读与讨论 → 访问原文 →

05.

arXiv (CS.CV) 2026-06-24 DOI: arXiv:2606.24586

EERLoss: A Novel Loss Function for Training Deep Biometric Models. A Case Study in Keystroke Dynamics

作者:

Nahuel Gonzalez ↗Marta Robledo-Moreno ↗Ivan DeAndres-Tame ↗Ruben Vera-Rodriguez ↗Ruben Tolosana ↗

Deep learning approaches to biometric verification are commonly trained by optimizing indirect objectives, creating a misalignment between the optimization process and the primary evaluation metric, typically the Equal Error Rate (EER). This paper introduces EERLoss: a subdifferentiable, arbitrarily accurate approximation to EER for training deep biometric models. Furthermore, this framework has the potential to be adapted to optimize any specific operating point on the DET curve, enhancing its generalizability. To validate this approach, EERLoss is evaluated on a particularly demanding behavioral biometric modality: keystroke dynamics verification. This task is characterized by its high intra-class and low inter-class variability. Experiments are conducted on the large-scale KVC-onGoing benchmark, incorporating data from over 185,000 subjects across different scenarios. A comprehensive ablation study initially demonstrates the superiority of EERLoss in comparison to existing state-of-the-art loss functions. It also converges substantially faster compared to other losses, reducing the overall training cost. Additionally, a comparison is made between the proposed loss and the KVC-winning architecture by re-training it with EERLoss, demonstrating that the proposed approach significantly outperforms the original SoTA, achieving a relative EER reduction of up to approx. 30\%. This improvement on a challenging, large-scale benchmark validates the effectiveness of EERLoss as a task-aligned training objective specifically suited for high-variance biometric traits.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16690

PATCH: Action-Chunk-Conditioned Latent Patch Innovation Monitoring for Robot Manipulation

作者:

Yanan Zhou ↗Ranpeng Qiu ↗Yincong Chen ↗Jiajie Cui ↗Weiming Zhi ↗

Learning-based manipulation policies have made substantial progress in real-world robot manipulation, particularly for short-horizon action generation. However, deployment in open workspaces remains fragile under unexpected local scene dynamics, such as moving objects, transient occlusions, or disturbances near the intended motion. Existing runtime monitors often rely on global observation anomalies, policy uncertainty, or frame-level visual changes, and struggle to distinguish task-relevant execution risk from benign visual variation. We introduce PATCH, an action-chunk-conditioned latent patch innovation monitor for deployment-time intervention. Given the active action chunk, PATCH defines a projected execution corridor, predicts latent patch evolution inside it, and accumulates persistent residuals unexplained by the robot's own motion. These residuals form a localized intervention signal that allows PATCH-Router to pause execution, select an available recovery source, and resume the original policy once localized innovation subsides. Experiments on real robot rollout data show that PATCH produces more stable and context-relevant triggers than competing runtime monitors. Real-robot deployment further demonstrates monitor-driven intervention and policy resumption for disturbance-aware manipulation. Project Page: https://yananzhou5555.github.io/PATCH/.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17872

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

作者:

Ning Ni ↗Yingjie Lao ↗

arXiv:2606.17872v1 Announce Type: cross Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability [zhao2023survey], the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods [jo2025fastkv,li2024snapkv,zhou2024dynamickv] reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks [jiang2024robustkv] or degrade safety alignment under aggressive eviction. We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach [arditi2024refusal,zou2023representation] to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

阅读与讨论 → 访问原文 →

08.

Nature (Science) 2026-06-18 DOI: HASH:529da27c28342033c467c4ecc3df3041

Clues to the sloth’s sloth found in its genome

作者: 未知作者

Sequencing shows duplication of genes that affect mitochondria, the organelles that provide energy for cells. Sequencing shows duplication of genes that affect mitochondria, the organelles that provide energy for cells.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.11415

Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings

作者:

Maryam Ostadsharif Memar ↗Nima Dehghani ↗

arXiv:2606.11415v1 Announce Type: cross Abstract: Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-24 DOI: arXiv:2606.23888

E-MRL: Cross-view Aligned Evidence-driven Multimodal Reinforcement Learning for Reliable 3D Tumor Analysis

作者:

Sijing Li ↗Zhongwei Qiu ↗Zhuoya Wang ↗Boxiang Yun ↗Zhenyu Yi ↗Jianwei Xu ↗Wenqiao Zhang ↗Yingda Xia ↗Ling Zhang ↗

arXiv:2606.23888v1 Announce Type: cross Abstract: While Vision-Language Models (VLMs) show great promise in volumetric medical report generation, they frequently suffer from visual hallucinations and a lack of grounding in 3D CT data. Current Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) strategies typically optimize text fidelity alone, essentially rewarding correct diagnoses derived from language priors rather than genuine visual perception. To address this, we propose cross-view aligned Evidence-driven Multimodal Reinforcement Learning (Evidence-MRL, noted as E-MRL), a reliable RL reasoning framework that formulates the generation process as a Markov Decision Process of "diagnosis-localization-verification". Unlike standard approaches, our model is explicitly trained to identify a "key evidence slice" alongside the global diagnostic report, grounding its findings in verifiable visual evidence. Crucially, we introduce a novel cross-view consistency reward, which validates the semantic alignment between the golden-standard report and a local visual re-query of the selected key slice, providing additional rewards for correctly-localized reasoning. Experiments on large-scale 3D CT tumor datasets demonstrate that E-MRL significantly reduces hallucinations and improves diagnostic accuracy compared to SFT and RL baselines, offering a clinically interpretable solution for visually-grounded and tumor analysis.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.13571

Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting

作者:

Yifan Hu ↗Hongzhou Chen ↗Peiyuan Liu ↗Yiding Liu ↗Zewei Dong ↗Jiang-Ming Yang ↗

arXiv:2606.13571v1 Announce Type: cross Abstract: Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolved from impute-then-forecast pipelines to continuous-time models such as Neural ODEs and continuous-time graph networks. While these approaches improve the modeling of historical irregularity, they still rely on an implicit oracle assumption at inference time: the timestamps of future valid observations are presumed to be known in advance. This assumption limits practical relevance, since in many real systems the more fundamental question is not only what the future value will be, but also whether a valid observation will occur at all. In this paper, we propose Timeflies, a unified framework that reformulates forecasting as a joint problem of future observability inference and value estimation. To explicitly model the interaction between observation dynamics and state evolution, Timeflies adopts an observation stream and a value stream, coupled through three dedicated modules for reliability-aware embedding, observation-guided dependency modeling, and joint prediction. We further construct Shadow, a benchmark that combines natural missingness from public datasets with real-world industrial data, and introduce the Observation-Value Joint Entropy (OVJE) metric to comprehensively evaluate this coupled predictability. Extensive experiments show that Timeflies consistently outperforms existing methods, highlighting the importance of explicitly modeling future observability in time series forecasting with missing values. Code and dataset are available in https://github.com/ant-intl/Timeflies.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15611

Mutual Distillation of Dual-Foundation Models for Semi-Supervised PET/CT Segmentation

作者:

Fuyou Mao ↗Beining Wu ↗Yanfeng Jiang ↗Bohan Xu ↗Lixin Lin ↗Naye Ji ↗Hao Zhang ↗Yan Tang ↗

Organ segmentation from PET/CT is critical for quantitative analysis and radiotherapy planning in oncology. To ease the high annotation cost of PET/CT segmentation, semi-supervised learning (SSL) provides a practical and effective solution for developing deep models with limited labeled data. Recent developments in visual foundation models have demonstrated remarkable adaptability with improved efficiency. In this work, we propose a mutual distillation framework that seamlessly exploits both structural and functional foundation models, which act as modality-specific generalists for distilling knowledge from structural CT and metabolic PET imaging. By bridging the gap between the task-specific precision of student models and the segmentation priors of generalist foundation models, we propose MuDuo, a mutual distillation framework that synergistically leverages SAM-Med3D for CT and SegAnyPET for PET to distill their knowledge into a lightweight student network. Our approach eliminates the need for manual prompts while maximizing the utility of unlabeled data for automatic segmentation, achieving state-of-the-art performance on the AutoPET dataset with only 5 labeled cases. Our source code is available at https://github.com/Wu-beining/MuDuo.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2508.21380

The Algorithm Is Not the Behavior: Learned Priors Override Look-Ahead in a Chess-Playing Neural Network

作者:

Elias Sandmann ↗Sebastian Lapuschkin ↗Wojciech Samek ↗

arXiv:2508.21380v3 Announce Type: replace-cross Abstract: Recent mechanistic work has uncovered learned algorithms within neural networks, from modular arithmetic to search and planning in game-playing agents. But does algorithmic structure guarantee algorithmic behavior? We investigate this in Leela Chess Zero, the strongest neural chess engine, where prior work identified learned look-ahead. By extending the logit lens to its move-selecting policy network, we discover that correct puzzle solutions-including immediate checkmates-often appear in intermediate layers but are systematically overridden in the final output, a phenomenon we term "forgotten puzzles". Replicating prior analyses on these positions, we find that look-ahead operates normally-future moves of the correct continuation are represented, causally important, and linearly decodable-ruling out a failure of the algorithm itself. Instead, late layers increasingly shift toward prioritizing safe play over aggression. To test whether this shift drives the override, we steer the model against these preferences and recover 61.7% of forgotten puzzles, providing causal evidence that safety priors override algorithmically computed solutions. These findings demonstrate that algorithmic structure does not guarantee algorithmic behavior: a model can internally solve a problem and still output the wrong answer.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.14805

Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces

作者:

Dong Ho Kang ↗Hyeonjeong Cha ↗Daein Weon ↗

arXiv:2606.14805v1 Announce Type: cross Abstract: Reliable operation of multi-agent large language model (LLM) systems depends on debugging long execution traces, where the few causally decisive events are buried in unstructured logs of messages, routes, memory writes, and tool calls. The standard tool is counterfactual replay (rewind, edit, and re-run the trajectory to measure each event's effect), but its cost grows linearly with the number of candidate events, making exhaustive replay infeasible at scale. We frame trace debugging as a knowledge-based decision-support problem. Each trace is compiled into a structured event knowledge graph over routing, memory, tool-use, uncertainty, and latent evidence, and a calibrated predictor decides where a scarce replay budget should be spent. We do not propose a new replay oracle; we propose a method to predict its results without paying the replay cost. We formulate zero-replay counterfactual-effect prediction: given a trace under a fixed budget, predict which events the oracle would mark high-effect before any replay is performed. BranchPoint-Latent is a lightweight predictor over observable, structural, uncertainty, and latent features of the knowledge graph. Calibrated against a deterministic replay oracle across 37 trace families, a single learning-to-rank gradient-boosted predictor raises per-trace localization (Branch Recall@5) from 0.73 to 0.93 on held-out families at zero oracle-replay cost. Rather than claiming universal dominance, we characterize when cheap graph centrality suffices and when learned evidence is necessary. The result is an auditable, cost-efficient decision-support system for AI-reliability debugging, positioned explicitly on the cost-accuracy frontier with reproducible artifacts.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.15179

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

作者:

Xuedong Hu ↗Zhiqing Tang ↗Zhi Yao ↗Tian Wang ↗Weijia Jia ↗

arXiv:2606.15179v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, creating a document-isolated dual-end RAG setting. However, existing methods rely on frequent remote synchronization and dense evidence transfer, limiting throughput under realistic latency and bandwidth conditions. To address this issue, we propose CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. CONCORD treats the cloud as an asynchronously arriving evidence source rather than a continuously synchronized co-generator. Specifically, we introduce waiting debt control to decide whether each decoding step should continue waiting for remote participation based on the observed return of waiting. We also design a certificate-guided minimal supplementation mechanism that requests only the remote evidence needed to determine the current greedy decision. Steps that consult the cloud preserve the same greedy token as dense dual-end aggregation, while the remaining steps commit locally without remote evidence. Experiments on Natural Questions and WikiText-2 show that CONCORD improves end-to-end throughput over baselines by $1.66\times$ and $2.15\times$, respectively, while reducing per-token communication by over two orders of magnitude and maintaining comparable answer quality and perplexity.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2606.13349

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

作者:

Haishuo Fang ↗Yue Feng ↗Iryna Gurevych ↗

Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitation is the lack of flexibility to proactively investigate suspicious parts of a paper based on accumulated evidence, as human reviewers do. In this paper, we explore how to enable an LLM-based review agent to perform such proactive investigation. We find that this can be naturally formulated as a Markov Decision Process (MDP), and propose ProReviewer, a scientific peer review agent that proactively reviews a paper guided by a maintained, structured review log. The structured review log serves as a workspace for the agent to track evidence and intermediate findings collected during review. Experiments show that ProReviewer with an 8B backbone, trained by supervised fine-tuning and optimized by reinforcement learning, achieves the highest average score across five quality dimensions, outperforming prompt-based methods with much larger frontier LLMs by up to 39% and the strongest fine-tuned baseline by 16% relatively. It also attains the highest win rates against baselines in human evaluation.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.11385

DeceptionX: Explainable Deception Detection with Multimodal Large Language Models

作者:

Jiayu Zhang ↗Shuo Ye ↗Jiajian Huang ↗Yawen Cui ↗Taorui Wang ↗Wei Xia ↗Zeheng Wang ↗Haowen Tang ↗Hui Ma ↗Zitong Yu ↗

Deception detection is a critical and highly challenging task within affective computing and behavioral analysis. Existing deep learning methods typically treat this task as a straightforward classification problem; however, this black-box approach lacks interpretability and fails to capture the complex logical deduction processes utilized by human experts when identifying lies. While Multimodal Large Language Models (MLLMs) have shown potential, applying them effectively requires a bridge between low-level audiovisual cues and high-level logical reasoning. In this paper, we propose DeceptionX, a novel MLLM framework that shifts the paradigm of deception detection from black-box classification to an interpretable Observe-Think-Summarize reasoning process. To address the scarcity of high-quality reasoning data, we first constructed DeceptChain, a high-quality dataset developed through a human-in-the-loop process. This dataset synthesizes fine-grained visual and auditory evidence (such as micro-expressions and vocal tremors) into structured chain-of-thought reasoning data. Furthermore, we propose a three-stage training pipeline and a Discrepancy-Aware Redundancy Elimination~(DARE) strategy for DeceptionX to further enhance the model's generalization capabilities. Extensive experiments demonstrate that DeceptionX not only outperforms existing MLLM baselines and state-of-the-art methods on standard real-world benchmarks but also provides transparent, expert-level reasoning paths, bridging the critical gap between accuracy and interpretability in multimodal deception detection.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.04547

Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization

作者:

Heng Cao ↗Fan Zhang ↗Jian Yao ↗Yujie Zheng ↗Changlin Zhao ↗Lu Hao ↗Yuxuan Wei ↗Wangze Ni ↗Huaiyu Fu ↗Yuqian Sun ↗Xuyan Mo ↗

Personalizing large language models requires adapting model behavior to individual users while preserving robustness and deployment-scale efficiency. Existing approaches typically personalize LLMs either at the input level, by retrieving user histories or constructing profile prompts, or at the parameter level, by maintaining user-specific parameter-efficient modules. The former makes personalization sensitive to retrieval quality and prompt design, whereas the latter incurs storage and maintenance costs that grow with the user population. To address these limitations, we propose TAP-PER (Temporal Attentive Prefix for PERsonalization), a prefix-based framework that encodes user preferences as learnable representations, eliminating explicit prompt construction and replacing heavy per-user adapters with lightweight user-state prefix embeddings. Inspired by personalized recommendation systems, TAP-PER decomposes user modeling into user-state and query-conditioned components, and incorporates temporal signals to capture the evolving nature of user interests. Experiments on six LaMP tasks show that TAP-PER consistently outperforms prompt-based and model-based baselines across classification, rating, and generation settings. Moreover, TAP-PER uses 130x fewer per-user parameters than OPPU and roughly half the total parameter footprint of PER-PCS at the 1,000-user scale, demonstrating that scalable LLM personalization can be achieved without explicit prompt construction or heavy per-user adapters.

阅读与讨论 → 访问原文 →

19.

Nature (Science) 2026-06-10 DOI: HASH:b6f1a0237bd0c0b73dd3d169988d920c

JUNO experiment ushers in next generation of neutrino experiments

作者:

Patricia Vahle ↗

The first measurements from the JUNO experiment demonstrate unprecedented precision and promise exciting results. The first measurements from the JUNO experiment demonstrate unprecedented precision and promise exciting results.

阅读与讨论 → 访问原文 →

20.

bioRxiv (Bioinfo) 2026-06-15 DOI: HASH:afb0144a38d292cd77f6bf7a01f30320

Multiple Fault Analysis and Drug Therapy on Signaling Pathways Using Dynamic Bayesian Network-based Model

作者:

Chowdhury ↗Maitra ↗Agarwal ↗Sur ↗Sarkar ↗Majumder ↗Lodh ↗

Cell growth is an intricate biological phenomenon that is closely regulated by the interplay between various growth factors and transcription factors. Signaling pathways are the main mediators in this event, which provide the driving force for mitosis or sometimes meiosis. However, when malfunctions occur within the biological network, they can cause uncontrolled cell division, regardless of external stimuli. By employing Dynamic Bayesian Networks (DBNs), these malfunctions can be explicitly simulated, offering insights into their effects on cellular behavior and growth regulation. To a significant extent, the resultant outcomes can be mitigated through the use of reduced drug combinations. This study delves into the intricacies of signaling pathway behavior under the influence of concurrent malfunctions. Initially, we replicate the effects of these dysfunctions within DBNs. Subsequently, drug therapy is applied to alleviate their impact. Our methodology introduces a parameter known as efficiency_score, enabling the identification of optimized drug combinations without prior knowledge of specific dysfunctions. Particularly relevant in the context of realistic cancer conditions, these tailored drug inhibition points demonstrate enhanced efficacy compared to conventional treatments. Leveraging GPU acceleration throughout the modeling process accelerates the analysis of multiple faults within the biological networks, rendering our approach notably faster and more efficient.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15784

Bayesian Networks with Latent Time Embedding for Stage-Aware Causal Modeling of Alzheimer's Disease Progression

作者:

Nguyen Linh Dan Le ↗

arXiv:2606.15784v1 Announce Type: new Abstract: Alzheimer's disease (AD) progression is often described through the amyloid-tau-neurodegeneration, or AT(N), cascade. However, most longitudinal models represent this cascade either as a fixed sequence of biomarkers or as a black-box forecasting task. This makes it difficult to determine when biologically guided biomarker relationships influence future regional pathology. In this study, we introduce Bayesian Networks with Latent Time Embedding (BN-LTE), a Bayesian structural framework for stage-aware modeling of AD progression. BN-LTE estimates disease pseudotime from baseline biomarker profiles and constrains directed dependencies according to biologically plausible AT(N) ordering. Posterior spline-varying structural equations are then used to link initial multimodal measurements with future annualized regional tau-PET change. Across repeated subject-disjoint evaluations using ADNI data, BN-LTE shows strong spatial reconstruction of tau progression compared with the included forecasting baselines. Beyond spatial reconstruction, BN-LTE recovers posterior stage-varying AT(N)-constrained effects and identifies a mid-pseudotime window of amyloid sensitivity. This window is supported by model-implied g-formula contrasts, root-adjusted AIPW, mechanism-sensitive ablations, and robustness analyses across spline and prior specifications. Overall, these findings position BN-LTE as a Bayesian structural framework for forecasting tau progression while examining stage-dependent AT(N)-cascade mechanisms in observational longitudinal neuroimaging data. Our code is available at https://github.com/danleneurocom/BN-LTE.

阅读与讨论 → 访问原文 →

22.

arXiv (quant-ph) 2026-06-19 DOI: arXiv:2511.23301

Inhibited radiative decay enhances single-photon emitters

作者:

Florian Burger ↗Stephan Rinner ↗Andreas Gritsch ↗Kilian Sandholzer ↗Andreas Reiserer ↗

arXiv:2511.23301v2 Announce Type: replace Abstract: Quantum networks and modular quantum computers require efficient spin-photon interfaces, often realized using optical resonators that enhance radiative decay on a desired transition. However, this requires small mode volumes and high quality factors, which limits multiplexing capacity and demands precise frequency tuning. Here, we demonstrate an alternative approach that circumvents these bottlenecks for upscaling. Using a W1 silicon photonic crystal waveguide with a tailored photonic bandgap, we selectively inhibit unwanted decay pathways, thereby redirecting emission to the desired transition. This enables efficient photon collection over a large frequency range, allowing the resolution and individual addressing of tens of erbium dopants. Their lifetimes are preserved, or even increased, compared to bulk material. The extended mode volume of the devices enables the use of lower dopant concentrations, thereby improving emitter coherence. Our approach can be combined with Purcell enhancement and applied to other spin-qubit platforms, opening intriguing perspectives for photonic quantum technologies.

阅读与讨论 → 访问原文 →

23.

Nature Biotechnology 2026-06-11 DOI: HASH:867fd3ca4f4a98778a787127577b6dac

Large-scale, spatially resolved panoramic CRISPR screening in native tissue environments using Perturb-DBiT

作者:

Alev Baysoy ↗

Spatially resolved CRISPR screening in vivo has been limited to small perturbation panels and subsets of protein-coding RNAs. We present Perturb-DBiT, a method for co-sequencing of spatial total RNA whole transcriptomes and single guide RNAs (sgRNAs) on the same tissue section in situ. In a human cancer metastatic colonization model, we applied large (80,000+) sgRNA panels across tumor colonies in multiple consecutive tissue sections alongside their corresponding total RNA transcriptomes. We linked perturbations affecting long noncoding RNA covariation, microRNA–mRNA interactions and distinct amino acid-specific tRNA alterations to tumor migration and growth. By integrating transcriptional pseudotime trajectories, we further observed the impact of perturbations on clonal dynamics and cooperation. In an immune-competent syngeneic mouse model, investigation of the tumor immune microenvironment indicated distinct, synergistic effects on immune infiltration and suppression. Perturb-DBiT provides a spatially resolved comprehensive view of perturbation responses in complex tissues, including small and large RNA regulation, tumor proliferation, migration, metastasis and immune interactions. In vivo CRISPR genetic perturbations are spatially mapped at scale.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.14838

A Definition of Good Explanations and the Challenges Explaining LLM Outputs

作者:

Louis Mahon ↗Elliot Ford ↗Callum Hackett ↗

arXiv:2606.14838v1 Announce Type: new Abstract: How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are. In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-24 DOI: arXiv:2606.24602

ViTexQA: A Multi-Frame Temporal Perception Dataset for Video Text Question Answering

作者:

Zhentao Guo ↗Chen Duan ↗Tongkun Guan ↗Zining Wang ↗Kai Zhou ↗Pengfei Yan ↗

Despite remarkable progress in multimodal understanding, current MLLMs still exhibit limitations in video text understanding, particularly when semantics emerge through the integration of temporally distributed textual cues across multiple frames. This perception challenge fundamentally differs from static image text understanding, yet existing datasets fail to capture: the vast majority of questions remain answerable from single frames, inadequately reflecting real-world video text comprehension demands. To address this, we present ViTexQA, a large-scale video-text QA dataset, and FrameThinker for robust multi-frame temporal reasoning. We build ViTexQA via a quality-controlled Chain-of-Thought (CoT) annotation pipeline boosted with temporal constraints; all its QA pairs demand cross-frame text fusion to solve, enforcing true temporal reliance. FrameThinker adopts two-stage training for explicit temporal modeling: CoT-Guided Supervised Fine-Tuning (SFT) generates frame-aware reasoning chains, followed by Temporally-grounded Reinforcement Learning (RL) optimized with multi-frame coherence rewards. Evaluations show our method outperforms SOTA baselines on ViTexQA, lifting ROUGE-L by 6.3%.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络