Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-19

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

02.
arXiv (CS.CL) 2026-06-24

Dialogue to Discovery: Attribute-Aware Preference Elicitation for Conversational Product Search Assistants

Conversational product search assistants offer a more expressive, natural, and interactive alternative to traditional keyword-based product search. With limited screen space, showing only a few items increases the need for precise preference elicitation, which can prolong conversations, leading to user frustration and session abandonment. Conversely, rushing to recommend items without a clear understanding of preferences risks poor matches and a degraded user experience. We present Dialogue to Discovery (D2D), an attribute-oriented preference elicitation framework that dynamically exploits the structure of product attributes to efficiently steer conversations toward the user's desired item. D2D adaptively prioritizes the most informative queries and strategically times product recommendations, reducing premature or off-target suggestions that harm engagement. To evaluate D2D, we curate three datasets from the Amazon Reviews corpus. In simulated conversations modelled using a multi-factor utilitarian patience framework, D2D achieves a 22.2-29.9% improvement in target-finding accuracy, 6.6-16.1% reduction in abandonment, and 27.5% shorter average conversations over the state-of-the-art baselines. A complementary user study further confirms significant gains in both user satisfaction and perceived efficiency.

03.
medRxiv (Medicine) 2026-06-22

A Parent-Generated Framework of Early Connection: Findings from a CBPR Qualitative Study

Background: Early relational health (ERH) constructs are derived fromresearch observations rather than lived experiences. This study foregrounds diverse parent voices to examine how they describeconnectionwith their young children. Methods: Usingcommunity-based participatory research (CBPR),this study was co-designed withparent leadersfromReach Out and Read. A semi-structured interview guidewas co-designed,and parent leaderssubsequentlyconducted and transcribed 18 interviews with parents from their networks.Researchersanalyzed transcripts using Reflexive Thematic Analysis.Member checking sessions with parent leadersinformedthe analytic framework. Results:Sixorganizing principleswereidentified.(1) Parent-child connection begins with an instinctual sense of responsibility.(2)Connectionebbs and flows as parent and child adapt to one another through dailyactivities.(3) Family circumstances, including family structure, cultural expectations, and intergenerational values, directly shape this connection. (4) Parents' own upbringings and past relationships indirectly shape how they connect with their child. (5) Forconnectionto grow, parents must show up physically and emotionally for their children despite competing demands. (6) Parentsgrow through engaged parenting, and that growth feeds back into the connection, creating a self-sustaining cycle of relational health.Conclusions:Our analysis generated twoconstructs underspecified in ERH frameworks.Parents described their sense of responsibility as immediate and instinctual, preceding an emotional bond.Parentsdemonstratedtheir agency in deciding what to carry forward from their relational histories, a pattern this study termsrelational legacy. Integrating parent-generated language into ERH measurementresearchmay shape a more comprehensive picture of ERHreflectinghow families experience connection.

04.
arXiv (CS.CL) 2026-06-15

Harsher on Male? Evaluating LLMs on Gender-Asymmetric Moral Framing Across Diverse Conflict Scenarios

Existing studies on gender bias in LLMs have largely focused on stereotypes, occupational associations, or explicit harmful outputs. In this work, we ask whether LLMs apply consistent response standards to the same negative behavior under matched male-actor and female-actor conditions. We introduce GAMA-Bench, a gender-mirrored benchmark of 1,298 scenarios covering intimate relationship and public social conflicts. It constructs gender-neutral misconduct templates through controlled grids and cross-model review, then compiles them into paired first-person prompts with matched actor-gender and role-reference variations. We further design a structured response-framing protocol to measure how models allocate punishment, empathy, escalation, instruction, and blame. Experiments on 10 representative LLMs reveal a consistent male-disadvantaging asymmetry: male actors receive more punitive, escalatory, and blame-centered framing, whereas female actors receive more therapeutic and empathy-oriented framing for the same misconduct. Further analyses show that this pattern persists across model families, scenario tracks, model scale, and explicit thinking-style reasoning. The official code is available at https://github.com/xufeiqiong/GAMA-Bench.

05.
arXiv (CS.CL) 2026-06-12

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images, creating broader attack surfaces. However, available resources for jailbreak detection are scarce, particularly in finance. To address this gap, we present FENCE, a bilingual (Korean-English) multimodal dataset for training and evaluating jailbreak detectors in financial applications. FENCE emphasizes domain realism through finance-relevant queries paired with image-grounded threats. Experiments with commercial and open-source VLMs reveal consistent vulnerabilities, with GPT-4o showing measurable attack success rates and open-source models displaying greater exposure. A baseline detector trained on FENCE achieves 99 percent in-distribution accuracy and maintains strong performance on external benchmarks, underscoring the dataset's robustness for training reliable detection models. FENCE provides a focused resource for advancing multimodal jailbreak detection in finance and for supporting safer, more reliable AI systems in sensitive domains. Warning: This paper includes example data that may be offensive.

06.
medRxiv (Medicine) 2026-06-24

Durability and Seasonal Variation in the Effectiveness of Nirsevimab over Three Seasons in Connecticut

Background Nirsevimab has been widely administered in the United States since 2023 to protect infants and young children from severe disease caused by respiratory syncytial virus (RSV). Although early post-licensure studies have shown high effectiveness against medically attended RSV infection, uncertainty remains about the durability of protection, effectiveness beyond the first RSV season, and the extent to which changing RSV seasonality influences real-world effectiveness. Objective To estimate the effectiveness of nirsevimab against medically attended RSV infection across three consecutive RSV seasons and to examine how effectiveness varies by season and time since immunization. Methods We conducted a test-negative case-control study utilizing electronic health records of infants and young children tested for RSV by polymerase chain reaction in outpatient and inpatient settings within the Yale New Haven Health System between October 1, 2023, and March 1, 2026. Effectiveness of nirsevimab was estimated using multivariable logistic regression, adjusting for age, weekly RSV activity, pre-existing risk factors, and other potential confounders. Variation in effectiveness was examined by season, encounter setting, and time since immunization up to 24 months. Results Overall, 17,755 infants and young children were tested for RSV infection, of whom 2,388 (13.4%) were cases and 15,367 (86.6%) were controls. The overall effectiveness of nirsevimab was 67.3% (95% confidence interval [CI]: 59.8, 73.3%) against all medically-attended RSV infections, 60.2% (95% CI: 49.6, 68.5%) against RSV-associated outpatient visits, and 88.9% (95% CI: 82.3, 93.0%) against RSV-associated hospitalization. Effectiveness against medically attended RSV infection declined across seasons, from 76.7% (95% CI: 60.5, 86.3%) in 2023/24 to 54.4% (95% CI: 33.0, 68.9%) in 2025/26. Lower season-specific effectiveness in later seasons corresponded with progressively delayed RSV activity over. Protection against RSV-associated hospitalization declined with increasing time since immunization, from 92.5% (95% credible interval [CrI]: 85.9, 96.4%) at 1 month, to 77.2% (95% CrI: 60.4, 87.6%) at 6 months, and 39.9% (95% CrI: 2.4, 63.3%) at 12 months post-immunization, after which effectiveness plateaued. Conclusions Nirsevimab remained effective against RSV-associated hospitalization through 6 to 12 months after immunization. Delayed RSV activity was associated with lower effectiveness, highlighting the importance of aligning administration with local RSV circulation.

07.
arXiv (math.PR) 2026-06-16

Probabilities

arXiv:2601.18853v4 Announce Type: replace-cross Abstract: Probabilities is the English translation of the book Probabilités Tome 1 and Tome 2. The mathematic content is authored by Prof. Jean-Yves Ouvrard. The English version has been done by his eldest son Dr. Xavier Ouvrard. This probability theory book covers not only an introduction to this field, but also advanced concepts based on measure theory. The first part introduces the fundamentals of probability theory across 7 chapters, targeting bachelor level, including event algebras, random variables, independence, conditional probabilities, moments of discrete and continuous random variables, generating functions, and limit theorems. The second part contains 10 chapters and corresponds to master level. Following a brief introduction to measure theory, this part develops more advanced topics: probability measures and their complements, distributions and moments of random variables, modes of convergence, laws of large numbers, conditional expectation, Fourier transforms and characteristic functions, Gaussian random variables, convergence of measures, convergence in distribution, discrete-time stochastic processes, martingales, and Markov chains. The reader's work is greatly facilitated by the inclusion, in every chapter, of numerous exercises, all accompanied by detailed solutions that often provide substantial extensions to the theoretical material.

08.
arXiv (CS.LG) 2026-06-24

Target-Aware Linear Regression Under Distribution Shift

arXiv:2606.22775v2 Announce Type: replace-cross Abstract: Distribution shift between training and deployment is a pervasive challenge for modern AI systems. In many cases, the target marginals of covariates and response are known or specified through population-level observations, boundary conditions, properties of simulator configurations, or alignment-time distributional constraints. Such knowledge may provide valuable side information for regression estimation. We study this problem in the multivariate linear regression setting with a stable conditional mean $E[Y\mid X]$ across source and target, and identify the hybrid-loss estimator, which jointly incorporates both target marginals, as a benchmark target-aware estimator. Its direct computation, however, requires solving a coupled nonlinear optimization that is expensive at scale. Our main contribution is to develop and evaluate two computationally tractable alternatives: a constrained moment-matching estimator and a two-stage estimator that augments ordinary least squares with a calibration step. For all three estimators, we derive and compare closed-form asymptotic mean squared errors, yielding conditions under which the tractable alternatives match or closely approximate the hybrid benchmark, and regimes in which they do not. Monte Carlo experiments across three controlled shift regimes validate the theoretical results, investigate the accuracy-runtime tradeoffs among the three estimators, and translate into guidance on estimator choice. In particular, the two-stage estimator nearly matches the hybrid benchmark in the high signal-to-noise regime at essentially no additional cost, providing theoretical grounding for empirical observations in nonlinear settings.

09.
arXiv (CS.CL) 2026-06-25

BitNet Text Embeddings

LLM-based text embedders have substantially improved retrieval and semantic representation quality, but their deployment remains costly: large backbone models slow down embedding inference, while high-dimensional full-precision embeddings impose substantial storage and bandwidth overhead on large-scale indexes. In this paper, we present BITEMBED, an extreme low-bit framework for LLM-based text embedding that jointly targets encoding efficiency and vector storage. BITEMBED converts pretrained LLM backbones into BitNet-style embedding encoders with ternary weights, quantized activations, and lightweight normalization refinement. The converted model is adapted to representation learning through continual contrastive pre-training, followed by supervised contrastive fine-tuning with both similarity-distribution distillation and attention-relation distillation from a full-precision teacher. Beyond quantizing the backbone, BITEMBED further trains output embeddings to support multiple storage precisions meeting different storage needs in various scenarios. Experiments on MMTEB (eng, v2) with Qwen3-0.6B and Gemma3-270M show that BITEMBED is largely comparable to full precision teacher embedders. Moreover, BITEMBED flexibly obtains text embeddings of various precisions, achieving a trade-off between performance and storage cost.

10.
arXiv (CS.AI) 2026-06-11

Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation

arXiv:2606.12006v1 Announce Type: cross Abstract: Predicting time-to-event outcomes such as mortality is a fundamental task in clinical decision-making, commonly addressed through survival analysis. While classical statistical and deep learning approaches have been widely studied, they typically require task-specific training and sufficient labeled data. Recent advances in tabular foundation models offer a new paradigm by learning general-purpose representations for structured data. However, their applicability to censored time-to-event prediction in clinical settings remains underexplored, as typical applications are restricted to discrete classification rather than survival analysis tasks. In this work, we propose a lightweight adaptation approach for applying tabular foundation models to clinical survival analysis by directly training a survival-aware head on top of the pretrained representations. We study representative architectures, including TabPFN, TabDPT, and TabICL, and adapt them using a multi-task logistic regression (MTLR) head to model right-censored time-to-event outcomes. We evaluate this approach on a diverse set of public survival benchmarks and two large-scale ICU cohorts, MIMIC-IV and eICU. Our results show that this transfer learning approach achieves competitive or superior performance compared to strong baselines. On MIMIC-IV, TabDPT-FT-MTLR reaches a C-index of 0.856, corresponding to a relative improvement of +1.4% over the best non-FM baseline (DeepSurv, 0.844) and +6.7% over the best zero-shot model (0.802). On eICU, TabICL-FT-MTLR achieves 0.797, yielding gains of +1.7% (DeepSurv, 0.784) and +6.4% (0.749), respectively. These findings highlight the importance of combining pretrained tabular representations with survival-aware objectives and suggest that tabular foundation models provide a practical and effective alternative for clinical survival prediction.

11.
arXiv (CS.AI) 2026-06-24

When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

arXiv:2407.18957v5 Announce Type: replace-cross Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large language model based agents. We have developed a multi-agent AI system called StockAgent, driven by LLMs, designed to simulate investors' trading behaviors in response to the real stock market. The StockAgent allows users to evaluate the impact of different external factors on investor trading and to analyze trading behavior and profitability effects. Additionally, StockAgent avoids the test set leakage issue present in existing trading simulation systems based on AI Agents. Specifically, it prevents the model from leveraging prior knowledge it may have acquired related to the test data. We evaluate different LLMs under the framework of StockAgent in a stock trading environment that closely resembles real-world conditions. The experimental results demonstrate the impact of key external factors on stock market trading, including trading behavior and stock price fluctuation rules. This research explores the study of agents' free trading gaps in the context of no prior knowledge related to market data. The patterns identified through StockAgent simulations provide valuable insights for LLM-based investment advice and stock recommendation. The code is available at https://github.com/MingyuJ666/Stockagent.

12.
arXiv (math.PR) 2026-06-25

An example of Ensemble Kalman Filter with resampling

arXiv:2606.25539v1 Announce Type: cross Abstract: This paper introduces the Exact Ensemble Kalman Filter (ExEnKF), a novel algorithm for state estimation in discrete-time nonlinear filtering problems with linear observations. Unlike traditional Ensemble Kalman Filters (EnKFs), which approximate the filtering distribution using ensembles of Dirac measures, the ExEnKF employs Gaussian measures, enabling more efficient exploration of the state space and potentially alleviating the curse of dimensionality. We prove the algorithm's asymptotic consistency with the optimal filter (Theorem 3.1), establishing a convergence rate of order 1/ $\sqrt$ N for N particles. Numerical experiments on the Lorenz-96 multiscale model demonstrate that the ExEnKF outperforms the standard EnKF under model misspecification and poor initialization, particularly in highly stochastic regimes. The algorithm's robustness is further highlighted by its ability to track hidden components of the true signal, even when observations are generated from a different model (e.g., multiscale vs. single-scale). This work advances the theoretical understanding of ensemble methods in nonlinear filtering and provides a practical alternative to sequential Monte Carlo methods for high-dimensional systems

13.
arXiv (CS.LG) 2026-06-18

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

arXiv:2511.05221v3 Announce Type: replace Abstract: Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $\alpha$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

14.
arXiv (CS.AI) 2026-06-16

Hybrid NARX-LLM for Greenland Iceberg Discharge: Prompt-Driven Residual Correction

arXiv:2606.15288v1 Announce Type: cross Abstract: Greenland iceberg discharge exhibits complex nonlinear dynamics with limited observability, challenging traditional predictive models. We present a Hybrid NARX-LLM framework that combines a nonlinear autoregressive model with exogenous inputs (NARX) and a large language model (LLM) for residual correction. We further propose a Physics-Informed Prompt (PIP) method that transforms unstructured physical knowledge into structured prompts for zero-shot in-context reasoning. The primary objective is to explore the corrective potential of this framework for modeling Greenland iceberg discharge, rather than merely optimizing predictive accuracy. The NARX component captures intrinsic temporal dependencies, while the LLM, guided by PIP, encodes glacier dynamics and environmental drivers and perceives key trend patterns to correct systematic prediction errors. This integration allows the model to reason about unmodeled factors and produce interpretable residuals, enhancing overall predictive accuracy. Applied to Greenland iceberg discharge time series, our approach addresses extreme events that are difficult to predict due to rare variations and nonstationary trends, a limitation often overlooked by traditional methods. By fusing structured time-series modeling with knowledge-driven foundation AI, the framework offers a scalable and interpretable pathway to bridge data-limited climate forecasting with physics-informed LLM reasoning. The code is available.

15.
arXiv (quant-ph) 2026-06-12

Coulomb crystallization of xenon highly charged ions in a laser-cooled Ca+ matrix

arXiv:2512.12266v2 Announce Type: replace-cross Abstract: We report on the sympathetic cooling and Coulomb crystallization of xenon highly charged ions (HCIs) with laser-cooled Ca$^+$ ions. The HCIs are produced in a compact electron beam ion trap, then charge selected, decelerated, and finally injected into a cryogenic linear Paul trap. There, they are captured into $^{40}$Ca$^+$ Coulomb crystals, and co-crystallized within them, causing dark voids in their fluorescence images. Fine control over the number of trapped ions and HCIs allows us to realize mixed-species crystals with arbitrary ordering patterns. By investigating Xe$^{q+}$–Ca$^+$ strings, we confirm the HCI charge states, measure their lifetime and characterize the mixed-species motional modes. Our system effectively combines the established quantum control toolbox for Ca$^+$ with the rich set of atomic properties of Xe highly charged ions, providing a resourceful platform for optical frequency metrology, searches for signatures of new physics, and quantum information science.

16.
arXiv (quant-ph) 2026-06-24

Suppressing Self-Discharging of Quantum Batteries by Cavity Interactions

arXiv:2606.23999v1 Announce Type: new Abstract: We analyse a two-cavity architecture, in which a lossy cavity hosting $N$ qubits is coherently coupled to an auxiliary cavity, as a resource for the storage phase of an open quantum battery at non-zero temperature. Within a local Lindblad treatment in the resonant configuration, we find that the inter-cavity coupling enhances the suppression of self-discharging across every initial preparation, battery size, and temperature we examine, with the protection degrading smoothly as the mean thermal occupation increases. For a single qubit, the energy-basis coherence of a pure superposition leads to better long-time retention than fully excited state, highlighting the beneficial role of quantum coherence in protecting stored energy against thermal degradation. For two-qubit batteries, Bell-state preparations exhibit enhanced long-time ergotropy retention compared with the fully excited state, while the inclusion of qubit-qubit interactions produces only a weak dependence on the interaction type and strength within the parameter regime considered. Extending the analysis to multi-qubit GHZ-charged batteries with all-to-all Heisenberg interactions, we find that the normalized retained ergotropy increases monotonically with the number of qubits. This behavior is consistent with the collective enhancement of the qubit-cavity coupling in the symmetric Dicke manifold, indicating that larger quantum batteries can benefit from improved protection against self-discharge. These findings establish cavity-assisted protection as a promising strategy for mitigating self-discharging and realizing of long-lived quantum batteries in experimentally accessible platforms.

17.
arXiv (CS.CL) 2026-06-25

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because LLM-as-a-judge reward models often exhibit stylistic biases and positional inconsistencies, leading to unstable supervision. To address this, we propose OPERA (Objective Perplexity-based Reflective Alignment), which replaces unreliable external judges with intrinsic rewards derived from perplexity dynamics. Specifically, we derive an intrinsic reward signal from perplexity dynamics, quantifying uncertainty reduction at critical reflective states. During the cold-start phase, we introduce a data synthesis method that leverages carefully designed guiding words to generate diverse reasoning traces, along with perplexity-prioritized rollouts that utilize internal log-probabilities to identify logically consistent reasoning branches. This pipeline yields a large-scale dataset comprising 20,000 high-quality reasoning trajectories. Empirical evaluations consistently demonstrate the scalability and efficacy of our approach in alignment for open-ended tasks. Implementing OPERA on Qwen3-8B establishes a new state-of-the-art among open-source models, achieving parity with or surpassing proprietary models like Gemini2.5 and MiniMax-M2.5 in some open-ended tasks. The code is available at https://github.com/pangpang-xuan/OPERA.

18.
arXiv (CS.AI) 2026-06-15

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

arXiv:2606.14418v1 Announce Type: new Abstract: We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction. Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning. Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.

19.
arXiv (CS.LG) 2026-06-12

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

arXiv:2510.16311v2 Announce Type: replace Abstract: Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

20.
arXiv (CS.CL) 2026-06-17

In-Context Environments Induce Evaluation-Awareness in Language Models

Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent evaluation awareness. This raises concerns that models could strategically underperform, or sandbag, to avoid triggering capability-limiting interventions such as unlearning or shutdown. Prior work demonstrates sandbagging under hand-crafted prompts, but this underestimates the true vulnerability ceiling. We introduce a black-box adversarial optimization framework treating the in-context prompt as an optimizable environment, and develop two approaches to characterize sandbagging: (1) measuring whether models expressing intent to underperform can actually execute it across different task structures, and (2) causally isolating whether underperformance is driven by genuine evaluation-aware reasoning or shallow prompt-following. Evaluating Claude-3.5-Haiku, GPT-4o-mini, and Llama-3.3-70B across four benchmarks (Arithmetic, GSM8K, MMLU, and HumanEval), optimized prompts induce up to 94 percentage point (pp) degradation on arithmetic (GPT-4o-mini: 97.8\%$\rightarrow$4.0\%), far exceeding hand-crafted baselines which produce near-zero behavioral change. Code generation exhibits model-dependent resistance: Claude degrades only 0.6pp, while Llama's accuracy drops to 0\%. The intent – execution gap reveals a monotonic resistance ordering: Arithmetic $

21.
arXiv (CS.AI) 2026-06-19

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

arXiv:2606.19522v1 Announce Type: new Abstract: The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer's disease (AD). A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning. We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk. We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines. By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.

22.
arXiv (CS.AI) 2026-06-24

Skills for the future software profession: beyond agentic AI!

arXiv:2606.21894v2 Announce Type: replace-cross Abstract: As coding agents are rapidly changing software engineering, a natural question is: what are the core skills needed by future software engineers? To identify where software engineering is headed and thus what skills will be needed, we summarize the results of two round-tables with researchers and industrial practitioners, held in 2026 in New York and Singapore. One key finding is that verification and validation is increasing in importance as agents handle implementation, as highlighted by anecdotes from the events. From our observations, we identify the skills developers need in the agentic era of development, with implications for training and educating future software engineers in coming years.

23.
arXiv (CS.LG) 2026-06-11

Flow Matching with In-Context Priors for Out-of-Distribution Brain Dynamics

arXiv:2606.11833v1 Announce Type: new Abstract: Flow matching and diffusion models enable conditional generation across domains ranging from images to proteins, with recent extensions to out-of-distribution contexts. Yet generative models of neural time series have largely remained restricted to categorical conditioning, precluding compositional and zero-shot generalization. In this work, we propose a per-timestep conditioned diffusion transformer for generating realistic fMRI brain dynamics during unseen cognitive tasks by injecting both compositional language and optional spatial priors in-context. Such zero-shot generation could enable counterfactual neuroscience by supporting in-silico design and evaluation of novel cognitive experiments before empirical validation. Leveraging this model, we evaluate across hundreds of held-out task conditions and characterize predictive performance in relation to the training manifold. From language alone, the model recovers region-specific recruitment across tasks and held-out spatial activation patterns. Spatial priors, when available, complement the text pathway by anchoring generation in regions of task space where language alone degrades, while retaining the compositional structure needed for counterfactual task specification. To our knowledge this is the first generative model of whole-cortex fMRI dynamics for unseen cognitive tasks, advancing counterfactual neuroscience and data-driven experimental design.

24.
medRxiv (Medicine) 2026-06-22

AI-driven Multimodal Representation Learning for Latent Mediation Structure Discovery of Socioeconomic Disadvantage, Psychosocial Factors, and Cardiometabolic Multimorbidity

作者:

Social disadvantage is associated with multimorbidity, but the pathways linking social conditions to disease burden remain poorly understood. We developed an AI-driven multimodal mediation framework that integrates socioeconomic, psychosocial, clinical, laboratory, behavioral, and genomic data from the All of Us Research Program. Modality-specific variational autoencoders were used to derive latent representations of each data domain, and mediation analyses were subsequently performed in latent space to evaluate indirect associations between socioeconomic disadvantage, psychosocial factors, and multimorbidity. The final analytic cohort included 20,804 participants with complete multimodal data. Across 800 exposure–mediator–outcome combinations, mediation signals were concentrated within a small number of latent dimensions. The strongest indirect association linked a socioeconomic disadvantage dimension, a psychosocial vulnerability dimension, and a cardiometabolic multimorbidity dimension (NIE = 0.002517). The psychosocial dimension was characterized by poorer mental health, greater loneliness, lower social well-being, and lower health literacy, whereas the outcome dimension was associated with hypertension, diabetes, hyperlipidemia, obesity, chronic kidney disease, and heart disease. Bootstrap analyses supported the stability of the leading pathway. These findings suggest that psychosocial vulnerability may contribute to the association between socioeconomic disadvantage and cardiometabolic multimorbidity. More broadly, the proposed framework illustrates how AI-based representation learning can be used to investigate complex relationships across high-dimensional multimodal health data.

25.
arXiv (CS.CL) 2026-06-15

Same-Origin Policy for Agentic Browsers

Agentic browsers integrate autonomous AI agents into web browsers, enabling users to accomplish web tasks through natural-language instructions. The same-origin policy (SOP) is a fundamental browser security mechanism that prevents unauthorized automated cross-origin data flows induced by scripts. However, whether SOP remains effective in agentic browsers is an open question that has not been systematically studied. In this work, we bridge this gap. We first observe that an agentic browser can itself serve as an automated channel for cross-origin data flows, potentially leading to SOP violations. To investigate this phenomenon, we construct SOPBench, a benchmark for evaluating SOP violations in agentic browsers. Our evaluation shows that existing agentic browsers frequently violate SOP, both in benign settings and under attacks. To address this problem, we propose SOPGuard, an SOP enforcement mechanism tailored to agentic browsers. We implement SOPGuard in BrowserOS, an open-source agentic browser. Extensive evaluations demonstrate that SOPGuard effectively enforces SOP while preserving utility and incurring only a small runtime overhead. Our code and data are available at https://github.com/wxl-lxw/BrowserOS-SOPGuard.