Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

ExpRL: Exploratory RL for LLM Mid-Training

arXiv:2606.17024v1 Announce Type: new Abstract: Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through mid-training on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive coverage is enough for much harder problems, which require combining these skills into broader solution strategies. We study a more automated approach: RL-based mid-training using large corpora of human-written question-answer data. Rather than treating reference solutions as targets to imitate, our method, ExpRL, uses them as reward scaffolds: references are hidden from the policy and used only to construct problem-specific grading rubrics for judging on-policy reasoning traces. The policy samples from the original problem prompt, while an LLM judge compares the sampled reasoning trace against the reference solution and assigns outcome-level or process-level dense rewards. This lets ExpRL reinforce partial progress, useful intermediate reductions, and productive reasoning behaviors that sparse final-answer rewards often fail to upweight. On challenging math reasoning tasks, ExpRL yields stronger RL priming than SFT, sparse-reward GRPO, and self-distillation, and provides a better initialization for subsequent sparse-reward RL. Additional mixed-domain experiments further suggest that ExpRL can extend beyond the original math-only setting.

02.
arXiv (CS.AI) 2026-06-12

A Tutorial on World Models and Physical AI

作者:

arXiv:2606.12783v1 Announce Type: new Abstract: World modeling is emerging as a central principle for building intelligent systems capable of prediction, reasoning, and decision making. A central distinction can be drawn between explicit world models, which learn structured dynamics for rollout-based reasoning and planning, and implicit world models, which encode predictive structure within scalable learned representations. These complementary paradigms provide a foundation for physical AI in domains such as robotics and autonomous driving, enabling intelligence beyond reactive control under real-world constraints. Recent foundation models further suggest a pathway toward unified systems integrating perception, prediction, and action. Despite rapid progress, major challenges remain in hierarchical reasoning, long-horizon planning, and autonomous goal formation, which are critical for advancing toward artificial general intelligence. This tutorial presents a coherent framework in which diverse world modeling approaches are unified through shared predictive structure and differentiated by how such structure is represented and exploited.

03.
arXiv (CS.LG) 2026-06-12

Interpretable Factor Decomposition for Decision Intelligence in Large-Scale Financial Markets: Evidence from China's A-Share Market

arXiv:2606.12843v1 Announce Type: new Abstract: We present an interpretable machine learning pipeline to decompose Cross-Sectional Equity Return Predictability into auditable factor contribution. We apply an XGBoost model with TreeSHAP attribution and conduct stress testing on 3632 Chinese A-share stocks from 2009 until 2019. Using 60-month, rolling windows over 55 months of out-of-sample data, XGBoost obtains a mean AUC of 0.547 and +2.38%/month (Newey-West t = 5.94; Annualized Sharpe 2.23) long-short spread for the top vs bottom quintiles. This alpha is persistent after adjusting for the Carhart four-factor model (+2.31%/month; t = 7.48). SHAP Decomposition indicates that behavioral signals (turnover and momentum) account for 58.2% of predictive attribution compared to 10.7% for valuation ratios, on average, across 55 industry groups. Ablation analysis serves to cross-validate this ranking and provides evidence that SHAP and ablation diverge in a manner that highlights feature substitutability structure that is largely invisible to either method used in isolation.

04.
arXiv (CS.LG) 2026-06-16

Leveraging Physiological Signals to Predict Exam Outcomes with Machine Learning

arXiv:2606.14960v1 Announce Type: new Abstract: This study investigates the application of machine learning models to predict exam outcomes using physiological data collected during examination sessions. Physiological stress indicators, including electrodermal activity, heart rate, and skin temperature, were analyzed to uncover their association with academic performance. A variety of machine learning approaches were employed, ranging from standard models like logistic regression, random forest, and support vector machines to more advanced architectures, including transformers, long short-term memory (LSTM), and gated recurrent unit (GRU) models. This diversity aimed to capture the complex interactions within the data effectively. A key focus was assessing the adaptability of transformers in processing numerical data and evaluating their performance in this novel context. Standard performance metrics, such as accuracy, precision, recall, and F1-score, were used to compare model efficacy. The experimental results demonstrate that while deep learning models generally excel at capturing complex relationships in physiological data, simpler models like random forests can sometimes achieve superior performance while offering computational efficiency and interpretability. Furthermore, transformers demonstrated notable versatility, showcasing performances comparable to those of the LSTM and GRU models. This research underscores the importance of experimenting with a broad class of models that align with the objectives of the problem at hand, balancing precision, efficiency, and interpretability. By elucidating the relationships between physiological signals and academic performance, this study contributes to understanding stressors affecting students' mental health. It further promotes leveraging physiological data to enhance student well-being and academic outcomes.

05.
arXiv (CS.CL) 2026-06-19

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

Effective Automated Essay Scoring (AES) are expected to support both reliable assessment and actionable instructional feedback. However, existing approaches often treat scoring and feedback as separate components: neural scoring models provide limited interpretability, while Large Language Model (LLM)-based feedback is typically insensitive to learners proficiency levels. To address this fragmentation, this work proposes PsyScore, a psychometrically-aware framework that integrates diagnostic assessment with instructional scaffolding through a shared latent ability representation. PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric interpretability, a ZPD-Scaffolded Feedback Generator, which conditions multi-agent feedback strategies on the diagnosed ability parameter to adapt instructional focus across different proficiency levels, and a Multi-Perspective Feedback Evaluation Strategy that assesses feedback quality via pairwise preference judgements and student revision simulations. Experiments on the ASAP++ dataset demonstrate that PsyScore achieves competitive scoring performance while providing more pedagogically aligned feedback.

06.
medRxiv (Medicine) 2026-06-15

Genome-wide colocalization of body fat distribution GWAS and subcutaneous adipose eQTLs identifies SNX10, DGKQ, and CBX3 as candidate causal genes for cardiometabolic disease

作者:

Background: Genome-wide association studies (GWAS) have identified hundreds of loci associated with body fat distribution, yet the causal genes and regulatory mechanisms through which these variants exert their effects remain largely unknown. Expression quantitative trait locus (eQTL) colocalization provides a powerful framework for identifying genes whose expression is genetically coregulated with complex traits. Methods: We performed a genome-wide colocalization analysis integrating waist-hip ratio adjusted for body mass index (WHRadjBMI) GWAS summary statistics from 694,649 individuals (Pulit et al., 2019) with subcutaneous adipose tissue eQTLs from the Genotype-Tissue Expression (GTEx) Project v8 (N = 581 donors). GWAS coordinates were lifted from GRCh37 to GRCh38 to enable direct alignment with GTEx data. We incorporated CAVIAR fine-mapping results to overcome the limitation of FDR-significant eQTL filtering. Colocalization was assessed using the approximate Bayes factor framework (coloc.abf) across 335 independent genome-wide significant loci. Results: Of 2,897 locus-gene pairs tested, 489 (16.9%) showed strong colocalization (PP.H4 > 0.8) and 618 (21.3%) showed moderate evidence (PP.H4 > 0.5). The strongest colocalization was observed for SNX10 (sorting nexin 10; PP.H4 = 1.000), a recently characterized regulator of adipocyte differentiation and female-specific diet-induced obesity. Other top hits included DGKQ (diacylglycerol kinase theta; PP.H4 = 0.9999999), an emerging pharmacological target for insulin resistance, and CBX3 (chromobox 3; PP.H4 = 0.9999974), an epigenetic regulator linked to cardiovascular disease. Established adiposity genes including GRB14 (PP.H4 = 0.681) and KLF14 (PP.H4 = 0.590) were recovered, validating our approach. Several loci exhibited extensive allelic heterogeneity, with 50 genes colocalizing at a single chromosome 3 locus. Conclusions: Our analysis provides a comprehensive map of adipose tissue gene regulatory mechanisms underlying genetic risk for body fat distribution. The identification of SNX10, DGKQ, and CBX3 as high-confidence candidate causal genes advances the translation of GWAS associations into mechanistic understanding and therapeutic targets for obesity-related cardiometabolic disease.

07.
arXiv (CS.AI) 2026-06-18

Towards Multi-Agent-Simulation-Based Community Note Evaluation

arXiv:2606.18268v1 Announce Type: cross Abstract: Community-based fact-checking that relies on cross-consensus is expanding rapidly on social media platforms. However, the delay and low-ratio of cross-consensus community fact-checks rated by human contributors remains a significant challenge. To address this, we first created ComRate, a large-scale dataset comprising 2.5 million community notes and over 209 million ratings sourced from $\mathbb{X}$. We then propose MultiCom, a persona-guided multi-agent rating framework for community note evaluation. MultiCom simulates diverse rater population by clustering contributors in a matrix-factorized rater space and prompting persona agents to generate structured assessments based on the official community notes rating schema. These agents output structured and explainable judgments, such as confidence, agreement signals and reasons. An out-of-fold calibrated aggregation algorithm combines features such as raw votes and diagnostic reason signals for reliable prediction. Extensive evaluations demonstrate that MultiCom outperforms alternative methods, achieving an average accuracy of 84.7% (balanced accuracy 68.3%, macro-F1 60.1%) on the evaluation set.

08.
arXiv (CS.LG) 2026-06-18

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

arXiv:2606.18464v1 Announce Type: cross Abstract: Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

10.
arXiv (CS.CL) 2026-06-16

SkillWiki: A Living Knowledge Infrastructure for Agent Skills

While knowledge is managed through Wikipedia and software through GitHub, agent skills still lack an infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure that supports the organization, grounding, and continuous evolution of agent skills by transforming heterogeneous knowledge into reusable skill assets linked to their originating evidence. Our demonstration presents the complete skill lifecycle, from knowledge ingestion and skill production to provenance-aware exploration, governance, and execution-driven evolution. SkillWiki highlights a future in which knowledge, skills, and execution experience co-evolve within a shared infrastructure. The live demonstration and source code are publicly available at https://github.com/Huangdingcheng/SkillWiki.

11.
arXiv (CS.CV) 2026-06-16

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with other efficient solutions, \textsc{Light Forcing} further achieves a $2.0{\sim}3.0\times$ end-to-end speedup across diverse GPUs (e.g., 27.4\,FPS on RTX 5090 and 33.9\,FPS on H100). Code is released via this \href{https://github.com/chengtao-lv/LightForcing}{link}.

12.
Nature (Science) 2026-06-11

Daily briefing: Deep-sea whale graveyard is a treasure trove of fossils

作者:

Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better. Researchers have uncovered more than 400 fossilized whale bones in an ocean-floor chasm. Plus, the working lives of scientists, in pictures, and how AI could slow the pace of research publication for the better.

13.
arXiv (CS.LG) 2026-06-18

Signature filtering: a lightweight enhancement for statistical watermark detection in large language models

arXiv:2606.18430v1 Announce Type: new Abstract: Statistical watermarks help organizations attribute large language model (LLM) outputs, yet existing detectors often struggle when watermark signals are weak, texts are repetitive, or watermarks are edited. We propose signature filtering, a detection-time module that enhances watermark detection without modifying watermark embedding and text generation. It learns a small set of ``signature'' tokens whose presence makes watermark tests unreliable, and removes these tokens before detection. The signatures are obtained by solving a mixed-integer linear program on a small training set, with constraints that maximize the true positive rate. We additionally derive finite-sample and asymptotic bounds under several attacker models (color-blind, color-adaptive, and distributionally correlated). On four well-known watermark families (Kgw, Sweet, Unigram, Exp), four benchmark corpora (C4, MBPP, HumanEval, Code-Search-Net), and six LLMs (Opt-1.3b, Opt-6.7b, Llama2-13b, Llama3.1-8b, Qwen2.5-14b, Phi-3-medium-14b), 2- and 3-gram signatures raise detection rates in weak-signal and low-entropy settings from 8~31% without filtering to 78~99% with filtering, while keeping false positives controllable and often negligible. In stress tests where we scramble sentences and perturb 25~50% of tokens by dilution, deletions, and substitutions, 2-gram filters for Kgw-style watermarks preserve most of the clean-text detection gains, often matching or outperforming the advanced WinMax watermark detector. Signature filtering thus provides a simple, scalable, and model-agnostic add-on to strengthen watermark-based provenance checks for LLM text in information processing workflows.

14.
arXiv (CS.CL) 2026-06-12

Detecting Functional Memorization in Code Language Models

Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap.

15.
arXiv (CS.CV) 2026-06-19

TriFlow: Generating Artist-Like 3D Mesh Topology via Nearest-Vertex Vector Fields

We present TriFlow, a new generative approach for producing compact 3D meshes with artist-like triangle topology directly from input geometry conditions such as signed distance fields. Our key insight is to represent mesh topology as a nearest-vertex vector field (NVF) defined over the surface, where each point encodes its association to the nearest triangle vertex in the local barycentric frame. We train a latent flow-matching model to synthesize this field, enabling topology generation conditioned on the input geometry. To extract a coherent mesh, we cluster surface regions using the generated NVF and guide a constrained quadric error metric (QEM) mesh simplification with topology-aware optimization. This yields output meshes that closely match the input geometry while exhibiting structured, artist-like connectivity. Experiments demonstrate that TriFlow achieves stronger generalization and significantly improved topology quality compared to state-of-the-art learning-based approaches, alongside 90% lower Chamfer Distance and an 8x speedup.

16.
arXiv (CS.CV) 2026-06-15

HARBOR: Heading Analysis and Reconstruction from Behavioral Observation and Radar

Maritime situational awareness often relies on Automatic Identification System (AIS) transmissions to track vessel movements. However, in operational or conflict scenarios, these data may be unavailable due to signal loss, deliberate deactivation, or intentional spoofing. In such conditions, synthetic aperture radar (SAR) imagery becomes a critical sensing alternative for wide-area maritime monitoring, despite providing only static scene snapshots. This work introduces HARBOR (Heading Analysis and Reconstruction from Behavioral Observation and Radar), a complete pipeline for transforming a single SAR image into predictive motion information without requiring any auxiliary data source at inference time. The method begins with SAR image preprocessing to enhance and segment vessel candidates, followed by automatic detection, size-based classification, and heading estimation using skeleton geometry and local intensity patterns. AIS data are used exclusively during an offline calibration phase to derive vessel-type-dependent motion parameters, which are then applied to generate probabilistic heatmaps of candidate future vessel positions. A case study using real COSMO-SkyMed SAR imagery demonstrates the pipeline on a maritime scene in southern Brazil, showing its ability to extract motion tendencies and generate probabilistic projections of vessel positions in data-denied environments.

17.
arXiv (CS.AI) 2026-06-16

Decision-Weighted Flow Matching for Contextual Stochastic Optimization

arXiv:2606.16790v1 Announce Type: cross Abstract: Conditional generative models are increasingly used as scenario generators for stochastic optimization, but standard training objectives emphasize uniform distributional fit rather than the downstream decisions induced by generated scenarios. This creates an objective mismatch: errors in statistically common regions may have little effect on decision regret, whereas errors in decision-sensitive regions can substantially change the optimal action. We propose Decision-Weighted Flow Matching (DW-FM), a regret-aligned training framework that preserves the simplicity of standard flow matching while reweighting its velocity-regression objective using decision-sensitive endpoint information. Theoretically, we connect downstream regret to pathwise velocity mismatch through a loss-induced decision discrepancy and an adjoint transport argument, yielding an ideal regret-aligned surrogate and practical endpoint-weighted objectives with regret guarantees. Empirically, we demonstrate the effectiveness of DW-FM on three CVaR-based contextual stochastic optimization benchmarks spanning synthetic portfolio, semi-real financial, and traffic-CVaR tasks, where DW-FM improves downstream regret over standard baselines.

18.
arXiv (CS.CL) 2026-06-11

Beyond representational alignment with brain-guided language models for robust reasoning

The correspondence between large language models (LLMs) and the neural mechanisms underlying human higher-order cognition remains insufficiently characterized. Given that language and reasoning in the human brain appear dissociable, an open question is whether LLMs align with neural signals from reasoning-related regions and whether such signals can improve them. Here, focusing on deductive reasoning, we show that LLM internal representations are not only partially aligned with task-fMRI activity but can also be directly enhanced by these signals. Using a neural-predictivity metric, we find that LLMs explain a substantial fraction of the explainable variance in reasoning-related regions at the aggregate level, whereas predictivity within specific reasoning types is lower, indicating both alignment and divergence. Building on this, we propose a brain-guided framework: we steer model representations along directions induced by the joint structure of model and brain representations, applying intervention at inference and fine-tuning during training. We demonstrate that task-evoked brain signals can directly enhance LLM reasoning, yielding gains orthogonal to language-only supervision across 10 LLMs (1.5B-72B), with transfer across reasoning types and up to 13\% absolute accuracy gain. Our results advance LLM-brain correspondences from correlation to guidance, establishing a brain-signal-driven pathway toward more robust and cognitively aligned AI.

19.
arXiv (CS.CL) 2026-06-12

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset

Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th–117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.

20.
arXiv (CS.AI) 2026-06-11

Sparsified Kolmogorov-Arnold Networks for Interpretable Quantum State Tomography

arXiv:2606.11814v1 Announce Type: cross Abstract: Machine-learning approaches to quantum state tomography can achieve high reconstruction fidelity, but the physical structure used by the trained model often remains implicit. Here we ask whether a sparsified Kolmogorov-Arnold Network (KAN) can be used not only as a regressor, but also as an inspectable reconstruction rule whose internal organization can be checked against known Pauli structure. We study a controlled three-qubit GHZ-family benchmark in which all 63 non-identity Pauli expectation values are used to reconstruct three GHZ-subspace variables: the population imbalance $z$, the real off-diagonal component $c$, and the imaginary off-diagonal component $s$. Under finite-shot sampling and depolarizing noise, external ablation identifies the extended 12-channel GHZ-relevant Pauli set from the 63 measurements, with exact top-12 recovery across the tested shot counts and depolarizing-noise strengths. These support patterns remain stable across multi-seed random-initialization and noise-level analyses, and collapse under random-label controls. The dominant pruned input-hidden-output pathways organize Z-type population observables and X/Y off-diagonal observables in a pattern consistent with the analytic GHZ Pauli grouping, and sparse formula recovery recovers the canonical signed Pauli relations. The contribution of the KAN is therefore pathway-level structural interpretability within a neural reconstruction model, rather than superior sparse regression. Together with negative controls, these probes provide a consistency chain for auditing learned reconstruction rules against known physical structure.

21.
arXiv (CS.CV) 2026-06-15

Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

Personalizing Image-to-Video (I2V) diffusion models with specific visual effects is increasingly demanded for high-end video generation. Current practice requires training a separate Low-Rank Adaptation (LoRA) module for each effect, incurring substantial data curation and iterative optimization costs that hinder interactive control. We present Prompt2Effect, a weight-driven hypernetwork that amortizes per-effect training by directly synthesizing effect-specific LoRA weights in a single forward pass. Unlike prior hypernetworks that regress adapter weights purely from semantics, Prompt2Effect is explicitly conditioned on the frozen base model weights, grounding weight prediction in the structural geometry of each layer. Furthermore, instead of predicting raw LoRA matrices, we introduce an SVD-canonicalized parameterization that resolves factorization ambiguity and stabilizes large-scale weight synthesis. Together, these design principles enable accurate and scalable LoRA prediction for high-dimensional I2V diffusion models. Extensive experiments demonstrate that Prompt2Effect achieves on-par or superior video quality and effect alignment compared to conventional LoRA fine-tuning, while reducing the computational cost from 56 GPU training hours to 3.3 seconds of hypernetwork inference. When used as initialization for subsequent fine-tuning, our predicted weights further improve final performance and accelerate optimization by approximately 10x.

22.
arXiv (CS.LG) 2026-06-16

Priority-Aware Shapley Value

arXiv:2602.09326v2 Announce Type: replace Abstract: Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

23.
arXiv (CS.AI) 2026-06-18

Code-Augur: Agentic Vulnerability Detection via Specification Inference

arXiv:2606.18619v1 Announce Type: cross Abstract: The advent of agentic vulnerability detection is already becoming a watershed moment for software security. Audits conducted entirely by autonomous LLM agents are uncovering critical vulnerabilities in fundamental software underpinning digital society. Many of these vulnerabilities remained masked for years, surfacing only now with AI agents. Yet the reasoning behind these discoveries remains alarmingly opaque and unvalidated. What assumptions did the agent make about a function's inputs when it deemed that function to be secure? Failures in reasoning and incorrect assumptions can lead to missed vulnerabilities and reduce trust in agentic analysis. We propose a security-specification-first paradigm that (1) exposes the agent's tacit assumptions explicitly as security specifications and (2) continuously refines those specifications via runtime falsification. We realize our approach in Code-Augur, a novel harness for agentic vulnerability detection. Given a codebase, Code-Augur analyzes each component of the system for vulnerable code. When it deems a component to be secure, it commits the local invariants behind that judgment as in-source assertions. In parallel, Code-Augur leverages a guided fuzzer to attempt to falsify those assumptions. When the fuzzer triggers an assertion, this either reveals a genuine vulnerability or a flawed specification to refine. In both cases, this process grounds the agent's understanding, aligning its view of code intent with how the code actually behaves. On real-world subjects, Code-Augur effectively leverages security specifications to detect more vulnerabilities than other state-of-the-art agents. Additionally, Code-Augur found 22 new vulnerabilities in key open-source projects. Compared to curated specialized models like Claude Mythos, Code-Augur offers effective agentic vulnerability detection built on widely available LLMs like Sonnet and DeepSeek.

24.
PLOS Computational Biology 2026-06-15

Environmental “knees” and “wiggles” as strong stabilizers of species’ range limits set by interspecific competition

by Farshad Shirani, Benjamin G. Freeman Whether interspecific competition is a major contributing factor to setting species’ range limits has been debated for a long time. Theoretical studies have proposed that the interactions between interspecific competition and disruptive gene flow along an environmental gradient can halt range expansion of ecologically similar species where they meet. However, the stability of such range limits has not been well addressed. We use a deterministic mathematical model of adaptive range evolution over a continuous habitat to show that the range limits set by interspecific competition are unlikely to be evolutionarily stable if the environmental optima for fitness-related traits vary (almost) linearly in space. That is, in a linear environment without a dispersal barrier or a third (or more) species, the range borders formed between two competing species constantly move towards the weaker species. We demonstrate that environmental nonlinearities such as “knees” and “wiggles”—wherein an isolated sharp change or a step-like change occurs in the steepness of a trait optimum—can strongly stabilize competitively formed range limits. The stabilization mechanism relies on the contrast that such nonlinearities create in the level of disruptive gene flow to the peripheral population of each species, and succeeds when an additional process, such as Allee effects, prevents the establishment of an infinitesimal population in the presence of an abundant competitor. We show that the stability of the range limits at these nonlinearities is robust against moderate environmental disturbances. Whether strong disturbances such as rapid high-amplitude climate changes can destabilize such range limits depends on how the competitive dominance of the species changes across the nonlinearity. Therefore, our findings underscore the importance of assessing species’ competitive ability when predicting responses to climate change, and identify geographic regions where established range limits are likely to persist as well as regions where shifting limits may eventually stabilize.

25.
arXiv (CS.LG) 2026-06-15

Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches

arXiv:2606.13780v1 Announce Type: cross Abstract: Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46\sigma$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10\sigma$ excesses and $\approx5\sigma$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.