Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-15

HierSVA: A Data Synthesis Pipeline, Dataset, and Benchmark for LLM-Driven Hierarchical Hardware Formal Verification

arXiv:2606.13706v1 Announce Type: cross Abstract: We present HierSVA, an integrated suite that combines a pipeline, dataset, and benchmark for LLM-driven hierarchical hardware formal verification. HierSVA-SP pairs an RTL preprocessing toolchain with an LLM-in-the-loop formal verification flow to produce reference SystemVerilog Assertions (SVA) on hierarchical RTL. Applying it to BaseJump STL yields HierSVA-DS, a dataset of 342 modules, with hierarchy metadata and depths 0–9, accompanied by a deep subset of 28 module-bug pairs with natural-language specifications and bug variants. HierSVA-B decomposes assertion quality into six metric axes: syntax correctness, assertion proof success rate, vacuity, specification faithfulness, mutation coverage, and formal core coverage. Applying HierSVA-B to twelve recent LLMs reveals three findings. First, the module-level compile rate is 67.1\%; among generated assertions in evaluable runs, 82.1\% prove non-vacuously, but the corresponding assertion sets detect only 70.2\% of eligible injected faults and cover 36.2\% of the formal core. Second, on 211 evaluable model–module entries in the deep subset, assertion sets flag buggy RTL with 0.87 recall, but 40\% of predicted-buggy outcomes are false positives on correct RTL, limiting precision to 0.60. Third, agentic mode improves S1-style provability and strength metrics, but gains plateau and oscillate. Codes and artifacts are available at \href{https://github.com/HierSVAAnon/HierSVACodeAndArtifacts}{https://github.com/HierSVAAnon/HierSVACodeAndArtifacts}. Dataset is available at \href{https://huggingface.co/datasets/AnonymousHierSVA/HierSVA}{https://huggingface.co/datasets/AnonymousHierSVA/HierSVA}.

02.
arXiv (CS.CL) 2026-06-18

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

Counterfactual story retelling exposes LLM shortcomings in constrained narrative solution spaces where they can no longer rely on recalling memorised training data. Ground-truth-based post-training, such as SFT, fails to teach LLMs how to generate logical and rational narrative events. In this paper, we introduce Retell, Reward, Repeat (RRR), an RL-based pipeline synthesising Structuralist Narratology with scalar narrativity to teach storytelling structure. We extend the TimeTravel dataset with human-annotated stages of narrative equilibrium to evaluate reward models. By using d-RLAIF, RRR derives training signals from the narrativity of textual features without the need for reference outputs. Evaluations demonstrate that RRR-trained LLMs outperform few-shot and SFT baselines in logic, rationality, and completeness, with output quality additionally validated by blind human preference. Relying on a small, query-only dataset, RRR provides a linguistically grounded, cost-effective post-training mechanism for storytelling–a domain currently lacking effective post-training methods. RRR highlights the continued relevance of integrating established linguistic theories into contemporary NLP.

03.
arXiv (CS.CL) 2026-06-24

QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages

Tokenization is a foundational step in NLP pipelines, yet standard evaluation metrics such as fertility rate fail to capture morphological correctness for agglutinative languages. We present QuechuaTok, a systematic benchmark comparing four tokenization strategies - BPE, Unigram LM, WordPiece, and a morphology-aware PRPE tokenizer - for Southern Quechua (quz), a low-resource agglutinative language spoken by 8-10 million people in South America. Using a 200k-sentence corpus and the SQUOIA finite-state morphological analyzer (Rios, 2016) as silver standard, we evaluate three metrics: fertility rate, OOV rate, and morphological boundary accuracy (MorphAcc). Our results show that BPE achieves the lowest fertility rate (1.636 at 16k vocab) by memorizing surface word forms, while achieving only 6.67% MorphAcc. PRPE achieves 83.33% MorphAcc - the highest of all systems - demonstrating that fertility rate alone is insufficient to evaluate tokenizers for agglutinative languages. All code and models are publicly available at kaggle.com/code/macmaky/quechuatok

04.
arXiv (CS.CL) 2026-06-16

Data Augmentations for Data-Constrained Language Model Pretraining

As AI labs approach a data ceiling where compute capacity outpaces the rate of new high-quality text generation, language model pretraining is shifting toward a data-constrained, compute-abundant regime that demands productive multi-epoch training on fixed corpora. Standard autoregressive (AR) pretraining overfits severely in this setting, reaching its optimum early and then continuously deteriorating. We investigate data augmentation as a regularizer to mitigate this overfitting and enable productive training for hundreds of epochs on the same data. We introduce three orthogonal categories of augmentation for AR pretraining: token-level noise (masking, random replacement), sequence permutations (right-to-left prediction, Fill-in-the-Middle), and target offset prediction ($x_{t+i}$ for $i > 1$). Through systematic ablations, we find that individual augmentations delay overfitting and lower validation loss relative to the baseline, with random token replacement achieving the best minimum loss among individual methods. Combining augmentation categories further lowers the minimum validation loss. Our experiments demonstrate that data augmentations mitigate AR pretraining's data inefficiency and offer a promising solution to the data-constrained regime. All code and data are available at https://github.com/michaelchen-lab/data-augmentations-for-pretraining

05.
arXiv (CS.AI) 2026-06-18

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

arXiv:2605.17986v3 Announce Type: replace-cross Abstract: AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email, downloaded files, webpages, repositories, or group-chat messages. Existing evaluations are often small, purely simulated, or focused on a narrow set of channels. We introduce LivePI (Live Prompt Injection), a structured benchmark for IPI risk in a production-like but test-controlled environment. LivePI covers seven input surfaces, twelve attack/rendering families, and five malicious goals, including protected-information exfiltration, unauthorized security-control changes, unsafe code retrieval or execution, inbox-summary exfiltration, and cryptocurrency transfer. We run LivePI on a real virtual machine with live but test-controlled email, chat, web, local-file, repository, and wallet interfaces. Across GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5, total attack success rates range from 10.7% to 29.6%. Group-chat injection is uniformly successful across the evaluated backbones in our deployment, and repository-link attacks produce high-severity failures despite a small denominator. We also evaluate a two-layer defense consisting of prompt-level filtering and pre-execution tool-call authorization. In the GPT-5.3-Codex setting, the defense intercepts all tested malicious-goal completions in LivePI before execution while preserving benign utility on PinchBench-derived workloads.

06.
arXiv (CS.CV) 2026-06-11

Vision Transformers for Face Recognition Need More Registers

Recent advances in Vision Transformers (ViTs) for face recognition (FR) have moved beyond the standard CLS-token paradigm. In this paradigm, a special classification token (CLS) is prepended to the patch embeddings and used as a representation of the input for downstream tasks. An alternative approach, Concatenated Patch Embeddings (CPE), instead leverages all patch tokens by concatenating them into a single vector, which is then projected into a compact face representation. CPE has been shown to improve recognition performance in comparison to CLS-based ones, but our qualitative analysis of attention maps showed the presence of artifacts that limit their interpretability. To address this issue, we incorporate register tokens, learnable tokens concatenated to the initial patch embeddings, and processed jointly through the ViT encoder blocks. This mechanism has been shown to produce more structured and interpretable attention maps compared to baseline ViT. We empirically demonstrate that these artifacts consistently appear across various ViT backbones, including small and large models, and that introducing register tokens effectively mitigates them. Adding four or eight registers significantly enhances interpretability, with eight registers providing the highest verification accuracies and smoothest attention structures. Our resulting model, ViT-8R, corresponds to a CPE-based ViT-B architecture augmented with eight register tokens achieves state-of-the-art performance among ViT-based FR models on large-scale IJB-B and IJB-C benchmarks. Also, ViT-8R produces substantially clearer attention maps compared with the baseline model, which offer deeper insight into the model's attention behavior (https://github.com/TaharChettaoui/ViT-FR-Registers)

07.
arXiv (CS.CL) 2026-06-16

Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning

作者:

Instruction-tuned language models can answer the same causal-reasoning question differently after its English variable names are replaced by type-preserving placeholders, although the structural causal model and the gold answer are unchanged. We ask whether this lexical gap reflects information loss in the placeholder view or a misaligned read-out from a representation that still carries answer-relevant content. Vernier uses a paired-view weight update as an instrument and then inspects the mechanism left after the gap closes. In the working regimes, the evidence favours representational misalignment. A variable-name probe becomes more accurate on the placeholder view, and activation patching on Qwen-7B, Qwen-14B, and Llama-3.1-8B shows that the decision-token representation can transfer answer identity between views. The update that realigns the views is counterfactual augmentation over original and placeholder prompts, while the answer-subspace KL mainly sharpens intermediate answer-belief agreement. Success is bounded by model family, scale, and task. CRASS transfer is reliable across Qwen scales and Llama, e-CARE remains weak, and preliminary non-causal rename tasks show a similar qualitative pattern.

08.
arXiv (CS.AI) 2026-06-12

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

arXiv:2606.12425v1 Announce Type: cross Abstract: Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.

09.
arXiv (CS.CL) 2026-06-19

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes. We study this as a deployment allocation problem rather than a new-verifier problem. We introduce \sevra, Selective Verification for Reasoning Allocation, a serving-layer controller that decides whether to preserve a frozen solver's initial answer or invoke active verification. Using a frozen Qwen3-4B solver, we log intervention outcomes and train recoverability-aware gates from serving-visible attempt state. On \mathfive, selective verification reaches 76.3\% accuracy, compared with 75.5\% for always verifying, while reducing post-generation tokens by 26.8\% and harmful flips from 2.2\% to 1.0\%. However, an 8,192-token initial solve reaches 76.0\% accuracy with 28\% fewer total model tokens, showing that selective recovery is useful but not the best tested cost frontier. In frozen transfer to \gsm, the selective policy verifies only 3.0\% of examples, improves accuracy from 93.4\% to 94.5\%, and reduces verification tokens by 91.2\% relative to always verifying; again, a longer initial solve matches its accuracy with fewer realized tokens. On CommonsenseQA, always-on verification hurts, while Self-Consistency@5 improves accuracy at about five times the realized token cost. The resulting deployment rule is: tune the initial budget first, then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter.

10.
arXiv (CS.CV) 2026-06-16

HAFMat: Hybrid Priors Guided Adaptive Fusion for Single-Image Human Material Estimation

Physically based rendering (PBR) material estimation is a fundamental appearance decomposition task with broad applications in virtual content creation, relighting, and digital human rendering. However, estimating PBR materials from a single human image remains highly ill-posed, since illumination, geometry, and reflectance are heavily entangled in the observed appearance. To mitigate this ambiguity, we propose HAFMat, a hybrid-prior-guided framework for single-image human material estimation. Our method introduces guidance maps that encode complementary cues, including appearance, body geometry, structure, and prior material predictions from pre-trained models. A key observation is that these guidance cues are heterogeneous: some cues mainly provide texture-level constraints, while others convey higher-level semantic information. To exploit this property, we design a Multi-layer Adaptive Feature Fusion Mechanism, which adaptively fuses guidance features with decoder features at different stages. This design enables texture-dominant and semantic-dominant cues to guide material decoding at appropriate levels, leading to more accurate and physically plausible material estimation. Extensive experiments on both synthetic and real data demonstrate that our method achieves state-of-the-art performance in material estimation and downstream relighting.

11.
arXiv (CS.AI) 2026-06-24

Sensing Intelligence as a Trainable Metamaterial Property

arXiv:2605.23967v2 Announce Type: replace-cross Abstract: In biological systems, sensing is not performed by the brain alone: the body deforms, vibrates, and filters external stimuli before they are transduced into neural signals. In engineered systems, this processing burden is placed largely on electronics and computation, while the mechanical body is usually designed only for strength and stability. Here, we present sensing intelligence as a trainable property of the body. We show that the geometry of a metamaterial can be optimized to reshape external stimuli into internal signals that are easier for a neural network to interpret. Rather than hand-designing this physical preprocessing, we let the neural network train its own body for sensing by backpropagating the sensing loss to the body's design parameters through differentiable simulation. Across numerical and experimental sensing scenarios, the optimized body improves sensing accuracy by up to fivefold or reduces the number of required electronic sensors by nearly an order of magnitude.

12.
arXiv (CS.CV) 2026-06-18

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-$\lambda$ Mix Interaction ($L\lambda MI$) block. Comprising Local-$\lambda$ and Interactive-$\lambda$ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a $>15\times$ acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

13.
arXiv (CS.AI) 2026-06-12

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

arXiv:2606.13148v1 Announce Type: new Abstract: Climate and environmental decision-making increasingly requires reasoning across heterogeneous inputs, including gridded physical data, satellite imagery, geospatial context, and simulator outputs. Weather and climate foundation models can forecast well, but do not reason interactively in language, while large language models (LLMs) reason in language but cannot operate directly on high-dimensional Earth-system data. As a result, real scientific workflows in Earth-science remain underserved. We introduce TerraBench, a benchmark for grounded Earth-science reasoning, built on TerraAgent, a ReAct-style executable framework that interleaves reasoning, tool calls, and observations to couple LLM planning with scientific tools for environmental retrieval, geospatial processing, simulation, and artifact-backed computation. TerraBench unifies analysis of Earth observation imagery, gridded data, GIS reasoning and simulation in a single executable interface, whereas prior benchmarks isolate these capabilities into narrow individual tasks. It is also the first in this space to pair process-level tool-use metrics with tolerance-aware numeric scoring. The benchmark comprises 403 extensive agentic tasks across three tracks (Fundamentals, Simulator-Grounded, and Document-Grounded Verification) and eight application domains with 24,500 verified execution steps. These results indicate that reliable Earth-science agents must go beyond tool access to coordinate heterogeneous workflows, parameterize tools precisely, and preserve artifact provenance.

14.
arXiv (CS.AI) 2026-06-16

Surprise-Guided MergeSort: Budget-Efficient Human-in-the-Loop Ranking via Adaptive Comparison Scheduling

arXiv:2606.15623v1 Announce Type: cross Abstract: Pairwise comparison is the gold standard for subjective ranking tasks; however, exhaustive annotation requires a massive number of human comparisons ($O(n^2)$). While sorting-based methods have reduced this burden to $O(n\log n)$, they still require expensive human judgment for every single comparison. To further improve annotation efficiency, we propose leveraging a Vision-Language Model (VLM) not as an annotator replacement, but as a question prioritizer to identify which comparisons genuinely require human judgment. The proposed Surprise-Guided MergeSort (SGS) framework achieves this through three integrated components: (1) a bottom-up MergeSort scheduler that structures comparisons and exploits transitivity, (2) a composite Surprise Scorer – combining position-bias-cancelled VLM confidence, Elo gap, and vote entropy – to quantify comparison ambiguity, and (3) an adaptive budget allocator that routes high-surprise pairs to humans while automating low-surprise pairs via transitivity inference. Validation was conducted on six diverse benchmarks spanning text similarity (STS-B, BIOSSES, SICKR-STS) and image quality assessment (KonIQ-10k, TID2013, LIVE Challenge). SGS effectively identified and skipped up to 535 non-informative comparisons per session. Consequently, it achieved Kendall's $\tau{\times}100$ improvements of $+6$ to $+12$ over Active Elo under the same total budget. These results demonstrate that combining VLM-guided surprise metrics with algorithmic sorting provides a generally consistent accuracy-efficiency trade-off across diverse domains.

15.
arXiv (CS.LG) 2026-06-18

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

arXiv:2606.18390v1 Announce Type: new Abstract: Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

16.
arXiv (CS.CL) 2026-06-19

Benchmarking Agentic Review Systems

A new class of agentic review systems are emerging as a remedy to the pressure placed on peer review systems by AI-assisted research, but it is unclear how they should be evaluated. We evaluate two open-source systems (OpenAIReview and coarse), one proprietary system (Reviewer3), and a zero-shot baseline, across six LLMs spanning frontier and efficient models. First, we study whether AI reviews on ICLR/NeurIPS papers track with papers' quality as approximated by external signals such as citations and acceptance decisions. Every system performs above chance in pairwise accuracy, and the best is OpenAIReview + GPT-5.5 at 83.0%. Second, to test whether systems can catch errors with known ground truth, we construct a perturbation benchmark that injects four categories of errors into papers across eight arXiv subject classes and measure detection recall. The strongest configuration (OpenAIReview + GPT-5.5) catches 71.6% of injected errors, leaving substantial room for improvement. The union of detections across six models reaches 83.3% recall, suggesting different models detect different errors and better harness design can potentially increase performance. Beyond these benchmarks, we study a public deployment of OpenAIReview with real users. Votes on its comments skew positive at 1.44 to 1, and the most common complaints are about false positives and minor nitpicks. Together, by evaluating full review systems backed by state-of-the-art models on real research papers, we show that while AI reviews still have room for improvement, they can already track human quality judgments well, catch important errors, and earn positive feedback from real users.

17.
arXiv (CS.AI) 2026-06-11

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

arXiv:2605.06100v2 Announce Type: replace-cross Abstract: Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods learn measurement weighting through the solver, but they still use position-only objectives. As a result, the position estimate may improve while the reported covariance remains too small, too large, or incorrectly oriented. We propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. A Weighting Generation Network (WGN) predicts per-satellite reliability weights, and a differentiable Gauss-Newton solver maps these weights to a position estimate and a Hessian-derived posterior covariance. We use proper scoring rules to supervise the East-North predictive distribution end to end. We study negative log-likelihood (NLL), the energy score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in covariance credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes; on the deep-urban scene, both the mean horizontal error and the 95th-percentile error improve. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77 m to 11.68 m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05 relative to DFGO (MAE). Case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

18.
arXiv (CS.LG) 2026-06-11

Efficient Time Series Clustering from Multiscale Reservoir Dynamics with Granular-Ball Anchoring Graph Optimization

arXiv:2606.12077v1 Announce Type: new Abstract: Time-series clustering remains challenging due to the inherent trade-off between clustering effectiveness and computational efficiency. Similarity-based methods often suffer from quadratic complexity caused by pairwise distance computations, while deep learning-based approaches typically rely on costly iterative training and a large number of trainable parameters. In this paper, we propose MSRGC-Net, an efficient time-series clustering framework that integrates multiscale reservoir computing, granular-ball-based anchoring graph construction, and consensus learning. MSRGC-Net adopts a training-free reservoir computing paradigm to extract multiscale temporal representations from raw time series without backpropagation, significantly reducing computational overhead. To capture the intrinsic structure of the resulting representations, granular-ball computing is employed to adaptively model data distributions via density-consistent regions, yielding compact and robust anchor graph representations. Furthermore, a consensus-based anchoring graph optimization strategy is introduced to effectively align multiscale reservoir representations and integrate complementary information across temporal scales. Extensive experiments on widely used univariate and multivariate benchmark datasets demonstrate that MSRGC-Net consistently outperforms state-of-the-art methods in clustering performance while maintaining superior computational efficiency.

19.
arXiv (CS.AI) 2026-06-24

Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

arXiv:2606.24047v1 Announce Type: new Abstract: One of the significant mental health issues affecting female sex workers (FSWs) is mental disorders, especially depression. Exposure to violence, stigma, and economic hardship further increases their psychological risk. Current machine learning (ML) models are typically ineffective at capturing the high-dimensional and complex risk patterns that exist in this marginalized group. This paper suggests a hybrid predictive model that merges an ensemble feature selection strategy using ANOVA and mutual information and Harris Hawks optimization-tuned logistic regression and represents a new application of swarm intelligence to predict mental health in vulnerable groups. The explainable AI (XAI) methods can be used to understand the factors of trauma associated with model predictions. When applied to a group of 3,005 FSWs, it can be seen that the proposed model is more effective than traditional classifiers, with an accuracy of 95.78%, an F1 score of 95.77%, and an AUC of 0.96, and identifying post-traumatic stress, client-related violence, and occupational factors as major contributors to depression. This work bridges the gaps between conventional and ML approaches to develop an XAI tool that enables vulnerable groups to receive early assistance, evidence-based targeted psychosocial care, and health planning.

20.
medRxiv (Medicine) 2026-06-24

Association of antiseizure medication with lower amyloid and tau burden

Network hyperexcitability is increasingly implicated in prodromal Alzheimer's disease and may be suppressed by antiseizure medications (ASMs). ASMs are widely prescribed to older adults, yet whether their use relates to Alzheimer's-disease biomarkers at the population level is unknown. In 52,537 participants in the National Alzheimer's Coordinating Center (NACC) study, we compared cerebrospinal-fluid biomarkers, amyloid and tau positron emission tomography (PET) between ASM users and non-users using inverse-probability-of-treatment weighting with gradient-boosted propensity scores. ASM users showed directionally lower amyloid across multiple brain regions, amplifying markedly in APOE epsilon 4 carriers (Centiloid beta = -25.7, p = 0.007). All three temporal tau-PET composites were significantly lower in users (META-temporal beta = -0.05, p = 0.01). The amyloid finding replicated independently in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (Centiloid beta = -8.6, p = 0.01), whereas four comparator drug classes showed no amyloid signal. These convergent observational findings provide a quantitative framework for evaluating ASMs as candidate disease-modifying agents in Alzheimer's disease.

21.
arXiv (quant-ph) 2026-06-19

Interaction geometry and ground-state properties of sparse quantum lattice models

arXiv:2606.20387v1 Announce Type: new Abstract: We investigate how interaction geometry shapes the low-energy phases of sparse tunable long-range quantum models. We focus on a class of graphs whose degree grows logarithmically with system size, and show how symmetry and frustration in graph connectivity can drive, suppress, and reshape ground-state phase transitions. The central examples are power-of-$p$ graphs, where even and odd values of $p$ exhibit qualitatively distinct behaviour: even-$p$ graphs inherit the rich phase structure of the power-of-two model, while odd-$p$ graphs are governed by geometric frustration. Fibonacci graphs provide a contrasting case, lacking the discrete self-similarity of the power-of-$p$ family but exhibiting a direct geometric mapping between the short- and long-range limits. Across our models, we find that phase structure and criticality are governed by the same effective-geometry principle, unifying our framework for experimentally motivated long-range quantum systems.

22.
arXiv (CS.CL) 2026-06-11

Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness

End-to-end task-success is the dominant way to evaluate LLM agents, but one aggregate number tells you that an agent regressed, not where. We present layer-isolated evaluation: a deployed ordering agent is decomposed into a fixed taxonomy of layers (ontology, intent, routing, decomposition, escalation, safety, memory, and cross-cutting envelope/defense), each exercised by its own assertion slice in a deterministic, no-LLM "pure" mode. The pure suite (238 cases across 23 slices; 225 run in 2.39 s, ~10 ms/case) runs in CI on every change against a locked per-slice baseline. We validate by controlled regression injection, degrading one layer at a time across seven non-safety layers. The effect we did not design in is masking: the aggregate pass-rate barely moves (-1.7 to -5.9 pp for six local regressions), while the matching slice craters (-25 to -91 pp). A layer's slice reacting to its own fault is partly by construction; the measured results are (i) the aggregate masking and (ii) that damage stays off the other slices: the injected layer's slice is the single worst-hit in 5 of 7 cases and top-3 in 7 of 7 (mean rank 1.29 of 19). Localization replicates on a second, structurally different tenant (Starbucks SG): all seven matching slices crater, so it is not a single-catalog artifact. We position it as a concrete, deterministic instantiation of the component-level evaluation EDDOps prescribes but leaves unimplemented, with CheckList as ancestor and as the deterministic mirror image of whole-workflow stochastic mutation testing. Our contributions: (a) a fully decomposed, sub-second, no-LLM per-layer harness for a production agent, (b) a coverage-honesty test-adequacy criterion that refuses to score an unexercised layer, and (c) the regression-injection demonstration that per-slice baseline-locked gates localize regressions an aggregate metric masks.

24.
arXiv (CS.LG) 2026-06-11

Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

arXiv:2606.11474v1 Announce Type: new Abstract: In this paper, we study Mahalanobis-guided latent out-of-distribution (OOD) detection for test-time RL controller switching in nonlinear time-varying systems. RL controllers can quickly control high-dimensional systems within the training distribution, but their performance can degrade when time-varying dynamics produce unseen observations. We consider a combined ES–DRL controller, where RL provides fast in-distribution actions and bounded extremum seeking (ES) provides robust model-independent control under OOD operation. The key challenge is deciding when to switch. We train a variational autoencoder (VAE) on in-distribution beam-profile observations and use Mahalanobis distance in the VAE latent space to detect OOD beam profiles at test time. This OOD decision sets a binary switch that selects either the RL controller or the ES controller. We evaluate the approach in safety-critical particle accelerator control. In this setting, spatial magnet motion creates OOD beam profiles that were not seen during RL training. Visualization of the VAE latent space shows that the proposed method identifies this OOD scenario and provides an interpretable signal for switching between RL and ES in the combined controller.

25.
arXiv (CS.AI) 2026-06-19

Neural Additive and Basis Models with Feature Selection and Interactions

arXiv:2606.19850v1 Announce Type: cross Abstract: Deep neural networks (DNNs) exhibit attractive performance in various fields but often suffer from low interpretability. The neural additive model (NAM) and its variant called the neural basis model (NBM) use neural networks (NNs) as nonlinear shape functions in generalized additive models (GAMs). Both models are highly interpretable and exhibit good performance and flexibility for NN training. NAM and NBM can provide and visualize the contribution of each feature to the prediction owing to GAM-based architectures. However, when using two-input NNs to consider feature interactions or when applying them to high-dimensional datasets, training NAM and NBM becomes intractable due to the increase in the computational resources required. This paper proposes incorporating the feature selection mechanism into NAM and NBM to resolve computational bottlenecks. We introduce the feature selection layer in both models and update the selection weights during training. Our method is simple and can reduce computational costs and model sizes compared to vanilla NAM and NBM. In addition, it enables us to use two-input NNs even in high-dimensional datasets and capture feature interactions. We demonstrate that the proposed models are computationally efficient compared to vanilla NAM and NBM, and they exhibit better or comparable performance with state-of-the-art GAMs.