Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-11

Precomputing Multi-Agent Path Replanning Using Temporal Flexibility

arXiv:2601.04884v3 Announce Type: replace Abstract: Executing a multi-agent plan can be challenging when an agent is delayed, because this typically creates conflicts with other agents. So, we need to quickly find a new safe plan. Replanning only the delayed agent often does not yield an efficient plan, and sometimes cannot even yield a feasible one. On the other hand, replanning other agents may lead to a cascade of changes and delays, and it is computationally expensive. We show how to efficiently replan a single delayed agent by tracking and using the temporal flexibility of other agents while avoiding cascading delays. This flexibility is the maximum delay that the agent can take without changing the order with agents other than the initially delayed agent, or further delaying other agents. Our algorithm, FlexSIPP, precomputes all possible plans for the delayed agent and returns the changes to the other agents within the given scenario. We demonstrate our method in a real-world case study of replanning trains in the densely-used Dutch railway network and in the MovingAI MAPF benchmark set. Our experiments show that FlexSIPP provides effective solutions relevant to real-world adjustments, and within a reasonable timeframe.

02.
arXiv (quant-ph) 2026-06-16

Quantum Information Geometry of Multicomponent Superconducting Fluctuation Transport

arXiv:2606.15928v1 Announce Type: cross Abstract: Quantum geometry underlies many electronic responses, but its transport signatures have so far been established mainly for pure single-particle Bloch states. Whether collective many-body fluctuations possess a measurable quantum geometry remains largely unexplored. Here we show that superconducting fluctuation transport provides a direct probe of quantum information geometry in collective many-body matter. Starting from a multicomponent time-dependent Ginzburg-Landau theory in the Gaussian fluctuation regime, we identify the equilibrium density matrix of fluctuating Cooper pairs as the static pair propagator, which defines a positive mixed-state manifold in momentum space. The geometry of this manifold is directly measurable through paraconductivity: the longitudinal paraconductivity is governed by the quantum Fisher information of superconducting fluctuation modes, while the fluctuational anomalous Hall effect is governed by the mean Uhlmann curvature, the mixed-state counterpart of Berry curvature. This correspondence further yields geometric bounds between these two transport components, with no direct analogue in normal electronic transport. Applied to chiral superconducting fluctuations in quarter-metal systems motivated by rhombohedral multilayer graphene, a symmetry-allowed Lifshitz invariant generates finite mean Uhlmann curvature and logarithmically enhances the anomalous Hall conductivity above the critical temperature. Our results establish collective superconducting fluctuations as an experimentally accessible transport probe of mixed-state quantum information geometry.

03.
arXiv (CS.CL) 2026-06-15

Creative Integration: A Decidable Criterion of Creativity

"Integrative" solutions are widely praised but rarely defined: we lack an operational way to tell a genuine integration – one that makes the world cheaper to describe – from a tidy re-description. Building on the lineage that treats creativity and intelligence as compression, we give such a criterion for creative integration (CI): the resolution of a real conflict between A and B is CI if and only if, under a fixed description language, the description length strictly shrinks (C = L_pre/L_post > 1), with the reduction located in the conflict itself. We make the judgment decidable through four binary, conjunctive gates, and we fix its extension through a taxonomy of pseudo-integration that names and rejects the look-alikes. We back the criterion with a curated, multi-domain corpus and – crucially – validate it not by human inter-rater agreement but by four falsifiable tests it could fail: an independent computational check, discrimination against hard negatives, out-of-sample prediction, and description-language robustness; all pass with margin. The contribution is not "creativity is compression" but its decidability, discrimination, and corpus: on this account, what makes a move genuinely creative – rather than merely novel – is that it compresses a conflict, with novelty and value as downstream symptoms; whether all creativity is so constituted we state as an explicit conjecture. We claim only the sign of C-1; we judge, not generate. The result is a citable primitive for a broader program.

04.
arXiv (CS.AI) 2026-06-19

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

arXiv:2606.19528v1 Announce Type: cross Abstract: Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory during fine-tuning often exceeds device limits, especially for models with billions of parameters and long-context training data. This paper introduces a suite of complementary techniques to reduce memory footprint without sacrificing model quality: (1) base model quantization with on-the-fly dequantization, (2) memory-efficient checkpointing combining selective activation caching and disk offloading, (3) softmax approximation using semantically relevant token subsets, and (4) logits masking. Experiments on Llama-3.2 3B and Qwen-2.5 3B demonstrate up to $26\times$ and $28\times$ reduction in peak memory, enabling fine-tuning on resource-constrained devices.

05.
arXiv (CS.LG) 2026-06-16

Imbalanced Classification under Capacity Constraints

arXiv:2605.03289v2 Announce Type: replace-cross Abstract: Detecting observations from a minority class under severe class imbalance is a central challenge in applications such as fraud detection, medical screening, and industrial quality control. In these settings, each positive prediction triggers a costly follow-up action, an MRI scan, a transaction audit, whose execution is subject to real operational constraints. This paper proposes a formal classification framework under capacity constraints: given a user-defined bound limit $b$ on the proportion of observations that can be labeled as belonging to the minority class, the goal is to find the classifier that maximizes sensitivity on that class. We characterize the optimal classifier under this constraint and establish its equivalence with the classical Bayes classifier under a reweighting of the prior probabilities. We also introduce a capacity-adjusted performance metric $M$ that accounts for the effective detection rate when the capacity constraint is binding. The framework is implemented on top of standard learning methods, k-NN, SVM, random forests, and neural networks, and statistical consistency is established for each. We further show that these methods reduce to post-hoc thresholding when no hyperparameters are oriented toward the capacity-constrained objective, and introduce a capacity-aware support vector machine that exploits the constraint during training and achieves the strongest empirical performance. Experiments on the Taiwanese credit card default dataset confirm that capacity-constrained classifiers substantially outperform both classical approaches and SMOTE under high imbalance regimes. The framework extends naturally to multiclass settings and online environments.

06.
arXiv (CS.LG) 2026-06-15

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

arXiv:2602.12379v2 Announce Type: replace Abstract: Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task transformer with a covariate simulator head for auxiliary supervision, regularizing representation learning, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

07.
arXiv (CS.AI) 2026-06-16

PAL-Bench: Evidence-Grounded Profile Reconstruction from Longitudinal Personal Albums

arXiv:2606.16175v1 Announce Type: new Abstract: Longitudinal personal albums are weak-schema multimodal databases: noisy perceptual records whose key facts require joins across faces, text, timestamps, locations, and repeated events. Existing visual, video, document, and lifelog benchmarks test sub-problems, but not album-scale profile reconstruction with social identity binding and evidence citation. Benchmarking this task is difficult because the ground truth needed for evaluation–owner profiles, social graphs, face-name maps, and evidence provenance–is private state that real albums cannot safely release. We introduce PAL-Bench, a controlled benchmark for evidence-grounded reconstruction under a public-record contract. Its Evidence Compiler builds latent private worlds, programs target-level evidence paths, renders album pixels, re-measures them through perception pipelines, and exports audited public/private views. Agents receive only perception-derived public records; targets, identifier maps, and evidence paths remain hidden. PAL-Bench contains 50 synthetic users, 36,659 public photo records, and 2,799 targets over owner facts, identities, and relations. A privacy-preserving audit with 10 participants confirms that PAL-Bench evidence structures match real private albums, though equivalent releases remain privacy-prohibitive. Across seven systems and two compute-matched diagnostics, a seven-metric protocol reveals a gap between plausible profile summarization and faithful social reconstruction: systems recover some owner facts but struggle with recurring identities and evidence citation. PAL-TRACE, a reference framework that freezes identity bindings before owner-fact mining, performs best but leaves hard identity resolution far from solved. PAL-Bench provides a testbed for perceptual entity resolution, multimodal data integration, temporal evidence aggregation, and provenance-aware structured prediction.

08.
arXiv (CS.AI) 2026-06-11

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

arXiv:2606.11339v1 Announce Type: cross Abstract: We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI), a constant step-size yields linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity, while a diminishing step-size achieves O(1/k) convergence without shared-minimizer assumptions. Under Polyak-Lojasiewicz (PL) inequality, we obtain linear-to-neighborhood convergence in the same stochastic quantized setting. Our results match the best-known centralized stochastic rates in oracle complexity, and are supported by experiments demonstrating the predicted tradeoffs between quantization level, step-size choice, and graph structure.

09.
medRxiv (Medicine) 2026-06-16

Re-evaluating the Cross-Sectional Prevalence of Severe Age-Related Hearing Loss Using Extreme Value Statistics

作者:

Standard demographic models of age-related hearing loss (presbycusis) predominantly utilize symmetric functions, such as log-normal distributions for age-binned thresholds and 4-parameter logistic curves for prevalence estimates. While these models capture early-to-moderate degradation effectively, they structurally struggle to characterize the heavy tails associated with severe clinical impairment. In this study, we present a statistical critique using a secondary analysis of the historical Medical Research Council (MRC) National Study of Hearing (1980-1986) dataset. By applying Generalized Extreme Value (GEV) distribution theory, we demonstrate that as severity increases, the underlying statistical geometry of hearing loss shifts. The asymmetric, heavy-tailed GEV distribution provides a parsimonious description of severe impairment, requiring fewer parameters than standard symmetric models. However, we explicitly acknowledge that utilizing static population data to infer progression introduces an ecological fallacy. Furthermore, the dataset's historical nature embeds unquantified generational cohort effects. We conclude that while extreme value statistics offer a compelling mathematical framework for modeling the variance of severe presbycusis, true longitudinal datasets are required to isolate physiological degradation from historical cohort variance.

10.
arXiv (CS.CV) 2026-06-18

Learning to Distort: Weakly-Supervised Image Quality Transfer for Prostate DWI Correction

Single-shot echo-planar prostate diffusion-weighted imaging (DWI) is frequently complicated by geometric distortions, which impact the ability to derive reliable diagnoses from such images. Developing automated correction methods is challenged by the absence of paired distorted and undistorted clinical scans. In this paper, we first propose a novel weakly-supervised image quality transfer (IQT) framework from undistorted to distorted images that utilizes image quality assessment (IQA) signals to supervise the transfer process. Unlike traditional methods that require expensive, voxel-wise paired data or resort to developing unpaired algorithms, our approach utilizes image-level quality labels (here, distorted vs. undistorted) to establish latent quality prototypes within a pre-trained feature space. Recognizing that simulating realistic distortions is more reliable than direct unpaired correction, we describe a weakly-supervised prototype flow matching algorithm to explicitly regularize generative trajectories towards distorted prototypes, producing realistic susceptibility artifacts that mimic clinical degradations. By synthesizing these realistic pairs, we enable a second IQT model to be trained in the forward direction for distortion correction. Experimental results demonstrate that our generated images successfully mimic the diagnostic interference of real-world artifacts, which leads to more capable distortion correction IQT models. In addition to qualitative comparisons, we also conduct exhaustive quantitative evaluations that compare our approach with existing unpaired approaches (e.g., CycleGAN, UNIT-DDPM, and OT-FM) - as either forward or reverse alternatives - by assessing clinical downstream task performance in PI-RADS and Gleason score classification, using both in-distribution and external data sets.

11.
medRxiv (Medicine) 2026-06-11

Incremental costs of transitioning from four to eight WHO-recommended antenatal care visits in Uganda: A costing analysis from a societal perspective

Background In 2016, the World Health Organization revised its antenatal care (ANC) recommendation from four to eight visits. For low- and middle-income countries like Uganda, where achieving even four visits remains a challenge, this transition has significant cost implications for both the health system and households. This study estimated the incremental costs of adopting the eight-visit model from a societal perspective. Methods The study was conducted in six government health facilities in southwestern Uganda. A micro-costing approach estimated health facility costs (personnel, equipment, consumables, and overhead). Costs incurred at patients end (transport, ultrasound, medical expenses, and time) were collected from 785 women using a questionnaire, with all costs in 2025 USD. Results For an average of 4.3 visits, total cost per woman was $100.1: facility costs $43.7 (43.7%), and patient costs $56.4 (56.3%). Transitioning to eight visits would increase total cost by $57.8 (57.8%), of which $36.4 (63.0%) would fall on households, equivalent to 68.8% of average monthly household income. Total costs would rise by 55.4% ($115.5 to $179.5) at Health Center IVs and 64.3% ($102.3 to $168.1) at Health Center IIIs, with facility costs up 43.4% and 62.9% and patient costs up 61.2% and 65.7%, respectively. Conclusion Transitioning to eight ANC visits would impose a large financial burden on households, with the incremental patient cost equivalent to more than two-thirds of average monthly household income. Equitable implementation requires improving availability of medicines and diagnostics, subsidizing transport, exploring telemedicine or community-based models, and improving efficiency at lower-tier health centers.

12.
arXiv (CS.CL) 2026-06-15

Implicit Reasoning for Large Language Model-based Generative Recommendation

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines that ground SIDs and elicit explicit rationales, but offer limited insight into when and why each stage is necessary. In this work, we systematically decompose explicit reasoning training pipelines for LLM-based GR, revealing three key limitations: weakened world-knowledge verbalization, misalignment between SID and natural-language token embedding spaces, and sensitivity to rationale quality, all of which hurt explicit reasoning performance. To circumvent these issues, we propose PauseRec, a lightweight implicit reasoning paradigm tailored for GR. PauseRec is exceptionally practical, avoiding costly reasoning trace acquisition and reasoning alignment training, leading to a multitude of benefits: (1) it outperforms standard explicit CoT methods by up to 6.22%, (2) it reduces training cost by up to 65% GPU hours, and (3) it speeds up inference by up to 71.3%. These results position PauseRec as a lightweight alternative to explicit rationale generation, enabling more effective and efficient LLM-based GR.

13.
arXiv (CS.CL) 2026-06-16

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals – e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals – including past, future, and mixed contexts – within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.

14.
arXiv (CS.LG) 2026-06-15

Diffusion Policy Optimization without Drifting Apart

arXiv:2606.13795v1 Announce Type: new Abstract: RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose DiPOD, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

15.
arXiv (CS.CV) 2026-06-16

HSQ-VLM: A Novel Spatially-Constrained Quadrant Segmentation VLM Model for Explainability in Diabetic Retinopathy

Diabetic Retinopathy (DR) is an aggressive retinal disease and a leading cause of global blindness, yet its clinical management is currently hindered by the black-box nature of diagnostic AI. While deep learning models achieve high classification accuracy, there is a critical lack of explainability methods capable of detailing the exact anatomical landmarks and lesion distributions that lead to a clinical decision for DR. Therefore, we propose HSQ-VLM, a novel quadrant segmentation pipeline on fundus images that utilizes a Landmark-Anchored Cartesian Cross-Attention mechanism to unify visual feature extraction with structured clinical reasoning. Unlike traditional methods that rely on arbitrary image partitioning, our pipeline implements 4-quadrant Topological Latent Partitioning (TLP) to dynamically align retinal features with a fovea-centered coordinate system. This allows the Vision-Language Model to generate natural language reports that quantify pathology with anatomical precision. On a dataset of 3,500 high-resolution fundus images, this innovative methodology achieved a lesion detection sensitivity of 99.6% for hemorrhages and 96.4% for microaneurysms, while demonstrating a significant reduction in boundary-ambiguity errors compared to standard segmentation baselines.

16.
arXiv (CS.AI) 2026-06-17

Offline Preference-Based Trajectory Evaluation

arXiv:2606.17541v1 Announce Type: cross Abstract: Offline evaluation of agentic systems often collapses trajectories to terminal success, discarding information about partial progress and inducing widespread ties, creating substantial statistical inefficiency by reducing effective sample size and weakening the ability to distinguish systems. We propose preference-based trajectory evaluation, which compares trajectories directly through temporal preferences over progress and time-to-return profiles. We find that, across diverse agentic and interactive benchmarks, standard success-based metrics produce tied comparisons on roughly 75% of instances, whereas trajectory-aware preferences reduce ties to roughly 35%, improving discriminative power, ranking stability, and data efficiency. Our results suggest that benchmark saturation, often attributed to poor data collection or problem difficulty, may also be explained by the choice of evaluation measure.

17.
arXiv (CS.AI) 2026-06-12

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

arXiv:2606.12808v1 Announce Type: cross Abstract: Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

18.
arXiv (CS.CV) 2026-06-17

Bridging Spatial And Frequency Views For Disaster Assessment: Benefits And Limitations

Rapid assessment of building damage from satellite imagery is essential for effective disaster response and recovery. While most deep learning methods rely on spatial-domain features, frequency-domain representations can capture complementary structural cues such as debris patterns and collapse-induced textures. This study presents a controlled comparison of spatial-domain, frequency-domain, and dual-domain deep learning approaches for multi-class building damage classification using post-disaster imagery from the xView2 (xBD) dataset. To ensure fairness, all models are built on an EfficientNet-B0 backbone and trained under identical settings, differing only in their input representations and fusion strategies. Performance is evaluated using accuracy, macro F1-score, per-class metrics, and confusion matrices. Results show that dual-domain models provide measurable improvements over single-domain approaches. The dual spatial configuration achieves the highest test accuracy (0.4688) and lowest loss, while the spatial-only model attains the best macro F1-score (0.4254), indicating more balanced class performance. In contrast, frequency-only models perform worst and exhibit overfitting, suggesting limited generalization. Despite these gains, all models struggle to detect subtle damage levels, particularly the Minor class, due to class imbalance and fine-grained visual ambiguity. While dual-domain approaches improve detection of severe damage, challenges remain. These findings highlight the benefits and limitations of hybrid representations and motivate future work on data balancing, advanced fusion, and regularization.

19.
arXiv (CS.AI) 2026-06-17

CausalT5k: Diagnosing Refusal and Failure Modes in Trustworthy Causal Reasoning Across Causal Rungs

arXiv:2602.08939v2 Announce Type: replace Abstract: Large language models increasingly produce fluent causal explanations, yet they often fail in ways aggregate accuracy cannot diagnose: confusing association with intervention, abandoning correct judgments under pressure, over-refusing valid claims, or answering when evidence is underdetermined. We introduce CTK, a diagnostic benchmark of 5,147 cases and growing, across 10 domains and all three levels of Pearl's Ladder of Causation. Unlike benchmarks that only score correctness, CTK reveals why a model failed by annotating causal rung, trap type, pressure sensitivity, refusal quality, and Utility-Safety tradeoffs. Its Sheep/Wolf taxonomy separates valid causal designs from inferential traps; paired neutral/pressure variants measure sycophantic drift through Bad Flip Rate; and Wise Refusal fields test whether a model identifies the missing information needed before endorsing a claim. CTK exposes failure modes hidden by aggregate accuracy: the Skepticism Trap, Rung Collapse under scaling, pressure-induced drift, Detection-Correction gaps, and counterfactual error modes. Rather than prescribing a correction method, it provides the diagnostic substrate for studying causal-reasoning failure profiles.

20.
Nature (Science) 2026-06-17

Navigating a crowded developing brain leaves neurons with broken DNA

As neurons migrate to their final destinations in the forming brain, their DNA gets damaged. The brain has evolved a fix, but there can be lasting consequences if repair fails. As neurons migrate to their final destinations in the forming brain, their DNA gets damaged. The brain has evolved a fix, but there can be lasting consequences if repair fails.

21.
arXiv (CS.CL) 2026-06-12

More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touché ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8-4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.

22.
arXiv (CS.LG) 2026-06-17

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

arXiv:2602.03045v2 Announce Type: replace Abstract: Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural-language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. However, existing fine-tuned models tend to reactively follow the user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named as ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent based on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9% and lowering the invalidity ratio from 4.8% to 0.9%. Our code and datasets are made publicly available on https://github.com/BoYuanVisionary/Pro-CAD.

23.
arXiv (CS.AI) 2026-06-16

SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation

arXiv:2606.16244v1 Announce Type: cross Abstract: Large language models routinely generate code with exploitable security flaws. Prior literature attributes this limitation to a lack of security expertise, steering current defense mechanisms toward heavy fine-tuning or external knowledge retrieval, which introduces significant computational overhead and data bias through redundant code examples. Contrary to this view, we argue that pretraining corpora are already rich in security material. The bottleneck is activation: without an explicit and brief cue, statistical pressure toward common training-distribution patterns suppresses the model's safety-relevant representations. We present SPARK, an inference-time security harness that activates this latent knowledge without any retraining. The harness has two parts. Component~I retrieves a few of the relevant Common Weakness Enumeration (CWE) entries for each coding task and appends a short structured cue to the prompt; this alone is enough to surface the model's existing security representations. Component~II adds a precomputed token bias to the logits at every decoding step. We obtain the bias by projecting a safe-direction vector, the unit difference between the mean safe and mean unsafe last-layer hidden states, through the language model head. The bias is computed once offline; applying it costs a single vector addition per generated token. We evaluate SPARK on 9 open-source models across C++, Java, and Python, and compare with 7 baselines spanning fine-tuning and retrieval-augmented methods. SPARK matches or improves on the best baseline in every setting while preserving HumanEval utility. We further test Component~I in a black-box setting on 7 of today's strongest models, including Claude, DeepSeek, and GPT, demonstrating the bottleneck of insecure code generation and the improvements enabled by our method.

24.
arXiv (quant-ph) 2026-06-16

Decoherence-free algebras in quantum dynamics

arXiv:2403.12926v2 Announce Type: replace Abstract: In this Article we analyze the algebraic properties of the asymptotic dynamics of finite-dimensional open quantum systems in the Heisenberg picture. In particular, a natural product (Choi-Effros product) can be defined in the asymptotic regime. Motivated by this structure, we introduce a new space called the Choi-Effros decoherence-free algebra. Interestingly, this space is both a C*-algebra with respect to the composition product, and a B*-algebra with respect to the Choi-Effros product. Moreover, such space admits a direct-sum decomposition revealing a clear relationship with the attractor subspace of the dynamics. In particular, the equality between the attractor subspace and the Choi-Effros decoherence-free algebra is a necessary and sufficient condition for a faithful dynamics. Finally, we show how all the findings do not rely on complete positivity but on the much weaker Schwarz property.

25.
arXiv (quant-ph) 2026-06-19

Locally Gentle State Certification for High Dimensional Quantum Systems

arXiv:2602.04550v3 Announce Type: replace Abstract: Standard approaches to quantum statistical inference rely on measurements that induce a collapse of the wave function, effectively consuming the quantum state to extract information. In this work, we investigate the fundamental limits of locally-gentle quantum state certification, where the learning algorithm is constrained to perturb the state by at most $\alpha$ in trace norm, thereby allowing for the reuse of samples. We analyze the hypothesis testing problem of distinguishing whether an unknown state $\rho$ is equal to a reference $\rho_0$ or $\epsilon$-far from it. We derive the minimax sample complexity for this problem, quantifying the information-theoretic price of non-destructive measurements. Specifically, by constructing explicit measurement operators, we show that the constraint of $\alpha$-gentleness imposes a sample size penalty of $\frac{d}{\alpha^2}$, yielding a total sample complexity of $n = \Theta(\frac{d^3}{\epsilon^2 \alpha^2})$. Our results clarify the trade-off between information extraction and state disturbance, and highlight deep connections between physical measurement constraints and privacy mechanisms in quantum learning. Crucially, we find that the sample size penalty incurred by enforcing $\alpha$-gentleness scales linearly with the Hilbert-space dimension $d$ rather than the number of parameters $d^2-1$ typical for high-dimensional private estimation.