Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

RQUL-UIE: Revitalizing Quality-Unstable Labels for Underwater Image Enhancement via In-Dataset Self-Supervision

Underwater Image Enhancement (UIE) is essential for mitigating degradations caused by water medium. Although learning-based methods have advanced significantly, most rely on paired datasets with unstable label quality, which bottlenecks model performance. This paper proposes a diffusion-based, in-dataset self-supervised learning strategy designed to exploit the quality distribution of training labels. Specifically, we evaluate label quality via semantic perception embeddings from a pre-trained diffusion model in a training-free manner. These quality scores are subsequently quantized into noise-level indices, guiding a multi-step denoising process for level-wise supervision. This mechanism prevents low-quality labels from degrading the model while maximizing their utility during training. Furthermore, a Fourier-based refinement network is incorporated to explicitly reconstruct high-frequency components. Extensive evaluations demonstrate that our method consistently outperforms SOTA approaches in restoration quality. The code and pre-trained model will be available once accepted in link.

02.
arXiv (quant-ph) 2026-06-17

A Quantum Approach to Stochastic Optimization in Insurance Underwriting

arXiv:2605.01169v2 Announce Type: replace Abstract: The presence of stochastic elements in combinatorial optimization problems makes them particularly challenging, as such problems quickly become intractable for classical computers even at relatively small sizes. In this work, we propose a novel quantum-classical hybrid scheme for solving a class of stochastic optimization problems known as chance-constrained knapsack problems, in which item weights follow probability distributions and constraints may be violated within a specified risk tolerance. Our method employs knapsack-specific QAOA-based circuits to generate samples which, when combined with a new self-consistent classical recovery scheme introduced in this work, produce high-quality solutions. Experiments carried out on IBM Heron processors, using circuits with depths up to 177 and comprising 3443 gates acting on as many as 150 qubits, yield solutions that indicate performance comparable to classical optimization schemes. The proposed quantum-classical scheme paves the way to tackling such problems, with the potential to outperform approaches that rely solely on classical computation.

03.
arXiv (CS.CV) 2026-06-19

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using 62,876 CFPs from 44,501 unique participants from the UK Biobank, DL models were trained to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

04.
arXiv (CS.AI) 2026-06-16

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

05.
arXiv (CS.AI) 2026-06-15

A Two-Stage Statistical Framework for Evaluating Associative Interference in Large Language Models

arXiv:2606.14117v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly evaluated for bias using adaptations of human psychological paradigms, yet methodological limitations-particularly the conflation of refusal behavior with task performance-have hindered clear interpretation. Here, we adapt the Implicit Association Test (IAT) to a controlled, forced-choice framework and introduce a two-stage modeling approach that separates response compliance from task-consistent classification. Across three contemporary LLMs (Claude Sonnet-4, Gemini 2.5 Pro, and GPT-5), we evaluate associative interference, defined as reduced task-consistency in incongruent relative to congruent conditions. While compliance with the structured response format was uniformly high, interference effects varied substantially across models and domains. Claude Sonnet-4 exhibited strong interference in the Gender–Career domain (DeltaP = 0.086, 95% CrI [0.026, 0.173]) and smaller but credible effects in Gender–Science. Gemini 2.5 Pro showed attenuated interference, and GPT-5 exhibited minimal or no detectable interference across domains. These findings demonstrate that IAT-style associative asymmetries are not a universal property of LLMs, but instead depend on model-specific characteristics. By isolating interference from compliance and modeling item-level variability, this study provides a principled framework for evaluating structured response patterns in LLMs. The results highlight the importance of model-specific assessment and suggest that associative interference can be substantially mitigated in modern systems.

06.
arXiv (CS.CL) 2026-06-15

MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains unclear. Existing embodied and game-based benchmarks often compress interaction into short-horizon tasks or entangle success with domain-specific game mechanics. In this paper, we introduce MineExplorer benchmark for evaluating open-world exploration capabilities of MLLM agents in Minecraft. We first filter atomic tasks whose solutions rely heavily on Minecraft-specific knowledge to better reflect general open-world reasoning. Then we organize the benchmark around a ReAct-style capability formulation and compose atomic tasks into implicit multi-hop tasks. To further construct reliable instances, MineExplorer uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Human evaluation shows that the multi-agent synthesis workflow produces significantly more reliable instances than a single-agent baseline. Experiments with advanced MLLM agents show that open-world exploration remains challenging, as strong models can handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories. Further analysis finds that task difficulty tracks agent completion, and larger models or thinking modes do not consistently translate into better performance. Code and dataset are available at https://github.com/Jometeorie/MineExplorer.

07.
bioRxiv (Bioinfo) 2026-06-23

Learning interpretable structural similarity from tandem mass spectra for small molecule analog discovery

Analog discovery remains a central bottleneck in mass spectrometry-based untargeted metabolomics, as conventional spectral similarity scores poorly reflect molecular structure. We introduce SIMBA, a transformer-based model that infers two interpretable graph-based distances, maximum common edge subgraph and substructure edit distance, directly from tandem mass spectra. SIMBA consistently retrieves structurally closer analogs than existing methods, enabling structure-aware small molecule identification beyond exact spectral matching.

08.
arXiv (quant-ph) 2026-06-17

Microwave-free vector magnetometry and crystal orientation determination with Nitrogen-Vacancy centers using Bayesian inference

arXiv:2512.13835v2 Announce Type: replace Abstract: Nitrogen-vacancy (NV) centers in diamond provide a solid-state platform for quantum sensing. While optically detected magnetic resonance techniques offer high sensitivity, their reliance on microwaves introduces heating and stray electromagnetic fields that can perturb nearby samples. Optical approaches based on cross-relaxation between differently oriented NV centers remove this constraint but have so far required stringent alignment of the external field with crystallographic axes, restricting their practicality. Here we introduce a general framework for microwave-free vector magnetometry at near-zero field that leverages Bayesian inference to extract both the magnetic field vector and the NV orientation directly from photoluminescence maps. An analytical model of cross-relaxation resonances enables efficient inference under arbitrary field and orientation configurations, while naturally incorporating the discrete degeneracies of the NV symmetry. We experimentally demonstrate robust orientation determination and vector-field reconstruction, establishing a general route toward compact and alignment-free NV magnetometers for practical sensing applications.

09.
bioRxiv (Bioinfo) 2026-06-19

Nickel-Driven Dynamics of Urease in Sporosarcina pasteurii: Integrated Computational and Experimental Insights

Urease is a nickel-dependent enzyme that plays an important role in urea hydrolysis and in a process named as microbial-induced calcium carbonate precipitation (MICP), which is widely used in sustainable environmental biotechnology. Despite its ecological importance, urease powers Biogrout (biocementation), a promising green technology for soil stabilization and infrastructure repair. Yet, the relationship between nickel availability, enzyme activation, and bacterial fitness remains poorly understood. In this study, we reveal a striking dual effect of nickel on Sporosarcina pasteurii: while high Ni2+ concentrations strongly inhibit growth (IC50 {approx} 637.7 {micro}M), they simultaneously boost specific urease activity up to six-fold. This uncoupling between biomass and enzymatic efficiency highlights a previously overlooked adaptive strategy under metal stress. Using structural bioinformatics and molecular docking, we show that Ure1–the catalytic subunit–exhibits the strongest nickel affinity (-4.3 kcal{middle dot}mol-1), supported by highly conserved active-site residues, whereas accessory proteins UreE and UreG display moderate and weak binding, consistent with their roles in metal delivery and GTP-dependent maturation. In addition, microscopic observations confirmed that calcium carbonate precipitation was most pronounced at intermediate nickel concentrations (approximately 400-1000 {micro}M), whereas higher concentrations ([≥]1000-1300 {micro}M) led to reduced mineral formation due to loss viable cells. Taken together, these results indicates that nickel availability controls both urease activation and bacterial fitness, and that an optimal balance is required to maximize biomenerilization efficiency in environmental applications, particularly in biocementation technology.

10.
arXiv (CS.CL) 2026-06-16

ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech

Mental health industry faces growing concerns regarding hate speech directed at children's on social media, as exposure to such content can contribute to adverse psychological outcomes during critical stages of development. Current hate speech datasets and detection systems provide limited support for child-focused applications because they are primarily designed for adults and lack dedicated representations of age-specific characteristics associated with hate speech directed at children's. To address this gap, we introduce ChildGuard, a large-scale English dataset for child-targeted hate speech containing 351,877 annotated instances collected from X (formerly Twitter), Reddit, and YouTube. The dataset covers three age groups such as younger children's (under 11), pre-teens (11-12), and teens (13-17). ChildGuard contains two subsets such as a contextual subset (157K) and a lexical subset (194K). Evaluation using recent transformer-based models and LLMs achieves a best Macro-F1 of 82.07%, decreasing to 79.41%, 79.24%, 76.04%, and 74.88% on younger children's, contextual, implicit hate, and cross-subset settings, respectively.

11.
arXiv (CS.LG) 2026-06-19

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

arXiv:2606.19825v1 Announce Type: new Abstract: Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs–such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph–as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

12.
arXiv (CS.AI) 2026-06-11

JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization

arXiv:2606.11425v1 Announce Type: cross Abstract: Jailbreak attacks expose persistent safety weaknesses in large language models (LLMs), but existing stateless single-turn methods face a trade-off: hand-crafted prompts are expressive but static, while iterative prompt optimization can adapt but often relies on low-level mutations that require many target queries. We propose JailbreakOPT, a tool-assisted framework for improving iterative single-turn jailbreak prompt optimization. JailbreakOPT organizes diverse atomic jailbreak prompts into an attack tool library and composes them through a unified intra-episode optimization abstraction to generate stronger standalone attack prompts. To reuse experience across attack episodes, JailbreakOPT further frames tool selection as a contextual bandit problem and applies contextual Thompson sampling to guide exploration and exploitation based on past outcomes. Experiments across multiple target LLMs and attack goals show that JailbreakOPT improves attack success rate (ASR) while reducing the number of attacks until success (No.A) compared with atomic single-turn attacks and existing iterative optimization baselines. This paper may contain offensive or harmful content.

13.
arXiv (CS.CL) 2026-06-12

Understanding helpfulness and harmless tension in reward models

Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their conflicts remain poorly understood. We study alignment tension in reward models trained under helpfulness-only, harmlessness-only, and mixed-objective settings. We find that mixed-objective models often underperform single-objective models, indicating interference between objectives. Using activation-based methods, we identify neurons associated with each objective and study their functional roles via targeted ablations. We find that these neurons causally support their corresponding objectives while often negatively affecting the opposing one. We find that a substantial proportion of neurons are shared between helpfulness and harmlessness, and that these shared neurons exert a disproportionate influence on model behaviour, contributing to alignment tension. Additionally, our results provide insights and mechanistic interpretation into how alignment objectives are represented in reward models and why multi-objective alignment remains challenging, motivating future work on disentangled and controllable alignment methods.

14.
arXiv (CS.CV) 2026-06-18

SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation

Conventional production workflow of high-precision mesh assets necessitates a cumbersome and laborious process of manual sculpting by specialized 3D artists/modelers. The recent years have witnessed remarkable advances in AI-empowered 3D content creation for generating plausible structures and intricate appearances from images or text prompts. However, synthesizing realistic surface details still poses great challenges, and enhancing the geometry fidelity of existing lower-quality 3D meshes (instead of image/text-to-3D generation) remains an open problem. In this paper, we introduce SuperCarver, a 3D geometry super-resolution pipeline for supplementing texture-consistent surface details onto a given coarse mesh. We start by rendering the original textured mesh into the image domain from multiple viewpoints. To achieve detail boosting, we construct a deterministic prior-guided normal diffusion model, which is fine-tuned on a carefully curated dataset of paired detail-lacking and detail-rich normal map renderings. To update mesh surfaces from potentially imperfect normal map predictions, we design a noise-resistant inverse rendering scheme through deformable distance field. Experiments demonstrate that our SuperCarver is capable of generating realistic and expressive surface details depicted by the actual texture appearance, making it a powerful tool to both upgrade historical low-quality 3D assets and reduce the workload of sculpting high-poly meshes.

15.
arXiv (CS.LG) 2026-06-16

Q-Learning with Fine-Grained Gap-Dependent Regret

arXiv:2510.06647v2 Announce Type: replace-cross Abstract: We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.

16.
arXiv (CS.AI) 2026-06-16

Deep Q-Learning on Hölder Spaces

作者:

arXiv:2606.16846v1 Announce Type: cross Abstract: We study the operator-theoretic core of Q-learning in continuous-time stochastic control with continuous states and actions. In value-based reinforcement learning, each Q-learning or DQN update is built from a Bellman optimality target; our analysis isolates this target in a diffusion setting and studies its regularity and approximation complexity. Under uniform ellipticity and Hölder-regular coefficients, we show that a Bellman update maps bounded inputs into an anisotropic regularity class, smoothing the state variable while leaving only Lipschitz dependence on the action variable. This yields a compact family of Bellman iterates and motivates a tensor-product DeepONet architecture adapted to the mixed regularity of the problem. We then derive explicit approximation and resource bounds, together with a stiffness–complexity trade-off as the time step $\delta \to 0$. The resulting theory makes a direct contribution to Q-learning theory at the level of Bellman target regularity and approximation in continuous stochastic control. At the same time, we do not claim a full convergence theorem for practical sampled Q-learning with exploration, replay, and stochastic gradient updates.

17.
arXiv (CS.CV) 2026-06-16

A biological vision inspired framework for machine perception of abutting grating illusory contours

Higher levels of machine intelligence demand alignment with human perception and cognition. Deep neural networks (DNN) dominated machine intelligence have demonstrated exceptional performance across various real-world tasks. Nevertheless, recent evidence suggests that DNNs fail to perceive illusory contours like the abutting grating, a discrepancy that misaligns with human perception patterns. Departing from previous works, we propose a novel deep network called illusory contour perception network (ICPNet) inspired by the circuits of the visual cortex. In ICPNet, a multi-scale feature projection (MFP) module is designed to extract multi-scale representations. To boost the interaction between feedforward and feedback features, a feature interaction attention module (FIAM) is introduced. Moreover, drawing inspiration from the shape bias observed in human perception, an edge detection task conducted via the edge fusion module (EFM) injects shape constraints that guide the network to concentrate on the foreground. We assess our method on the existing AG-MNIST test set and the AG-Fashion-MNIST test sets constructed by this work. Comprehensive experimental results reveal that ICPNet is significantly more sensitive to abutting grating illusory contours than state-of-the-art models, with notable improvements in top-1 accuracy across various subsets. This work is expected to make a step towards human-level intelligence for DNN-based models.

18.
arXiv (CS.AI) 2026-06-16

How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for Recommendations

arXiv:2606.16973v1 Announce Type: cross Abstract: Incorporating textual reviews into a Recommender System has become a prominent strategy for enriching collaborative signals with semantic information. However, the actual contribution of review-derived representations remains an open question, particularly when strong collaborative baselines are employed. In this work, we systematically investigate the impact of textual information on Matrix Factorization by introducing and comparing three enrichment strategies over a common collaborative backbone. First, we propose a learnable gating mechanism that adaptively balances collaborative and textual signals during training. This mechanism is applied to two distinct review representations: (i) aggregated topic profiles extracted from user and item histories, and (ii) full text embedding representations derived from reviews. Additionally, we explore a cross-attention mechanism that identifies and emphasizes the most informative dimensions of the textual representation before fusion with collaborative factors. We evaluate six variants: pure, enriched with topic profiles and text via gating; enriched with topics and text via gating; and enhanced with cross-attention over textual features. Experiments across multiple review-based datasets reveal that although adaptive fusion mechanisms improve representation flexibility, the marginal contribution of textual signals remains limited compared to the collaborative backbone. These findings suggest that, under typical rating-prediction settings, collaborative information continues to dominate performance, raising important considerations for the effective integration of semantic review signals into recommendation models.

19.
bioRxiv (Bioinfo) 2026-06-15

DAQplugin: Deep Learning based Real-time Model Evaluation Plugin for ChimeraX

Although an increasing number of protein structures are determined by cryogenic electron microscopy (cryo-EM), protein structure modeling frequently suffers from residue misassignments and sequence register shifts, particularly in regions with ambiguous density. Here, we present DAQplugin, a ChimeraX plugin that performs real-time evaluation of protein models against cryo-EM density maps using the deep-learning-based residue-wise model quality (DAQ) score. Unlike existing validation tools that are typically applied after model construction, DAQplugin enables real-time deep-learning-based validation during model building and refinement. To our knowledge, DAQplugin is the first tool that provides real-time deep-learning based validation of protein models for cryo-EM map within an interactive modeling environment. In addition to identifying potential modeling errors, DAQplugin also provides guidance for correcting sequence register shifts by suggesting alternative residue placements along the backbone. The computation in this plugin is designed to run efficiently on general CPUs without requiring GPU hardware. Using DAQplugin, users can perform deep-learning-based validation on standard laptops during interactive model building, model-map fitting, and refinement. DAQplugin is able to facilitate more accurate interpretation of cryo-EM density maps and improve the reliability assessment of protein structure models.

20.
arXiv (CS.CV) 2026-06-15

Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

Vision-language-action (VLA) policies typically inherit their vision encoder from upstream VLM releases, but it is unclear whether an encoder choice validated on a small VLA transfers to a larger backbone. We introduce a frozen-backbone grafting diagnostic: the vision tower of a released VLA is replaced by a candidate encoder under a fixed protocol (adaptive average pooling, LayerNorm, and a single trainable linear projector), with the language model and action expert frozen. Across four encoders, two LIBERO suites, two backbones (SmolVLA-450M and $\pi_{0.5}$-3.3B), and two-to-three seeds per cell (40 main grafting runs plus native, LoRA, pooling, and zero-/shuffled-image controls, all scored by offline action MSE), the small-backbone winner does not reliably select the large-backbone top tier: SigLIP is best on SmolVLA across both suites, while on $\pi_{0.5}$ DINOv2-small leads the spatial suite and the object suite is a seed-sensitive near-tie band; three of the four backbone-suite comparisons (and 11 of 12 seed-level cells) support backbone-dependent rankings. The grafting wrapper is itself non-neutral with opposite sign across backbones (+45-56% MSE on the SmolVLA native tower, -50-52% on $\pi_{0.5}$), so all conclusions are conditional on the fixed grafting protocol. We position frozen grafting as a cheap target-backbone diagnostic to run before committing to an encoder at scale, not as a closed-loop deployment claim.

21.
arXiv (quant-ph) 2026-06-11

Optimizing Encoder Circuits of Entanglement-Assisted Quantum LDPC Codes via Beam Search

arXiv:2606.11468v1 Announce Type: new Abstract: Entanglement-assisted (EA) quantum QC-LDPC codes offer strong error-correction capabilities with structured parity-check matrices, but their practical use depends on efficient encoder circuits and the availability of pre-shared Bell pairs (ebits). In all encoder implementations based on the stabilizer formalism, the dominant contribution to this complexity comes from the use of controlled gates. In this paper, we adopt the Sharma-Kumar-Garani (SKG) encoder construction. We formulate the encoder optimization as a search over GF(2) row operations that decompose the binary matrix derived from its CNOT sub-sequence. We solve this problem using a beam search algorithm guided by a Hamming-distance heuristic. For the tested EA quantum QC-LDPC code families, the proposed method achieves CNOT-count reductions of 7.3-34.0% relative to the SKG baseline encoder. The optimized circuits also yield lower CNOT counts than Patel-Markov-Hayes synthesis on all tested instances and are verified by stabilizer-tableau simulation. These results show that substantial encoder simplification is possible for structured EA QC-LDPC codes.

22.
arXiv (CS.LG) 2026-06-17

Learning Credal Ensembles via Distributionally Robust Optimization

arXiv:2602.08470v3 Announce Type: replace Abstract: Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

23.
arXiv (CS.LG) 2026-06-11

Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

arXiv:2606.11998v1 Announce Type: new Abstract: Trusted monitoring is a cornerstone of AI control. However, as frontier models grow more capable, the increasing capabilities gap between trusted and untrusted models may render trusted models unreliable monitors. We introduce bootstrapped monitoring, a protocol that addresses this by inserting a stronger, intermediate untrusted model with transparent chain-of-thought reasoning into the oversight chain. The untrusted monitor ($U_m$) evaluates the agent's actions, while a weaker trusted model ($T$) oversees $U_m$'s reasoning to detect collusion. We evaluate bootstrapped monitoring on multi-turn software engineering tasks (BashArena) across multiple agents and monitors. Bootstrapped monitoring substantially improves catch rates over trusted-only monitoring, even when the untrusted monitor actively colludes with the agent, provided we have access to its raw chain-of-thought. Our results suggest that bootstrapped monitoring can extend the useful lifetime of trusted models in control as AI capabilities advance.

24.
arXiv (CS.CV) 2026-06-16

Through-Foliage Surface-Temperature Reconstruction for Early Wildfire Detection

We present a method to reconstruct surface temperatures through forest vegetation by combining signal processing and machine learning, enabling fully automated aerial wildfire monitoring with drones for early fire detection. Synthetic aperture (SA) sensing reduces canopy occlusion but introduces thermal blur. To overcome this, we train a visual state space model to recover subtle thermal signals of partially occluded soil and fire hotspots from blurred data. To address limited real-world training data, we generate realistic surface temperature simulations using a latent diffusion model, temperature augmentation, and procedural thermal forest modeling. On simulated datasets, our method reduces RMSE by 2-2.5 versus conventional thermal and uncorrected SA imaging; in field experiments on hotspots, RMSE improved by 12.8-fold and 2.6-fold, respectively. Our approach also generalizes to other thermal signals, including human signatures, capturing morphology and extent – critical where simple thresholding fails – while conventional imaging struggles with partial occlusion.

25.
arXiv (CS.AI) 2026-06-11

An Ethical eValuation Agent (EeVA): Results of a Proof-of-Concept Test on a Prototype Agentic-like Workflow to Assist Ethical Deliberations

arXiv:2606.11218v1 Announce Type: cross Abstract: Ethical deliberation is often misunderstood as a search for single right or wrong answers, creating difficulties for non-ethically trained personnel who must address ethically laden challenges. We developed EeVA, an agentic-like LLM-based workflow designed to support comparative ethical reflection rather than deliver definitive ethical answers. EeVA was programmed in n8n using three interconnected workflows: starter, worker, and emitter. It evaluated uploaded use cases against 10 ethical frameworks through evaluator and synthesis prompts. Proof-of-concept testing used three published cases from urban mobility, peer-to-peer energy trading, and social-service resource allocation. Across all cases, EeVA produced consistently structured framework-specific evaluations and integrated syntheses. Outputs differentiated between frameworks, identified convergences and divergences, recommended modifications to increase alignment, and highlighted persistent ethical tensions. Syntheses were readable for non-specialists and shifted attention away from simplistic answers toward design conditions, safeguards, and areas where full cross-framework agreement was unlikely. The findings suggest that LLMs can be organised into usable workflows that preserve ethical plurality while helping bridge the communicative gap between ethicists and non-ethically trained personnel. EeVA's value lies not in replacing ethicists or resolving moral disagreement, but in scaffolding structured ethical deliberation. EeVA offers a promising proof of concept for supporting ethical reflection where access to ethics expertise is limited. Further work is needed on reproducibility, human evaluation, user testing, and efficiency before it can be considered a mature tool.