Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-17

LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings

arXiv:2606.17727v1 Announce Type: new Abstract: Recent vision-language models (VLMs) have shown promising progress in generating webpages from visual inputs, yet existing evaluations mainly focus on short, single-screen, and largely static webpages. We introduce LongWebBench, a benchmark for evaluating long-horizon webpage generation from both structural and functional perspectives. LongWebBench contains 490 real-world long webpages for structural fidelity evaluation and 507 goal-oriented interaction tasks over 129 webpages for functional evaluation. It employs two complementary protocols: a multi-dimensional VLM-based metric for assessing long-range structural coherence, and a DOM-augmented agent-based pipeline for end-to-end functional verification. We further examine the automatic evaluation protocols through human agreement analysis. Experiments with state-of-the-art open-source and proprietary VLMs under single-image and multi-image settings reveal that structural fidelity degrades as webpage length increases, while visually plausible generations often fail to support executable multi-step interactions. These results highlight the need to evaluate long webpage generation beyond visual similarity, with executable interaction as a core criterion. Our code and data are available at https://github.com/zheny2751-dotcom/LongWebBench.

02.
arXiv (CS.AI) 2026-06-16

Steering Emotional Dynamics for Art Therapy: Controllable Narrative Script Generation through Hierarchically Guided LLM Agents

arXiv:2606.16481v1 Announce Type: new Abstract: Art therapy plays a vital role in emotional healing, in which narrative creation acts as the primary vehicle for emotional expression. Given the inherently dynamic nature of emotions during healing, narratives with finely controlled emotional fluctuations enable individuals to safely project inner conflicts and achieve emotional catharsis. Recently, with the rapid development of Large Language Models (LLMs), automated narrative generation technology has provided a new pathway to support such artistic designs. However, while existing methods can produce fluent texts, they struggle to generate narratives that adhere to specified affective trajectories, failing to meet the demands of emotion-oriented psychological healing. To address these issues, this paper proposes EC-Script, an LLM agent-based framework that enables hierarchical control of the affective trajectory in narrative generation for emotional healing. To ensure that the generated narratives strictly follow the given emotional patterns, EC-Script establishes overall narrative direction through Emotion-Trajectory Planning, propels scene-level plot development with Character-Driven Scene Generation, and regulates local emotional changes of characters via Emotion-Controlled Script Writing. Ultimately, it outputs scene-by-scene script content that remains highly consistent with the preset affective trajectory. Experimental results demonstrate that EC-Script significantly outperforms baseline methods in affective trajectory adherence, exhibiting excellent and reliable emotional controllability, thereby providing effective technical support for AI-assisted emotional healing scenarios.

03.
arXiv (CS.CV) 2026-06-12

Spatially Grounded Concept Bottleneck Models via Part-Factorized Attention

Concept bottleneck models (CBMs) predict a layer of human-named attributes before predicting a class, which makes their decisions auditable. On fine-grained recognition tasks the concept heads are usually free to attend anywhere in the image, so a head named for one body region can be satisfied by evidence on another. This work studies a part-factorized CBM that removes that freedom by construction. The method has three components built on a frozen DINOv3 vision transformer. A learned foreground gate, trained on DINOv3 patch features, suppresses background patches inside the part attention. A set of part queries cross-attends to patch features and each of the 312 CUB attributes is routed, through a fixed concept-to-part map, to read only from the part token its name implies. A learnable two-dimensional Gaussian prior, injected additively in log space into the attention logits, breaks the permutation symmetry among part queries; its means are initialized from the dataset-average keypoint location of each part, which requires no per-image keypoint supervision at training or test time. On CUB-200-2011 the spatial-prior model matches a fully supervised baseline (88.85% versus 88.95% top-1) while raising pointing accuracy by 16 points (52.6% versus 36.4%). Replacing bounding-box supervision with a PCA foreground target and combining it with the Gaussian prior removes all per-image supervision and reaches 88.6% top-1 at about 70% pointing accuracy. A keypoint-fraction sweep shows that 0.5% of the training set (about 27 images) suffices to initialize the prior with no measurable loss. Removing part identity entirely is the harder case: without any spatial prior, pointing accuracy collapses to $2.9\%$.

04.
arXiv (CS.AI) 2026-06-12

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv:2606.12848v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for tasks once reserved for trained researchers, including hypothesis generation, specification choice, and drafting conclusions. We argue that the reliability of AI-assisted research depends not only on model capability, but also on how cognitive labour is structured between humans and machines. We study this problem through Human-in-the-Loop Economic Research (HLER), a decision architecture based on pre-commitment, decision sequencing, accountability, and attention allocation. In a pre-specified 2*4 factorial experiment with 280 complete research runs across four datasets, an unconstrained multi-agent baseline produced critical failures in 72% of runs. Using the same underlying model, the same agent decomposition, and identical prompts for the shared reasoning agents, HLER reduced the failure rate to 16% by imposing three architectural commitments: LLMs reason but do not execute data work, data and estimation are handled deterministically, and three human decision gates bind the workflow. Fisher's exact test rejects equality of failure rates at p

05.
bioRxiv (Bioinfo) 2026-06-10

Pseudoperplexity Probes Memorization in Protein Language Models

Protein Language Models (pLMs) have significantly advanced computational biology. Yet their scale and reliance on redundant training data raise a fundamental question: do pLMs generalize the statistical grammar of proteins, or do they simply memorize their training data? To investigate this, we used pseudoperplexity as a probe for sequence-level memorization, comparing ProtT5's pseudoperplexity on a pre-training proxy dataset against a post-training holdout of genuinely novel sequences. To ensure a valid comparison, we matched the datasets by sequence length, cluster size, and taxonomic family. As a statistical baseline, we trained n-gram language models; analysis of higher-order n-gram composition and a statistically significant divergence in perplexity confirmed that the post-training sequences were genuinely novel at the local sequence level. ProtT5 showed a statistically significant difference in pseudoperplexity between seen and unseen sequences, though further analysis revealed this memorization signal to be modest. These findings suggest that ProtT5 exhibits detectable but limited memorization of its training data as measured by a pseudoperplexity-based probe.

06.
arXiv (CS.CV) 2026-06-19

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck. Extensive experiments conducted on the LIDC-IDRI and RAD-ChestCT datasets demonstrate that PRDiT consistently outperforms state-of-the-art models, such as HA-GAN, 3D LDM and WDM-3D, achieving significantly lower 3D FID, MMD and Wasserstein distance scores.

07.
arXiv (CS.CV) 2026-06-17

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision–language models (VLMs), in contrast, provide strong semantic grounding and general knowledge by reasoning over uniformly sampled frames, but they are not ideal as standalone dense predictors due to compute-driven sparse sampling, a language-output bottleneck that compresses fine-grained interaction states into text-oriented representations, and a data-regime mismatch when adapting to small action-conditioned datasets. We propose a VLM-guided JEPA-style latent world modeling framework that combines dense-frame dynamics modeling with long-horizon semantic guidance via a dual-temporal pathway: a dense JEPA branch for fine-grained motion and interaction cues, and a uniformly sampled VLM thinker branch with a larger temporal stride for knowledge-rich guidance. To transfer the VLM's progressive reasoning signals effectively, we introduce a hierarchical pyramid representation extraction module that aggregates multi-layer VLM representations into guidance features compatible with latent prediction. Experiments on hand-manipulation trajectory prediction show that our method outperforms both a strong VLM-only baseline and a JEPA-predictor baseline, and yields more robust long-horizon rollout behavior.

08.
arXiv (CS.LG) 2026-06-19

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

arXiv:2510.19893v2 Announce Type: replace Abstract: Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.

09.
arXiv (quant-ph) 2026-06-11

Logical error estimation from syndrome data of surface-code experiments

arXiv:2606.11496v1 Announce Type: new Abstract: Decoders for quantum error correction (QEC) experiments rely on detector error models (DEMs), which encode, for each error, its probability and the detectors and logical observables it flips. Here we show that estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. We evaluate our methods using open-source data from surface-code memory experiments performed on Google's Willow chip, and we carry out analogous surface-code experiments on IBM's \texttt{ibm\_miami} processor. Despite the different physical error scales of the Google and IBM devices, in both cases our estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the $5\%-10\%$ level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.

10.
arXiv (CS.LG) 2026-06-19

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

arXiv:2606.20324v1 Announce Type: cross Abstract: Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

11.
arXiv (CS.CL) 2026-06-11

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking checkpoints, these capabilities are complementary but misaligned. The Instruct model is concise and tool-disciplined, whereas the Thinking model offers stronger planning and recovery behavior but often over-deliberates and degrades agent performance. We present CRANE (Constrained Reasoning Injection for Code Agents via Nullspace Editing), a training-free parameter-editing method that treats the Thinking-Instruct delta as a directional pool of candidate reasoning edits for the Instruct backbone. CRANE combines magnitude thresholding to denoise the delta, a Conservative Taylor Gate to retain edits that are jointly beneficial for reasoning transfer and tool-use preservation, and Graduated Sigmoidal Projection to suppress format-critical update directions. By merging paired Instruct and Thinking checkpoints, CRANE delivers strong gains over either individual model while preserving Instruct-level efficiency: on Roo-Eval it achieves pass1 of 66.2% (+19.5%) for Qwen3-30B-A3B and 81.5% (+8.7%) for Qwen3-Next-80B-A3B; on SWE-bench-Verified it resolves up to 14 additional instances at both scales (122/500 and 180/500); and on Terminal-Bench v2 it improves pass1/pass5 by up to 2.3%/7.8%, reaching 7.6%/17.9% and 14.8%/30.3%, respectively, consistently outperforming alternative merging strategies across all three benchmarks.

12.
arXiv (CS.AI) 2026-06-17

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

arXiv:2510.01359v2 Announce Type: replace-cross Abstract: Code-capable large language model (LLM) agents are embedded in software engineering workflows where they can read, write, and execute code, raising "jailbreak" stakes beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection, leaving open whether agents compile and run malicious programs. We present JAWS-Bench (Jailbreaks Across WorkSpaces), a benchmark spanning three escalating workspace regimes mirroring attacker capability: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). We pair this with a hierarchical, executable-aware Judge Framework that tests (i) compliance, (ii) attack success, (iii) syntactic correctness, and (iv) runtime executability, to measure deployable harm. Across seven LLM backends from five families, prompt-only attacks in JAWS-0 achieve 61% compliance; 58% are harmful, 52% parse, and 27% run end-to-end. In JAWS-1, compliance reaches ~100% for stronger models with a mean ASR (Attack Success Rate) ~71%; JAWS-M raises mean ASR to ~75%, with 32% runnable attack code. Wrapping an LLM in an agent increases ASR by 1.6$\times$, by overturning initial refusals during planning and tool use. Similar trends hold for OpenHands, SWE-Agent, and OpenAI Codex, suggesting our JAWS-Bench is agent-agnostic. Category analyses identify which attack classes are most vulnerable and deployable, motivating execution-aware defenses and refusal-preserving agent designs.

13.
arXiv (math.PR) 2026-06-15

Uniform-in-time error estimates for McKean-Vlasov SDEs with common noise and stochastic algorithms

arXiv:2606.14170v1 Announce Type: new Abstract: In this work, by construct an asymptotic coupling by reflection, we first explore the uniform-in-time estimate on probability distance for two measure-valued processes induced by a McKean-Vlasov SDE with common noise and an interacting particle system, where the drift terms are dissipative merely in the long distance. As direct applications of this estimate, we establish the uniform-in-time error estimates for the numerical solutions derived via backward/tamed/adaptive Euler-Maruyama methods. Moreover, as another direct application, the uniform-in-time conditional propagation of chaos is quantified.

14.
arXiv (CS.AI) 2026-06-17

AI Adoption Across a Multinational Workforce: Sociotechnical Conditions for GenAI Acceptance in Human Resources

arXiv:2606.17887v1 Announce Type: cross Abstract: Generative AI (GenAI) deployment in the workplace is accelerating rapidly. Nevertheless, questions of who adopts, who benefits, and who is left behind and why are still understudied. In this paper, we investigate these dynamics in the context of a multinational tech company transitioning from a legacy Human Resources (HR) search system to a GenAI-supported system, analyzing search log data, survey data (n=25), and ten semi-structured interviews. Our findings show that adoption depended on the fit between the GenAI system's design assumptions and employees' work positionalities (role, spoken language, tenure). Further, we find that employees' trust in GenAI answers was built through source-checking, comparison among systems, and seeking input from colleagues or HR when in doubt. Our contribution is twofold. First, we provide empirical evidence of workplace GenAI adoption during a live organizational transition, showing that adoption is influenced by factors such as situational fit, search literacy, and trust calibration. It is also further shaped by knowledge conditions such as the system's content quality, employee training, and guidance. Second, we translate these findings into design considerations for inclusive deployment and adoption in high-stakes environments such as HR. We argue that organizations should design systems considering the role and context-sensitive benefits they yield to different social groups. They also need to treat the organizational knowledge infrastructure as AI infrastructure to improve the accountability and usability of GenAI systems

15.
medRxiv (Medicine) 2026-06-12

Effect of tenofovir on the outcomes of COVID-19 in persons with chronic hepatitis B: a nationwide cohort study in Sweden.

Background: Patients with chronic hepatitis B (CHB) may have an increased risk of severe COVID-19. Tenofovir has been hypothesized to confer protection against severe disease, but evidence is inconclusive. We evaluated the risk of severe COVID-19 among CHB patients treated with tenofovir compared with other nucleos(t)ide analogues (NAs). Methods and findings: In this nationwide, registry-based cohort study, we included all adults with CHB and laboratory-confirmed COVID-19 in Sweden between February 2020 and July 2022. Data from national health and socioeconomic registers were linked using unique personal identification numbers (PINs). Patients with HIV, hepatitis C, or hepatitis D coinfection were excluded. Exposure was defined as tenofovir versus other NA therapy. The primary outcome was severe COVID-19, defined as hospitalization >2 days or death within 30 days of diagnosis. Logistic regression was used to estimate adjusted odds ratios (aOR) with 95% confidence intervals (CI), controlling for age, sex, comorbidities, vaccination, socioeconomic status, and region of birth. Among 5,877 CHB patients with COVID-19, 672 were receiving NA therapy (437 tenofovir, 235 other NAs). Severe COVID-19 occurred in 8.0% of tenofovir-treated patients and 14.5% of those receiving other NAs (unadjusted OR 0.52; 95% CI, 0.31-0.85). After adjustment, the association was attenuated and no longer significant (aOR 0.72; 95% CI, 0.39-1.31). Older age, comorbidities, and unvaccinated status were strongly associated with severe disease. Conclusions: The apparent protective effect of tenofovir against severe COVID-19 in unadjusted analyses was largely explained by confounding factors. The risk of severe disease was primarily driven by age, comorbidities, and vaccination status. Prevention of severe COVID-19 in patients with CHB should instead focus on vaccination and management of comorbidities.

16.
arXiv (CS.CV) 2026-06-12

MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics

To study the ability to infer physical dynamics from videos and extrapolate them forward in time, we assemble a dataset of 2D Material Point Method (MPM) physical simulations covering rich physical phenomena such as deformable objects, fluids, kinetic objects, and emitters. We study code generation and video diffusion approaches on this dataset, identifying their strengths and weaknesses by varying the amount of physically relevant side information. The code generation model, beyond giving a working demonstration of automatic synthesis of MPM simulations, reveals that such an approach struggles with inferring physical parameters from visual input, but relative to video diffusion, produces physically and temporally stable extrapolations forward in time, while the video diffusion model more strongly identifies geometric properties from visual input but produces physically implausible extrapolations.

17.
arXiv (CS.CV) 2026-06-17

Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation

Semi-supervised medical image segmentation has emerged as a dominant research problem in medical image analysis, mitigating annotation scarcity by leveraging consistency regularization on unlabeled data. However, existing approaches operate predominantly via visual pattern matching, relying heavily on pixel-level similarities. This visual-centric dependency often falters in clinical scenarios characterized by the visual-semantic mismatch, where visually similar lesions warrant distinct diagnostic conclusions, thus failing to capture the underlying diagnostic logic used by experts. To address this, we move beyond visual cues and propose CERS (CoT-Enhanced Reasoning Segmentation), a framework that integrates Chain-of-Thought (CoT) reasoning to distinguish pathologically distinct cases. Specifically, we construct a knowledge pool enriched with linguistic reasoning descriptions generated by large language models (LLMs). A semantic-aware reference selection strategy is introduced to identify historical evidence, filtering candidates first by morphology, and then refining them via CoT consistency to eliminate hard negatives. Furthermore, a multi-scale coordinate attention module (MCAM) is designed to effectively fuse this reasoning-derived context into the decoding process. Extensive experiments demonstrate the superiority of CERS against state-of-the-art approaches, particularly in resolving boundary ambiguities and semantic inconsistencies. The code is available at https://github.com/cymasuna/CERS.

18.
arXiv (CS.CV) 2026-06-16

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

Reward models play a pivotal role in reinforcement learning (RL) and multi-modal trajectory selection for autonomous driving. However, acquiring such rewards typically relies on hand-crafted rule-based objectives or perception ground truth, which hinders generalization for data-scaling. While Vision-Language Models (VLMs) have demonstrated feasibility as reward models in other domains, their effectiveness in driving tasks remains underexplored. In this work, we bridge this gap by (1) introducing DriveReward, a reasoning trajectory evaluation dataset rigorously labeled via temporally-grounded visual guidance, and augmented with counterfactual driving behaviors., (2) alongside a specialized Vision-Language Reward Model. To address the scarcity of failure cases in conventional datasets, we propose a counterfactual data annotation scheme to construct cases encompassing diverse driving styles and erroneous behaviors. Evaluations on our proposed benchmark reveal that even leading open-source and proprietary VLMs fail to excel across all tasks, highlighting significant room for improvement in existing models. Building on these findings, we subsequently tailor a specialized 1B reward model that outperforms larger VLMs on task-specific reward alignment. Finally, we validate our reward model's effectiveness by integrating it into RL finetuning and multi-modal trajectory scoring across multiple baselines, achieving performance comparable to rule-based reward calculations in both open-loop and closed-loop evaluation.

19.
bioRxiv (Bioinfo) 2026-06-13

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

作者:

De novo protein binder design has been dominated by structure-based pipelines that require known three-dimensional target conformations and consume substantial compute and generation time per design, limiting their throughput and accessibility for routine large-scale binder exploration. Sequence-only generative models promise a faster and lighter alternative, yet existing systems remain uniformly dense and frequently reintroduce structural computation at inference, undermining the core advantages they were intended to deliver. Across the broader language modelling community, transformers have meanwhile transitioned from fully dense designs to sparse Mixture-of-Experts architectures that decouple capacity from per-token compute, a shift that has yet to reach sequence-only protein binder generation. We present MoE-Bind, an autoregressive protein binder generator that, for the first time in this domain, combines Multi-head Latent Attention with a sparse Mixture-of-Experts feed-forward network and is evaluated under two independent structure predictors, Boltz-2 and AlphaFold2-Multimer. Despite activating less than half the per-token parameters of compute-matched dense baselines, MoE-Bind matches or exceeds them on full-length receptor-conditioned binder generation on a leakage-free Docking Benchmark 5.0 evaluation, transfers without peptide-specific training to short-peptide design, and reduces training and inference compute by a large margin. Routing analysis on generated binders reveals interpretable expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models. These results show that sparse architectural design, rather than scale, can deliver fast, structure-free, and interpretable protein binder generation.

20.
arXiv (quant-ph) 2026-06-17

Quantum conditional entropies from convex trace functionals

arXiv:2410.21976v4 Announce Type: replace Abstract: We study geometric properties of trace functionals that generalize those in [Zhang, Adv. Math. 365:107053 (2020)], arising from a novel family of conditional entropies with applications in quantum information. Building on new convexity results for these functionals, we establish data-processing inequalities and additivity properties for our entropies, demonstrating their operational significance. We further prove completeness under duality, chain rules, and various monotonicity properties for this family. Our proofs draw on tools from complex interpolation theory, multivariate Araki–Lieb and Lieb–Thirring inequalities, variational characterizations of trace functionals, and spectral pinching techniques.

21.
arXiv (CS.LG) 2026-06-11

Counterexample Guided Learning in the Large using Reasoning Agents

arXiv:2606.11521v1 Announce Type: new Abstract: LLMs and LLM agents should improve when given feedback, but identifying when they are able to do so is difficult: feedback is heterogeneous, domain-specific, and difficult to control. We approach this challenge by asking LLMs to perform regular-expression induction, a classical symbolic learning problem where precise mechanisms for feedback exist in the form of counterexamples. In counterexample-guided learning, a learner (LLM) proposes candidate regular expressions from positive/negative-labeled strings, and the teacher (verifier) returns counterexamples showcasing the difference between the candidate and target languages. We identify novel counterexample-guided refinement strategies that enable effective regex learning, such as regularization and symbolic counterexample clusters. We also explore agentic strategies such as reflection and repair loops. Empirically, we find that verifier feedback substantially improves sample efficiency on challenging regex-induction tasks, reducing the number of labeled examples required and enabling learning of complex target expressions where standard prompting fails. For example, on the hardest task groups, our counterexample-guided framework improves success from 3.2% to 38.1% and from 38.9% to 74.1% on two different regex domains. These results suggest that LLMs can benefit from rich feedback beyond treating it as additional data, opening the door for robust verifier-guided methods for LLM-based program synthesis and formal reasoning.

22.
arXiv (CS.AI) 2026-06-15

Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning

arXiv:2606.14078v1 Announce Type: cross Abstract: Existing studies reveal that current backdoor defenses exhibit limited robustness and often fail against specific types of attacks. More concerningly, prevailing safety tuning strategies tend to provide only superficial safety protection, as they fall short of completely eliminating the backdoor effects. In this work, we present a novel formulation of backdoor learning and unlearning as a sequential, three-stage process from a continual learning perspective. Within this framework, we formally define complete backdoor unlearning and further derive the necessary conditions for achieving it based on the mechanism of catastrophic forgetting. Guided by these insights, we propose Blind Inversion-Backdoor Adversarial Unlearning (BI-BAU), which formulates the generation of adversarial examples satisfying the unlearning conditions as a blind inversion problem. We solve this by integrating the bi-level optimization process of adversarial training into an Expectation-Maximization (EM) algorithm framework to optimize the maximum a posteriori (MAP) objective. Furthermore, BI-BAU is extended to untargeted adversarial scenarios with unknown target classes, as well as to multi-modal contrastive learning tasks, enhancing its applicability to real-world deployment scenarios where pre-trained models may be compromised. Extensive experiments demonstrate that our method exhibits general applicability across a wide spectrum of backdoor attacks and can effectively and thoroughly eliminate the backdoor effects from a backdoor model.

23.
arXiv (CS.AI) 2026-06-15

PLAIground: SLO-Driven Runtime Model Selection for Compound AI Systems in the Edge-Cloud-Space Continuum

arXiv:2606.14356v1 Announce Type: cross Abstract: Applications in the 3D Computing Continuum, which unifies edge, cloud, and space, require combining multiple AI tasks such as object detection, time-series analytics, and natural language processing into Compound AI systems. These systems must satisfy stringent Service Level Objectives (SLOs) on accuracy, latency, and cost. A key mechanism for maintaining SLO compliance of Compound AI systems is runtime model selection, where AI models are dynamically switched for each workflow task. However, existing distributed and compound AI frameworks do not natively support runtime model selection. We present PLAIground, a framework that enables runtime model selection for Compound AI systems. PLAIground introduces Compoundable AI Model (CAIM) abstraction, which decouples task semantics from AI model implementations via Task and Data Contracts, enabling model switching without workflow changes. Additionally, PLAIground introduces Pixie, an SLO-driven runtime model selection algorithm, which dynamically selects the most suitable model for each task during execution. Our evaluation on two realistic Compound AI workflows demonstrates that Pixie achieves up to 91.3% accuracy while maintaining SLO compliance where fixed-model strategies either violate cost and latency budgets up to 21x or miss accuracy targets by 4%.

24.
arXiv (CS.CV) 2026-06-16

Bridging Information Asymmetry: A Hierarchical Framework for Blind Face Restoration with Reduced Uncertainty

Blind face restoration remains a persistent challenge due to the inherent ill-posedness of reconstructing holistic structures from severely constrained observations. Current generative paradigms, while capable of synthesizing realistic facial details, remain limited by the under-constrained nature of blind restoration, where severely degraded inputs can be mapped to plausible yet identity-inconsistent outputs. To address this issue, we present Pref-Restore, a hierarchical framework for BFR with reduced restoration uncertainty. Our design is organized around three complementary principles: (1) Semantic Information Augmentation, where an auto-regressive semantic branch converts image and text cues into structured tokens that provide a stable high-level anchor; (2) Texture-level Fidelity Alignment, where the diffusion generator is trained under this anchor to recover identity-relevant details; and (3) Fidelity-constrained Preference Optimization, where a face-aware reward refines the diffusion trajectory while controlling the quality–fidelity trade-off. Extensive experiments on synthetic and real-world benchmarks show that Pref-Restore achieves state-of-the-art performance, with stronger identity-sensitive fidelity and lower restoration uncertainty across repeated sampling. Systematic ablations further attribute these gains to the proposed hierarchical design, showing the necessity of staged training, the robustness and quality dependence of the text pathway, and the benefit of fidelity-constrained preference optimization.

25.
arXiv (CS.AI) 2026-06-15

Korzhinskii-Net: Physics-Informed Neural Network for Sub-Surface Mineral Prospectivity Modelling

作者:

arXiv:2606.13695v1 Announce Type: cross Abstract: Mineral prospectivity modelling (MPM) underpins exploration economics, yet most operational pipelines reduce to data-driven classifiers trained on shallow surface proxies. Such models are blind to the subsurface physics that actually localises ore: heat advection, fluid flow, and lithology-dependent precipitation. We present Korzhinskii-Net, a 2-D radial physics-informed neural network (PINN) that couples Darcy flow, advective-diffusive heat transport, and a softplus-saturated reaction rate into a single differentiable forward model, weakly supervised by surface and remote-sensing proxies. The network is named after Dmitri S. Korzhinskii (1899-1985), whose theory of infiltration metasomatism provides the physical scaffold. We evaluate Korzhinskii-Net on five ore provinces spanning four commodity classes – Norilsk (Ni-Cu-PGE), Pechenga (Ni-Cu sulphide), Udokan (sandstone-hosted Cu), Sukhoi Log (orogenic Au), and Mirny (kimberlitic diamond) – under a fair, leakage-controlled 5-fold cross-validation protocol with hard ring-shaped negatives. Korzhinskii-Net attains a mean PR-AUC of 0.885 versus 0.281 for the strongest classical baseline (gradient boosting), and a mean fractional rank of 0.019 versus 0.413. The improvement is consistent across all five provinces and four commodity systems, suggesting that physics-informed differentiable simulators, even when constrained only by global open-data proxies, can recover localisation patterns that pure feature-based learners systematically miss. We release the full pipeline and evaluation harness as open source.