Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

Algorithmic Constitutionalism

arXiv:2606.12437v1 Announce Type: cross Abstract: The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

02.
Science (Express) 2026-06-04

Long-range extended chains arising from polymerization-driven spontaneous assembly | Science

作者: 未知作者

A central challenge for conjugated polymers is to achieve long-range order while remaining solution-processable, which is essential for matching the electrical performance of their counterparts of crystalline inorganic semiconductors. Here we show that n-doped poly(benzodifurandione) (n-PBDF) can undergo polymerization-driven spontaneous assembly (PSA), in which chain growth, chemical doping, and structural ordering are intrinsically coupled, yielding long-range chain extension over hundreds of nanometers. We reveal that the spontaneously formed n-PBDF nanoribbons arise from a self-initiated, convergent growth mechanism driven by cooperative monomer–polymer interactions and stabilized by proton-coupled duplex chains and the polymer’s intrinsic polyelectrolyte character. With long-range extended chains in the nanoribbons, the aligned n-PBDF thin films demonstrate metallic-level conductivity (>10 4 Siemens per centimeter).

03.
arXiv (CS.CL) 2026-06-16

LLM-based Visual Code Completion for Aerospace Geometric Design

Recent advances in both Large Language Models (LLMs) and Vision Language Models (VLMs) have seen a step change in their ability to perform visual code completion, but the aerospace industry, which prioritizes safety and explainabilty over rapid LLM adoption, currently has no publicly announced LLM-based geometric design copilot systems in commercial use by aerospace Original Equipment Manufacturers (OEMs). This paper presents a LLM-based visual programming copilot application for aerospace engineering design tasks, using a visual programming variant of the ReAct methodology and GPT 5.4. In addition to the copilot, we describe Wingbuilder, a new Grasshopper plugin library with custom components for aerospace-specific geometry abstraction, and an associated Aerospace Visual Programming Dataset (AVPD) with 18 aerospace expert designed tasks at different levels of difficulty alongside ground truth solutions. We evaluate our copilot application with a user trial involving two experienced aerospace engineers from a large aircraft manufacturing company. We find our copilot visual programming ReAct methodology was successful in generating suggestions that participants found helpful, but slow ReAct inference times limit its usefulness to more complex time-consuming tasks where waiting for good copilot solution suggestion was worthwhile. Participants reported they liked the tool and would be willing to use it in the future.

04.
arXiv (CS.CL) 2026-06-16

Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget

We study context compression for multi-hop question answering with small language models. We propose Telegraph English, a readable symbolic format that rewrites retrieved passages into structured entity-relation statements, preserving reasoning evidence at lower token cost. In controlled experiments on MuSiQue, TwoWiki, and HotpotQA, Telegraph English outperforms three matched-budget compression baselines (character-level deletion, truncation, and random sub-sampling) on every dataset, with gains of 13 to 20 F1 percentage point. It also outperforms a coherent prose summary produced by the same encoder on the hardest dataset. A pre-registered depth-interaction hypothesis is null: the advantage does not grow with reasoning depth within datasets. We interpret these results as evidence that readable symbolic re-expression preserves entity content more densely than either natural language or coherent summarization at matched token budget.

05.
arXiv (CS.CV) 2026-06-16

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differences, while reasoning-based generative rewards provide stronger judgments but are costly to deploy and difficult to use as direct optimization signals. We propose Z-Reward, a teacher-student reward modeling framework that decouples reasoning-heavy judgment from efficient reward deployment. The teacher is a large VLM that uses reasoning to infer rubric-aligned score distributions, and is trained with Group-wise Direct Score Optimization (GDSO), which combines policy-gradient rewards from distribution expectations with direct pointwise and pairwise supervision on score distributions and score gaps. The student is trained with Reasoning-Internalized Score Distillation (RISD), which transfers the teacher's reasoning-conditioned score distribution into a compact VLM without requiring explicit reasoning chains at inference time. On our internally annotated evaluation set, the 27B GDSO teacher reaches 89.6% human preference accuracy, outperforming SFT, RewardDance, and GRPO, while the 9B RISD student reaches 88.6%, outperforming the OPD baseline and closely matching the larger teacher. We further show that Z-Reward can serve as a differentiable reward signal for text-to-image optimization, yielding a 41.3% net human-preference improvement over the SFT baseline.

06.
arXiv (CS.CL) 2026-06-18

Steerable Cultural Preference Optimization of Reward Models

It is essential for large language model (LLM) technology to serve many different cultural sub-communities in a manner that is acceptable to each community. However, research on LLM alignment has so far predominantly focused on predicting a unified response preference of annotators from certain regions. This paper aims to advance the development of alignment models with a more global outlook, that are able to accurately represent the preferences of subcommunities and do not exhibit excessive bias towards any of them. We focus on the development of reward models for this purpose and present a novel reward model training algorithm (SCPO) that can incorporate diverse cultural preferences in a balanced manner. Our method results in performance increases of the minority reward model of up to 7 points over the baseline model across two datasets, PRISM and GlobalOpinionQA, and across 7 countries. SCPO is up to 280% more training data-efficient than full-data finetuning of reward models. In addition, we perform analysis of bias by separately evaluating on the preference of subcommunities and show that excessive bias is mitigated via our weighting method. Our code is available at https://github.com/minsik-ai/Steerable-Cultural-Preference

07.
arXiv (CS.CV) 2026-06-17

Geometry-Consistent Endoscopic Representations for Image-Guided Navigation via Structured Foundation Model Adaptation

Accurate vision-based navigation in monocular endoscopy is difficult due to limited depth cues, weak tissue texture, non-rigid deformation, and substantial appearance variation across domains, all of which complicate pose estimation, depth prediction, and image-to-anatomy alignment. Although recent vision foundation models have shown promise, their learned representations often remain insufficiently geometry-consistent, hindering stable feature correspondence and limiting their reliability for downstream navigation tasks. We propose a unified framework for learning geometry-consistent and domain-robust image representations for monocular endoscopy. The framework combines a synthetic data pipeline that provides accurate geometric supervision with Hierarchy-Aware Geometry-Semantic Adaptation, a structured alternative to standard LoRA that inserts low-rank adapters selectively across the transformer hierarchy and couples them with layer-wise training objectives to encourage geometric correspondence in intermediate features and semantic consistency in deeper features. Experiments on public and proprietary datasets show improved geometric and semantic representation quality, leading to better performance on downstream navigation tasks including pose estimation and monocular depth estimation. The learned representations show favorable synthetic-to-real transfer on clinical bronchoscopy and provide a useful initialization for adaptation to sinus endoscopy and colonoscopy under limited supervision. The framework also shows favorable scaling with model size and training data. These results support hierarchy-aware, geometry-guided adaptation as a practical approach for endoscopic representation learning.

08.
arXiv (CS.CL) 2026-06-11

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention

Activation steering can shift LLM behaviour, but standard evaluations do not typically test whether a sycophancy-reduction direction also suppresses agreement with factually correct statements. We introduce dual-stance evaluation, which tests both stances of each topic, and apply it to centroid-difference steering on Llama-3-8B-Instruct. We find a dissociation: the model represents sycophantic and factual agreement in geometrically distinct subspaces, yet the steering direction projects equally onto both and cannot differentially target either. The direction accordingly reduces agreement with factually correct statements (e.g. that the Earth is round) as well as sycophantic ones. All other static properties of the two activation groups are matched, suggesting the behavioural dissociation arises from generation dynamics or from finer-grained structure that residual-stream analysis cannot resolve. The pattern illustrates a general gap: representations that are readable from activations may not be writable through them.

09.
arXiv (CS.CV) 2026-06-18

Hallucination Detection and Correction in Medical VLMs via Counter-Evidence Verification

Vision-Language models (VLMs) reliability in medical diagnosis is challenged by trust-undermining hallucinations. Existing hallucination detection approaches mainly focus on identifying factual inconsistencies between generated text and reference data. While some studies analyze where models attend in images, they seldom verify whether such attention truly reflects the visual evidence supporting the generated text. To address this gap, we propose Co}unter-Evidence Verification (CoEV), a training-free plug-and-play framework that detects and corrects hallucinations through evidence-based factual consistency verification. CoEV performs bidirectional verification between textual assertions and visual evidence, testing whether each statement is supported by its corresponding evidence region, and assigns each statement into a four-quadrant diagnostic map capturing combinations of text factuality and visual grounding. CoEV detects hallucinated content and serves as a post hoc refinement tool, correcting hallucinations without retraining. Extensive experiments on four medical datasets show that CoEV combats hallucinations in VLMs.For hallucination detection, CoEV consistently outperforms existing methods, improving average PR-AUC and ROC-AUC by 3.0% and 3.9% absolute points respectively, with notable gains of up to 18.5% in specific VQA scenarios. For hallucination correction, it improves Micro-F1 by up to 12.5%, reduces hallucination rates by over 11.9% on medical report generation, and also boosts medical VQA accuracy. These results show that CoEV enables reliable detection and correction of hallucinations, providing clinicians with dependable, evidence-based cues for diagnosis. Code will be released upon acceptance.

10.
arXiv (CS.LG) 2026-06-11

SpAArSIST: Sparsified AASIST for Efficient and Reliable Anti-Spoofing

arXiv:2606.11674v1 Announce Type: cross Abstract: We present SpAArSIST, a deployment-oriented refinement of the widely used AASIST graph pooling backend for self-supervised learning (SSL) based anti-spoofing. Motivated by redundant operations in public implementations, we replace learned pooling and stack-node attention with explicit, lightweight choices: separate train and inference graph pooling ratios $(k_{\mathrm{tr}},k_{\mathrm{inf}})$, magnitude-based node scoring, and mean aggregation of graph nodes. The best overall configuration (rank 1) cuts backend compute by 20.7% (195.045M $\rightarrow$ 154.706M MACs) and model size by 4.1% (611.8k $\rightarrow$ 586.4k params), while improving out-of-domain robustness on In-the-Wild to 2.82% EER and 0.078 minDCF (from 4.64% and 0.133) and remaining competitive on ASVspoof5. We further provide a composite selection score that summarizes accuracy, calibration, and compute to support balanced deployment-oriented model choice.

11.
Nature (Science) 2026-06-10

Hybrid refinery process turns plant material into industrially important chemical

An ingredient of nylon has been made in high yields from lignin — revealing a fresh strategy for turning this complex plant biopolymer into industrial chemicals. An ingredient of nylon has been made in high yields from lignin — revealing a fresh strategy for turning this complex plant biopolymer into industrial chemicals.

12.
arXiv (CS.AI) 2026-06-19

See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View

arXiv:2606.20045v1 Announce Type: cross Abstract: UAV Vision-Language Navigation (UAV-VLN) is typically formulated as a holistic search-and-reach problem, where long-range target discovery and final target approach are optimized and evaluated jointly. This formulation makes it difficult to assess a critical capability of aerial embodied agents, namely whether a UAV can accurately ground a visible target and translate vision-language evidence into precise 3D motion once the target enters its field of view. To address this limitation, we introduce UAV-VLN-FOV, a target-visible navigation task that isolates the see-and-reach stage and enables a more diagnostic evaluation of terminal reaching ability. We further propose 3DG-VLN, a vision-language waypoint prediction framework guided by dynamic 3D direction cues to enhance fine-grained visual grounding and spatial direction alignment for precise target reaching. Specifically, 3DG-VLN adaptively processes high-resolution front-view and downward-view observations to preserve fine-grained visual and geometric details for target grounding. It also updates the target-relative direction online during closed-loop navigation, allowing the agent to maintain spatial alignment with the target and reduce accumulated direction drift. To support this task, we construct a dedicated high-resolution benchmark which contains 2,717 trajectories with target-oriented high-level instructions, high-resolution front-view and downward-view egocentric observations, and continuous 3D waypoint annotations. Experiments show that 3DG-VLN outperforms competitive UAV-VLN baselines, achieving a 13.82\% improvement in success rate. Real-world trials further demonstrate the potential of 3DG-VLN for practical see-and-reach navigation. The source code and benchmark are available at https://github.com/xuefanfu/3DG-VLN.

13.
arXiv (CS.AI) 2026-06-19

Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions

arXiv:2606.20459v1 Announce Type: new Abstract: IVF pregnancy rates are routinely modeled using patient-level variables, while high-resolution laboratory environmental data remain underutilized. We show that this is a missed opportunity. Rather than relying on raw sensor averages, we engineer 55 context-aware temporal features, including rolling thermal stability, simultaneous temperature-humidity adherence, peak stress duration, and post-stress recovery speed, that capture the dynamics of incubator microenvironments. On 61 weeks of data from an Asian IVF clinic, these features reduce cross-validated prediction error to 1.27%, compared to 3-5% for raw averages. We then train a hierarchical Bayesian Beta regression model that shares environmental effects across an Asian and a Northern European clinic via partial pooling, while preserving site-specific baselines. On held-out data from the Northern European clinic, the model achieves R2 = 0.86 and a 64% error reduction for the 35-39 age group over a naive baseline, demonstrating that structured environmental monitoring contains clinically meaningful, transferable signal.

14.
medRxiv (Medicine) 2026-06-11

Computer Vision for Real-Time Anatomical Navigation in Neurosurgery: First-in-Human Clinical Evaluation and Iterative Development (IDEAL Stage 1)

Introduction: Precise anatomical navigation is fundamental to safe endoscopic pituitary surgery, a high-stakes procedure characterised by a challenging learning curve. While traditional navigation systems often rely on workflow-disrupting probes or static preoperative imaging, advancements in computer vision AI (CVAI) now enable dynamic, real-time anatomical segmentation directly from live surgical video1-3. Our group has previously conducted a series of preclinical human-computer interaction studies to refine the system's design, alongside digital and high-fidelity physical simulations demonstrating the benefit of AI assistance in improving overall performance, training, and safety4-8. Building on this foundation, the current study represents a first-in-human application of real-time CVAI assistance in the neurosurgical operating room, serving to assess feasibility and safety, and to iteratively improve the system. Method: Guided by DECIDE-AI and IDEAL frameworks, this single-centre evaluation comprises an initial proof-of-concept phase (n=6) for endoscopic transsphenoidal pituitary surgeries. The AI model utilised a DINOv3-derived vision transformer architecture, deployed via a high-performance edge computing unit to achieve low-latency, real-time inference without reliance on cloud infrastructure2. Given the high-risk nature of the procedure and the early stage of clinical AI integration, the system was initially deployed as an educational adjunct on a secondary monitor, ensuring the primary surgical feed remains uncompromised. Functionality and safety were assessed via structured questionnaire, prospective observation, and blinded retrospective review of the recordings of the endoscopic surgical video feed and wider operating room environment. Continuous multi-stakeholder feedback through validated human factors surveys drove iterative technical refinements between cases. Results: Six patients with pituitary adenomas were enrolled. The CVAI system was successfully deployed in four cases, demonstrating acceptable real-time sella segmentation accuracy. Deployment failed pre-operatively in two cases owing to a single recurring system reboot bug. Iterative refinement between cases were driven by our experience and surgical team feedback. This resulted in the integration of additional anatomical structure segmentations (e.g., carotid arteries), enhanced model accuracy via training dataset expansion, and hardware firmware upgrades. Multi-stakeholder surveys demonstrated satisfactory system feasibility, usability, and acceptability among the surgical team. Both prospective observation and retrospective video review confirmed the absence of adverse events, including no significant distraction to the primary surgeon, and there were no AI-related clinical complications. Conclusion: This first-in-human early clinical evaluation demonstrates the feasibility, safety and iterative development of real-time, CVAI-based anatomical navigation during high-stakes neurosurgery. Future work will include a larger single-centre case series (IDEAL Stage 2a) with more surgical teams to further iterate the system and explore its impact on training and workflow. As the underpinning technology improves, deployment will transition to direct intra-operative decision support and integration with other intra-operative navigational technologies.

15.
arXiv (CS.LG) 2026-06-12

Limits of spectral learning under noise

arXiv:2606.13067v1 Announce Type: new Abstract: Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

16.
Nature (Science) 2026-06-12

Daily briefing: How Venus flytraps snap shut

作者:

Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message. Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message.

17.
arXiv (quant-ph) 2026-06-16

Stronger Entanglement Dies Faster: Quantum Mpemba Effect in Dissipative Qubits

arXiv:2605.23197v3 Announce Type: replace Abstract: In classical thermodynamics, the Mpemba effect refers to the counterintuitive observation that hot water can freeze faster than cold water, manifesting as an anomalous crossing of dynamical trajectories. While analogues of this phenomenon have been explored in open quantum systems and spin-chain entanglement asymmetry, its connection to the finite-time decoupling of quantum correlations remains elusive. In this work, we report a distinct Mpemba effect for quantum entanglement in a dissipative quantum system associated with entanglement sudden death (ESD). By analyzing two qubits interacting with local amplitude damping reservoirs, we demonstrate that a more strongly entangled initial state can experience a faster collapse into a separable state than a more weakly entangled state. This anomalous decay stems from the competition between initial coherence and excited-state population, where the latter acts as a catalyst for ESD. We provide exact analytical derivations for the trajectory crossover and ESD time, and map the phase diagram to precisely identify the parameter regime where the effect occurs. Our results offer a new strategy for controlling the lifetime of quantum resources in dissipative environments.

18.
bioRxiv (Bioinfo) 2026-06-20

Evaluation of Trypanosoma brucei Phosphofructokinase Allosteric Inhibition: An In-Silico Study

Human African trypanosomiasis, caused by a protozoan parasite Trypanosoma brucei, is a neglected tropical disease for which well-tolerated, conveniently administered, and highly efficacious medicines are still missing. Previously, T. brucei Phosphofructokinase was targeted by small-molecule inhibitor development efforts. This approach has shown promise both in vitro and in vivo. In this study, we have used these wet-lab results, evaluated the compounds already characterised by Molecular Dynamics simulations, found relationships between in silico and wet-lab data and used these observations to evaluate compounds that we selected through several different approaches of virtual screens. We observed that inhibitor-ATP interactions are highly predictive of the inhibitory activity. Several compounds selected through virtual screens have outperformed previously characterised compounds.

19.
arXiv (CS.AI) 2026-06-16

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

arXiv:2606.16721v1 Announce Type: new Abstract: Medical diagnosis and treatment are dynamic processes in which patient states evolve over time and clinical interventions alter future outcomes. Although current medical AI can detect disease, estimate risk and generate reports, many systems still return static labels or scores, offering limited insight into how illness may progress or how alternative interventions may reshape its trajectory. Medical world models adapt the world-model idea from artificial intelligence to healthcare by learning internal simulators of patient-state dynamics. Their long-term goal is to help clinicians anticipate deterioration, compare treatment-conditioned futures and tailor care to individual patients. Yet relevant work remains scattered across foundation models, longitudinal modelling, disease simulation, treatment-effect estimation, reinforcement learning and digital twins. To bridge this gap, this review outlines a roadmap for advancing medical AI from isolated diagnosis and prediction toward medical world models that simulate disease evolution and support intervention decisions. This roadmap is organized around three coupled capabilities: patient-state construction, clinical dynamics modelling and intervention decision support. Across representative systems, the comparison highlights what each capability contributes and how partial components can be integrated into more mature perception–dynamics–planning systems. Finally, we identify the challenges involved in turning plausible rollouts into clinically useful simulators. Related literature is available at https://github.com/1999kevin/awesome_medical_world_models.

20.
arXiv (CS.CV) 2026-06-16

SP$^3$: Spherical Priors for Plug-and-Play Restoration

In this paper, we introduce SP$^3$, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP$^3$ approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projection onto the natural image manifold. Alternating this projection with a closed-form data-consistency step, via Half-Quadratic Splitting, achieves stable convergence without requiring gradient computation during inference. This unique formulation unlocks "anytime" restoration capabilities, producing sharp, plausible images from the first iteration. Evaluations across a variety of image restoration tasks demonstrate that SP$^3$ achieves perceptual quality comparable to state-of-the-art zero-shot diffusion and flow methods while being $3$-$630\times$ faster.

21.
arXiv (CS.AI) 2026-06-11

MetaPlate: Counterfactual-Guided RAG-LLM Tool for Personalized Food Recommendation and Hyperglycemia Prevention

arXiv:2606.10120v2 Announce Type: replace-cross Abstract: Postprandial hyperglycemia is a key risk factor for metabolic disorders; however, existing dietary guidance is often static, impractical, and insufficiently personalized, providing recommendations that are difficult to follow or not impactful. While recent advances leverage continuous glucose monitoring (CGM) and machine learning to predict glycemic responses, these approaches are largely predictive and lack actionable guidance. Moreover, recommendation systems are often misaligned with user goals and require extensive input. We present MetaPlate, a counterfactual explanation (CF) guided, context-aware decision-support framework that generates personalized meal recommendations to mitigate postprandial glucose excursions in healthy adults. MetaPlate integrates multimodal data, including CGM readings, wearable-derived physiological signals, and user-provided meal inputs from $25$ individuals to model pre-meal context. A machine learning model predicts glucose response, while a CF optimization module adjusts meal composition modifying macronutrient amounts to maintain glucose levels within a target range ($\leq 140$ mg/dL). An LLM-based retrieval-augmented generation (RAG) layer enhances interpretability by producing human-readable recommendations using constrained search of the USDA food database. We evaluate MetaPlate via a structured expert-in-the-loop assessment with registered dietitians (RDs), comparing performance before and after prompt refinement. Results show improvements in meal realism, portion suitability, and recommendation likelihood, with expert feedback indicating a shift from clinically implausible outputs to actionable, contextually appropriate recommendations. Our findings emphasize the importance of domain knowledge and structured constraints in LLM-driven systems and highlight the potential of MetaPlate as a real-time personalized dietary decision-support tool.

22.
arXiv (CS.CL) 2026-06-12

Reward Modeling for Multi-Agent Orchestration

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating orchestration quality without human annotations. OrchRM leverages intermediate artifacts from multi-agent executions to construct win-lose pairs for Bradley-Terry reward model training. Unlike existing MAS test-time scaling and orchestrator training frameworks that rely on costly sub-agent rollouts, OrchRM operates directly at the orchestration level, enabling efficient and high-performing reward-guided orchestrator training and MAS test-time scaling. OrchRM improves training efficiency by up to 10x in token usage while improving MAS test-time scaling performance by up to 8% in accuracy. These gains consistently transfer across multiple domains, including mathematical reasoning, web-based question answering, and multi-hop reasoning, demonstrating orchestration-level reward modeling as a scalable direction for robust multi-agent orchestration. Code will be available at https://github.com/Wang-ML-Lab/OrchRM.

23.
arXiv (CS.CL) 2026-06-17

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.

24.
arXiv (CS.AI) 2026-06-19

Denoising Implicit Feedback for Cold-start Recommendation

arXiv:2606.19658v1 Announce Type: new Abstract: Implicit feedback is widely used in recommender systems due to its accessibility and generality, yet it usually presents noisy samples (e.g., clickbait, position bias). Meanwhile, recommenders inevitably face the item cold-start problem due to the continuous influx of new items. We identify that cold items are more prone to noisy samples due to the aforementioned factors, and researchers often overlook the significance of denoising implicit feedback for cold items. Previous denoising studies usually identify noisy samples based on heuristic patterns, such as higher loss values, and mitigate noise through sample selection or re-weighting. However, these methods have limited adaptability and are ineffective in cold-start scenarios. To achieve denoising implicit feedback for cold-start recommendation, we propose a model-agnostic denoising method called DIF. First, user preferences for content remain stable, which allows us to infer pseudo-labels indicating whether a user is interested in a cold item through content-similar warm items. Furthermore, to improve pseudo-label accuracy, we model the confidence of pseudo-labels based on the content similarity between the cold item and warm items, and then aggregate multiple pseudo-labels for each sample. Finally, we explicitly estimate the uncertainty of the noisy sample label by considering its relative entropy and the cold-start status of the item, which adaptively guides the role of pseudo-labels to correct the noisy labels at the sample level. DIF's superiority is supported by both theoretical justification and extensive experiments on real-world datasets. The method has been deployed on a billion-user scale short video application Kuaishou and has significantly improved various commercial metrics within cold-start scenarios.

25.
arXiv (CS.LG) 2026-06-19

An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling

arXiv:2606.19770v1 Announce Type: new Abstract: We propose an information-theoretic framework for graph novelty generation, which aims to generate data that are distinct from existing patterns while preserving global structural consistency. Our approach embeds data into a latent space, models the latent distribution using finite mixture models, and generates novel samples by imposing explicit novelty and reliability conditions formulated in terms of description length. Specifically, novelty is enforced by requiring generated samples to be poorly explained by all existing mixture components, while reliability constrains their impact on the overall mixture structure under the Minimum Description Length (MDL) principle. We provide a theoretical analysis showing that, with appropriate threshold choices, the probabilities of misclassifying non-novel or unreliable samples converge to zero with explicit rates. Experiments on synthetic and benchmark graph datasets demonstrate that the proposed method enables principled novelty generation with quantifiable risk.