Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-15

Trimodal Glioma Representation Alignment via Volumetric Contrastive Learning

Glioma grading and survival prediction require the integration of heterogeneous information collected at different spatial and biological scales. Histopathology describes tissue morphology, mRNA expression captures molecular activity, and magnetic resonance imaging provides a non-invasive view of tumor extent and radiological heterogeneity. Existing glioma prognosis models often combine only two of these sources, while their alignment objectives remain mostly pairwise. This paper introduces GLORIA, a novel trimodal framework for GLioma Omics - Radiology - hIstopathology Alignment. GLORIA processes whole-slide image regions, gene-expression profiles, and 3D MRI volumes through modality-specific encoders, projects them into a shared latent space, and aligns them with a Gramian contrastive loss that measures the volume spanned by the three modality embeddings. The aligned representations are fused through a cross-modal gating module and optimized jointly for three-class glioma grading and overall survival prediction. We evaluate GLORIA on a matched TCGA-GBM/LGG and BraTS21 cohort, comprising 132 patients with all three modalities. On the shared trimodal test set, GLORIA improves over the bimodal WSI-mRNA baseline in all the metrics considered.

02.
arXiv (quant-ph) 2026-06-17

Hybrid Acousto-Optical Double Dressing of a Two-Level System

arXiv:2509.25847v2 Announce Type: replace Abstract: We experimentally investigate resonance fluorescence from a two-level system in a novel configuration where a strong laser drives an optical Rabi oscillation while an acoustic field parametrically modulates the frequency of the two-level system. We observe emission spectra that deviate markedly from the standard Mollow triplet, including dynamical cancellation of the central peak. A doubly dressed state model incorporating hybridization among the emitter, optical field, and acoustic field captures these features. Guided by this model, we experimentally validate the condition for optimal cooling of acoustic phonons in an emitter-optomechanical system. These results reveal new regimes of strongly driven quantum nonlinear interactions.

03.
arXiv (CS.AI) 2026-06-11

Conformal Risk-Averse Decision Making with Action Conditional Guarantee

arXiv:2606.05551v2 Announce Type: replace-cross Abstract: Reliable decision making pipelines powered by machine learning models require uncertainty quantification (UQ) methods that come with explicit safety guarantees. Conformal prediction provides such UQ by wrapping ML predictions into prediction sets, and recent work by Kiyani et al. (2025b) established that these sets can be translated into optimal risk-averse decision policies – yet only inheriting marginal safety guarantees. We generalize and strengthen their results by (i) introducing action-conditional conformal prediction, which yields safety guarantees conditioned explicitly on each action taken by the decision maker, (ii) showing that action-conditional prediction sets serve as a proxy for the feasible decision space for risk-averse decision makers aiming to optimize action-conditional value-at-risk, and (iii) proposing a principled finite-sample algorithm based on pinball-loss minimization, connecting the framework of Gibbs et al. (2025) to action-conditional guarantees. Experiments on two real-world datasets confirm that our approach significantly improves action-conditional performance over conformal baselines.

04.
arXiv (quant-ph) 2026-06-12

Exotic critical states as fractional Fermi seas in the one-dimensional Bose gas

arXiv:2602.17656v2 Announce Type: replace-cross Abstract: Critical quantum field theories occupy a central position in modern theoretical physics for their inherent universality stemming from long-range correlations. As an example, the Tomonaga-Luttinger liquid (TLL) describes a wealth of one-dimensional quantum systems at low temperatures. Its behavior is deeply rooted in the emergence of an effective Fermi sea, leading to power-law correlations and Friedel oscillations. A promising direction to realize systems exhibiting novel universal behavior beyond TLL is through the generalization of the underlying Fermi sea. In this Letter, we show that fractional Fermi seas with reduced occupancy arise in an integrable Bose gas driven out of equilibrium by cyclic changes in interactions from repulsive to attractive. The correlation functions feature signatures of criticality incompatible with a conventional TLL, suggesting a novel critical phase. Our predictions, based on Generalized Hydrodynamics, are directly relevant to cold atoms.

05.
arXiv (CS.AI) 2026-06-19

Reward as An Agent for Embodied World Models

arXiv:2606.19990v1 Announce Type: new Abstract: While RL has become a promising tool for refining world models, existing methods largely rely on conservative rollouts near the training distribution, limiting exploration, behavioral diversity, and richer dynamic discovery. In this work, we challenge this conservative paradigm. We argue that the core limitation is not exploration itself, but the lack of reliable verification strategies to support broader exploration. Without reliable verification, expanded exploration becomes highly susceptible to reward hacking, where policies exploit imperfect rewards without achieving genuine improvement. To evaluate this motivation, we instantiate our method in embodied world models, where physical plausibility, and task completion provide a rigorous testbed for scalable RL under complex dynamics. On the verification side, we introduce Reward as an Agent, an agentic reward framework that actively evaluates generated behaviors to provide robust reward signals and mitigate reward hacking under distribution shifts. On the exploration side, we introduce Dynamic-Aware Rollout Diversification through DynDiff-GRPO, which explicitly expands action-space exploration to diversify trajectories, broaden state-action coverage, and encourage richer embodied behaviors beyond conservative rollout regimes. By unifying Reward as an Agent with DynDiff-GRPO, we enable RL on a more reliable reward foundation with substantially diversified sampling, effectively mitigating reward hacking while yielding significant accuracy gains across multiple open-source world models, thereby demonstrating that broader exploration can scale successfully when grounded in robust verification.

06.
bioRxiv (Bioinfo) 2026-06-14

Somatic variant detection in normal tissues from single-cell sequencing data

A crucial advantage of single-cell sequencing (SCS) is its ability to identify somatic variants in individual cells, enabling phylogenetic analysis of cellular populations within bulk tissues. While identifying somatic variants in tumor tissues via SCS has become a common practice, doing so in normal tissues remains challenging due to the rarity of somatic variants in normal cells. To evaluate the feasibility of somatic variant calling from widely available single-nucleus RNA-seq (snRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, we profiled a Cell-line mix of six HapMap samples prepared by the SMaHT consortium using 10x Genomics 5' snRNA-seq (12k cells with 36k mean reads per cell) and snATAC-seq (11k cells with 14k median high-quality fragments per cell) for variant calling. PacBio long-read whole genome sequencing (WGS) data (109x) generated from individual cell lines were used as ground truth. Two computational tools, Monopogen and SComatic, were used for somatic variant calling from the SCS data. Monopogen achieved single nucleotide variant (SNV) detection accuracies of 93.30% in the snRNA-seq and 99.64% in the snATAC-seq data, both of which outperformed SComatic (74.35% and 94.29%, respectively). Monopogen also consistently detected somatic SNVs at cellular fractions as low as 0.5% (2.54% in snRNA and 0.81% in snATAC) in individual samples. Notably, snATAC-seq exhibited higher genomic coverage breadth and larger number of variants detected than snRNA-seq. While the SCS data have lower overall genome coverage than that of the bulk WGS, the single-cell level variant resolution allows Monopogen to assign variants to their cells of origin with over 80% accuracy in both RNA and ATAC modalities, thereby facilitating studies of clonal evolution and cell-type-specific mutagenesis. Other benchmarking methods were also evaluated (DeepVariant, Cellsnp-lite and Mutect2) for comparison. In conclusion, our study demonstrated the feasibility of performing reliable single-cell somatic mutation calling in a cell-line mixture and discussed the strengths and limitations of current computational methods when applied to normal tissues.

07.
arXiv (math.PR) 2026-06-12

Mixing times of one-sided $k$-transposition shuffles

arXiv:2112.05085v2 Announce Type: replace Abstract: We study mixing times of the one-sided $k$-transposition shuffle. We prove that this shuffle mixes relatively slowly, even for $k$ big. Using the recent ``lifting eigenvectors'' technique of Dieker and Saliola and applying the $\ell^2$ bound, we prove different mixing behaviors and explore the occurrence of cutoff depending on $k$.

08.
arXiv (CS.CV) 2026-06-18

GUMP-Net: An interpretable model-data-driven intelligent algorithm for multi-class pelvic segmentation

Pelvic segmentation is one of the most important and fundamental research problems in precise and intelligent diagnosis and treatment, as well as surgical planning and navigation for pelvic fractures. By combining an improved geodesic active contour model with deep neural networks, we propose GUMP-Net, an interpretable model-data-driven intelligent algorithm for multi-class pelvic segmentation, in which three network modules are designed to constitute the overall segmentation framework together: the object detection module for automatic level set initialization, the edge detector module for learning an anatomy-aware edge detector function and the iteration module for deep level set evolution. Leveraging the advantages of level set representation and deep learning, GUMP-Net shows more accurate, robust and consistent segmentation performance, especially in small training data situation, compared to the state-of-the-art methods. Extensive experiments on pelvic datasets demonstrate the rationality and effectiveness of the proposed algorithm. Further experiments extended to ankle dataset indicate broader applications to other anatomies. The proposed algorithm not only provides an efficient segmentation method for complex fracture reduction, but also gives an interpretable geometric perspective for understanding deep learning segmentation.

09.
medRxiv (Medicine) 2026-06-16

Validating an Early Pregnancy HbA1c as the Screening Test for Gestational Diabetes Mellitus: Findings from PRISMA Pakistan Cohort

Background: Early identification of gestational diabetes mellitus (GDM) is critical to improving maternal and neonatal outcomes, particularly in resource-constrained settings where universal oral glucose tolerance testing (OGTT) is burdensome. We assessed whether early-pregnancy HbA1c alone or combined with common risk factors can predict GDM and reduce the burden of OGTT requirements in a peri-urban cohort in Karachi, Pakistan. Methods: We conducted a secondary analysis of the Pregnancy Risk Infant Surveillance and Measurement Alliance (PRISMA) Pakistan cohort. Women enrolled before 20 weeks' gestation with available early-pregnancy HbA1c and a 2-hour 75g OGTT at 24 to 28 weeks were included. We externally validated GDM prediction models originally developed in the STRiDE-India cohort. Model performance was evaluated using receiver operating characteristic (ROC) curves and area under the curve (AUC). We assessed four models: HbA1c alone (Model 1a); age, BMI, and family history of diabetes mellitus (FH DM) (Model 1b); HbA1c combined with age, BMI, and FH DM (Model 2); and an extended model, i.e., Model 2 combined with socioeconomic status, gestational age, parity, systolic and diastolic blood pressure (Model 3). A dual-threshold approach was applied to assess rule-in and rule-out performance. Results: Among 2,489 women, GDM incidence was 7.5% (n=186). Models with a broader set of predictors demonstrated higher AUC values, with Model 2 achieving an AUC of 0.61 (95% CI: 0.57, 0.66). Including additional factors (Model 3) did not further improve predictive ability (AUC: 0.62; 95% CI: 0.58, 0.66). In addition, at predefined thresholds, Model 2 achieved sensitivity of 73.7% (rule-out) and specificity of 83.5% (rule-in), with the potential to reduce OGTT requirements (58.5%). Conclusions: Early-pregnancy risk stratification using HbA1c combined with simple clinical predictors offers a pragmatic approach to streamline GDM screening among high-risk pregnant women. A dual-threshold strategy using Model 2 could reduce reliance on universal OGTT while prioritizing high-risk women for confirmatory testing.

10.
arXiv (CS.AI) 2026-06-16

How Much Do Reviews Really Contribute? A Study on Text-Enriched Matrix Factorization for Recommendations

arXiv:2606.16973v1 Announce Type: cross Abstract: Incorporating textual reviews into a Recommender System has become a prominent strategy for enriching collaborative signals with semantic information. However, the actual contribution of review-derived representations remains an open question, particularly when strong collaborative baselines are employed. In this work, we systematically investigate the impact of textual information on Matrix Factorization by introducing and comparing three enrichment strategies over a common collaborative backbone. First, we propose a learnable gating mechanism that adaptively balances collaborative and textual signals during training. This mechanism is applied to two distinct review representations: (i) aggregated topic profiles extracted from user and item histories, and (ii) full text embedding representations derived from reviews. Additionally, we explore a cross-attention mechanism that identifies and emphasizes the most informative dimensions of the textual representation before fusion with collaborative factors. We evaluate six variants: pure, enriched with topic profiles and text via gating; enriched with topics and text via gating; and enhanced with cross-attention over textual features. Experiments across multiple review-based datasets reveal that although adaptive fusion mechanisms improve representation flexibility, the marginal contribution of textual signals remains limited compared to the collaborative backbone. These findings suggest that, under typical rating-prediction settings, collaborative information continues to dominate performance, raising important considerations for the effective integration of semantic review signals into recommendation models.

11.
arXiv (CS.CV) 2026-06-12

DuET: Dual Expert Trajectories for Diffusion Image Editing

Recent diffusion editors perform diverse instruction-based edits while conditioning on the source image at every denoising step. Yet persistent source-image conditioning can limit how fully an edit is executed and how natural the result appears, especially when the target scene diverges substantially from the input. We introduce DuET (Dual Expert Trajectories), a training-free inference method that temporarily relaxes source-image conditioning by transitioning through a text-to-image phase before returning to edit mode, allowing the denoising trajectory to move toward the target distribution while retaining the structural benefits of image-conditioned editing. Without modifying model weights or increasing sampling cost, DuET consistently improves instruction relevance, semantic fidelity, and perceptual quality across diverse models and benchmarks. In some cases, these gains come with a modest reduction in source-image preservation, revealing a predictable trade-off between source preservation and edit fidelity.

12.
arXiv (CS.LG) 2026-06-11

MASK: Multi-Agent Semantic K-Scheduling for Risk-Sensitive 6G Robotics

arXiv:2606.11249v1 Announce Type: cross Abstract: Realizing the vision of 6G connected robotics requires reconciling high-performance collaborative control with the rigid spectral limitations of physical wireless channels. In realistic collaborative sensing scenarios, spectral resources are quantized into finite physical resource blocks or orthogonal subcarriers, rendering simultaneous transmission by all agents infeasible. To address this, we propose Multi-Agent Semantic K-Scheduling (MASK), a control architecture designed to sustain robust, risk-aware coordination under strict instantaneous bandwidth caps. We introduce Arbiter-Assisted Semantic Information Gating (A-SIG), a lightweight coordination mechanism that enforces hard access constraints by scheduling only the top-K agents based on locally computed semantic importance scores. By aggregating these prioritized observations into a compact latent state, a self-supervised global encoder enables a distributional policy to mitigate tail risks despite data sparsity. We evaluate MASK across diverse benchmarks, demonstrating that it matches the performance of communication-unconstrained baselines even when channel access is restricted to a small fraction of the swarm size. Furthermore, the framework exhibits inherent resilience to packet erasures, validating semantic scheduling as a critical enabler for resource-constrained 6G systems.

13.
arXiv (CS.CV) 2026-06-24

A Dual Edge Spatial Jacobian Image Graph for Interpretable Diabetic Retinopathy Grading

Automated diabetic retinopathy (DR) grading from colour fundus photographs can achieve strong predictive performance, but clinical interpretation requires more than an image-level label. It requires understanding how lesion evidence is distributed around retinal vessels and how this evidence relates to quantitative vascular biomarkers. We present a dual-edge spatial-Jacobian image graph for interpretable DR grading. Each fundus image is represented as a graph node with four aligned evidence streams: AutoMorph vessel information ($X_1$), DR-XAI-style lesion evidence maps ($X_2$), a 128-dimensional lesion-based contrastive image embedding ($X_3$), and AutoMorph morphometric biomarkers ($X_4$). The spatial edge branch ($X_{12}$) encodes vessel-lesion geometry, while the Jacobian branch ($X_{34}$) models embedding-biomarker sensitivity. Lightweight two-token attention fuses both edge families into a final image graph. On 2,910 matched non-augmented APTOS images, the full graph achieves 0.8076 accuracy, 0.8312 quadratic weighted kappa, 0.5915 macro-F1, and 0.9330 adjacent-grade accuracy; referable DR reaches 0.9055 accuracy and 0.9711 AUROC. The framework is positioned as an explainable representation-learning tool for lesion-biomarker hypothesis generation, rather than as a deployment-ready clinical classifier. The code is available at https://github.com/Inamullah-Colab/dual-edge-dr-graph-xai.

14.
arXiv (CS.CL) 2026-06-11

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

Authors:

User-side memory in LLMs is typically scored as a single "personalization" capability: given a user's history, is the output more user-aware? We show this aggregate metric hides opposite-direction failures. Memory factorises into at least three orthogonal axes – behavioral consistency (style, voice), factual presence (recall facts in history), and factual absence (abstain when a fact is absent) – and no single substrate wins all three. Comparing per-user gamma-LoRA (a small LoRA adapter trained on each user's history; gamma denotes per-user, not per-task) against BGE-large dense top-K retrieval on a controlled 50-user synthetic corpus and a real-data probe (LaMP-3), we find gamma-LoRA decisively wins behavioral style while RAG decisively wins factual absence – and the same query-projection cells in attention layers 21-35 causally load-bear both effects in opposite directions (zeroing those LoRA weights raises absence-probe TPR by +33 pp and drops presence-probe TPR by 20 pp). On the more heavily RLHF-tuned Llama-3.1-8B-Instruct the asymmetry strengthens, not heals: parametric memory's behavioral advantage collapses while its absence-calibration deficit against retrieval widens – an alignment tax on parametric user-memory. On real-data LaMP-3, gamma-LoRA underperforms a majority baseline; a 9-condition mitigation sweep diagnoses this as instruction-following collapse, not substrate failure (a 9x2 cross-product shows the eval-time {1..5} logit mask drives main_acc to >=0.995 on every recipe), and the best training-time fix replicates bit-identically on Llama. Finally, substrate-selection routing is question-classification, not calibration: a 110M DistilBERT on the question text alone beats every logit-based router. We contribute the diagnostic framework, the diagnosed real-data negative, the alignment-tax replication, and the routing-as-classification finding.

15.
arXiv (quant-ph) 2026-06-11

Tight Bounds for Quantum Phase Estimation and Related Problems

arXiv:2305.04908v3 Announce Type: replace Abstract: Phase estimation, due to Kitaev [arXiv'95], is one of the most fundamental subroutines in quantum computing. In the basic scenario, one is given black-box access to a unitary $U$, and an eigenstate $\lvert \psi \rangle$ of $U$ with unknown eigenvalue $e^{i\theta}$, and the task is to estimate the eigenphase $\theta$ within $\pm\delta$, with high probability. The cost of an algorithm for us is the number of applications of $U$ and $U^{-1}$. We tightly characterize the cost of several variants of phase estimation where we are no longer given an eigenstate, but are required to estimate the maximum eigenphase of $U$, aided by advice in the form of states (or a unitary preparing those states) which are promised to have at least a certain overlap $\gamma$ with the top eigenspace. We give algorithms and nearly matching lower bounds for all ranges of parameters. We show that a small number of copies of the advice state (or of an advice-preparing unitary) are not significantly better than having no advice at all. We also show that having lots of advice (applications of the advice-preparing unitary) does not significantly reduce cost, and neither does knowledge of the eigenbasis of $U$. We immediately obtain a lower bound on the complexity of the Unitary recurrence time problem, resolving an open question of She and Yuen~[ITCS'23]. Lastly, we study how efficiently one can reduce the error probability in the basic phase-estimation scenario. We show that a phase-estimation algorithm with precision $\delta$ and error probability $\epsilon$ has cost $\Omega\left(\frac{1}{\delta}\log\frac{1}{\epsilon}\right)$, matching an easy upper bound. This contrasts with some other scenarios in quantum computing (e.g., search) where error-probability reduction costs only a factor $O(\sqrt{\log(1/\epsilon)})$. Our lower bound uses a variant of the polynomial method with trigonometric polynomials.

16.
arXiv (CS.LG) 2026-06-15

Utility-Constrained Policy Optimization

arXiv:2606.14029v1 Announce Type: new Abstract: Constrained MDPs (CMDPs) are a widely adopted framework for incorporating safety into RL agents; however, the framework does not support risk-sensitive constraints. This can be problematic: For example, CMDPs allow for optimal solutions that, in order to satisfy the risk-neutral constraints, mix infrequent catastrophic behaviors and frequent, overly conservative ones. Moreover, prior empirical results suggest that enforcing stricter, risk-sensitive constraints can improve performance even under risk-neutral evaluation. The natural framework to incorporate risk-sensitive constraints is utility-constrained MDPs (UCMDPs), but no practical solutions for this problem existed. In this work, we introduce a simple yet powerful methodology for UCMDPs and constrained RL. Besides allowing for risk-sensitive constraints, our framework does not require us to fix constraint limits in advance of training the agent, provided that a sensible range is known. This increases policy flexibility and, in practice, allows for adjustments to these limits at no extra training cost. Besides benefiting from the generality of the framework, our agent shows strong performance in practice, consistently matching or outperforming existing baselines in several Safety Gymnasium benchmark tasks.

17.
arXiv (CS.LG) 2026-06-16

Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks

arXiv:2605.26290v2 Announce Type: replace Abstract: Temporal signed networks (TSNs) model the time evolution of cooperative and adversarial relationships that arise in applications such as social media analysis, trust and reputation systems, and financial transaction networks. While graph neural networks (GNNs) perform well for static or unsigned link prediction, effective learning in temporal signed graphs remains challenging due to the interaction of signed relations, evolving structure, and balance-theoretic constraints. To address this gap, we propose a modular temporal enhancement framework for signed GNNs that integrates historical context into otherwise static architectures. The framework introduces a Historical Context Integration Module (HCIM) that combines learnable recency-aware temporal weighting, LSTM-based embedding trajectory modeling, and multi-head temporal attention to capture both short- and long-term signed interaction dynamics. Historical information is fused with current node representations using either global or node-adaptive weighting, allowing the architecture-agnostic framework to accommodate heterogeneous temporal behaviors. We instantiate the approach on the Self-Explainable Signed Graph Transformer (SE-SGformer), preserving interpretability while extending it with temporal awareness. Experiments on real-world and synthetic TSNs, including Bitcoin OTC, Bitcoin Alpha, Reddit, and small-world network models, demonstrate consistent and statistically significant improvements over the static baseline.

18.
arXiv (CS.CV) 2026-06-16

Multi-view feature High-order Fusion for Space Weak Object Detection and Segmentation

Weak objects are common in images and videos of space applications. However, it is hard to learn proper representations from their limited appearance information. Inspired by multi-view learning, we develop simple multi-view attentions, treating their outputs as multi-view features. We also propose a multi-view feature high-order fusion method (MHF) to aggregate more accurate and richer features of weak objects. Our MHF extends the commonly used low-order feature fusion method to higher orders. It enhances the model's capacity to capture relevant and complementary information about weak objects. This is achieved by introducing high-order multi-view features perception and a recursive task-contribution gated selection of multi-view features. The new operation is highly flexible and customizable. It is compatible with various variants of multi-view feature representations. We conduct extensive experiments on two newly constructed space science datasets and an open, large-scale satellite video dataset. Our MHF serves as a plug-and-play module and significantly improves various vision transformers and convolution-based detection and segmentation models. We achieve all state-of-the-art accuracies on both tasks across three datasets. Our MHF can be a new basic module for visual modeling that effectively represents weak objects in terms of multi-view learning. The code will be available at https://github.com/Kingdroper/MHF.

19.
arXiv (CS.CL) 2026-06-19

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording of the prompt. While literature on prompt engineering is expanding, few studies focus on classification tasks, and even fewer address domains like psychology, where constructs have precise, theory-driven definitions that may not be well represented in pre-training data. We present an empirical framework for optimizing LLM performance for identifying constructs in texts via prompt engineering. We experimentally evaluate five prompting strategies – codebook-guided empirical prompt selection, automatic prompt engineering, persona prompting, chain-of-thought reasoning, and explanatory prompting - with zero-shot and few-shot classification. We find that persona, chain-of-thought, and explanations do not fully address performance loss accompanying a badly worded prompt. Instead, the most influential features of a prompt are the construct definition, task framing, and, to a lesser extent, the examples provided. Across three constructs and two models, the classifications most aligned with expert judgments resulted from a few-shot prompt combining codebook-guided empirical prompt selection with automatic prompt engineering. Based on our findings, we recommend that researchers generate and evaluate as many prompt variants as feasible, whether human-crafted, automatically generated, or ideally both, and select prompts and examples based on empirical performance in a training dataset, validating the final approach in a holdout set. This procedure offers a practical, systematic, and theory-driven method for optimizing LLM prompts in settings where alignment with expert judgment is critical.

20.
arXiv (CS.AI) 2026-06-12

A Minimal Model of Bounded Trade-Off Screening in Multi-Attribute Choice

arXiv:2606.13201v1 Announce Type: new Abstract: Human decision-making often involves choosing between multi-attribute alternatives, yet classical models assume fully compensatory utility aggregation despite evidence that people reject options with poor performance on critical attributes. We propose a bounded trade-off reasoning framework in which decisions are governed by a screening process that evaluates the balance between gains and losses across attributes. The model introduces a trade-off tolerance parameter that controls acceptable imbalance and can vary across contexts. Through simulation, we show that this mechanism produces preference patterns that differ from standard utility-based models and captures context-dependent variation in trade-off behavior. These results establish bounded trade-off screening as a plausible computational mechanism for multi-attribute choice and generate testable predictions for future behavioral studies.

21.
arXiv (CS.CL) 2026-06-18

RedactionBench

Large Language Models are increasingly applied to sensitive domains that require redaction of personally identifiable information (PII). While redacting PII is a data cleaning prerequisite, existing benchmarks conflate extraction mechanics with privacy semantics. A public phone number is not equivalent to a phone number in a medical record. Whether information constitutes a violation depends heavily on who holds it, why, and in what context, fundamentally differentiating redaction from simple entity recognition. Grounded in contextual integrity, we introduce RedactionBench, a manually annotated benchmark comprising 200 diverse documents across 11 domains, mostly seeded from real-world sources. We also introduce R-Score, a novel character-level metric that treats semantically similar redactions equally and nullifies shallow formatting choices, such as varying masking styles for phone numbers. Evaluations across Named Entity Recognition models, entity extraction Small Language Models, and frontier models equipped with agentic tools demonstrate that contextual redaction remains an unsolved problem. A human evaluation with over 80 users on RedactionBench reveals a stark dichotomy in privacy perceptions. Annotators show consensus with target labels for mandatory redactions (89.4 percent) and safe text preservations (94.1 percent), but fail to agree on contextual redactions (47.7 percent). This variance demonstrates the subjective nature of contextual privacy and motivates R-Score, which decouples contextual ambiguity from strict precision. We compare 35 models across families and report their performance in redacting PII. Finally, we release RedactionBench to establish a baseline for future privacy-preserving systems, hoping to inspire efficient model design and standardized evaluations.

22.
arXiv (CS.LG) 2026-06-11

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

arXiv:2606.11798v1 Announce Type: cross Abstract: In this paper, we develop a continuous-time model-free reinforcement learning algorithm to learn deterministic equilibrium policies in general time-inconsistent control problems. Utilizing the extended Hamilton-Jacobi-Bellman system, we recast the original time-inconsistent problem into an equivalent two-stage problem. In the first stage, for given auxiliary functions, we employ the deterministic policy gradient approach to learn an optimal policy in an auxiliary time-consistent control problem. In the second stage, given the updated policy, we exploit the inner fixed point iterations and some martingale characterizations to learn the auxiliary functions. As a theoretical contribution, we provide some mild model assumptions and establish the convergence of inner fixed point iterations. By repeating this actor-critic style of iterations across two stages, our algorithm aims to learn the equilibrium under different sources of time-inconsistency in a unified manner. The superior effectiveness of the proposed algorithm are illustrated in two classical financial applications with time-inconsistency: mean-variance portfolio management and optimal tracking portfolio under non-exponential discounting.

23.
Nature Medicine 2026-06-17

General-purpose chatbots outperform clinical AI tools on physicians’ real-world questions

Authors: Unknown Author

Specialized clinical AI tools are entering medical practice with little independent testing. In a head-to-head evaluation across two public benchmarks and real questions from physicians, three general-purpose frontier large language models outperformed two leading clinical AI tools, which performed no better than Google search AI overview.

24.
arXiv (CS.CV) 2026-06-17

GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

Generalist vision-language-action systems need object-centric 3D evidence and reusable manipulation experience to plan reliable robot trajectories. GeneralVLA provides a hierarchical interface for converting language and RGB-D observations into 3D end-effector paths, but two bottlenecks remain. First, monocular SAM3D-style object reconstruction can hallucinate pose and unseen geometry, while manipulation benefits from stable object shape when calibrated multi-view observations are available. Second, the original KnowledgeBank mainly retrieves semantically similar snippets and appends new knowledge, which makes it difficult to control memory quality, conflicts, confidence, and geometric relevance. To address the first challenge, we introduce GeoFuse-MV3D, a geometry-prior-guided MV-SAM3D reconstruction branch that verifies external geometry cues with input-view masks, applies soft visual-hull support, performs axis-wise refinement, and fuses only geometry while preserving appearance. To address the second challenge, we upgrade KnowledgeBank into a governed long-term memory system with explicit quality, confidence, lifecycle, verifier, and conflict metadata, together with precision-oriented retrieval. Finally, we evaluate the reconstruction branch on GSO-30 and the memory module on Terminal-Bench 2.0 and SWE-Bench Verified; GeoFuse-MV3D improves over the MV-SAM3D baseline by reducing CD and LPIPS by 2.20% and 2.02% while increasing PSNR and SSIM by 2.36% and 1.03%, and KnowledgeBank improves over ReasoningBank by 4.53% on Terminal-Bench SR and 3.73% on SWE-Bench resolve rate, while reducing AS by 4.95% and 5.65%, respectively. Code: https://github.com/AIGeeksGroup/GeneralVLA-2. Website: https://aigeeksgroup.github.io/GeneralVLA-2.

25.
arXiv (CS.CV) 2026-06-19

SA-VIS: Sparse frame Annotations for training Video Instance Segmentation

Recent online video instance segmentation (VIS) methods have achieved impressive results, thus becoming the preferred approach to segment instances in videos. Despite the resurgence of impressive single image models, the online (or semi-online) VIS approaches outperform single-image models (e.g., based on SAM) by using long sequences of densely annotated frames during training. However,such a training setup of VIS is expensive in the sense of compute as well as dense annotations required. In order to solve these major flaws, we argue that the effective modeling of the instances and their evolution in videos do not require densely annotated frames. To that end, we propose a simple and effective module, called Past-frames Feature Propagation (PFP) which aggregates low-dimensional features from the image encoder of multiple frames. This simple low-compute module provides tremendous learning capability in using sparse video frame labels for end-to-end training. Combined with a light-weight frame-specific Instance Queries, our Sparse frame Annotation VIS (SA-VIS) significantly improves performance over its baseline. Most interestingly, our simple design that avoids complexities effectively bridges the gap in accuracy between training on sparsely and densely annotated video sequences. This translates to a mere 0.4% drop in performance of SA-VIS when using annotations for only 1/5 of the images in the dataset. Empirically, SA-VIS shows strong improvements over the baseline on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS) and an over 1% improvement in AP on the state-of-the-art in a limited annotations scenario.