Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

Language Model Circuits Are Sparse in the Neuron Basis

The high-level concepts that a neural network uses to perform computation need not be aligned to individual neurons (Smolensky, 1986). Language model interpretability research has thus turned to techniques which decompose the neuron basis into more interpretable units of model computation, such as sparse autoencoders (SAEs). However, not all neuron-based representations are uninterpretable. For the first time, we empirically show that MLP neurons are as sparse a feature basis as SAEs. We use this finding to develop an end-to-end gradient-based attribution pipeline for circuit tracing on the MLP neuron basis, which surfaces causally effective neurons on a variety of tasks. On a standard subject-verb agreement benchmark (Marks et al., 2025), a circuit of $\approx 10^2$ MLP neurons is enough to control model behaviour. On the multi-hop city-state-capital task from (Lindsey et al., 2025), we find a circuit in which small sets of neurons encode specific latent reasoning steps (e.g. mapping a city to its state), and can be steered to change the model's output. This work thus advances automated interpretability of language models without imposing additional training costs.

02.
arXiv (CS.LG) 2026-06-17

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

arXiv:2606.06227v2 Announce Type: replace-cross Abstract: A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

03.
arXiv (CS.CL) 2026-06-16

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

With the rapid progress of speech language models (SLMs), discrete speech tokens have emerged as a core interface between speech and text, enabling unified modeling across modalities. Recent speech tokenization approaches aim to isolate semantic information from low-level acoustics to better align with language models (LMs). In particular, previous methods use self-supervised learning (SSL) teachers such as HuBERT to extract semantic representations, which are then distilled into a semantic quantizer to suppress acoustic redundancy as well as capture content-related latent structures. However, these tokenizers often operate at relatively high frame rates, producing token sequences significantly longer than their textual counterparts and hindering seamless integration with pretrained LMs. Although recent methods attempt to reduce the token rate by applying uniform average pooling to SSL features, this can over-smooth content-bearing regions and dilute the structural information, thereby potentially limiting the LM alignment. To address this, we propose LM-SPT, an LM-aligned speech tokenization method based on semantic speech-resynthesis distillation. Instead of directly matching teacher and student features via pooling, LM-SPT resynthesizes speech from semantic tokens only and minimizes the discrepancy between representations extracted from the original and resynthesized waveforms using a frozen, LM-aligned speech encoder. This indirect supervision avoids rigid temporal alignment and encourages dedicated semantic units that are more semantically aligned with LMs under reduced frame rates. Experimental results show that the proposed LM-SPT consistently outperforms previous semantic-enhanced speech tokenizers when applied to SLMs for the tasks of automatic speech recognition and text-to-speech, even without compromising the speech reconstruction fidelity at the codec level.

04.
arXiv (CS.CV) 2026-06-24

EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation

Sound effects build an essential layer of multimodal storytelling, shaping the emotional atmosphere and the narrative semantics of videos. Despite recent advancement in video-text-to-audio (VT2A), the current formulation faces three key limitations: First, an imbalance between visual and textual conditioning that leads to visual dominance; Second, the absence of a concrete definition for fine-grained controllable generation; Third, weak instruction understanding and following, as existing datasets rely on brief categorical tags. To address these limitations, we introduce EchoFoley, a new task designed for video-grounded sound generation with both event level local control and hierarchical semantic control. Our symbolic representation for sounding events specifies when, what, and how each sound is produced within a video or instruction, enabling fine-grained controls like sound generation, insertion, and editing. To support this task, we construct EchoFoley-6k, a large-scale, expert-curated benchmark containing over 6,000 video-instruction-annotation triplets. Building upon this foundation, we propose EchoVidia a sounding-event-centric agentic generation framework with slow-fast thinking strategy. Experiments show that EchoVidia surpasses recent VT2A models by 40.7% in controllability and 12.5% in perceptual quality.

05.
arXiv (CS.CL) 2026-06-18

Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science

Research methods are essential carriers of knowledge contribution in academic papers. Automatic multi-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full-text content faces challenges related to excessive length and information redundancy. Therefore, this paper proposes a segment combination strategy by partitioning the full-text content according to its physical postion. Using an annotated corpus of 1,954 full-text articles from three representative journals in Library and Information Science (JASIST, LISR, and JDoc), we evaluate the classification performance of various segments and their combinations across multiple models. Experimental results indicate that methodological information is distributed unevenly within the full-text content, with the middle-to-late and final segments exhibiting greater discriminative power. Furthermore, integrating bibliographic metadata with cross-segment combination strategies effectively enhances classification performance.

06.
medRxiv (Medicine) 2026-06-19

The Impact of Pregnant Womens Dietary Behavior on the Physiological Adaptation Paradox and Maternal-Fetal Resource Conflict in Conflict Settings: A Predictive Analytical Study

This scientific study aims to assess the level of awareness, nutritional knowledge, and actual behavioral practices among pregnant women in the Capital District of Sanaa, Republic of Yemen, and to determine their impact on the health and clinical indicators of the mother and fetus under complex conflict conditions. The study employed a descriptive-analytical approach based on a simple random sample of 200 pregnant women attending government-run hospitals and specialized medical centers in the Capital District. Field data were collected during December 2025 using a structured and validated questionnaire consisting of 42 items measuring demographic variables, awareness, practices, barriers, and health outcomes. The results of the statistical analysis using SPSS software showed a high level of nutritional awareness (87%) and healthy dietary practices (80%) among the sample participants. Simple and multiple linear regression tests revealed a statistically significant effect of awareness and practices in explaining 20.2% of the variance in the health status of the mother and fetus (R{superscript 2}= 0.204, p < 0.001). The study demonstrated that actual behavioral practices have greater predictive power ({beta}=0.316, p=0.001) compared to theoretical cognitive awareness ({beta}=0.232, p=0.005) in determining clinical outcomes for the mother and fetus, highlighting the widening gap between knowledge and behavior under structural pressures. "Morning sickness" (80%) and the deterioration of "family economic status" (71%) emerged as the greatest physiological and material barriers to proper nutrition. With their inferential impact established as an extension of the maternal-fetal resource allocation conflict in a physiologically and economically challenging environment, the study also identified significant differences in nutritional behavior and health outcomes in favor of housewives and mothers who are more educated and have higher incomes, while no significant differences were recorded attributable to obstetric variables such as stage or order of pregnancy. The study offers a unique theoretical and practical contribution by formulating an integrated causal model that demonstrates that the fetus acts as a biological drain on the mothers cellular and mineral reserves in a war environment, which necessitates directing antenatal care and support programs toward effective behavioral empowerment and nutritional support to overcome the structural and material barriers faced by pregnant women.

07.
arXiv (CS.AI) 2026-06-12

Humor Style Drives Laughter, Topic Shapes Acceptability: Evaluating Bilingual Personal and Political Robot-Delivered AI Jokes

arXiv:2606.13256v1 Announce Type: cross Abstract: Humor plays a central role in human social relationships, and recent advances in computational humor create new opportunities for integrating humor into human-robot interaction (HRI). While large language models (LLMs) can generate diverse forms of humor, it remains unclear how humor style, joke content, and language preference shape perceptions of robot-delivered humor in group settings. In this exploratory study, we employed a mixed factorial design in which participants evaluated AI-generated jokes delivered by a robot in a university classroom. We examined the effects of humor type (Affiliative, Self-Enhancing, Aggressive, Self-Defeating) and joke content (person-related vs. political) on perceived funniness and appropriateness, as well as preferred language. Results show that humor type significantly influences funniness, with Aggressive and Affiliative humor rated higher, while joke content primarily affects appropriateness, with person-related jokes preferred over political ones. Language preference was shaped by both joke content and participants' self-reported fluency and humor practices.

08.
arXiv (CS.LG) 2026-06-24

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

arXiv:2603.08287v2 Announce Type: replace-cross Abstract: We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is a heuristic for decision-making under uncertainty that has been used to develop successful algorithms for a variety of continuous control problems. However, theoretical work on GP-PSRL is limited. All known regret bounds either have a sub-optimal growth rate, require strong smoothness assumptions, or fail to properly account for the fact that the set of possible system states is unbounded. Through a recursive application of the Borell-Tsirelson-Ibragimov-Sudakov inequality, we show that, with high probability, the states actually visited by the algorithm are contained within a ball of near-constant radius. We then use the chaining method to control the regret suffered by GP-PSRL under weak smoothness conditions. Our main result is a Bayesian regret bound of the order $\widetilde{\mathcal{O}}(H\sqrt{\gamma_TT})$, where $H$ is the horizon, $T$ is the number of time steps and $\gamma_T$ is the expected information gain. With this result, we resolve the limitations with prior theoretical work on PSRL, and provide the theoretical foundation and tools for analyzing PSRL in complex settings.

09.
arXiv (quant-ph) 2026-06-15

Multi-entropy in random tensor networks

arXiv:2606.04470v2 Announce Type: replace-cross Abstract: We study the evaluation of Rényi multi-entropies $S^{(q)}_n$ in Random Tensor Network (RTN) states in the large bond-dimension limit. For the case of Rényi index $n=2$ and arbitrary number of parties $q$, we prove that that multi-entropies are determined by minimal multiway cuts through the network. When the minimal multiway cut is degenerate, we characterize the full minimizer set via compatible families of minimal cuts and give a criterion for all minimizers to come from ordinary cut partitions. For $n=2$, this gives a natural generalization of the minimal cut description of bipartite entanglement to multipartite systems with arbitrarily many parties. For the case of integer $n>2$, we show that the minimal multiway cut conjecture is in general not true by providing explicit counter examples for both the single random tensor and for the network built from isometric tilings. We discuss the implication for our results on the multipartite entanglement structures in RTN and holography.

10.
medRxiv (Medicine) 2026-06-16

Adherence to Red Reflex and Vision Screening Recommendations: A Deep Dive into Primary Care Implementation Gaps

Introduction: Early childhood vision screening is critical for detecting amblyopia and other vision-threatening conditions. Despite screening recommendations during well-child visits, rates remain low. Red reflex assessment is recommended to identify serious ocular pathology, yet its use in primary care is not well described. We examined rates and drivers of vision screening in pediatric primary care. Methods: We conducted a retrospective review of electronic health records for children 3 to 5 years attending well-child visits in 2022 in one of three representative primary care clinics within a university health system. Outcomes were documented red reflex and functional vision tests. We evaluated associations with patient demographics and clinic site using multivariable logistic regression Results: Among 1,003 visits, 21.1% (n=212) had a documented red reflex assessment, and 60.8% (n=610) a functional vision test. Younger children (ages 3 and 4 vs. 5 years) had higher odds of red reflex assessment [adjusted odds ratio (aOR) 9.00 and 8.64], and lower odds of a functional vision (aOR 0.47 and 0.59) test. Females had higher odds of red reflex assessment (aOR 1.53). Other/Multiracial children had lower odds of red reflex assessment than Non-Hispanic White children (aOR 0.48). Screening rates varied significantly by clinic site Conclusions: Visual function and red reflex assessment are inconsistently performed in pediatric primary care, with particularly low rates of red reflex documentation. Screening rates varied between clinics and were affected by age. These findings highlight missed opportunities for early detection of vision-threatening conditions and identify targets for improving adherence to pediatric vision screening recommendations

11.
medRxiv (Medicine) 2026-06-10

Human-centred design approaches to health facility design: Evidence from perinatal care settings in Ethiopia and Bangladesh

While significant progress has been made in perinatal outcomes over recent decades in low- and middle-income countries (LMICs), maternal and newborn quality improvement initiatives often fail to account for the spatial conditions in which they are implemented. Health systems are increasingly deploying evidence-based care models into built environments that are not optimally structured to meet the needs of its patient population. As the principal users, patients and health care workers can offer pragmatic insights about improving these structural designs. Our objective was to gather insights from patients, providers, and companions about how the physical design of their health facilities influenced their experience receiving or delivering perinatal care. We conducted a prospective observational study using a human-centred design (HCD) approach to analyse perceptions of the quality of perinatal care across two low resource settings: Ethiopia and Bangladesh. Using engagement and assessment tools, we conducted interviews, focus groups, facility walk-throughs, co-design workshops, and infrastructural assessments with patients, companions, providers, and Ministry of Health representatives. Descriptive statistics and thematic analysis were used to identify key learnings and develop recommendations. Across both countries, participants identified the need for facility layouts that better support privacy, mobility during labour, alternative birth positions, companion involvement, cultural and religious practices, sanitation, and provider visibility. Based on these insights, we developed six recommendations to better align health facility infrastructure with maternal and newborn care delivery needs. Our findings suggest that investments in health facility infrastructure may improve care experiences and help enable respectful, safe, and evidence-based maternal and newborn care. Alongside targeted spatial improvements, government authorities responsible for health facility planning should incorporate participatory design processes to ensure infrastructure reflects the needs of patients, companions, and providers and supports high-quality care delivery.

12.
arXiv (CS.CL) 2026-06-12

LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis

Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simultaneously provide realistic patient simulation, clinician-verified diagnostic labels, and support for dynamic multi-turn consultation. We present LingxiDiagBench, a large-scale multi-agent benchmark that evaluates LLMs on both static diagnostic inference and dynamic multi-turn psychiatric consultation in Chinese. At its core is LingxiDiag-16K, a dataset of 16,000 EMR-aligned synthetic consultation dialogues designed to reproduce real clinical demographic and diagnostic distributions across 12 ICD-10 psychiatric categories. Through extensive experiments across state-of-the-art LLMs, we establish key findings: (1) although LLMs achieve high accuracy on binary depression–anxiety classification (up to 92.3%), performance deteriorates substantially for depression–anxiety comorbidity recognition (43.0%) and 12-way differential diagnosis (28.5%); (2) dynamic consultation often underperforms static evaluation, indicating that ineffective information-gathering strategies significantly impair downstream diagnostic reasoning; (3) consultation quality assessed by LLM-as-a-Judge shows only moderate correlation with diagnostic accuracy, suggesting that well-structured questioning alone does not ensure correct diagnostic decisions. We release LingxiDiag-16K and the full evaluation framework to support reproducible research at https://github.com/Lingxi-mental-health/LingxiDiagBench.

13.
arXiv (CS.CV) 2026-06-18

SVHighlights: Towards Extremely Long Sport Video Highlight Detection

While highlight detection for long-form videos is of great practical importance, most existing methods remain limited to short-form content, largely due to the absence of a suitable benchmark. To bridge this gap, we introduce SVHighlights, to the best of our knowledge, the first benchmark for highlight detection in extremely long sports videos, each exceeding one hour in duration, across multiple sports categories. SVHighlights is constructed from pairs of full-length sports videos and their corresponding official highlight videos using a dataset generation pipeline, enabling scalable label generation without conventional per-clip saliency annotation. The benchmark comprises 320 videos with an average duration of 2.00 hours and a total of 640.18 hours, substantially exceeding previous datasets. Existing methods also face fundamental challenges on long videos: models trained on short clips fail to generalize to hour-long content, and their clip-level scoring lacks the broader context needed to identify highlights. To address this and provide a strong baseline, we present TF-SELECTOR, a training-free segment-based approach that divides each video into context-aware segments by merging adjacent shots sharing the same semantic content, and predicts segment-level saliency scores using a large language model with multimodal inputs including visual captions, transcripts, and audio volume. Experiments demonstrate that TF-SELECTOR achieves superior performance across most metrics compared to Video Temporal Grounding (VTG)-tuned baselines, with improvements of +2.50 in HIT@1, +4.04 in HIT@K, and +2.95 in IoU. These results establish SVHighlights as a challenging testbed for long-form highlight detection and demonstrate that a simple segment-based strategy can effectively scale to hour-long videos.

14.
arXiv (CS.CL) 2026-06-11

Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering

Full-duplex spoken language models (FD-SLMs) enable seamless speech interaction by allowing models to listen and speak simultaneously, yet the internal mechanism by which they coordinate listening and speaking remains underexplored. We analyze the predictive behavior encoded in FD-SLM hidden representations and find that they exhibit stream-specific predictive patterns: during listening, they preferentially predict the incoming user stream, whereas during speaking, they preferentially predict the model output stream. Building on this observation, we show that FD-SLMs dynamically modulate their internal predictive focus between two states: a generative state aligned with model output generation and a perceptive state aligned with incoming user input. However, this modulation can lag behind abrupt changes in conversational context. During user interruptions, the model remains transiently biased toward the generative state before transitioning into the perceptive state, causing it to miss the beginning of the incoming input. We term this delayed internal transition state inertia. To quantify its downstream impact, we introduce the Zero-Buffer Benchmark (ZBB), a diagnostic benchmark for evaluating immediate interruption comprehension when user speech begins abruptly. We evaluate this setting using response correctness and initial-word occurrence rate (IWOR). Finally, we mitigate state inertia through activation steering with a perception vector, a training-free intervention with little additional computational overhead. Across multiple state-of-the-art FD-SLMs, activation steering substantially improves interruption handling; for example, on PersonaPlex, it improves correctness from 28% to 45% and IWOR from 40% to 72% without any fine-tuning.

15.
arXiv (quant-ph) 2026-06-15

Merged amplitude encoding for Chebyshev quantum Kolmogorov–Arnold networks: trading qubits for circuit executions

arXiv:2603.02818v3 Announce Type: replace Abstract: Quantum Kolmogorov–Arnold networks based on Chebyshev polynomials (CCQKAN) evaluate each edge activation function as a quantum inner product, creating a trade-off between qubit count and the number of circuit executions per forward pass. We introduce merged amplitude encoding, a technique that packs the element-wise products of all $n$ input-edge vectors for a given output node into a single amplitude state, reducing circuit executions by a factor of $n$ at a cost of only 1–2 additional qubits relative to the sequential baseline. The merged and original circuits compute the same mathematical quantity exactly; the open question is whether they remain equally trainable within a gradient-based optimization loop. We address this question through numerical experiments on 10 network configurations under ideal, finite-shot, and noisy simulation conditions, comparing original, parameter-transferred, and independently initialized merged circuits over 16 random seeds. Wilcoxon signed-rank tests show no significant difference between the independently initialized merged circuit and the original ($p > 0.05$ in 28 of 30 comparisons), while parameter transfer yields significantly lower loss under ideal conditions ($p < 0.001$ in 9 of 10 configurations). On 10-class digit classification with the $8\times8$ MNIST dataset using a one-vs-all strategy, original and merged circuits achieve comparable test accuracies of 53–78\% with no significant difference in any configuration. These results provide empirical evidence that merged amplitude encoding preserves trainability under the simulation conditions tested.

16.
PLOS Medicine 2026-05-15

Spatial transcriptomic-metabolic features of tumor foci and tumor capsule in microvascular invasion with hepatocellular carcinoma: A spatial multi-omics study

作者:

by Zhi-Hui Luo, Na Wang, Jingwei Zhao, Fei Long, Si Wu, Wei Zhong, Wei-Ming Chen, Bicheng Wang, Kun Wang, Yufeng Yuan, Jingjiao Zhou, Chunhui Yuan, Fubing Wang Background Microvascular invasion (MVI) is closely related to the recurrence and metastasis of hepatocellular carcinoma (HCC), but the underlying cellular mechanism remains largely elusive. This study aims to elucidate the regional cellular discrepancy between MVI-positive (MVI+) and MVI-negative (MVI−) HCC by integrating Spatial transcriptomics (ST) and spatial metabolomics (SM). Methods and findings ST and SM were performed on six tissue samples from four patients (including 2 MVI+, 2 MVI−, and 2 paratumor tissues), with the integration of 79 public single-cell RNA sequencing datasets of HCC. Patient identity was used as a covariate in the linear equation for regional differentially expressed gene analysis with the ST data. Clinical validation was conducted through multiplex immunofluorescence staining in 79 patients, together with external validation in the cancer genome atlas (TCGA)-liver hepatocellular carcinoma (LIHC) cohort (n = 299) and an independent microarray dataset (n = 62). For cell-type-specific metabolic profiling, spatial transcriptomic-metabolic registration was performed. The functional roles of key metabolites were further validated in vitro using inflammatory cancer-associated fibroblasts (iCAFs) derived from hepatic stellate cells (HSCs) and primary CAFs through co-culture models and various functional assays assessing cell proliferation, migration, and invasion. In the tumor lesion, a malignant STMN1+HMGN2+GPC3+ cell subtype enriched in MVI+ HCC was identified, which exhibited enhanced proliferative activity and was associated with poor prognosis. This finding was further confirmed in a local cohort of 79 patients, where multiplex immunofluorescence staining for the three genes (STMN1, HMGN2, and GPC3) showed significantly higher expression in the MVI+ group than in the MVI− group (p = 0.046). Integrated SM analysis further revealed that this cell population underwent metabolic reprogramming characterized by suppressed glycerolipid metabolism. In the tumor capsule, iCAFs-related genes were downregulated in MVI+ cases, and iCAFs were located distally from the tumor boundary. Spatial metabolite mapping showed a strong correlation between taurine and iCAFs, and functional assays demonstrated that taurine promotes HCC proliferation and migration by suppressing iCAF activity. One limitation of this study is the small sample size of spatial omics data, which hinders a more complete molecular functional analysis of the STMN1+HMGN2+GPC3+ cell subtype and iCAFs in MVI+ HCC. Larger-scale ST cohorts are required to further validate and expand the findings of this study. Conclusions This integrative spatial atlas proposes a hypothesis that there exists a highly proliferative and metabolically reprogrammed malignant cell subtype in the tumor lesion of MVI+ HCC, and that taurine in the tumor capsule modulates iCAF activity to influence tumor progression. The exploratory results provide mechanistic insights into MVI-related HCC progression and offer potential avenues for targeted therapeutic intervention of MVI+ HCC.

17.
arXiv (CS.AI) 2026-06-11

When Context Returns: Toward Robust Internalization in On-Policy Distillation

arXiv:2606.11627v1 Announce Type: cross Abstract: Recent work has shown that on-policy distillation can internalize privileged context, such as system prompts or task hints, into a student model so that the context is no longer needed at inference time. Although this approach successfully improves the student's no-context performance, we identify an interesting and previously unstudied phenomenon: in many settings, reintroducing the original privileged context to the distilled student actually degrades its performance, even on instances it already solves correctly without context. We term this context-induced degradation and argue that robust internalization demands not only matching the teacher's context-conditioned behavior, but also remaining stable when the context is reintroduced, a property we call context removability. Motivated by this observation, we propose a lightweight consistency regularizer that first anchors the student's no-context output via stop-gradient, then penalizes the context-conditioned output for deviating from it via forward KL divergence. This simple addition requires only one extra forward pass per training step, yet it effectively mitigates context-induced degradation and, in many cases, even improves no-context performance. Across 12 configurations spanning diverse domains and model families, our method improves context-conditioned accuracy in the majority of settings, reduces context-induced harm in 11 out of 12 settings, and effectively eliminates response-length inflation. A mechanistic case study further confirms that context removability is achieved at the representation level, with hidden states remaining nearly identical regardless of whether the context is present.

18.
arXiv (quant-ph) 2026-06-15

Certifying Macroscopic Quantum Mechanics via Hypothesis Testing with Finite Data

arXiv:2506.22092v2 Announce Type: replace Abstract: We address the challenge of certifying quantum behavior with single macroscopic massive particles, subject to decoherence and finite data. We propose a hypothesis testing framework that distinguishes between classical and quantum mechanics based on position measurements. While interference pattern visibility in single-particle quantum superposition experiments has been commonly used as a sufficient criterion to falsify classical mechanics, we show that, from a hypothesis testing perspective, it is neither necessary nor efficient. Focusing on recent proposals to prepare macroscopic superposition states of levitated nanoparticles, we show that the likelihood ratio test – which leverages differences across the entire probability distribution – provides an exponential reduction in measurements needed to reach a given confidence level. These results generalize to a broad class of quantum states, and offer a principled, efficient method to falsify classical mechanics in interference experiments, relaxing the experimental constraints faced by current efforts to test quantum mechanics at the macroscopic scale.

19.
arXiv (CS.CV) 2026-06-24

Compact Object-Level Representations with Open-Vocabulary Understanding for Indoor Visual Relocalization

Indoor visual relocalization plays a critical role in emerging spatial and embodied AI applications. However, prior research was predominantly devoted to low-level vision schemes, struggling to perceive scene semantics and compositions, which limits both interpretability and applicability. In this paper, we explore the issue of how to organize rich object information in a scene, including semantics, layout, and geometry, into a structured map representation, thereby utilizing object units exclusively to drive the camera relocalization task. To this end, we propose OpenReLoc, a camera relocalization system designed to provide scene understanding and accurate pose estimation capabilities. Leveraging recent foundation models, we first introduce a multi-modal mechanism to integrate open-vocabulary semantic knowledge for effective 2D-3D object matching. Additionally, we design object-oriented reference frames as position priors, paired with a reference frame selection strategy based on the Distance-IoU (DIOU), enabling extension to scalable scenes. Moreover, to ensure stable and accurate pose optimization, we also propose a dual-path 2D Iterative Closest Pixel loss guided by object shape. Experimental results demonstrate that OpenReLoc achieves superior relocalization recall and accuracy across various datasets. Our source code will be released upon acceptance.

20.
Nature (Science) 2026-06-17

Optical fibre gripper for high-performance 3D micromanipulation

作者:

Optical tweezers offer precise, non-contact control, but operate in a limited force regime and impose strict requirements on the characteristics of the targets as well as the environmental conditions1–4. Millimetre-scale mechanical tweezers can offer higher gripping force but are not suitable for precise manipulations5–11. Integrating microgrippers directly at the optical fibres provides a new approach for precise micromanipulation. However, existing fibre-integrated tweezers still face challenges in achieving high-performance manipulation of micro-objects (for example, single cells) within narrow spaces, mainly due to simplified architectures, constrained designs and millimetre-scale footprints12–14. Here we report a three-dimensional (3D) optical fibre gripper (OFG), which is fabricated by two-step, two-photon polymerization. The OFG consists of rigid photoresist microclaws and soft thermoresponsive hydrogel muscle doped with silver nanoparticles, and its size is only 38 × 38 × 61 μm3. The OFG exhibits a force-to-mass ratio of about 340 μN mg−1, outperforming previously reported fibre-integrated tweezers by one to two orders of magnitude. The OFG can manipulate opaque particles, irregular micromechanical components and diverse single-cell types. We further demonstrated its potential in 3D microassembly of complex microdevices (bearings, shafts and gearboxes) and biomimetic sampling in the narrow environment (&lt;300 μm). These results position the OFG as a compact fibre-tip manipulator for 3D micromanipulation, offering reversible and tunable gripping in an intermediate force regime between optical field trapping and millimetre-scale mechanical tweezers. A miniature three-dimensional optical fibre gripper enables powerful, precise micromanipulation of particles and single cells in confined spaces, bridging the gap between optical and mechanical tweezers.

21.
arXiv (CS.AI) 2026-06-17

Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

arXiv:2606.18108v1 Announce Type: cross Abstract: We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a dataset of 110 NL/SQL pairs. We propose a step-by-step generation framework comprising four modules: schema linking, query classification, prompt decomposition, and self-correction. The performance of thirteen LLMs is evaluated using in-context learning and prompt engineering techniques. Text-to-SQL performance is assessed using the perfect-match (PM) rate for row identifiers (e.g., object identifiers) and column identifiers (i.e., column names). The proposed step-by-step framework consistently outperforms a direct-inference baseline, while the self-correction module consistently reduces execution errors. For Claude Opus 4.6, PM performance on row (column) identifiers is high for simple queries, reaching 0.97 (0.94), and decreases with query complexity to 0.44 (0.72) for medium queries and 0.59 (0.49) for hard queries. Among the thirteen evaluated models, the best-performing LLMs for the text-to-SQL task are Claude Opus 4.6, Gemini 2.5 Pro, Gemini 3 Flash, and GPT-5.2-Codex.

22.
medRxiv (Medicine) 2026-06-11

What level of expertise is necessary to generate ACLS training test questions: pre-med students vs. artificial intelligence?

Abstract Introduction In-hospital cardiac arrest carries high mortality despite standardized ACLS training. Educators face increasing time constraints in developing assessment tools for ACLS training. Two possible solutions to this problem are using pre-medical students or using artificial intelligence to generate test questions. This study compared the quality of pre-medical student-generated ACLS test questions vs. AI-generated ACLS test questions, testing the hypothesis that AI-generated questions are non-inferior to student-generated questions. Methods Ten pre-medical students created ACLS questions following predefined criteria, while an AI model (Northwell's Artificial Intelligence Hub) generated comparable questions. A blinded ACLS-certified physician evaluated questions on the qualities of Alignment, Clarity, Cognitive Level, and Question Design using a standardized rubric (Likert scale: 1 = poor quality, 5 = excellent). Student's T-test and Chi-square analysis were used to compare the quality of questions on different rubric domains within each arm (student vs. AI) and within one domain (eg, question Clarity) between arms. The Student's T test was used when 2 comparator groups were compared (eg, Clarity of student-generated vs. AI-generated questions) within one arm. The ANOVA test was used when comparing more than 2 comparator groups (eg, Alignment vs. Clarity vs. Cognitive Level) within one arm. Statistical significance was set as a priority at p

23.
arXiv (quant-ph) 2026-06-11

Quest for quantum advantage: Monte Carlo wave-function simulations of the Coherent Ising Machine

arXiv:2501.02681v2 Announce Type: replace Abstract: The Coherent Ising Machine (CIM) is a quantum network of optical parametric oscillators (OPOs) intended to find ground states of the Ising model. This is an NP-hard problem, related to several important minimization problems, including the max-cut graph problem. In order to enhance its potential performance, we analyze the coherent coupling strategy for the CIM in a highly quantum regime. To explore this limit, without assuming gaussianity, we employ accurate numerical simulations. Due to the inherent complexity of the system, the maximum network size is limited. While master equation methods can be used, their scalability diminishes rapidly for larger systems. Instead, we use Monte Carlo wave-function methods, which scale as the wave-function dimension, and use large numbers of samples. These simulations involve Hilbert spaces exceeding $10^{7}$ dimensions. To evaluate success probabilities, we use quadrature probabilities. We demonstrate the potential for quantum computational advantage by reducing the time required to reach maximum success probability in a low-dissipation regime enabled by initial quantum superpositions and entanglement. Furthermore, we demonstrate that tailored time-dependent couplings can amplify these quantum effects. Comparisons with classical CIM models give evidence that quantum tunneling effects in this strong coupling limit can overcome trapping in false minima. This can greatly increase success rates, indicating a potential for quantum advantage. Finally, we perform a coherence analysis based on the state purity to examine the role of quantum coherence in CIM performance and to determine how state purity correlates with improved optimization outcomes.

24.
arXiv (CS.LG) 2026-06-11

MPK: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

arXiv:2512.22219v2 Announce Type: replace-cross Abstract: We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance mega-kernel. MPK introduces an SM-level graph representation that captures data dependencies at the granularity of individual streaming multiprocessors (SMs), enabling cross-operator software pipelining, \rev{fine-grained overlap of computation and communication, and other optimizations that are infeasible under the conventional kernel-per-operator execution model}. The MPK compiler lowers tensor programs into optimized SM-level task graphs and generates fast CUDA implementations for each task, while the MPK in-kernel parallel runtime executes these tasks within a single persistent mega-kernel using decentralized scheduling across SMs. Together, these components provide end-to-end kernel fusion with minimal developer effort, while preserving the flexibility of existing programming models. Our evaluation shows that MPK significantly outperforms existing kernel-per-operator LLM serving systems, achieving up to 1.7$\times$ lower end-to-end inference latency and pushing LLM inference performance close to the limits of the underlying hardware. MPK is publicly available at https://github.com/mirage-project/mirage.

25.
arXiv (CS.LG) 2026-06-15

A General Framework for Decision Trees via Bregman Divergences

arXiv:2606.13984v1 Announce Type: cross Abstract: Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman divergences, introduced by Lev Bregman in 1967 in the context of convex optimization, provide a broad family of loss functions that naturally generalize the squared Euclidean distance. This family includes, among others, the Kullback-Leibler divergence, the Poisson divergence, and the Itakura-Saito divergence, as well as several losses associated with distributions belonging to the exponential family. Moreover, Bregman divergences possess a rich geometric structure and deep connections with convex analysis and information geometry. In this work, we propose a generalization of the CART paradigm based on Bregman divergences, thereby obtaining a broader family of decision trees adapted to different statistical models and underlying geometries. Although algorithms such as CART or classical implementations such as rpart incorporate different impurity criteria, these are usually introduced in an ad hoc manner for each specific model. In contrast, the Bregman divergence approach provides a unified framework that allows these criteria to be derived and interpreted from common convex and geometric principles. Beyond the algorithmic construction, we also investigate theoretical properties of these trees. In particular, we study how properties of the generating convex function – such as strong convexity or smoothness – influence impurity gains between parent and child nodes, as well as stability and consistency properties of the estimator.