Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-18

An In-depth Study of LLM Contributions to the Bin Packing Problem

arXiv:2510.27353v2 Announce Type: replace Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.

02.
arXiv (CS.CL) 2026-06-11

Building Social World Models with Large Language Models

Understanding and predicting how social beliefs evolve in response to events – from policy changes to scientific breakthroughs – remains a fundamental challenge in social science. Given LLMs' commonsense knowledge and social intelligence, we ask: Can LLMs model the dynamics of social beliefs following social events? In this work, we introduce the concept of the Social World Model (SWM), a general framework designed to capture how social beliefs evolve in response to major events. SWM learns state-transition functions for social beliefs by mining temporal patterns in social data and optimizing the evidence lower bound, without the need for explicit human annotations linking events to belief shifts, or for expensive census data. To evaluate SWM, we introduce a benchmark, SWM-bench, derived from real-world prediction markets, specifically Kalshi and Polymarket. SWM-bench includes over 12k data points for social belief prediction tasks spanning diverse domains such as politics, finance, and cryptocurrency. Our experimental results show that SWM significantly outperforms time-series foundation models, achieving state-of-the-art results on Kalshi data and demonstrating competitive performance on Polymarket data, while offering interpretable insights into the underlying mechanisms of social belief dynamics.

03.
arXiv (CS.CV) 2026-06-12

Improving Pre-trained Adult Glioma Segmentation Models Using only Post-processing Techniques

Gliomas are the most common malignant brain tumors in adults and are among the most lethal. Despite aggressive treatment, the median survival rate is less than 15 months. Accurate multiparametric MRI (mpMRI) tumor segmentation is critical for surgical planning, radiotherapy, and disease monitoring. While deep learning models have improved the accuracy of automated segmentation, large-scale pre-trained models generalize poorly and often underperform, producing systematic errors such as false positives, label swaps, and slice discontinuities in slices. These limitations are further compounded by unequal access to GPU resources and the growing environmental cost of large-scale model training. In this work, we propose adaptive post-processing techniques to refine the quality of glioma segmentations produced by large-scale pretrained models developed for various types of tumors. We demonstrated the techniques in multiple BraTS 2025 segmentation challenge tasks, with the ranking metric improving by 14.9 % for the sub-Saharan Africa challenge and 0.9% for the adult glioma challenge. This approach promotes a shift in brain tumor segmentation research from increasingly complex model architectures to efficient, clinically aligned post-processing strategies that are precise, computationally fair, and sustainable.

04.
arXiv (CS.CL) 2026-06-15

Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

Byte-level tokenization enables language models to handle any Unicode input, but models can generate invalid UTF-8 sequences when encountering rare or unseen characters. We investigate the relationship between training scale and UTF-8 generation reliability with a 355M parameter model trained on 80B tokens from a balanced multilingual corpus of English, Japanese, Korean, and Chinese. We introduce multiple evaluation protocols that isolate UTF-8 structural validity from language modeling. UTF-8 validity convergence lags perplexity by a roughly a factor of two: perplexity stabilizes after 2.1B tokens, but UTF-8 validity requires 4.2B tokens. In context-free generation, rare characters achieve higher structural validity than common characters, suggesting over-specialization of frequent character representations. Through experiments, we observed that reliable UTF-8 generation is a distinct capability requiring evaluation beyond perplexity.

05.
arXiv (quant-ph) 2026-06-16

The Quantum Transition State

arXiv:2606.10266v2 Announce Type: replace Abstract: The transition state – the critical configuration separating reactants from products – is the central organizing concept of chemical reaction rate theory, yet for nearly a century it has been thought to have no exact quantum counterpart: the recrossing-free, one-way flux through a transition state appears to demand simultaneous knowledge of position and momentum, in conflict with the uncertainty principle. We show this obstruction is illusory and construct the quantum transition state directly from the exact quantum flow. Its stable and unstable invariant manifolds intersect in a unique bounded trajectory – the quantum transition-state trajectory – anchoring a moving dividing surface that each reactive characteristic crosses exactly once, yielding a one-way flux of the standard quantum probability current. The geometric framework underlying classical transition-state theory thus survives intact in exact quantum mechanics, in a fundamentally quantum form.

06.
arXiv (CS.AI) 2026-06-19

Human-on-the-Loop Orchestration for AI-Assisted Legal Discovery

arXiv:2606.19812v1 Announce Type: new Abstract: Autonomous Large Language Model (LLM) agents are increasingly deployed in electronic discovery (e-discovery), where compounding errors across multi-step reasoning chains can constitute legal malpractice. Unlike single-turn retrieval, agentic workflows operating over privileged document corpora exhibit a class of failure we term "trajectory collapse": an early misclassification silently propagates, rendering an entire privilege review invalid. This paper makes three contributions. First, we propose a structured taxonomy of agentic failures in legal information retrieval, organized by functional stage. Second, we introduce a four-layer verification architecture – spanning planning, reasoning, execution, and uncertainty quantification – designed to intercept these failures before they compound. Third, we present a preliminary simulation study on a synthetic e-discovery corpus that demonstrates how mandatory Human-on-the-Loop (HOTL) escalation thresholds reduce privilege-waiver risk relative to fully autonomous baselines. Our results suggest that calibrated uncertainty thresholds can reduce privilege-waiver risk by up to 61% versus fully autonomous deployment, while routing fewer than one quarter of documents to attorney review.

07.
arXiv (quant-ph) 2026-06-15

Tamed Feynman-Kac diffusion processes: Killing-branching intertwine

arXiv:2605.07824v2 Announce Type: replace-cross Abstract: Relaxation to equilibrium of a drifted Brownian motion is quantified by a transition probability density function, whose main (multiplicative) entry is an inferred Feynman-Kac kernel of the Schr\"{o}dinger semigroup operator. Although seemingly devoid of a natural probabilistic significance (except for its explicit path integral definition), the pertinent kernel relaxes to equilibrium as well. The implicit Feynman-Kac potential ${\cal{V}}(x)$, continuous, confining and bounded from below, may take negative values. If positive, ${\cal{V}}(x)$ can be interpreted as the killing rate of the decaying diffusion process. In case of relaxing F-K kernels the killing effects are tamed (often overcompensated). The taming inavoidably appears in conjunction with the existence of the negativity subdomains of ${\cal{V}}(x)$ in $R$. If locally ${\cal{V}}(x) < 0$, its sign inversion $- {\cal{V}}(x)$ can be interpreted as the branching (cloning, alternatively bifurcation) rate in the course of the other wise free random motion. The arising killed diffusion processes with branching, we interpret as the possible path-wise background of tamed (relaxing) Feynman-Kac diffusions. We present acomputer-assisted path-wise arguments, towards a consistency of the killing/branching taming scenario, for a number of nonlinear model systems in one space dimension. Special attention is paid to Feynman-Kac potential shapes in the double well form, where an analytic access to eigenvalues and eigenfunctions is scarce. Throughout the paper the dynamics refers to the positive real time. Since the Newton-type equations of motion for admissible classical trajectories have a Euclidean form (due to the sign inverted force term), we give a brief resume of a couple of their explicit solutions, without recourse to the Euclidean time intuitions, and the instanton lore of related quantum model systems.

08.
arXiv (CS.CV) 2026-06-16

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. Many multimodal tasks, especially those involving complex spatial compositions, multiple interacting objects, or evolving instructions, require decomposing instructions, verifying intermediate results, and making iterative corrections. While test-time scaling (TTS) has demonstrated that allocating additional inference compute for iterative reasoning substantially improves language model performance, extending this paradigm to unified multimodal models remains an open challenge. We introduce UniT, a framework for multimodal chain-of-thought test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds. UniT combines agentic data synthesis, unified model training, and flexible test-time inference to elicit cognitive behaviors including verification, subgoal decomposition, and content memory. Our key findings are: (1) unified models trained on short reasoning trajectories generalize to longer inference chains at test time; (2) sequential chain-of-thought reasoning provides a more scalable and compute-efficient TTS strategy than parallel sampling; (3) training on generation and editing trajectories improves out-of-distribution visual reasoning. These results establish multimodal test-time scaling as an effective paradigm for advancing both generation and understanding in unified models.

09.
arXiv (quant-ph) 2026-06-19

Entanglement structure of the dynamical phases in the sub-Ohmic spin-boson model

arXiv:2606.20313v1 Announce Type: new Abstract: The sub-Ohmic spin-boson model exhibits three distinct dynamical regimes in its spin population dynamics, classified as coherent, incoherent, and pseudo-coherent. Whether these regimes correspond to distinct spin-bath entanglement structures remains an open question. Here we address this using tree tensor network states with projector-splitting time evolution (TTN-TDVP-PS), scanning a broad grid in the sub-Ohmic $(s, \alpha)$ plane. We find that the spin entanglement entropy $S_\mathrm{spin}(t)$ reaches a stationary plateau on a timescale shorter than the polarization relaxation, enabling construction of a stationary entropy landscape from the stationary value $S_\mathrm{stable}$. Within this scalar entropy landscape, the entropy ridge broadly follows the population-based phase boundary at small $s$, but does not reproduce the two-branch structure at large $s$. The ridge remains single-valued within the incoherent region rather than separately tracking both population-based transitions. The Bloch-sphere representation provides a geometric interpretation of this behavior. The entropy plateau corresponds to trajectories settling onto constant-radius shells, with the ridge marking the parameters of smallest stationary Bloch radius. Mode-resolved bath entanglement shows that low-frequency modes dominate the environmental entropy scale and that coherent dynamics enhance bath-mode correlations beyond direct spin–mode correlations. These results establish the stationary spin entanglement entropy as a physically informative observable that complements population-based classifications of dissipative quantum dynamics.

10.
arXiv (CS.LG) 2026-06-17

Noise-Driven Exploration and Transient Freezing Select Flat Minima in Stochastic Gradient Descent

arXiv:2601.10962v2 Announce Type: replace Abstract: Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism that governs solution selection during training. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and migrate toward flatter regions of the loss landscape before becoming confined to a final basin. Using a tractable physical model, we show that SGD noise reshapes the loss landscape into an effective potential that preferentially stabilizes flat solutions. We further uncover a transient freezing mechanism: as training progresses, the flattening landscape suppresses transitions between competing valleys. Stronger SGD noise delays this freezing transition, prolonging the exploratory phase and thereby increasing the probability of convergence to flatter minima. Together, these results provide a unified physical framework connecting learning dynamics, loss-landscape geometry, and generalization, and suggest guiding principles for the design of more effective optimization algorithms.

11.
arXiv (CS.AI) 2026-06-17

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

arXiv:2606.18122v1 Announce Type: cross Abstract: Embedded machine learning moves inference from cloud services to resource-constrained devices that must acquire data, preprocess signals, run a model, and act within tight limits on memory, energy, and latency. This paper presents a systems-oriented synthesis of an embedded machine-learning workflow for microcontroller-class platforms. The emphasis is placed on engineering decisions that are often hidden in generic machine-learning introductions: sampling and buffering, feature extraction as dimensionality reduction, validation under class imbalance, model/runtime co-design, and streaming deployment. Two representative signal families are used throughout the paper. The first is inertial motion recognition, where a two-second, three-axis accelerometer window is transformed from raw samples into root-mean-square and spectral features before classification. The second is keyword spotting, where audio is sampled, anti-aliased, transformed into mel-frequency cepstral coefficients, and processed by a compact one-dimensional convolutional network. The paper concludes with practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

12.
arXiv (CS.LG) 2026-06-17

On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies

arXiv:2606.17276v1 Announce Type: cross Abstract: Generative recommendation (GR) has emerged as a promising direction for recommender systems. Recently, large language models (LLMs) have been increasingly adopted for GR, as their rich pretrained knowledge is expected to help them generalize beyond common user behavior patterns that traditional memorization-oriented baselines can capture. However, existing LLM-based GR works largely ignore LLMs' well-known tendency to memorize, which, if present in LLMs fine-tuned for GR, would restrict their utilization of pretrained knowledge. In this work, we investigate this concern by examining one-hop memorization, where a model recommends items that are direct successors of items in the training data. We show that LLMs do this more than non-LLM-based GR models-in fact, the vast majority of their gains over GR baselines are actually on users whose target items can be predicted through one-hop memorization. We intuit that improving performance on the remaining users requires LLMs to learn richer item-item relations beyond one-hop transitions. To achieve this, we propose IIRG, a novel training strategy that teaches LLMs to capture: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes, both of which can serve as useful recommendation signals. We show that IIRG significantly improves over LLMs trained solely with standard next-item prediction, with especially large gains for users whose test items are not covered by train-time one-hop transitions.

13.
arXiv (CS.AI) 2026-06-16

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

arXiv:2606.17006v1 Announce Type: cross Abstract: We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduce anchor calibration, a post-hoc, per-system Bradley-Terry calibration that recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.

14.
arXiv (CS.AI) 2026-06-16

MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions

arXiv:2606.14715v1 Announce Type: cross Abstract: LLM agents are increasingly used to simulate real world interactions, but it remains unclear whether simulated behaviors preserve the content patterns and interaction dynamics of real human behaviors. Existing evaluations remain fragmented, which makes it difficult to compare systems or measure progress. In this paper, we focus on Reddit discussions as a concrete first step toward evaluating real-world social simulation. Reddit threads provide public, topic-grounded, multi-party interactions where people share experiences, debate, seek advice, express emotion, and collectively respond to products, events, and social issues. These discussions offer an observable window into broader social behavior, making them a useful setting for testing whether LLM agents can reproduce not only fluent text, but also the distributional patterns and interaction dynamics of real online communities. We introduce MiroBench, a benchmark for Reddit discussion simulation built from 4,292 real Reddit threads. MiroBench uses statistical tests to compare generated and real discussions across four major aspects: repetition and semantic uniformity, narrative content, toxicity and aggression, and structural complexity. Experiments across five domains and five models show that current simulators remain distributionally mismatched with real Reddit threads, while a lightweight prompt-based improvement procedure provides only limited gains. MiroBench offers a concrete benchmark for measuring, diagnosing, and improving realism in LLM-based social simulation.

15.
arXiv (CS.CL) 2026-06-17

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8\% and reduces execution time by up to 40.4\% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.

16.
medRxiv (Medicine) 2026-06-11

Computer Vision for Real-Time Anatomical Navigation in Neurosurgery: First-in-Human Clinical Evaluation and Iterative Development (IDEAL Stage 1)

Introduction: Precise anatomical navigation is fundamental to safe endoscopic pituitary surgery, a high-stakes procedure characterised by a challenging learning curve. While traditional navigation systems often rely on workflow-disrupting probes or static preoperative imaging, advancements in computer vision AI (CVAI) now enable dynamic, real-time anatomical segmentation directly from live surgical video1-3. Our group has previously conducted a series of preclinical human-computer interaction studies to refine the system's design, alongside digital and high-fidelity physical simulations demonstrating the benefit of AI assistance in improving overall performance, training, and safety4-8. Building on this foundation, the current study represents a first-in-human application of real-time CVAI assistance in the neurosurgical operating room, serving to assess feasibility and safety, and to iteratively improve the system. Method: Guided by DECIDE-AI and IDEAL frameworks, this single-centre evaluation comprises an initial proof-of-concept phase (n=6) for endoscopic transsphenoidal pituitary surgeries. The AI model utilised a DINOv3-derived vision transformer architecture, deployed via a high-performance edge computing unit to achieve low-latency, real-time inference without reliance on cloud infrastructure2. Given the high-risk nature of the procedure and the early stage of clinical AI integration, the system was initially deployed as an educational adjunct on a secondary monitor, ensuring the primary surgical feed remains uncompromised. Functionality and safety were assessed via structured questionnaire, prospective observation, and blinded retrospective review of the recordings of the endoscopic surgical video feed and wider operating room environment. Continuous multi-stakeholder feedback through validated human factors surveys drove iterative technical refinements between cases. Results: Six patients with pituitary adenomas were enrolled. The CVAI system was successfully deployed in four cases, demonstrating acceptable real-time sella segmentation accuracy. Deployment failed pre-operatively in two cases owing to a single recurring system reboot bug. Iterative refinement between cases were driven by our experience and surgical team feedback. This resulted in the integration of additional anatomical structure segmentations (e.g., carotid arteries), enhanced model accuracy via training dataset expansion, and hardware firmware upgrades. Multi-stakeholder surveys demonstrated satisfactory system feasibility, usability, and acceptability among the surgical team. Both prospective observation and retrospective video review confirmed the absence of adverse events, including no significant distraction to the primary surgeon, and there were no AI-related clinical complications. Conclusion: This first-in-human early clinical evaluation demonstrates the feasibility, safety and iterative development of real-time, CVAI-based anatomical navigation during high-stakes neurosurgery. Future work will include a larger single-centre case series (IDEAL Stage 2a) with more surgical teams to further iterate the system and explore its impact on training and workflow. As the underpinning technology improves, deployment will transition to direct intra-operative decision support and integration with other intra-operative navigational technologies.

17.
arXiv (CS.LG) 2026-06-16

Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks

arXiv:2605.26290v2 Announce Type: replace Abstract: Temporal signed networks (TSNs) model the time evolution of cooperative and adversarial relationships that arise in applications such as social media analysis, trust and reputation systems, and financial transaction networks. While graph neural networks (GNNs) perform well for static or unsigned link prediction, effective learning in temporal signed graphs remains challenging due to the interaction of signed relations, evolving structure, and balance-theoretic constraints. To address this gap, we propose a modular temporal enhancement framework for signed GNNs that integrates historical context into otherwise static architectures. The framework introduces a Historical Context Integration Module (HCIM) that combines learnable recency-aware temporal weighting, LSTM-based embedding trajectory modeling, and multi-head temporal attention to capture both short- and long-term signed interaction dynamics. Historical information is fused with current node representations using either global or node-adaptive weighting, allowing the architecture-agnostic framework to accommodate heterogeneous temporal behaviors. We instantiate the approach on the Self-Explainable Signed Graph Transformer (SE-SGformer), preserving interpretability while extending it with temporal awareness. Experiments on real-world and synthetic TSNs, including Bitcoin OTC, Bitcoin Alpha, Reddit, and small-world network models, demonstrate consistent and statistically significant improvements over the static baseline.

18.
arXiv (CS.CL) 2026-06-18

Application of integrated gradients explainability to sociopsychological semantic markers

Classification of textual data in terms of sentiment, or more nuanced sociopsychological markers (e.g., agency), is now a popular approach commonly applied at the sentence level. In this paper, we exploit the integrated gradient (IG) method to capture the classification output at the word level, revealing which words actually contribute to the classification process. This approach improves explainability and provides in-depth insights into the text. We focus on sociopsychological markers beyond sentiment and investigate how to effectively train IG in agency, one of the very few markers for which a verified deep learning classifier, BERTAgent, is currently available. Performance and system parameters are carefully tested, alternatives to the IG approach are evaluated, and the usefulness of the result is verified in a relevant application scenario. The method is also applied in a scenario where only a small labeled dataset is available, with the aim of exploiting IG to identify the salient words that contribute to building the different classes that relate to relevant sociopsychological markers. To achieve this, an uncommon training procedure that encourages overfitting is employed to enhance the distinctiveness of each class. The results are analyzed through the lens of social psychology, offering valuable insights.

19.
arXiv (CS.CL) 2026-06-18

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

20.
arXiv (CS.CL) 2026-06-11

Schützen: Evaluating LLM Safety in Bulgarian and German Contexts

Large language models are increasingly deployed across professional domains, bringing hard-to-predict risks, including the generation of harmful or disrespectful content. Although substantial progress has been made in developing safety evaluation datasets, existing resources remain overwhelmingly English- and Chinese-centric. This limitation is particularly pronounced when evaluating languages that operate within shared sociocultural, legal, and ethical contexts. To address this gap, we introduce Sch\"{u}tzen: a German–Bulgarian safety dataset designed to assess model answerability under risk, covering both a low-resource language (Bulgarian) and a high-resource language (German). Experiments with multilingual and language-specific LLMs reveal pronounced cross-language differences in safety behavior, highlighting the necessity of tailored, region-specific evaluation resources to support the responsible deployment of LLMs in Germany and Bulgaria. Datasets and code are available at https://github.com/xnlp-lab/Schutzen. Warning: this paper contains examples that may be offensive, harmful, or biased.

22.
medRxiv (Medicine) 2026-06-11

Parent and physiotherapist perceptions about movement skills of young children with juvenile idiopathic arthritis

Objective: The onset of juvenile idiopathic arthritis (JIA) in the early years ([&le;]5 years) may negatively impact movement skill (encompassing related concepts of gross motor skills, fundamental movement skills, and functional ability) development. Few studies have explored the perceptions and needs of parents and physiotherapists towards children's difficulty with these movement skills, essential to identify potential areas for added support. The objective of this study is to understand the perceptions of physiotherapists and parents towards movement skills of children with JIA. Methods: Seventeen parents and 24 physiotherapists completed an online questionnaire consisting of multiple choice and open-ended questions about the movement skills of young children with JIA. Demographic and multiple choice questions were quantitively analysed using descriptive statistics. Open-ended responses were analyzed using qualitative conventional content analysis. Results: About half (47%) of parents perceived their children to have movement difficulties, and 75% of physiotherapists described the movement skills of children with JIA as worse than other children of the same age. Our qualitative analysis revealed three general themes including: functional task difficulties; clinical variability in movement skills; and psychosocial components of movement skill difficulties. Conclusion: This study provides an analysis of perceptions of physiotherapists and parents towards the movement skills of young children with JIA. A significant proportion of parents and physiotherapists identify movement difficulties among children with JIA that impact daily life. Future interventions co-designed with both parents and care providers targeting movement skills are needed.

23.
arXiv (CS.LG) 2026-06-17

Gradual Fine-Tuning for Flow Matching Models

arXiv:2601.22495v2 Announce Type: replace Abstract: Fine-tuning flow matching models is a central challenge in settings with limited data, evolving distributions, or computational constraints. While recent work has produced significant advances, particularly in the area of reward-based fine-tuning, current methods fail to demonstrate both theoretical correctness as well as strong empirical results in terms of stability, efficiency, and diversity preservation. In this work, we propose Gradual Fine-Tuning (GFT), a simple yet principled annealing-based framework for fine-tuning flow generative models when only samples from the target distribution are available. For stochastic flows, GFT defines a temperature-controlled sequence of intermediate objectives that smoothly interpolate between the pretrained and target drifts, provably approaching the true target as the temperature approaches zero. We analytically demonstrate that sample generation after GFT can be made substantially more efficient with the use of arbitrary (e.g., optimal transport) couplings, as well as by utilizing few-step inference methods. Empirically, GFT significantly improves convergence stability, while maintaining or improving generation quality, training speed, and generation diversity compared to other fine-tuning methods. Our results position GFT as a simple yet theoretically grounded and practically effective alternative for scalable adaptation of flow matching models under distribution shift.

24.
arXiv (quant-ph) 2026-06-12

Quantum-Driven Neuromorphic Computing for Million-Qubit-Scale Workloads

arXiv:2606.12968v1 Announce Type: new Abstract: We introduce Apollo, a 10000 node p-qubit neuromorphic processor fabricated in 16 nm mixed signal CMOS and operating fully at room temperature with a typical analog core power envelope of about 0.5 W. Its fundamental element, the p-qubit, is a bistable stochastic unit whose continuous time state fluctuations are driven by integrated quantum entropy units that inject true quantum derived randomness. This enables ultrafast stochastic transitions at low energy while preserving a classical state representation. Apollo combines these p-qubits with a high degree Hyperion 256 interconnect topology, allowing efficient embedding of dense Ising and QUBO problems with substantially reduced minor embedding overhead compared with sparse annealing platforms. We show that, through the Suzuki Trotter correspondence, the equilibrium statistics and annealing dynamics of the p-qubit network reproduce key properties of transverse field quantum annealing without cryogenic cooling, long lived coherence, or microwave control. Beyond device level validation, Apollo is evaluated on a three dimensional spin glass benchmark previously used to study quantum advantage in superconducting annealers. Across 300 disorder realizations, Apollo reaches substantially lower ground state energies than reported cryogenic quantum annealing hardware, while remaining distinct from classical simulated annealing and simulated quantum annealing. A 350 nm release candidate device experimentally validates the core p-qubit dynamics, thermodynamic sampling correctness, and continuous time annealing behavior. These results establish Apollo as a room temperature, industrially scalable platform for quantum driven energy based optimization, probabilistic inference, generative modeling, and hybrid classical quantum workflows.

25.
PLOS Computational Biology 2026-06-02

PepAnno: A structure-aware deep learning framework for bioactive peptide prediction, structural visualization, and physicochemical profiling

作者:

by Enyan Liu, Yueming Hu, Liya Liu, Yifan Chen, Shilong Zhang, Sida Li, Haoyu Chao, Luyao Xie, Yi Shen, Liangwei Wu, Julio Raúl Fernández Massó, Ming Chen Peptides are gaining prominence as therapeutic candidates due to their diverse physiological functions and structural simplicity. Although multiple computational tools exist for bioactive peptide prediction, many suffer from limitations such as non-intuitive interfaces, sequence-only representations, insufficient structural awareness, restricted interpretability, or fragmented analysis workflows, leading to reduced research efficiency and higher costs. To address these challenges, we present PepAnno (https://bis.zju.edu.cn/pepanno/), a comprehensive and user-friendly web server for multi-functional peptide annotation. PepAnno is powered by a novel structure-aware, multi-view geometric deep learning framework that integrates pre-trained sequence embeddings with predicted 3D structural graphs through a dual-stream architecture combining a Transformer and a GATv2 network. A cross-modal attention mechanism is employed to effectively fuse semantic and geometric representations, enabling accurate multi-task prediction across 7 key bioactivities, including antimicrobial and anticancer properties. Comprehensive evaluation on seven curated bioactivity datasets demonstrates that PepAnno achieves robust and competitive predictive performance across tasks, consistently outperforming or matching existing methods in terms of discrimination and stability. Beyond functional prediction, PepAnno provides automated calculation of physicochemical properties, structure visualization, and access to an integrated repository of peptide-related databases and tools. By enabling one-click peptide annotation, PepAnno offers an efficient and interpretable solution for large-scale peptide analysis and facilitates downstream experimental design and peptide-based drug discovery.