Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-24

SENTRY: SAM2-Enhanced Neighbor-Aware and Temporally Reasoned Memory for Visual Tracking

We revisit the memory update mechanism in SAM2-based visual object tracking and identify confidence-only mask selection as the dominant cause of drift under occlusion, rapid motion, and distractors. We introduce SENTRY, a training-free, plug-and-play, refine-before-write module that validates each memory update for short-horizon temporal consistency before committing it. SENTRY aggregates diverse segmentation hypotheses per frame, backtracks them into short tracklets, and uses neighbor-aware cycle-consistent matching against recent trajectories to favor temporally and geometrically consistent masks. It leaves the base architecture untouched, replacing confidence-driven writes with consistency-validated ones. For fair evaluation, we re-evaluate major open-source SAM2-based trackers across all available scales and datasets, filling gaps in prior reports. Integrated into five strong baselines, SENTRY delivers consistent gains across nine benchmarks, achieving new zero-shot SOTA on LaSOT, LaSOT_ext, GOT-10k, VOT20, VOT22, and DiDi. Despite these checks, the SAM2-L version runs at 32.8 FPS on an A100, and across compatible hosts adds only about 0.4–0.6 GB VRAM. Our results provide the first unified all-scale evaluation of SAM2-based trackers and show that enforcing temporal validity at write time stabilizes memory-augmented tracking without retraining.

02.
arXiv (CS.AI) 2026-06-11

Can Open-Source LLM Agents Replace Static Application Security Testing Tools? An Empirical Assessment

arXiv:2606.11672v1 Announce Type: cross Abstract: This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- (GenAI-) based agent when powered by three different Ollama-hosted general-purpose open source models. We assess each agent's performance using precision, recall, false positive count, and a calculated composite score based upon the interplay of the captured metrics, against the baseline performance of an existing, vetted Static Application Security Testing (SAST) tool, Bandit. Our findings refute the notion that a modern open-source GenAI LLM-based agent is currently suitable for the specialized task of SAST scanning under realistic conditions.

03.
arXiv (CS.LG) 2026-06-24

Flow-Corrected Thompson Sampling for Non-Stationary Contextual Bandits

arXiv:2606.23933v1 Announce Type: cross Abstract: We study non-stationary linear contextual bandits where the reward model drifts over time, rendering classical contextual bandit algorithms brittle because historical data becomes systematically biased. We propose Flow-Corrected Thompson Sampling (fcTS), a Bayesian method that reuses experience by transporting past rewards to the present using an explicit drift model and incorporating each transported observation with a confidence weight that reflects transport reliability. This yields a unified template that specializes in (i) linear parameter drift via online slope estimation and reward correction, (ii) periodic variation via phase-aware reuse across cycles, and (iii) recurring regime switches via changepoint detection and regime-specific posterior memory. The resulting posterior updates remain closed-form under a linear Gaussian model and can be implemented efficiently with truncated, incrementally updated sufficient statistics. Across five controlled case studies and a semi-synthetic portfolio-selection benchmark with multiple overlapping non-stationarities, fcTS outperforms standard forgetting-based baselines (discounting, sliding windows, and periodic restarts), with the largest gains in settings exhibiting recurring temporal structure. These results demonstrate that when non-stationarity is structured, correcting and reweighting historical observations can be substantially more sample-efficient than uniformly discarding them.

04.
arXiv (CS.LG) 2026-06-11

Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers

Authors:

arXiv:2606.11949v1 Announce Type: new Abstract: We present an online monitoring system for distributional shift in deployed safety classifiers, using calibrated sequential statistics to detect when a classifier has moved out of distribution. Upon detection, a conformal abstention layer adapts decision thresholds to recover a target error rate epsilon=0.1. In a pre-registered factorial evaluation (4 classifiers x 5 shift conditions x 20 seeds x 2 window sizes, 800 cells), the system achieves 86.6% valid detection (693/800, 95% CI [84.1%, 88.8%]) with mean latency of 39.5 steps. Detection holds across three ground-truth regimes: synthetic onset (86.6%), real temporal jailbreaks (85%, 17/20), and GCG adversarial attacks. Weighted conformal prediction recovers up to 39 pp of lost coverage for DeBERTa (ESS=46/300) but collapses for all other classifiers (ESS~300): logistic density ratio estimation achieves perfect source/target separability in high-dimensional embedding spaces, clipping all importance weights to the floor. DeBERTa shows a gradient from effective correction (paraphrase, ESS=46) to near-total collapse (adversarial suffix, ESS=206). PCA to 32 dimensions breaks the collapse, recovering 33 pp for Llama Guard and 21 pp for ShieldGemma. Variance decomposition reveals classifier (eta^2=0.243), shift type (eta^2=0.237), and their interaction (eta^2=0.185) all contribute substantially to detection latency variance (all p

05.
arXiv (CS.AI) 2026-06-12

Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

arXiv:2505.13102v4 Announce Type: replace-cross Abstract: Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.

06.
arXiv (CS.CL) 2026-06-12

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

Authors:

Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and Mistral-7B, via QLoRA on SciFact and HealthVer, providing the first study of QLoRA models against GPT-4o and fine-tuned BioLinkBERT encoders. Mistral-7B QLoRA surpasses both GPT-4o and GPT-5 (up to 12% F1 gain) at a fractional cost using just 1,008 training examples. We conduct extensive in-domain and cross-domain evaluation: models trained on SciFact tested on HealthVer and vice versa, at matched sizes to isolate dataset structure from data quantity. We identify a previously unreported structural artifact in SciFact that inflates in-domain scores, and show through bidirectional out-of-domain evaluation that training on structurally sound data enables robust cross-domain transfer. We plan to release all code and adapter checkpoints.

07.
arXiv (CS.CL) 2026-06-24

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

The composition of training data, governed by the diversity of sources and their mixing strategy, is a cornerstone of Large Language Model (LLM) pre-training. Online Data Mixing (ODM), the technique of adaptively adjusting data mixtures during training, has emerged as a promising direction to improve efficiency. However, existing methods are constrained by their reliance on a singular optimization perspective, which fundamentally overlooks the need for complex LLM pre-training to consider the dynamic data composition from multiple dimensions. To overcome this limitation, we introduce the Holistic Data Scheduler (HDS), a novel online data mixing framework. HDS formulates the data scheduling challenge as a reinforcement learning problem in a continuous control space and leverages the Soft Actor-Critic (SAC) algorithm for its stability and sample efficiency in exploring the high-dimensional policy space. At the core of HDS lies a novel multi-objective, holistic reward function that integrates three critical perspectives: a data-driven reward for quality, a loss-driven reward capturing inter-domain influence, and a model-driven reward based on weight norms. To validate our design and determine its optimal configuration, we conducted systematic experiments on LLMs of various sizes. On The Pile benchmark, HDS reaches the final validation perplexity of the next best method with 44% fewer training iterations. Furthermore, it achieves a 7.2% improvement on the MMLU 0-shot task along with consistent gains on other benchmarks, showcasing its ability to enhance both training efficiency and final model capability.

08.
arXiv (CS.CV) 2026-06-15

FLaRA: Predicting Future Latent Representations for Accident Anticipation

Anticipating traffic accidents from dashcam videos is a critical challenge in intelligent transportation systems. Existing methods typically map visual context directly to a collision probability without explicitly modeling the future evolution of the driving scene. In this paper we propose FLaRA (Predicting Future Latent Representations for Accident Anticipation), a novel predictive architecture that shifts this paradigm by forecasting future latent representations for accident anticipation. Building upon the Video Joint-Embedding Predictive Architecture (V-JEPA2), our model conditions a predictor network on observed context frames to predict the forthcoming latent features of the scene. A classifier then operates on these predicted future representations rather than only on past observations. To ensure these forecasts remain grounded in realistic future dynamics, we introduce a joint training objective that simultaneously optimizes an auxiliary feature-level reconstruction loss and a cross-entropy classification loss. Extensive evaluations on the Nexar dataset, alongside cross-domain validations on the DAD, DADA-2000, and DoTA benchmarks, demonstrate that our approach achieves state-of-the-art performance while maintaining realistic early warning capabilities.

09.
arXiv (CS.CL) 2026-06-16

The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models

High-grade gliomas integrate into neural circuits through functional synapses with neurons, raising the question of which noncoding elements shape synaptogenic gene expression in tumor cells. The regulatory program written across the dark genome, what we call the $dark regulome$, is the natural substrate to probe, and sequence foundation models offer a zero-shot route through in-silico mutagenesis (ISM); yet likelihood-based scoring is tautologically coupled to local sequence predictability, leaving the regulatory interpretation underdetermined. Across three architecturally distinct foundation models (Caduceus-Ph, HyenaDNA, Enformer) and 30,448 dark genome elements at 92 glioma-relevant loci, we introduce a residualization-and-permutation diagnostic that separates predictability-driven from regulation-driven RIS variance. A sharp 10kb proximal-regulatory horizon survives every control we apply, but the LM-derived element-class hierarchy does not: a six-feature linear baseline matches Caduceus top-decile membership at AUC $= 0.985$. Cross-architecture decomposition cleanly separates a sequence-predictability layer (the two language models co-rank long well-predicted transposable elements) from a regulatory-output layer (Enformer alone retains residual cCRE-discriminative signal), with literally zero overlap between the two top-100 lists. Conservation, brain cis-eQTL, and STRING-PPI cross-checks then anchor what biology survives: top-100 elements across all three models are $3.3\times$ enriched per model for matching brain eQTLs ($p_\mathrm{emp} < 5\times 10^{-3}$), while a tempting transposable-element regulatory layer and a striking NRXN1+NLGN1 protein-pair convergence both fail proper permutation tests once those tests are constructed. We deliver the diagnostic as a general methodological tool for any ISM-based regulatory study.

10.
arXiv (CS.LG) 2026-06-18

Task-Restricted Symmetries in Recurrent Weight Space

arXiv:2606.18457v1 Announce Type: new Abstract: Recurrent networks can contain substantial functional redundancy in weight space: changing a recurrent matrix may leave the input-output rollout nearly unchanged on a task distribution, while similar-scale changes can destroy the same behavior. We study this redundancy in one-layer tanh RNNs using ordered real Schur coordinates. The Schur form separates spectral blocks from directed nonnormal couplings, giving a diagnostic basis for structured ablations that keep the input and readout maps fixed. In a fixed-length copy task, selected nonnormal Schur couplings can be removed with little loss in some trained solutions, whereas other couplings are necessary for accurate autonomous replay. Across flip-flop, sine generation, and context-dependent integration, the loss-preserving ablation profile varies across tasks and trained solutions. These results identify candidate approximate functional invariances, not universal symmetries of recurrent weight space. Schur-coordinate ablations provide a practical diagnostic for which structured perturbations preserve a trained recurrent solution and which ones disrupt its computation.

11.
arXiv (CS.AI) 2026-06-12

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

arXiv:2606.13054v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit exceptional general language processing capabilities, but their memory and compute costs hinder deployment. Ternarization has emerged as a promising compression technique, offering significant reductions in model size and inference complexity. However, existing methods struggle with heavy-tailed activation distributions and therefore keep activations in high precision, fundamentally limiting end-to-end inference acceleration. To overcome this limitation, we propose TWLA, a post-training quantization (PTQ) framework that achieves 1.58-bit weight compression and 4-bit activation quantization while maintaining high accuracy. TWLA comprises three components: (1) Euclidean-to-Manifold Asymmetric Ternary Quantizer (E2M-ATQ) minimizes layer-output error under weight ternarization via a two-stage optimization from Euclidean initialization to manifold relocation; (2) Kronecker Orthogonal Tri-Modal Shaping (KOTMS) applies a Kronecker-structured orthogonal rotation to reshape weights into ternary-friendly tri-modal distributions, while the shared rotation statistically suppresses activation outliers; and (3) Inter-Layer Aware Activation Mixed Precision (ILA-AMP) explicitly introduces adjacent-layer second-order interaction costs in bit allocation and jointly optimizes for the layer-wise disparity of activation quantization gains induced by the shared orthogonal transform, preventing cascades triggered by a few weak layers. Extensive experiments demonstrate that TWLA maintains high accuracy under W1.58A4, while delivering significant inference acceleration. The code is available at .

12.
arXiv (CS.CL) 2026-06-15

SciDef: Datasets and Tools for Automated Definition Extraction from Scientific Literature with LLMs

Scientific concepts are often defined inconsistently across papers, making it difficult to compare findings, reuse terminology, and build reliable downstream resources. We present SciDef, a resource suite for scientific definition extraction. The suite contains DefExtra, a benchmark of 268 human-validated author-stated definitions from 75 academic papers; DefSim, 60 human-labeled definition-pair similarity judgments; and an open LLM-based pipeline for PDF preprocessing, chunking, definition extraction, prompt optimization, and evaluation. We validate the resources by benchmarking 16 language models across prompting strategies and chunking schemes. The strongest set-level configuration achieves a score of 0.397, while the highest-coverage configuration matches at least one prediction to 86.4% of gold definitions but over-generates candidate definitions. We further show that an NLI-based matching metric agrees strongly with human DefSim judgments. These results position SciDef as a reusable benchmark and tooling layer for definition-centric literature analysis, while highlighting relevance-aware filtering as the key bottleneck for fully automatic definition extraction. Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.

13.
arXiv (quant-ph) 2026-06-24

Certifying Quantum Optimization and Circuit Cutting by Using Quantum-Classical Moment Duality

Authors:

arXiv:2606.23727v1 Announce Type: new Abstract: We establish a direct quantum-classical duality based on the degree-$2$ Sum-of-Squares (SoS) semidefinite programming cone: the matrix of two-qubit Pauli-$Z$ correlation functions obtained from any quantum state $\rho$ is automatically a feasible point of the classical Goemans-Williamson (GW) relaxation. This observation provides a universal ``safety net'' for quantum optimization algorithms: applying GW random hyperplane rounding to the quantum-driven moment matrix yields a certified expected cut value $\mathbb{E}[\mathrm{Cut}] \ge \alpha_{\mathrm{GW}}\langle\mathcal{H}\rangle_\rho$, valid for every state produced by variational algorithms such as QAOA or the Variational Quantum Power Method (VQPM), regardless of convergence quality. We further show that the same moment matrix reveals the tensor-product structure of the underlying unitary circuit, enabling a polynomial-time, correlation-based circuit cutting procedure with rigorous error bounds. The framework is validated numerically on Max-Cut instances for variational quantum algorithms and on random states for circuit cutting, demonstrating that the cheap two-point correlation data are sufficient to locate near-optimal bipartitions and that the theoretical error bounds hold in practice.

14.
arXiv (math.PR) 2026-06-24

Queues with Correlated Service Times – the $M/M_D/c$ Model

arXiv:2606.24881v1 Announce Type: new Abstract: This paper studies multi-server queueing systems with correlated service times, modeled as the $M/M_D/c$ queue, which is a natural extension of the recent work by Thapa and Zhao [Thapa-Zhao:2026]. In this model, arrivals follow a Poisson process, while service times across servers exhibit dependence captured by the Marshall–Olkin multivariate exponential distribution (MO-MVED). We first develop a rigorous sample-path construction of the system and establish that the resulting queueing process is a continuous-time Markov chain. We then analyze the stationary behavior of the $M/M_D/c$ model. In the homogeneous case, we derive a complete solution via geometric tail structure and explicit boundary equations, recovering a tractable one-dimensional representation. In the heterogeneous case, we establish a general framework combining a geometric tail with a finite boundary system, and prove existence, uniqueness, and nonnegativity of the stationary distribution. The above results provide a unified analytic framework extending classical $M/M/c$ theory to correlated-service settings, and reveal how dependence among service times fundamentally affects system performance and structure. Beyond the $M/M_D/c$ model, We next study the interplay between Marshall–Olkin service dependence and queue-state Markovianity. On the one hand, Marshall–Olkin dependent service completions are shown to preserve Markovianity for a broad class of queueing systems. On the other hand, if a queueing process admits a Markovian state description without tracking service ages, residual service times, or service phases, then its service mechanism must satisfy a weak multivariate lack-of-memory property and consequently belongs to the Marshall–Olkin family. These results provide a probabilistic foundation for the use of Marshall–Olkin multivariate exponential service times in Markovian queueing models.

15.
Nature (Science) 2026-06-10

Mitochondria directly interact with the nuclear pore complex

Mitochondria regulate cellular processes through direct and indirect interactions with other organelles. A well-studied example has been contact with the endoplasmic reticulum at mitochondrial-associated endoplasmic reticulum membranes1, which control pathways including redox and calcium homeostasis2,3. Recent studies have also reported direct mitochondria–nuclear membrane contacts in cancer cells and yeast that promote pro-survival signalling4,5. Here we identify direct interactions between mitochondria and nuclear pores. Using two unbiased proteomic screens, GST pulldown and BioID, we found that VDAC1 was the top mitochondrial candidate that interacts with the filamentous nuclear pore protein RANBP2. In vitro RANBP2 CRISPR knockout,&nbsp;RANBP2 truncation&nbsp;or site-directed mutagenesis of RANBP2–VDAC1 interacting amino acids resulted in reduced mitochondria–nucleus proximity and decreased nuclear ATP and phosphocreatine levels. This was accompanied by a decline in the levels of the nuclear phosphoproteome and downregulation of pathways involved in histone modification, cellular differentiation and transcriptional regulation in vitro. Moreover, deletion of the RANBP2 C-terminal domain in vivo in mice resulted in embryonic lethality due to cardiac and neural crest differentiation defects. Collectively, these results describe a mechanism by which mitochondria directly interact with the nuclear pore complex, a phenomenon critical for regulation of nuclear energetics and cellular differentiation. Undoubtedly, additional roles of this interaction remain to be revealed. Mitochondria interact directly with the nuclear pore complex via VDAC1–RANBP2&nbsp;binding to sustain nuclear ATP levels.

16.
arXiv (CS.AI) 2026-06-12

CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

arXiv:2606.12666v1 Announce Type: cross Abstract: Screenshot-based mobile GUI agents can operate ordinary smartphone apps through the same visual interface as a human user, but this capability also turns every screen observation into a privacy boundary. During normal task execution, screenshots may expose contacts, messages, photos, files, recommendations, health cues, and other sensitive context that is unrelated to the user's request. We call this problem incidental visual privacy exposure. It is difficult to address with existing defenses: text anonymization misses many visual and inferential cues, while generic privacy masking can remove the evidence and controls that a GUI agent needs to complete the task. This paper presents CAPED, a context-aware pre-upload exposure control layer for mobile GUI agents. CAPED is designed as a phone-side protection layer: before screenshots are released to a remote multimodal agent, it extracts task requirements, uses screen context as a privacy prior, parses visible UI elements, and selectively exposes only content needed for the current task while masking incidental private content. We evaluate CAPED on AndroidWorld for broad task utility and with a controlled 28-task seeded privacy evaluation used as a measurement instrument for trajectory-level incidental leakage. In this seeded evaluation, Full CAPED reduces success-conditioned weighted seeded leakage from 0.766 under raw screenshots to 0.268 while preserving high task utility. A broader AndroidWorld run shows a remaining prototype-level utility cost, but the results support the central claim that screenshot upload should be treated as an explicit device–cloud boundary decision, governed by task-driven selective exposure rather than all-or-nothing screen sharing.

17.
medRxiv (Medicine) 2026-06-22

Development of a Novel Risk Prediction Model for Rheumatoid Arthritis-Associated Interstitial Lung Disease (RA-ILD): A Longitudinal Study

Background: Interstitial lung disease (ILD) is one of the most common and potentially most devastating extra-articular complication of rheumatoid arthritis (RA) and is associated with substantial morbidity and mortality. However, reliable tools for the early identification of ILD in patients with RA remain limited. This study aimed to identify plasma protein biomarkers of RA-ILD and develop an interpretable machine learning model for risk prediction using data from the UK Biobank. Methods: We first evaluated the association between baseline RA and the risk of incident ILD in the UK Biobank using Cox proportional hazards models. Mendelian randomization analysis was then performed to investigate the potential causal relationship between RA and ILD. Finally, we analyzed 2,920 plasma proteins measured using the Olink platform in 781 eligible RA patients. Proteins associated with ILD risk were identified using Cox proportional hazards models and subsequently used to construct eight machine learning models. Model performance was assessed using the receiver operating characteristic curve (ROC) and decision curve analysis. The best-performing model was further interpreted using Shapley additive explanations (SHAP) to evaluate feature importance. Results: Compared with participants without RA, Patients with baseline RA had a significantly higher risk of developing ILD (Hazard ratio: 4.425, 95% CI: 3.549,5.518). The MR supported a potential causal association between RA and ILD (Odds ratio: 1.227, 95% CI: 1.121,1.343). Among the eight machine learning models, the CatBoost model showed the best performance, achieving an area under the curve (AUC) of 0.884 (95% CI: 0.773,0.996). The SHAP analysis identified LAG3, NPC2, and LAMP3 are the three most important plasma protein predictors of ILD development in patients with RA. Conclusion: Plasma proteomics combined with machine learning may provide a promising approach for identifying biomarkers and predicting ILD risk in patients with RA. LAG3, NPC2, and LAMP3 may serve as candidate biomarkers for RA-ILD and warrant further validation. Keywords: Rheumatoid arthritis, Interstitial lung disease, Mendelian randomization, Machine learning, Plasma proteins.

18.
arXiv (CS.CV) 2026-06-15

Self-Evolving Visual Questioner

Vision-language models (VLMs) are typically trained as passive answerers, while their ability to actively ask diverse, non-trivial, visual-centric and grounded questions remains underexplored. Existing visual questioners' performance is bottlenecked by the availability of high-quality training data or the cost of curating them. We show that a VLM can continuously improve itself as a visual questioner without any external supervision. We propose a self-evolving framework that uses a VLM itself as both a proposer and a filter to produce harder, more informative, and visual-centric questions, while maintaining their exploration diversity to avoid training collapse. These questions are then used to train the VLM in both questioner and answerer modes. To evaluate the questioner, we introduce an agentic protocol that assesses questions along perception, reasoning, and diversity dimensions. Experiments across various backbone VLMs show that our method substantially enhances the quality and substantially expands the difficulty boundary of autonomous question generation. Under the same budget, our self-supervision is more effective than training on the static source data. Moreover, the self-evolving questioner remains a competitive or even better answerer.

19.
arXiv (CS.AI) 2026-06-17

Patients With Personality: Realistic Patient Simulation through Controlled Diversity and Selective Disclosure

arXiv:2606.17441v1 Announce Type: cross Abstract: Simulating realistic patient interactions is a key requirement to testing clinical applications of LLMs at scale without time-consuming and expensive user studies. However, existing approaches often lack realism and controllability, often oversharing information unprompted, and failing to capture the wide variability of patient behavior. Here, we introduce PatientsWithPersonality (PWP), a patient simulation framework that generates realistic yet diverse virtual patient responses through explicit personality parametrization over a latent patient state. Grounded in HEXACO, a six-dimensional personality space used to quantify and parameterize human behavioral traits, our approach enables fine-grained control over conversational style, cooperativeness, and information disclosure within a unified framework. In a clinician evaluation, PWP is judged nearly as realistic as recorded human actors and clearly ahead of prior simulators, while being flagged as "too informative" far less often. Conditioning on HEXACO axes yields personas whose configured traits are recoverable by both clinicians and an autorater, span a substantially wider behavioral footprint than the closest baseline, and prevent oversharing. Altogether, our framework paves the way for more accurate and informative LLM benchmarking through our realistic and steerable patient simulator.

20.
arXiv (quant-ph) 2026-06-17

Canonical regularization of the stationary Coulomb problem and an Aufbau-like spectral ordering

arXiv:2606.17359v1 Announce Type: new Abstract: The stationary hydrogen atom has Coulomb degeneracy across orbital levels, whereas the Aufbau/Madelung ordering is an empirical, many-electron rule established in atomic physics. We examine the hydrogen atom through a regularized de Broglie–Bohm representation, in which stationary amplitude current constraints generate separable Sturm–Liouville branches. In this formulation, the radial, orbital, and magnetic sectors acquire canonical Langer-like inverse square corrections. The modified boundary value problems allow analytical solutions and produce a hydrogen-like spectrum with regularized radial and angular indices. Consequently, radial Coulomb quantization acquires an orbital dependent shift, lifting the Coulomb degeneracy and producing a spectral ordering that follows the Aufbau/Madelung sequence. On this basis, we construct the ordering of the regularized de Broglie–Bohm states and show that the spectral structure retains the standard degenerate Rydberg sequence in the l=0 sector. The separated amplitudes are represented by generalized special function branches, including the associated Laguerre, Legendre, and Bessel functions with non-integral parameters arising from regularized separation. Therefore, the treatment is intended as an analytical examination of spectral ordering in a regularized one center Coulomb problem rather than as a replacement for the many electron atomic structure theory. Keywords: de Broglie–Bohm representation; Coulomb spectrum; canonical regularization; Langer correction; Sturm–Liouville equations; Aufbau principle; Madelung ordering; associated Legendre functions; associated Laguerre functions; Bessel functions.

21.
arXiv (CS.CL) 2026-06-15

Cross-Dataset Bloom Question Classification: Supervised Models and Prompted LLMs

Automatic Bloom's taxonomy classification of assessment questions can substantially reduce instructor workload, but labeling is subjective and teacher-dependent. Prior machine learning (ML) and deep learning (DL) approaches reported strong within-dataset results, yet were rarely evaluated in cross-dataset settings, leaving real-world generalizability unclear; meanwhile, LLM effectiveness for Bloom question classification has not been systematically studied. We evaluated the cross-dataset generalization of existing ML/DL methods and assessed LLMs with multiple prompting strategies on five datasets; the best prompting strategy combined in-context examples with course-specific action verbs. Supervised ML/DL models degraded substantially on unseen datasets, whereas LLMs were more stable, suggesting a robust alternative across diverse educational contexts. Based on the best prompting strategy, we also presented a lightweight UI that supports instructors in automatically classifying large question banks; a usability study indicated low workload and high usability.

22.
Nature (Science) 2026-06-10

Measurement of reactor neutrino oscillation with the first JUNO data

Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO)4 is a 20-ktonne liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision5,6. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $${\sin }^{2}{\theta }_{12}=0.3092\,\pm \,0.0087$$ and $$\Delta {m}_{21}^{2}=(7.50\,\pm \,0.12)\times 1{0}^{-5}\,{\mathrm{eV}}^{2}$$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the design of the detector and indicate the readiness of JUNO for resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights the potential of JUNO to push the frontiers of precision neutrino physics and paves the way for its broad scientific programme. The first data of the Jiangmen Underground Neutrino Observatory deliver high-precision neutrino oscillation parameters, improving measurements and demonstrating readiness to determine neutrino mass ordering.

23.
arXiv (CS.CV) 2026-06-17

Pareto LoRA: Mitigating Modality Imbalance in Unified Multimodal Models via Pareto-Optimal Gradient Integration

Unified multimodal models (UMMs) have recently emerged as a promising paradigm for integrating multimodal understanding and generation within a single autoregressive transformer. However, during multimodal instruction tuning, these models often exhibit pronounced modality imbalance: language gradients dominate optimization, thus leading to lower image generation quality, especially under parameter-efficient fine-tuning such as LoRA. In this work, we systematically analyze modality imbalance in LoRA-based fine-tuning of UMMs for interleaved text-image generation. We show that vision modality performance degrades substantially more than text modality performance when compared to unimodal counterparts, and that modality-specific gradients can differ by orders of magnitude across various tasks and layers. Motivated by this observation, we reformulate the multimodal instruction tuning as a bi-objective optimization problem and propose Pareto LoRA, a Pareto-optimal gradient integration strategy that balances the text and image objectives by modulating the gradient direction and strength. Experiments on the CoMM benchmark with Emu2 demonstrate that Pareto LoRA consistently improves multimodal generation balance, achieving up to 44.9% gains in perceptual image quality over vanilla LoRA while maintaining comparable text performance.

24.
arXiv (CS.AI) 2026-06-19

Variable-Length Tokenization via Learnable Global Merging for Diffusion Transformers

arXiv:2606.20076v1 Announce Type: cross Abstract: Latent Diffusion Models (LDMs) have become dominant in visual synthesis, but their quality-compute trade-off is largely constrained by the tokenizer's fixed compression ratio. Variable-length tokenizers (VLTs) promise adaptive compression by varying token counts, allowing diffusion models to flexibly balance quality and compute. However, conventional VLTs modulate length by truncating ordered token sequences, which makes token semantics depend on token position and breaks representational alignment across lengths. This leads to a cross-length shift in the latent distribution that hinders a single variable-length diffusion model from operating effectively. To address this, we propose a novel variable-length tokenizer that modulates length by merging tokens. We show that encouraging similar tokens to merge enables direct cross-length representation alignment when the diffusion transformer operates according to the merging pattern. Since conventional merging methods are data-dependent, making the merging pattern inaccessible during generation, we introduce learnable global merging, which is data-independent, to ensure compatibility with diffusion transformers. On ImageNet 256$\times$256 generation, our merging-based variable-length tokenizer integrated with a diffusion transformer achieves a superior gFID-compute trade-off compared to prior VLT methods. Code is available at [this https URL](https://github.com/movinghoon/lgm)

25.
Nature (Science) 2026-06-17

Spatial distribution of the proteome in the human body and in cancers

Authors:

A detailed, spatially resolved quantitative map of the human proteome is essential for a deeper understanding of human biology and disease1–4. Here we present a comprehensive human proteomic landscape, generated by profiling more than 13,000 proteins across 2,856 samples using data-independent acquisition mass spectrometry. The dataset spans 58 major tissue types, 251 specific tissue subtypes and 25 distinct carcinomas. This resource enables the depiction of spatially resolved proteome trajectories across tissue types and physiological states, including fetal, tumour, adjacent non-tumour and healthy adult tissue, thereby providing insight into both developmental processes and oncogenic progression. Furthermore, quantitative proteomics comparisons across diverse tissue types and states facilitate the indication of organ-specific toxicity, the identification of repurposable anticancer drug candidates and the prioritization of therapeutic targets for cancers. This study establishes a quantitative resource for navigating the proteome in the human body and in common cancers. A spatially resolved map of the human proteome across a variety of healthy tissues and cancers provides wide-ranging insights in developmental biology and oncology, and could aid the identification of therapeutic targets and development of treatments for cancer.