Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier

arXiv:2606.13236v1 Announce Type: cross Abstract: Passive acoustic monitoring holds great promise for ecological inference, yet existing automated tools are typically narrowly trained and non-transferable. We address these limitations with PULSE, a semi-supervised, multi-task framework for Orthoptera bioacoustics, combining weakly-supervised species classification, self-supervised learning on unlabelled field audio, and knowledge distillation from a general-purpose bioacoustic model. Our domain-adapted specialist model outperforms a state-of-the-art general model across all metrics (macro F1: 0.21 vs. 0.07; AUC: 0.74 vs. 0.45; AP: 0.32 vs. 0.19), with active learning further raising F1 to 0.34 and AUC to 0.84. Beyond classification, the learned embeddings encode ecologically meaningful structure, exposed through an interactive visualisation tool for ecological discovery.

02.
arXiv (CS.LG) 2026-06-19

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

arXiv:2606.19491v1 Announce Type: new Abstract: Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $\gamma^{-1}/\|\gamma^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

03.
arXiv (CS.LG) 2026-06-24

Low-rank Updates in Slowly Time-varying Graphs for Spatial-Temporal Signal Interpolation

arXiv:2606.24011v1 Announce Type: cross Abstract: A crucial assumption in graph signal processing (GSP) is the existence of an underlying graph that captures the pairwise similarities between nodes, allowing filters to be designed based on this graph for tasks such as denoising. For spatial-temporal data in which node-to-node similarities evolve over time, a static spatial graph is insufficient. In this paper, to represent slowly time-varying pairwise relationships, we model the graph changes in two consecutive adjacency matrices $P = W^{(2)} - W^{(1)}$ across time as a low-rank matrix. % Specifically, given an initial adjacency matrix $W^{(1)}$ at time $t=1$, we jointly interpolate a signal $x_2$ and estimate $W^{(2)}$ at $t=2$ using both a graph signal smoothness prior for $x_2$ and a low-rank prior on $\P$. We alternate optimization steps. With $W^{(2)}$ fixed, $x_2$ is interpolated by solving a linear system. Alternatively, holding $x_2$ fixed, $W^{(2)}$ is updated via proximal gradient descent (PGD). The proximal mapping of the rank term $Gamma(W^{(2)} - W^{(1)})$ is approximated in linear time using a fast orthogonal matching pursuit (OMP) algorithm that selects a sparse combination of atoms from a dictionary $cR$ formed by the outer products of $W^{(1)}$'s eigenvectors. We unroll iterations of our algorithm into layers to build a lightweight neural network for limited data-driven parameter tuning. Experiments show that our joint optimization achieves better signal interpolation compared to existing time-varying graph models.

04.
medRxiv (Medicine) 2026-06-12

Does the method matter? Evaluating the effectiveness, efficiency and ease of hearing-aid gain self-adjustment

In conventional hearing-aid personalisation, clinicians cannot hear what their patients hear, and patients cannot often reliably detect or describe what they hear. Self-adjustment avoids this issue but requires user controls that adjust hearing-aid signal processing parameters to be effective, efficient and easy. In this study, we explored (a) the roles of interface complexity and stimulus type in the self-adjustment of hearing-aid gain, and (b) how well individuals can adjust one sound to match another to assess the same interfaces and stimuli. Adult hearing-aid users with mild to moderate symmetrical sensorineural hearing loss repeatedly adjusted the gain (a) to their preference from individual prescription (n = 41) and (b) to match their previous preferences from a random starting point (n = 32) using three interfaces representing different bass/mid/treble configurations and three stimuli (music, speech and speech-in-noise). The large interindividual variability in self-adjusted gains clustered into three patterns of deviation from initial prescription: increased relative bass, overall gain reduction, and close to initial prescription. There were no substantial effects of interface nor stimulus on self-adjustment reliability (median {sigma} = 2.8 dB), whereas absolute sound-matching error increased with increasing interface complexity and centre frequency. Neither individual matching accuracy nor questionnaire responses predicted either self-adjusted gains or reliability. Overall, these results show that many - but not all - hearing-aid users can adjust gains with reasonable reliability, and while it can be difficult to predict the behaviour from the individual, the individual applies a similar self-adjustment behaviour across different interfaces and stimuli.

05.
arXiv (CS.CL) 2026-06-25

Graph-Based Phonetic Error Correction of Noisy ASR

Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors are often structured, arising from phonetic similarity rather than random noise, making naive token-level correction insufficient. We propose a structured ASR correction framework, that we call G-SPIN, that combines phonetic graph modeling with contextual language understanding. A graph neural network (GNN) first constructs acoustically plausible candidate neighborhoods for flagged tokens, explicitly restricting the correction search space to phonetic alternatives. A masked language model (MLM) then provides local contextual scoring, and an instruction-tuned large language model (LLM) performs final context-aware re-ranking over this compact candidate set. By decoupling structured phonetic reasoning from contextual semantic selection, our method avoids unconstrained generation while improving correction accuracy. The framework is lightweight, modular, and operates entirely at inference time.

06.
medRxiv (Medicine) 2026-06-17

Brain age gap correlates with DTI-derived microstructural abnormalities in multiple sclerosis.

Background: Brain age gap (BAG) is increased in multiple sclerosis (MS), but whether it reflects microstructural pathology beyond conventional atrophy remains unclear. Objective: To test whether BAG is elevated in MS and correlates with conventional and diffusion tensor imaging (DTI) abnormalities relative to healthy controls. Methods: A case-control study of 43 people with MS and 18 healthy controls was performed. BAG was estimated from T1-weighted MRI using brainageR. Controls were used as MRI reference distributions. MRI values were expressed as deviation z-scores and correlated with BAG within MS. Conventional MRI and DTI domains were analysed using age/sex-adjusted partial correlations with domain-wise Benjamini-Hochberg FDR correction, where appropriate. Results: BAG was higher in MS than controls (4.79 vs -2.58 years; p

07.
arXiv (CS.AI) 2026-06-11

Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends

arXiv:2606.12207v1 Announce Type: cross Abstract: Embodied intelligence now spans navigation, household assistance, manipulation, autonomous driving, aerial agents, and multimodal large-model control. This expansion has made benchmark construction a central bottleneck for reliable evaluation. Unlike static datasets, embodied benchmarks combine task specifications, environments, robot data, demonstrations, annotations, metrics, evaluation scripts, and release policies into a single evaluation system. This survey reviews the literature through a five-stage construction pipeline: requirement and task construction, data acquisition, data cleaning and annotation, benchmark suite generation and metric definition, and evaluation execution with diagnostic feedback. For each stage, the survey analyzes the transition from manual curation to traditional automation, foundation-model assistance, and agentic closed-loop workflows. It also compares qualitative construction costs across human labor, data and asset acquisition, compute and simulation, validation and debugging, governance and maintenance, and rework risk. The main conclusion is that automation does not simply reduce benchmark cost. Instead, it often shifts cost toward validation, auditability, version control, and long-term governance. Progress in embodied evaluation will therefore depend not only on larger benchmark suites, but also on construction pipelines that are diagnosable, auditable, and responsibly refreshable.

08.
arXiv (CS.LG) 2026-06-19

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

arXiv:2606.20107v1 Announce Type: new Abstract: Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.

09.
arXiv (CS.AI) 2026-06-24

Sensing Intelligence as a Trainable Metamaterial Property

arXiv:2605.23967v2 Announce Type: replace-cross Abstract: In biological systems, sensing is not performed by the brain alone: the body deforms, vibrates, and filters external stimuli before they are transduced into neural signals. In engineered systems, this processing burden is placed largely on electronics and computation, while the mechanical body is usually designed only for strength and stability. Here, we present sensing intelligence as a trainable property of the body. We show that the geometry of a metamaterial can be optimized to reshape external stimuli into internal signals that are easier for a neural network to interpret. Rather than hand-designing this physical preprocessing, we let the neural network train its own body for sensing by backpropagating the sensing loss to the body's design parameters through differentiable simulation. Across numerical and experimental sensing scenarios, the optimized body improves sensing accuracy by up to fivefold or reduces the number of required electronic sensors by nearly an order of magnitude.

10.
arXiv (CS.CL) 2026-06-11

Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation

While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability to adhere to a provided reference. We identify a critical failure mode of such reference-based LLM QA evaluation: when the provided reference conflicts with the judge model's parametric knowledge, the resulting scores become unreliable, substantially degrading evaluation fidelity. To study this phenomenon systematically, we introduce a controlled swapped-reference QA framework that induces reference-belief conflicts. Specifically, we replace the reference answer with an incorrect entity and construct diverse pairings of original and swapped references with correspondingly aligned candidate answers. Surprisingly, grading reliability drops sharply under swapped references across a broad set of judge models. We empirically show that this vulnerability is driven by judges' over-reliance on parametric knowledge, leading judges to disregard the given reference under conflict. Finally, we find that this failure persists under common prompt-based mitigation strategies, highlighting a fundamental limitation of LLM-as-a-judge evaluation and motivating reference-based protocols that enforce stronger adherence to the provided reference.

11.
bioRxiv (Bioinfo) 2026-06-24

Statistical tests for bivariate spatial association across multi-omics data with disjoint coordinates

Spatial biology has entered a new era of multimodal profiling, with multiple, high-dimensional spatial omics types being measured on consecutive tissue slices, or co-assayed on the same slice. Interest then lies in statistical testing for spatial association between the features of the different modalities, to gain insight in biological processes. One major challenge is the multitude of bivariate combinations, leading to high computational demands. Another difficulty is the difference in spatial resolution between technologies, implying no one-to-one matching between the measurement spots of the two modalities, even after alignment. As a result, common statistical measures such as joint distributions and correlations are not defined, and tests need to rely on spatial vicinity only. Moreover, we argue that many existing bivariate association tests address an inappropriate null hypothesis, or make inappropriate assumptions, both implying absence of spatial autocorrelation in any of the features and leading to misleading conclusions. As a remedy, we modify tests for the detection of spatially variable genes (Moran's I, Gaussian processes and generalized additive models (splines)) to derive bivariate tests across modalities with non-overlapping coordinate sets and provide variance estimators that do account for spatial autocorrelation. We develop inference methods for single sections as well as for replicated experiments with multiple sections, and compare their performance in nonparametric and parametric simulations. Finally, we apply the newly developed methods to two co-assayed spatial transcriptomics and metabolomics datasets from mouse and human. The full suite of tests is available from github.com/sthawinke/sbivar as the R-package sbivar.

12.
medRxiv (Medicine) 2026-06-11

Ferritin across long-term conditions in England: cross-sectional primary care study

Background Iron deficiency (ID) is a readily treatable condition once identified. Ferritin is the primary diagnostic marker, but cut-offs vary and inflammation complicates interpretation in patients with long-term conditions (LTCs). Aim To describe ferritin distribution and the prevalence of threshold-defined low ferritin in adults with and without LTCs in primary care. Design and setting Cross-sectional observational study using routinely collected electronic health records from a national primary care database in England (1st January 2015 to 31st December 2021). Method Adults with >1 ferritin test in Clinical Practice Research Datalink (CPRD) Aurum were included. LTCs were identified using validated primary-care code lists. Outcomes included ferritin distribution and threshold-defined ID prevalence using World Health Organization (WHO) (

13.
arXiv (CS.LG) 2026-06-17

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv:2606.17526v1 Announce Type: new Abstract: Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose MGUP, a novel mechanism for selective updates. MGUP augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, MGUP seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as MGUP-AdamW, MGUP-Lion, and MGUP-Muon. Under standard assumptions, we provide theoretical convergence guarantees for MGUP-AdamW (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.

14.
arXiv (CS.LG) 2026-06-18

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

arXiv:2606.19186v1 Announce Type: cross Abstract: Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

15.
arXiv (math.PR) 2026-06-15

Longest weakly increasing subsequences of discrete random walks on the integers with heavy tailed distribution of increments

arXiv:2603.29047v2 Announce Type: replace-cross Abstract: We investigate the behavior of the length of the longest weakly increasing subsequences (weak LIS) of $n$-step random walks with nonzero integer increments $k = \pm 1, \pm 2, \dots$ given by a symmetric heavy tailed mass distribution proportional to $|k|^{-1-\alpha}$ for several values of the real parameter $\alpha > 0$ together with that of the simple random walk ($k=\pm 1$), to which the $n$-step heavy tailed walks reduce when $\alpha$ grows large enough that step jumps beyond $\pm 1$ become essentially absent on the scale of $n$. By means of exploratory fits, weighted nonlinear least squares, and nested-model comparisons, we found that the sample average length $\langle{L_{n}}\rangle$ scales like $\langle{L_{n}}\rangle \sim \sqrt{n}\log{n}$ when the distribution of increments has finite variance ($\alpha > 2$) and $\langle{L_{n}}\rangle \sim n^{\theta}$ with a varying exponent $\theta > 0.5$ when the variance is infinite ($\alpha \leq 2$). Distributional diagnostics indicate that the bulk of the $L_{n}$ distribution is very well-approximated by a lognormal model, though systematic deviations are observed in the tails. Our results corroborate and expand upon previous results for the LIS of other types of heavy-tailed random walks and raise a conjecture as to whether the distribution of $L_{n}$ is given, or can be effectively described, by a lognormal distribution.

16.
arXiv (CS.CV) 2026-06-12

ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation

Customizing image generation remains a core challenge in controllable image synthesis. For single-concept generation, maintaining both identity preservation and prompt alignment is challenging. In multi-concept scenarios, relying solely on a prompt without additional conditions like layout boxes or semantic masks, often leads to identity loss and concept omission. In this paper, we introduce ShowFlow, a comprehensive framework designed to tackle these challenges. We propose ShowFlow-S for single-concept image generation, and ShowFlow-M for handling multiple concepts. ShowFlow-S introduces a KronA-WED adapter, which integrates a Kronecker adapter with weight and embedding decomposition, and together with a novel Semantic-Aware Attention Regularization (SAR) training objective to enhance single-concept generation. Building on this foundation, ShowFlow-M directly reuses robust models learned by ShowFlow-S to support multi-concept generation without extra conditions, incorporating a Subject-Adaptive Matching Attention (SAMA) and a Layout Consistency guidance as the plug-and-play module. Extensive experiments and user studies validate ShowFlow's effectiveness, highlighting its potential in real-world applications like advertising and virtual dressing. Our source code will be publicly available at: https://htrvu.github.io/showflow.

17.
arXiv (CS.CV) 2026-06-16

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

In the context of novel view synthesis, 3D Gaussian Splatting (3DGS) has recently emerged as an efficient and competitive counterpart to Neural Radiance Field (NeRF), enabling high-fidelity photorealistic rendering in real time. Beyond novel view synthesis, the explicit and compact nature of 3DGS enables a wide range of downstream applications that require geometric and semantic understanding. This survey provides a comprehensive overview of recent progress in 3DGS applications. It first reviews the reconstruction preliminaries of 3DGS, followed by the problem formulation, 2D foundation models, and related NeRF-based research areas that inform downstream 3DGS applications. We then categorize 3DGS applications into three foundational tasks: segmentation, editing, and generation, alongside additional functional applications built upon or tightly coupled with these foundational capabilities. For each, we summarize representative methods, supervision strategies, and learning paradigms, highlighting shared design principles and emerging trends. Commonly used datasets and evaluation protocols are also summarized, along with comparative analyses of recent methods across public benchmarks. To support ongoing research and development, a continually updated repository of papers, code, and resources is maintained at https://github.com/heshuting555/Awesome-3DGS-Applications.

18.
arXiv (math.PR) 2026-06-11

An Information-Theoretic Analysis of Threshold Group Testing

arXiv:2606.11353v1 Announce Type: cross Abstract: We study the Threshold Group Testing (TGT) problem in the noiseless and non-adaptive setting, where the objective is to exactly recover a sparse binary vector from pooled tests, using as few tests as possible. In TGT, each test applied to a subset of items returns a positive outcome if the number of 1's (defective items) in that subset meets or exceeds a specified threshold, and has a negative outcome otherwise. We investigate how the complexity of TGT compares to that of Classical Group Testing (CGT), corresponding to the special case of the threshold equal to one, and analyse the impact of increasing the threshold on the required number of tests. Our main contribution is the derivation of a sharp information-theoretic phase transition at $c_{\mathrm{inf}}^{\mathrm{TGT}}k\log(n/k)$ (non-adaptive) tests for TGT within the constant-column test design. The threshold constant $c_{\mathrm{inf}}^{\mathrm{TGT}}$ is expressed as a function of the prevalence of defectives and the threshold value. Our upper bound is derived under an analytic assumption, and we verify that this assumption is satisfied for a threshold value of 2. The value of $c_{\mathrm{inf}}^{\mathrm{TGT}}$ reveals that TGT on the constant-column design has the same information-theoretic behaviour as CGT in the low-prevalence regime. Yet, strikingly, at higher prevalences, the threshold leads to a significant reduction in the number of tests. On the other hand, we provide evidence that when the asymptotic proportion of defective items is positive, TGT actually becomes strictly harder than CGT (excluding trivial reductions).

19.
arXiv (CS.CV) 2026-06-18

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method – a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

20.
arXiv (CS.LG) 2026-06-16

Convergence Rate Analysis of the AdamW-style Shampoo: Unifying One-Sided and Two-Sided Preconditioning

arXiv:2601.07326v4 Announce Type: replace-cross Abstract: This paper studies AdamW-style Shampoo, an effective variant of the classical Shampoo that won the external tuning track of the AlgoPerf neural network training competition. Our analysis unifies one-sided and two-sided preconditioning. When the exponents of the two preconditioners sum to $1/2$, we establish the convergence rate $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$, where $K$ represents the number of iterations, $(m,n)$ denotes the dimensions of the matrix-valued parameters, and $C$ matches the constant appearing in the optimal convergence rate of SGD. Theoretically, the nuclear norm and Frobenius norm satisfy $||\nabla f(X)||_F\leq ||\nabla f(X)||_*\leq \sqrt{\min\{m,n\}}||\nabla f(X)||_F$, which suggests that our convergence rate is analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case where $||\nabla f(X)||_*= \Theta(\sqrt{\min\{m,n\}})||\nabla f(X)||_F$ and $m$ and $n$ are of comparable magnitude. Then, we extend our analysis to settings where the preconditioning exponents do not sum to 1/2, and establish convergence with an explicit but more involved rate.

21.
arXiv (CS.CV) 2026-06-12

OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

We present OpenMedQ, a medical vision-language model pretrained on the broadest fully-open medical mix to date: 14 datasets totaling ~3.35M pretraining samples spanning pathology, radiology, microscopy, and text-only clinical QA. OpenMedQ reaches state-of-the-art BLEU-1 on PathVQA (75.9), beating Med-PaLM M variants up to 562B parameters (~80x larger), and matches the best reported VQA-MED BLEU-1 (64.5). Its vision encoder, transferred to 8 unseen medical classification benchmarks under an identical downstream recipe, obtains the highest average macro-F1 (0.757) among BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). We release our code and an interactive demo is publicly available as a reproducible baseline for the community.

22.
arXiv (CS.LG) 2026-06-16

Assessing Predictive Models for Fairness Based on Movement Patterns

arXiv:2605.23234v3 Announce Type: replace Abstract: Assessing the spatial fairness of predictive models involves establishing whether they are statistically penalizing (favoring) individuals associated with certain geographical locations. Literature on this topic makes the fundamental assumption that each individual is assigned to a single geographical location (e.g., place of residence). However, fairness with respect to the set of locations where one has been, i.e., their movement patterns over different regions, also matters when fairness is considered. Consequently, we argue that it is necessary to generalize the notion of spatial fairness to also include movement patterns, leading to the novel problem of assessing predictive models for fairness relative to the movements of individuals. To deal with this problem, we propose an approach that first associates the movements of individuals to certain geographic regions, considering multiple spatial partitions with different resolutions and alignments, and then employs a suitable spatial scan statistic to assess whether a predictive model is fair based on movement patterns. In the experimental evaluation, we study the performance of our approach over thousands of synthetic unfair datasets, showing that it is effective at detecting this new type of unfairness and at retrieving the set of objects treated unfairly, while localization performance exhibits a consistent multi-resolution trade-off.

23.
arXiv (quant-ph) 2026-06-25

Quantum Optimal Control Using MAGICARP: Combining Pontryagin's Maximum Principle and Gradient Ascent

arXiv:2505.21203v2 Announce Type: replace Abstract: We introduce the MAGICARP algorithm, a numerical optimization method for quantum optimal control problems that combines the structure provided by Pontryagin's Maximum Principle (PMP) and the robustness of gradient ascent techniques, such as GRAPE. MAGICARP is formulated as a "shooting technique", aiming to determine the appropriate initial adjoint momentum to realize a target quantum gate. This method naturally incorporates time and energy optimal constraints through a PMP-informed pulse structure. We demonstrate MAGICARP's effectiveness through illustrative numerical examples, comparing its performance to GRAPE and highlighting its advantages in specific scenarios.

24.
arXiv (CS.CV) 2026-06-18

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schrödinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8 HU on simulated noisy data and 152.0 HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19 s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135 s) and surpassing DiffusionGAN (0.58 s), the second fastest. This combination of accuracy and efficiency indicates that I$^2$SB has potential for real-time or clinical deployment.

25.
arXiv (CS.AI) 2026-06-24

LemonHarness Technical Report

arXiv:2606.24311v1 Announce Type: new Abstract: As large language model (LLM) agents are applied to longer tasks, they increasingly modify workspace state across multiple rounds of iteration. However, agents typically observe only tool outputs and log fragments, while the actual state changes occur in the file system. Without explicit workspace boundaries, state-changing operations such as file writes and temporary artifact generation may scatter changes across paths. Over time, these weakly constrained changes accumulate, making states such as modified files difficult to track. This paper presents LemonHarness, an integrated execution framework for long-horizon agents. LemonHarness establishes an explicit execution boundary by constraining state-changing operations within a clearly defined workspace and bringing model invocation, tool execution, and rule knowledge within a single controlled boundary. State-changing operations, including file writes, dependency installation, and temporary artifact creation, are executed through structured tool interfaces, with execution feedback recorded as observations available to subsequent model decisions. The system also introduces a reusable rule knowledge base, which turns recurring execution rules and acceptance criteria into runtime knowledge. LemonHarness further adds a time-aware execution mechanism that exposes elapsed and remaining budget to the model, so it can rebalance exploration, implementation, and validation effort as time pressure shifts and avoid timeouts from long waits or excessive verification. On Terminal-Bench 2.0, LemonHarness_GPT-5.3-CodeX reached 84.49% accuracy over 445 trials; pairing the same framework with the stronger GPT-5.5 backbone raised the average accuracy to 86.52% across five jobs. The results suggest that a unified runtime boundary, callable rule knowledge, and time-aware execution can improve the stability of long-horizon agent execution.