Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-19

Steady-state entanglement of spin qubits mediated by nonreciprocal and chiral magnons

arXiv:2509.13094v3 Announce Type: replace Abstract: We propose a hybrid quantum system in which a magnet supporting non-reciprocal magnons, chiral magnons, or both mediates the dissipative and unidirectional coupling of spin qubits. By driving the qubits, the steady state of this qubit-qubit coupling scheme becomes the maximally entangled Bell state. We devise a protocol where the system converges to this entangled state and benchmark it including qubit decay and dephasing. The protocol is numerically tested on a hybrid system consisting of nitrogen-vacancy (NV) centers coupled to magnon surface modes of an yttrium iron garnet (YIG) film. We show that the dephasing time of the NV centers forms the bottleneck for achieving the entanglement of NV centers separated by a distance within the magnon coherence length. Our findings identify the key technological requirements and demonstrate a viable route toward steady-state entanglement of solid-state spins over distances of several microns using magnonic quantum networks, expanding the toolbox of magnonics for quantum information purposes.

02.
arXiv (CS.AI) 2026-06-11

MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

arXiv:2606.12018v1 Announce Type: new Abstract: We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, multi-modal data pertinent to social intelligence is precisely localized. Furthermore, relevant long-tail events are identified, extracted, and rendered as formatted, explicit text. This formatting strategy prevents critical long-tail information from being overshadowed by head events and environmental noise during the tokenization process. Specifically, we integrate Test-Time Adaptation (TTA) across the entire reasoning pipeline, encompassing the extraction and representation of long-tail events, Chain-of-Thought (CoT) prompting, and self-reflection. This TTA mechanism is also distillation-enhanced, utilizing Low-Rank Adaptation (LoRA) to fine-tune the foundation model exclusively for instance-level reasoning. Extensive evaluations against various open-source and proprietary AI models across multiple benchmarks demonstrate the effectiveness of the proposed framework. With around 30% of training data from IntentTrain, we achieve state-of-the-art results. Codes are available at https://github.com/eeee-sys/MODF-SIR, demo is available at https://huggingface.co/spaces/Harry-1234/MODF-SIR, LoRA is available at https://huggingface.co/Harry-1234/MODF-SIR and the dataset for training router is available at https://huggingface.co/datasets/Harry-1234/IntentRouterTrain.

03.
arXiv (CS.CV) 2026-06-15

OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains

Current automated pipelines for audio-visual Question Answering (QA) generally adopt a ``video-caption-QA'' paradigm. However, these methods typically segment videos into short clips and generate separate descriptions for audio and visual modalities. This decoupled processing severs inherent associations between sounds and their visual sources, while independent clip processing often causes inconsistent descriptions of the same entity across segments. Furthermore, coupling long-text comprehension and QA synthesis into a single step often restricts models to localized events, yielding questions lacking long-term temporal connections and deep cross-modal reasoning. To address these issues, we propose an automated data engine featuring two mechanisms: (1) Entity-Anchored Video Scripting transforms videos into structured scripts, comprising summaries, main entity lists, and segment-wise audio-visual descriptions. The entity list serves as a global prior to ensure cross-segment referential consistency and reconstruct audio-visual associations. (2) Clue-Guided QA Generation prompts models to first mine cross-segment, multimodal clues from the script, and subsequently generate QA pairs based on these high-value clues. Leveraging this pipeline, we construct the instruction-tuning dataset OmniVideo-100K and a human-verified test set, OmniVideo-Test. Fine-tuning VITA-1.5, Qwen2.5-Omni-7B and Qwen3-Omni-30B on OmniVideo-100K yields performance gains of up to 20.59% on OmniVideo-Test, demonstrating strong generalization (up to 12.64% improvements) across established benchmarks like Daily-Omni and JointAVBench.

04.
arXiv (CS.LG) 2026-06-15

DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

arXiv:2606.14116v1 Announce Type: new Abstract: The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

05.
arXiv (CS.CV) 2026-06-11

FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback

We present FronTalk, a benchmark for front-end code generation that pioneers the study of a unique interaction dynamic: conversational code generation with multi-modal feedback. In front-end development, visual artifacts such as sketches, mockups and annotated creenshots are essential for conveying design intent, yet their role in multi-turn code generation remains largely unexplored. To address this gap, we focus on the front-end development task and curate FronTalk, a collection of 100 multi-turn dialogues derived from real-world websites across diverse domains such as news, finance, and art. Each turn features both a textual instruction and an equivalent visual instruction, each representing the same user intent. To comprehensively evaluate model performance, we propose a novel agent-based evaluation framework leveraging a web agent to simulate users and explore the website, and thus measuring both functional correctness and user experience. Evaluation of 20 models reveals two key challenges that are under-explored systematically in the literature: (1) a significant forgetting issue where models overwrite previously implemented features, resulting in task failures, and (2) a persistent challenge in interpreting visual feedback, especially for open-source vision-language models (VLMs). We propose a strong baseline to tackle the forgetting issue with AceCoder, a method that critiques the implementation of every past instruction using an autonomous web agent. This approach significantly reduces forgetting to nearly zero and improves the performance by up to 9.3% (56.0% to 65.3%). Overall, we aim to provide a solid foundation for future research in front-end development and the general interaction dynamics of multi-turn, multi-modal code generation. Code and data are released at https://github.com/shirley-wu/frontalk

06.
arXiv (CS.LG) 2026-06-15

High-Frequency Pricing at Scale for E-Commerce

arXiv:2606.13741v1 Announce Type: new Abstract: This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

07.
bioRxiv (Bioinfo) 2026-06-18

Metrics for Evaluating Biological AI Model Predictive Accuracy at the Data-Substrate Level

作者:

Reports in the biological literature disagree on whether a given model can predict a biological outcome from a given data sample — one study finding a model capable, another, on the same kind of data, finding it is not. This is particularly a challenge in relation to LLMs–where the models are large and opaque, with weights and training data inaccessible.textbf{ }Such disagreements cannot be settled by directly inspecting the model. To address this challenge, we considertextbf{ }an alternative approach: assessing whether the data sample is adequate to support the prediction asserted. For a given dataset, its substrate — the underlying structure of the data — determines what any model can recover, independent of architecture or capacity. At the same time, predicting the present state of a biological process and predicting the direction of its future change are different tasks; the second is supportable among AI models only where the data encode direction as determinable from the state — a property we call encoding — and is unsupportable where the same observed state precedes change in opposite directions — a property we call non-identifiability, in the informational rather than the statistical sense. We introduce two generic metrics, Predictive Blindness Risk (PBR) and Prediction Indeterminacy Measure (PIM), that evaluate a data substrate for predictive accuracy directly — without access to model weights, architecture, or training data — and locate the regions of a data substrate where a predictive claim can be supported and where it cannot. Using human biological subjects, we employ the Yale Brain Metastases Longitudinal Data (1,430 human subjects; 11,892 MRI studies; four sequences) and show that direction of change was non-identifiable across regions encompassing the majority of transitions; a nonlinear AI model gained essentially nothing over majority-direction prediction there while recovering direction near-perfectly where the state encoded it; and model accuracy tracked data-substrate resolvability continuously (Spearman {rho} = -0.95 to -1.00). The metrics adjudicate, before any model is trusted and from the data alone, where claims of predictive accuracy — of state, or of the law of change — can be supported.

08.
arXiv (CS.CL) 2026-06-16

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

作者:

Where an LLM sits in an agent memory pipeline – between the recall plane that retrieves stored facts (extensively benchmarked) and the control plane that mutates them via supersede, release, purge (largely untested) – shapes which forgetting failure modes the system recovers. Comparing thirteen system configurations on a 385-case adversarial surface, we observe three placement regimes with partly complementary coverage: deterministic primitives suffice for lexical/temporal categories but fail canonicalization (5% on identifier-obfuscation, 0% on cross-lingual); inscribe-time LLM recovers canonicalization (100%) but cannot help intent-aware deletion (0% on prefix-collision and compound-fact); a mutation-time hook recovers intent-aware deletion (78-85%) and brightens nearly all categories simultaneously (91.7-93.2% overall, $0.17 per 385-case run, 2.3s/case mutation latency vs. 64-191ms/case deterministic, recall path unchanged). We expose the trade-off via ForgetEval, a 1000-case templated suite plus a 385-case adversarial layer (132 hand-crafted + 253 LLM-drafted oracle-validated) scored by deterministic substring match, paired with a six-method Adapter Protocol with honest N/A scoring that lets heterogeneous memory stores enter in 130 lines. Admission is corroborated by 10-annotator IAA (Fleiss' kappa = 0.958) and a 77-case external-authored subset (four blind contributors) that replicates the canonicalization asymmetry and amplifies the joint-placement lift (+27.8 pt). Production failures are predominantly forgetting failures rather than recall failures, yet existing benchmarks measure only recall. ForgetEval and all adapters are released under MIT.

09.
arXiv (CS.CL) 2026-06-16

It's About Time: Temporal References in Emergent Communication

Emergent communication enables agents to develop bespoke languages that improve communication efficiency. Despite the known importance of temporal structure in natural language, there is no existing evidence of temporal references in emergent communication. This paper addresses this gap, by exploring how agents communicate about temporal relationships. We analyse three potential factors for the emergence of temporal references: environmental, external, and architectural. Our experiments demonstrate that altering the loss function is insufficient for temporal references to emerge; rather, architectural changes are necessary. A minimal change in agent architecture, using a different batching method, allows the emergence of temporal references. This modified design is compared with the standard architecture in a temporal referential games environment, which emphasises temporal relationships. The analysis shows that over 95% of the agents with the modified batching method develop temporal references, without changes to their loss function. We consider temporal referencing necessary for future improvements to the agents' communication efficiency, enabling future agents to use a closer to optimal coding as compared to purely compositional languages. These insights provide the basis for incorporation of temporal references into other emergent communication settings, and investigation of other aspects of language.

10.
arXiv (quant-ph) 2026-06-17

Canonical regularization of the stationary Coulomb problem and an Aufbau-like spectral ordering

arXiv:2606.17359v1 Announce Type: new Abstract: The stationary hydrogen atom has Coulomb degeneracy across orbital levels, whereas the Aufbau/Madelung ordering is an empirical, many-electron rule established in atomic physics. We examine the hydrogen atom through a regularized de Broglie–Bohm representation, in which stationary amplitude current constraints generate separable Sturm–Liouville branches. In this formulation, the radial, orbital, and magnetic sectors acquire canonical Langer-like inverse square corrections. The modified boundary value problems allow analytical solutions and produce a hydrogen-like spectrum with regularized radial and angular indices. Consequently, radial Coulomb quantization acquires an orbital dependent shift, lifting the Coulomb degeneracy and producing a spectral ordering that follows the Aufbau/Madelung sequence. On this basis, we construct the ordering of the regularized de Broglie–Bohm states and show that the spectral structure retains the standard degenerate Rydberg sequence in the l=0 sector. The separated amplitudes are represented by generalized special function branches, including the associated Laguerre, Legendre, and Bessel functions with non-integral parameters arising from regularized separation. Therefore, the treatment is intended as an analytical examination of spectral ordering in a regularized one center Coulomb problem rather than as a replacement for the many electron atomic structure theory. Keywords: de Broglie–Bohm representation; Coulomb spectrum; canonical regularization; Langer correction; Sturm–Liouville equations; Aufbau principle; Madelung ordering; associated Legendre functions; associated Laguerre functions; Bessel functions.

11.
arXiv (CS.CV) 2026-06-16

ATV-Net: Adaptive Triple-View Network with Dynamic Feature Fusion

Recent advances in semantic segmentation rely heavily on attention-based and transformer-style architectures that, while accurate, introduce considerable architectural complexity and computational cost. This paper asks whether a compact CNN-based segmentation head can remain competitive by adaptively selecting useful receptive-field evidence. We propose ATV-Net, an Adaptive Triple-View Network that attaches a lightweight head to a conventional backbone. The head organizes three complementary views – point-wise, neighborhood-level, and enlarged context – and fuses them through an Adaptive Decision Gate that generates image-dependent weights from global feature statistics. This allows the model to emphasize different receptive-field responses according to scene content, without dense attention or multi-scale aggregation. Experiments on Cityscapes and Pascal VOC 2012 show that ATV-Net achieves 80.31% mIoU on Cityscapes with ResNet-101 and 80.90% with ConvNeXt-Tiny, and 86.7% and 88.5% mIoU on Pascal VOC 2012, respectively, while requiring fewer GFLOPs than representative context-aggregation and attention-based heads. The results indicate that adaptive receptive-field selection remains a practical and effective design choice for CNN-based semantic segmentation.

12.
arXiv (CS.CV) 2026-06-25

StyleFusion360: View-Consistent Head Stylization via Adaptive Style Modulation

3D head stylization enables expressive reimagining of human faces for creative visual experiences in digital media. Existing 3D-aware methods often require computationally intensive optimization or per-style fine-tuning, limiting flexibility and user control. To overcome these challenges, we introduce StyleFusion360, a diffusion-based framework for multi-view consistent, identity-preserving 3D head stylization from a single style reference image, without per-style training. Our approach enhances the Style Fusion Attention mechanism with a style-conditioned key modulation mechanism that aligns content and style representations for fine-grained and controllable stylization. We further provide a user-controllable slider for adjusting stylization intensity. In addition, StyleFusion360 supports local multi-edit stylization, enabling targeted edits such as modifying hair or eyes independently. Extensive experiments on FFHQ and RenderMe360 demonstrate that StyleFusion360 produces high-quality, controllable, and visually compelling stylizations, outperforming state-of-the-art GAN- and diffusion-based methods across diverse style domains.

13.
medRxiv (Medicine) 2026-06-23

Differential Recovery Trajectories of Emergency Otolaryngologic Conditions across the COVID-19 Pandemic: A Six-year Longitudinal Study from an Urban Emergency Center

作者:

Objective: The COVID-19 pandemic markedly altered social activity patterns, healthcare utilization, and the epidemiology of infectious diseases. However, its long-term impact on emergency otolaryngologic conditions remains incompletely understood. This study investigated long-term trends in emergency otolaryngologic conditions before, during, and after the COVID-19 pandemic using comprehensive data from a large urban emergency clinic in Osaka, Japan. Methods: All new otolaryngologic outpatients who visited the Chuo Emergency Medical Clinic (CEMC) in Osaka City between 2019 and 2024were retrospectively analyzed. Annual trends in absolute numbers and relative proportions of emergency otolaryngologic conditions were examined by anatomical region and disease category, using 2019 as the pre-pandemic baseline. Results: A total of 99,324 new otolaryngologic outpatients were analyzed. Overall emergency visits declined sharply to approximately half of baseline in 2020, followed by a gradual but incomplete recovery toward pre-pandemic levels by 2024. Most anatomical categories declined to 45-61% of baseline in 2020 and exhibited gradual yet incomplete recovery through 2023; in stark contrast, laryngeal conditions diverged sharply, surging beyond pre-pandemic levels after 2022. Acute infectious otorhinolaryngologic diseases fell to 23-50% of baseline in 2020 and showed variable recovery (69-103%) by 2024. Notably, laryngitis exceeded the baseline, reaching 132% in 2023, whereas epiglottic edema exhibited only a transient increase approaching the baseline in 2021. Non-infectious emergency conditions generally showed only a marginal decrease in 2020 and remained relatively stable throughout the study period, except for sudden sensorineural hearing loss (SSNHL), which dropped sharply to 39% of the baseline in 2020 and remained persistently reduced through 2024. Traumatic emergencies declined variably to 53-81% of the baseline in 2020, followed by an incomplete recovery, reaching only 55-69% by 2024. Conclusion: Emergency otolaryngologic conditions demonstrated heterogeneous recovery trajectories following the COVID-19 pandemic. While most infectious and traumatic conditions gradually but incompletely normalized, laryngeal conditions showed a distinct post-pandemic surge, and SSNHL remained persistently suppressed. These findings reveal heterogeneous, condition-specific recovery trajectories that reflect both genuine shifts in community pathogen burden, true traumatic incidence, and persistent alterations in healthcare-seeking behaviors, insights essential for resource allocation during future public health emergencies.

14.
arXiv (CS.AI) 2026-06-12

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

arXiv:2606.13192v1 Announce Type: new Abstract: User experience (UX) centered on usability, perceived consistency, and functional clarity is fundamental to real-world user interfaces (UI). The application of multimodal large language models (MLLMs) in the field of user interfaces is evolving rapidly, such as visual element grounding, graphical user interface (GUI) agents, and design-to-code generation. However, research efforts on evaluating UX based on UI screenshots are still immature. To address this, we propose UXBench, a novel multimodal benchmark consisting of 2,000 VQA data samples designed to assess MLLMs' ability to perform UI-based reasoning. UXBench includes 8 tasks based on real-world UI screenshots that require fine-grained diagnosis of UX issues across layout relationships, visual hierarchy, and content consistency. Our extensive evaluation of mainstream MLLMs shows that they remain fundamentally limited in their capacity for UI-based reasoning. The results underscore the need for further advancements in this area. To bridge this gap, we propose UI-UX, an MLLM based on Qwen3-VL-4B-Thinking foundation model and enhanced via reinforcement learning with two key innovations: a reward routing mechanism that dynamically balances perceptual understanding and logical reasoning during inference, and an asymmetric transition reward that suppresses redundant or insufficient reasoning steps. Experiments demonstrate that UI-UX achieves state-of-the-art (SOTA) performance on UXBench, attaining an accuracy of 0.7963 – surpassing Claude-4.5-Sonnet's 0.6550 – while exhibiting strong generalization across diverse UI tasks and maintaining low inference latency.

15.
arXiv (quant-ph) 2026-06-11

On the Addressability Problem on CSS Codes

arXiv:2502.13889v4 Announce Type: replace Abstract: Recent discoveries in asymptotically good quantum codes have intensified research on their application in quantum computation and fault-tolerant operations. This study focuses on the addressability problem within CSS codes: we ask what circuits might implement logical gates on strict subsets of logical qubits. With some notion of fault-tolerance, we prove several impossibility results: for CSS codes with non-zero rate, one cannot address a logical $H$, $HS$, $SH$, or $\mathsf{CNOT}$ to any non-empty strict subset of logical qubits using a circuit made only from 1-local Clifford gates. Furthermore, we show that one cannot permute the logical qubits in a code purely by permuting the physical qubits, if the rate of the code is (asymptotically) greater than 1/3 and the distance is at least 3. We can show a similar no-go result for $\mathsf{CNOT}$s and $\mathsf{CZ}$s between two such high-rate codes, albeit under a more restrictive assumption on the circuit, which we call "global" (though recent addressable CCZ gates use global circuits). This work pioneers the study of distance-preserving addressability in quantum codes, mainly by considering automorphisms of the code. This perspective offers new insights and potential directions for future research. We argue that studying this trade off between addressability and efficiency of the codes is essential to understand better how to do efficient quantum computation.

16.
bioRxiv (Bioinfo) 2026-06-20

Evaluation of Trypanosoma brucei Phosphofructokinase Allosteric Inhibition: An In-Silico Study

Human African trypanosomiasis, caused by a protozoan parasite Trypanosoma brucei, is a neglected tropical disease for which well-tolerated, conveniently administered, and highly efficacious medicines are still missing. Previously, T. brucei Phosphofructokinase was targeted by small-molecule inhibitor development efforts. This approach has shown promise both in vitro and in vivo. In this study, we have used these wet-lab results, evaluated the compounds already characterised by Molecular Dynamics simulations, found relationships between in silico and wet-lab data and used these observations to evaluate compounds that we selected through several different approaches of virtual screens. We observed that inhibitor-ATP interactions are highly predictive of the inhibitory activity. Several compounds selected through virtual screens have outperformed previously characterised compounds.

17.
medRxiv (Medicine) 2026-06-16

AI-assisted continuous-time modelling of metastatic breast cancer reveals subtype-specific spatiotemporal organ interactions

Metastatic breast cancer is one of the leading causes of premature mortality among women worldwide. A major barrier to optimal care is the marked heterogeneity in both the temporal dynamics of metastatic spread and the organ-specific spatial distribution of metastases. Existing analyses do not adequately capture this complexity, as they either neglect temporal dependencies or assume independence between metastasic sites. As a result, it remains unclear how established metastases influence subsequent organ-specific dissemination. We address this question using patient-level longitudinal trajectories from a large multicentre real-world metastatic breast cancer registry, combined with an AI-assisted disease-progression modelling framework based on continuous-time Markov chains that represent combinations of metastatic sites and the non-uniform and practice-driven timing of radiologic response assessments, as encountered in routine clinical care. We present a stochastic model determined by progression rates, which are parameterised to capture baseline organ-specific transition risks, patient-level covariates, and pairwise inter-organ interaction effects. High-dimensional treatment information is incorporated using an large language model based encoding. We find that metastatic spread follows non-independent, subtype-specific spatiotemporal patterns, with subtype-specific inter-organ interaction patterns that shape progression. Visceral metastases, particularly lung and liver metastasis, are associated with an increased hazard of subsequent brain metastasis, with effects varying across hormone receptor-positive, HER2-positive, and triple-negative subtypes. Together, these findings define a clinically relevant spatiotemporal architecture of metastatic progression in breast cancer. This framework enables refined mechanism-informed risk stratification and provides a data-driven rationale for targeted and risk-adapted – rather than symptom-triggered – surveillance strategies.

18.
Nature (Science) 2026-06-24

Fourier pixels for bidirectional light control

Digital cameras1 and displays2 use picture elements (pixels3) that perform a single function: detecting or emitting light intensity. To exploit the full information content of electromagnetic waves, more advanced elements are required. This has driven the development of multifunctional components that, for example, simultaneously detect and emit intensity4,5 or extract intensity and spectral information6–8. However, no pixel exists that both senses and generates optical wavefronts with full control over amplitude, phase and polarization, limiting bidirectional control and feedback of sophisticated light fields. Here we present a route to such pixels by demonstrating a versatile platform of miniaturized diffractive elements based on Fourier optics9. We use plasmonic surface waves10, which propagate coherently11 and efficiently12–15 across metallic surfaces. When these plasmons are launched towards wavy microstructures16 designed with simple Fourier analysis, arbitrary and background-free optical wavefronts are generated. Conversely, incoming light can be sensed, and its amplitude, phase and polarization can be fully characterized. By combining or superposing several such components, we create multifunctional ‘Fourier pixels’ that provide compact and accurate control over the optical field. Our approach, which we extend to photonic waveguide modes, establishes a scalable, universal architecture for vectorially programmable pixels with applications in adaptive optics17,18, holographic displays19–21, optical communication22,23 and quantum information processing24. A versatile platform of miniaturized Fourier-optics-based diffractive elements enables multifunctional pixels that fully control and sense the amplitude, phase and polarization of optical wavefronts for advanced photonic applications.

19.
arXiv (CS.LG) 2026-06-16

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

arXiv:2510.01175v2 Announce Type: replace Abstract: While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized matrix sensing problem. We prove that WN with Riemannian optimization achieves linear convergence, yielding an exponential speedup over standard methods that do not use WN. Our analysis further demonstrates that both iteration and sample complexity improve polynomially as the level of overparameterization increases. To the best of our knowledge, this work provides the first characterization of how WN leverages overparameterization for faster convergence in matrix sensing.

20.
arXiv (CS.AI) 2026-06-11

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

arXiv:2605.06100v2 Announce Type: replace-cross Abstract: Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods learn measurement weighting through the solver, but they still use position-only objectives. As a result, the position estimate may improve while the reported covariance remains too small, too large, or incorrectly oriented. We propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. A Weighting Generation Network (WGN) predicts per-satellite reliability weights, and a differentiable Gauss-Newton solver maps these weights to a position estimate and a Hessian-derived posterior covariance. We use proper scoring rules to supervise the East-North predictive distribution end to end. We study negative log-likelihood (NLL), the energy score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in covariance credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes; on the deep-urban scene, both the mean horizontal error and the 95th-percentile error improve. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77 m to 11.68 m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05 relative to DFGO (MAE). Case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

21.
arXiv (quant-ph) 2026-06-12

Quantum optical photoelectron interferometry

arXiv:2606.13447v1 Announce Type: new Abstract: We present a general theoretical framework for multiphoton processes driven by quantum light fields, establishing a direct link between photon statistics and photoelectron observables. Our results show that the autocorrelation and cross-correlation functions, which quantify the underlying photon statistics, are directly mapped onto the resulting photoelectron spectra. Although our framework is broadly applicable, we demonstrate specifically in the example of reconstruction of attosecond beating by interference of two-photon transitions (RABBIT) the influence of the light statistical properties. In this approach, the amplitude, contrast and phase of the oscillations of the sideband signal as a function of pump-probe delay reveal the quantum nature of light. We analyze these observables across several quantum configurations, including correlated infrared and harmonic modes, as well as the uncorrelated case with non-classical harmonic statistics, thereby establishing a general framework for quantum-light RABBIT spectroscopy. We compare the analytical theory with numerical simulations for the case of classical harmonics and an infrared field in a squeezed coherent state, obtaining excellent agreement. Our results reveal how the interplay between classical and quantum correlations dictates the coherence of the photoemission process, providing a new window into the quantum-optical foundations of attosecond science.

22.
arXiv (CS.CV) 2026-06-16

Towards Global AI-Driven Cervical Cancer Screening

The global elimination of cervical cancer is a key public health goal set by the World Health Organization (WHO), with screening programs reducing mortality by up to 80%. However, access to experts and biopsy services is limited in low- to middle-income countries (LMICs). Deep learning (DL)-based algorithms offer promising support for screening, but most existing approaches have been developed and validated on private datasets from single countries. We present the first DL-based approach to cervical cancer screening validated on data from multiple countries. Technically, we phrase the problem of detecting and classifying lesions in colposcopy images as a multi-task learning problem, in which we simultaneously perform image-level classification and lesion segmentation. Our model was trained on a private data set of acid stain colposcopy images with manually generated lesion segmentation masks and corresponding histopathological results, employing extensive data augmentation to address image variability. In an in-distribution validation with pathology results serving as ground truth, our algorithm outperformed medical experts (Balanced Accuracy: 0.68 vs 0.64) in CIN1- (Cervical intraepithelial neoplasia grade 1 or lower) versus CIN2+ (grade 2 or higher) classification. External validation on four colposcopy data sets from four countries featuring radical differences in prevalence and patient characteristics yielded superior performance of our method compared to baseline methods. Performance variability across countries was high with AUC values ranging from 0.54 - 0.80. Overall, algorithm performance varied with age, transformation zone (cervical area most prone to lesion development), presence of comorbidities and pathognomonic signs, with comorbidities having by far the largest negative effect. Future work should focus on improving model robustness and generalizability.

23.
arXiv (CS.CV) 2026-06-18

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

24.
arXiv (CS.CL) 2026-06-24

Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

A number of scientific conferences and journals have recently enacted policies that prohibit LLM usage by peer reviewers, except for polishing, paraphrasing, and grammar correction of otherwise human-written reviews. But, are these policies enforceable? To answer this question, we assemble a dataset of peer reviews simulating multiple levels of human-AI collaboration, and evaluate five state-of-the-art detectors, including two commercial systems. Our analysis shows that all detectors misclassify a non-trivial fraction of LLM-polished reviews as AI-generated, thereby risking false accusations of academic misconduct. We further investigate whether peer-review-specific signals, including access to the paper manuscript and the constrained domain of scientific writing, can be leveraged to improve detection. While incorporating such signals yields measurable gains in some settings, we identify limitations in each approach and find that none meets the accuracy standards required for identifying AI use in peer reviews. Importantly, our results suggest that recent public estimates of AI use in peer reviews through the use of AI-text detectors should be interpreted with caution, as current detectors misclassify mixed reviews (collaborative human-AI outputs) as fully AI generated, potentially overstating the extent of policy violations.

25.
arXiv (CS.AI) 2026-06-11

SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving

arXiv:2606.11244v1 Announce Type: cross Abstract: Efficient large language model (LLM) serving is increasingly constrained by deployment cost. Quantization is a key technique for reducing serving cost, yet even state-of-the-art 4-bit quantizers exhibit a noticeable quality gap from FP16, particularly for smaller models where low-bit serving is most beneficial. We identify a fundamental cause of this gap: quantization error is highly input-dependent and varies substantially across tokens, while existing post-quantization compensation methods are static and apply identical corrections to all inputs. As a result, easy tokens are over-corrected while hard tokens remain under-corrected. We present SPEAR, a system for post-quantization error-adaptive recovery that improves low-bit LLM serving. SPEAR introduces lightweight Error Compensators (ECs) modulated by per-token gates and places them only at the most error-sensitive layers identified through a CKA-guided entropy-aware diagnostic. This focuses a small parameter budget where it is most effective. Efficient deployment of ECs presents several systems challenges, including additional computation, tensor-parallel synchronization caused by input-dependent gating, and latency instability across configurations. SPEAR addresses these issues through adaptive kernel-fusion dispatch, combining an epilogue-integrated peer-reduction kernel with P2P dual-write to fuse the post-EC computation into low-bit GEMMs, and an SLO-constrained EC-aware scheduler for predictable serving performance. Across challenging per-channel quantization settings, SPEAR recovers 56-75% of the perplexity gap between W4 and FP16 while adding less than 1% model memory overhead and maintaining latency comparable to a widely used 4-bit serving deployment.