Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

02.
arXiv (quant-ph) 2026-06-16

Entanglement-Rank Duality in Quadratic Phase Quantum States

arXiv:2605.05167v2 Announce Type: replace Abstract: Absolutely maximally entangled (AME) states are fundamental resources in quantum information theory, yet their construction and certification remain a nontrivial problem. Within the family of quadratic phase quantum states, defined by symmetric matrices $P$ over finite fields $\mathbb{F}_{p^m}$, we show that the Rank-Purity Duality $\operatorname{Tr}(\rho_S^2) = |\mathbb{F}|^{-\operatorname{rk}_{\mathbb{F}}(P_{S,\bar{S}})}$ follows from additive character orthogonality and holds over all $\mathbb{F}_{p^m}$, yielding a polynomial-time AME certification criterion. For square-free dimensions $d = p_1\cdots p_r$, the Chinese Remainder Theorem induces a prime-field factorisation. This implies additivity of Rényi-2 entropy and yields sharp obstruction criteria that rule out cases such as $\operatorname{AME}(4,6)$ and constrain the open case $\operatorname{AME}(8,6)$. As a proof of concept, we construct an explicit $\operatorname{AME}(17,10001)$ state, certified across all $65{,}535$ bipartitions, demonstrating that the framework scales to large systems and previously inaccessible local dimensions.

03.
arXiv (quant-ph) 2026-06-11

Time-Frequency Grid States for Reconstruction and Correction of Channel-Induced Distortion in Entangled Photons

arXiv:2606.12216v1 Announce Type: new Abstract: Characterization of time-frequency (TF) quantum states requires reliable reconstruction of their TF distributions. However, imperfect transmission or measurement channels can distort reconstructed joint spectral intensities (JSIs), especially when the underlying perturbation mechanism is unknown. Here, we experimentally demonstrate a reconstruction and correction framework that uses a TF grid state as an intrinsic frequency-domain reference. By analyzing the displacement of the grid points, a Gaussian process regression model is employed to reconstruct a correction mapping for the nonlinear coordinate deformation without assuming a prior physical model of the distortion. The learned mapping reduces the residual coordinate deviation of the TF grid state by approximately a factor of 11 and, when applied to an independent frequency-entangled test state, improves the Gaussian-shape fidelity from 76.2\% to 90.0\%. These results establish TF grid states as practical metrological resources for diagnosing and correcting distortions in TF quantum systems, providing a pathway toward distortion-resilient quantum communication and high-dimensional quantum information processing.

04.
arXiv (CS.CL) 2026-06-17

Variable-Width Transformers

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing distinct computational roles. In this work, we empirically investigate nonuniform capacity allocation across network depth by proposing a $\times$-shaped >

05.
arXiv (CS.CV) 2026-06-16

SPARK: Spatial Policy-driven Adaptive Reinforcement learning for Knowledge distillation

Low-bit quantization enables deployment of image restoration (IR) networks on resource-constrained devices, but introduces rounding noise that disproportionately degrades high-frequency regions such as edges and fine textures. Existing knowledge distillation (KD) methods apply distillation signals uniformly across all spatial locations, overlooking the varying reconstruction difficulty across image regions. To address this, we propose SPARK (Spatial Policy-driven Adaptive Reinforcement Learning for Knowledge Distillation), a framework that adaptively allocates distillation effort using a lightweight reinforcement learning (RL) policy network. At each training step, a difficulty feature extractor computes four signals, namely Laplacian variance, pixel variance, student reconstruction error, and teacher-student knowledge gap, which are fed into a compact policy CNN that produces a stochastic spatial weight map to modulate the KD loss during quantization-aware training (QAT). SPARK is IR task-agnostic, adds no inference cost, and integrates into any existing QAT pipeline without architectural changes. Experiments on benchmark datasets demonstrate that SPARK consistently outperforms PTQ, QAT, and state-of-the-art (SOTA) KD approaches across multiple student architectures, achieving reconstruction quality closest to the full-precision teacher under significant computational constraints.

06.
arXiv (math.PR) 2026-06-18

Milstein-type Schemes for Hyperbolic SPDEs

arXiv:2512.19647v4 Announce Type: replace-cross Abstract: This article studies the temporal approximation of hyperbolic semilinear stochastic evolution equations with multiplicative Gaussian noise by Milstein-type schemes. We take the term hyperbolic to mean that the leading operator generates a contractive, not necessarily analytic $C_0$-semigroup. Optimal convergence rates are derived for the pathwise uniform strong error \[ E_h^\infty := \Big(\mathbb{E}\Big[\max_{1\le j \le M}\|U_{t_j}-u_j\|_X^p\Big]\Big)^{1/p} \] on a Hilbert space $X$ for $p\in [2,\infty)$. Here, $U$ is the mild solution and $u_j$ its Milstein approximation at time $t_j=jh$ with step size $h>0$ and final time $T=Mh>0$. For sufficiently regular nonlinearity and noise, we establish strong convergence of order one, with the error satisfying $E_h^\infty\lesssim h\sqrt{\log(T/h)}$ for rational Milstein schemes and $E_h^\infty \lesssim h$ for exponential Milstein schemes. This extends previous results from parabolic to hyperbolic SPDEs and from exponential to rational Milstein schemes. Moreover, root-mean-square error estimates are strengthened to pathwise uniform estimates. Numerical experiments validate the convergence rates for the stochastic Schrödinger equation. Further applications to Maxwell's and transport equations are included.

07.
arXiv (quant-ph) 2026-06-11

Split-Evolution Quantum Phase Estimation for Particle-Conserving Hamiltonians

arXiv:2604.14921v2 Announce Type: replace Abstract: We present a hardware demonstration and resource analysis of split-evolution quantum phase estimation (SE-QPE) on a Quantinuum System Model H2 quantum computer. SE-QPE is a modification to canonical QPE for particle-conserving Hamiltonians in which controlled time evolution is replaced by CSWAP-based interference between a target register and a reference register. For factorizations of time evolution with a shared eigenbasis, SE-QPE preserves the phase-register outcome distribution of canonical QPE and, unlike with compute–uncompute substitutions, it remains compatible with non-exact eigenstates. The substitution removes controlled-simulation overhead and enables parallel evolution on two registers, reducing the depth of each phase-kickback block. Resource analysis for Trotterized double-factorized chemistry Hamiltonians shows that the substitution becomes increasingly favorable at higher phase powers and combining QPE and SE-QPE implementations can be a useful option. Over a range of FeMoco active spaces, SE-QPE reduces time evolution resources, with asymptotic reductions of about 33% in CX count, 25% in $T$ count, and an asymptotic depth ratio of $3/N$ for CX layers. On Quantinuum H2-2, a four-qubit model ethylene demonstration with explicit inverse QFT and repeated phase-kickback steps up to 8 phase bits yields distinct energies and shows the auxiliary registers provide useful error detection filters.

08.
arXiv (quant-ph) 2026-06-17

Quantum-inspired Ising machine using sparsified spin connectivity

arXiv:2604.04606v2 Announce Type: replace-cross Abstract: Combinatorial optimization problems become computationally intractable as these NP-hard problems scale. We previously proposed extraction-type majority voting logic (E-MVL), a quantum-inspired algorithm using digital logic circuits. E-MVL mimics the thermal spin dynamics of simulated annealing (SA) through controlled sparsification of spin interactions for efficient ground-state search. This study investigates the performance potential of E-MVL through systematic optimization and comprehensive benchmarking against SA. The target problem is the Sherrington-Kirkpatrick (SK) model with bimodal and Gaussian coupling distributions. Through equilibrium state analysis, we demonstrate that the sparsity control mechanism provides a consistent search of the solution space regardless of the problem's coupling distribution (bimodal, Gaussian) or size. E-MVL not only achieves the best performance among all tested algorithms–solving exact solutions up to 1600 spins where the best SA baseline is limited to 400 spins–but also provides insights that significantly improve SA's own temperature scheduling. These results establish E-MVL's dual contribution as both an efficient optimizer and a practical methodology for enhancing SA performance. Moreover, FPGA implementation achieved an approximately 6-fold faster solution speed than SA.

09.
arXiv (CS.CL) 2026-06-16

Entity Labels Are Not Entity Signals: A Framework for Observable Relevance in Document Re-Ranking

Entity-aware document retrieval uses query-associated entities as ranking signals, assuming that semantically relevant entities are also useful retrieval signals. We show this assumption is insufficient- and explain why. Unlike terms, which are ground-truth observations, entity links are hypotheses produced by an imperfect linker: an entity can be topically central yet provide no discriminative signal if the linker fires indiscriminately across relevant and non-relevant documents. We formalize this as a distinction between Conceptual Entity Relevance (CER)- whether an entity is topically related to a query- and Observable Entity Relevance (OER)- whether its observed presence in a collection discriminates relevant from non-relevant documents. Across four collections and annotation sources including human entity judgments, CER and OER exhibit near-chance agreement ($\kappa \approx 0$), while OER operationalizations agree substantially ($\kappa \approx 0.5$), confirming CER as the systematic outlier. CER-based supervision selects topically plausible but weakly discriminative entities, pruning fewer than 4% of non-relevant documents on some collections. Aligning supervision with OER improves non-relevant pruning by up to 10x and open-world MAP by 0.051 over BM25. Our findings motivate a shift from conceptual to observable notions of entity relevance in entity-aware retrieval.

10.
arXiv (quant-ph) 2026-06-17

Practical Tests and Witnesses of Fermionic non-Gaussianity

arXiv:2605.26218v2 Announce Type: replace Abstract: Fermionic Gaussian states describe free fermions and underlie the mean-field picture of matter, from metals to superconductors; they are also efficiently simulable on classical computers. Departures from Gaussianity – the correlations produced by interactions – are therefore what make a fermionic system hard to simulate classically and useful for quantum computation, analogous to the role of magic in stabilizer-based quantum computation. Yet detecting and quantifying such non-Gaussianity at scale has remained challenging. Here we introduce practical tests and witnesses of fermionic non-Gaussianity built on fermionic antiflatness, a measure derived from the two-point covariance matrix. We estimate it with two protocols – a two-copy Bell measurement and a single-copy scheme using commuting Majorana bilinears – that determine whether a state is Gaussian or far from it at lower measurement cost than existing approaches, using only operations native to fault-tolerant hardware. For mixed states, a purity-corrected witness certifies non-Gaussianity and remains robust under strong noise; running it on the IQM quantum processor, we find that noise can both reduce and enhance non-Gaussianity. Finally, we show that preparing pseudorandom fermionic states requires extensive non-Gaussianity. Together, these tools enable the study and certification of non-Gaussian fermionic resources on present-day quantum devices.

11.
arXiv (CS.AI) 2026-06-17

Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

arXiv:2606.17115v1 Announce Type: cross Abstract: Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.

12.
arXiv (CS.AI) 2026-06-16

Critically Engaged Pragmatism: Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

arXiv:2601.09753v2 Announce Type: replace-cross Abstract: AI science evaluation tools aim to assess research credibility. As with traditional metrics such as impact factors, their edicts can be decontextualised and repurposed in problematic ways. To address this, I propose Critically-Engaged Pragmatism as a scientific norm enjoining scientific communities to scrutinise the purposes and purpose-specific reliability of AI science evaluation tools. To foster Critically Engaged Pragmatism, creators of AI science evaluation tools should transparently and fully report design, training, and benchmarking details to facilitate assessments of purpose-specific reliability, liability to different types of error, and bias. What count as best practices for the transparent reporting of AI science evaluation tools should be updated as new forms of error, bias, and gamesmanship are discovered. Under this framework, AI science evaluation tools are not objective arbiters of scientific credibility. Rather, they are the object of critical discursive practices that ultimately ground the credibility of scientific communities.

13.
arXiv (CS.AI) 2026-06-19

Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models

arXiv:2606.19793v1 Announce Type: cross Abstract: The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid DNN/HMM sequence discriminative training. This paper presents a comprehensive investigation of various combinations of acoustic features tailored to different Acoustic Models, offering suitable feature selections for each. The incorporation of Pitch features notably improved recognition performance, especially for sentence recognition tasks involving dysarthric speech. Through a systematic examination of the TORGO database, we have demonstrated the potential to enhance the performance of the state-of-the-art Factorized Time Delay Neural Network (F-TDNN) model for recognizing dysarthric speech. Our methods, implemented with the F-TDNN model, resulted in a 4.65\% relative improvement in isolated word recognition and a 4.63\% relative improvement in sentence recognition for dysarthric speech, compared to previous research. This improvement effectively compensates for speech variability, attributable to our deliberate selection of the number of overlapping frames between consecutive training example chunks.

14.
arXiv (CS.AI) 2026-06-16

Adaptive $k$NN graph model

arXiv:2601.16509v2 Announce Type: replace-cross Abstract: The $k$-nearest neighbors ($k$NN) algorithm is a cornerstone of non-parametric classification in artificial intelligence, yet its deployment in large-scale applications is persistently constrained by the computational trade-off between inference speed and accuracy. Existing approximate nearest neighbor solutions accelerate retrieval but often degrade classification precision and lack adaptability in selecting the optimal neighborhood size ($k$). Here, we present an adaptive graph model that decouples inference latency from computational complexity. By integrating a Hierarchical Navigable Small World (HNSW) graph with a pre-computed voting mechanism, our framework completely transfers the computational burden of neighbor selection and weighting to the training phase. Within this topological structure, higher graph layers enable rapid navigation, while lower layers encode precise, node-specific decision boundaries with adaptive neighbor counts. Benchmarking against eight state-of-the-art baselines across six diverse datasets, we demonstrate that this architecture significantly accelerates inference speeds, achieving real-time performance, without compromising classification accuracy. These findings offer a scalable, robust solution to the inherent inference bottleneck of $k$NN, laying an adaptive structural foundation for graph-based nonparametric learning.

15.
arXiv (quant-ph) 2026-06-17

Twin-beam advantage in quantum LiDAR under correlated noise

arXiv:2606.17908v1 Announce Type: new Abstract: Quantum light promises improved precision in optical remote sensing, but its practical advantage depends critically on whether nonclassical resources remain useful under realistic noise and experimentally accessible detection. This question becomes especially relevant for LiDAR systems, where a quantum advantage has been demonstrated for target detection and joint range-velocity estimation, but mostly under idealized conditions or simple noise models, such as optical loss and thermal background. A key open point is whether entanglement provides an operational advantage when the dominant disturbance is not independent noise, but structured interference across sensing modes. Here, we address this question by studying the joint estimation of target range and velocity with bright two-mode Gaussian probes and homodyne detection, comparing coherent, separable squeezed, and twin-beam states at a fixed resource budget. Our results reveal a hierarchy of quantum resources set by the noise structure: separable squeezing provides a robust advantage over coherent illumination under loss and thermal background, whereas twin-beam probes become superior under correlated jamming when the receiver is adaptively optimized. These results establish correlated noise as the operational regime in which entanglement provides a robustness advantage beyond local squeezing, opening a receiver-aware route to quantum-enhanced LiDAR in realistic and potentially adversarial environments.

16.
arXiv (CS.LG) 2026-06-15

Zero-shot generalization of transformer neural operators to larger domains

arXiv:2606.14597v1 Announce Type: new Abstract: Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.

17.
arXiv (CS.CV) 2026-06-11

Semantic Segmentation of Node and Edge Diagrams for Assistive Technology

In this paper, we present a novel set of related models for semantic segmentation of node-link diagrams. These diagrams are frequently used to represent mathematical graphs, relationships between concepts, and flowcharts. Such diagrams are difficult to access non-visually; while some assistive interfaces have been designed for node-link diagrams, they rely upon a machine-readable representation of the diagram, whereas such diagrams will generally be made available as bitmap images. Our compact deep learning models show excellent quantitative and qualitative performance on a large synthetic dataset of node-link diagrams, reaching per-pixel accuracy over 93\%.

18.
arXiv (CS.AI) 2026-06-16

PISA: A Pragmatic Psych-Inspired Unified Memory System for Enhanced AI Agency

arXiv:2510.15966v2 Announce Type: replace Abstract: Memory systems are fundamental to AI agents, yet existing work often lacks adaptability to diverse tasks and overlooks the constructive and task-oriented role of AI agent memory. Drawing from Piaget's theory of cognitive development, we propose PISA, a pragmatic, psych-inspired unified memory system that addresses these limitations by treating memory as a constructive and adaptive process. To enable continuous learning and adaptability, PISA introduces a trimodal adaptation mechanism (i.e., schema updation, schema evolution, and schema creation) that preserves coherent organization while supporting flexible memory updates. Building on these schema-grounded structures, we further design a hybrid memory access architecture that seamlessly integrates symbolic reasoning with neural retrieval, significantly improving retrieval accuracy and efficiency. Our empirical evaluation, conducted on the existing LOCOMO benchmark and our newly proposed AggQA benchmark for data analysis tasks, confirms that PISA sets a new state-of-the-art by significantly enhancing adaptability and long-term knowledge retention.

19.
medRxiv (Medicine) 2026-06-16

Fidelity-Derived Quantum Dissimilarity-Enhanced k-Nearest Neighbor Algorithm for Arterial Hypertension Prediction

We present a quantum-enhanced version of the classic k-Nearest Neighbors (kNN) classification algorithm, applied to the prediction of arterial hypertension. The traditional Euclidean distance metric of the kNN algorithm is replaced with a Fidelity-derived quantum dissimilarity measure to evaluate the similarity between data samples. We map classical real-world clinical and ECG-derived data features into quantum states via the Dense-Angle Encoding, which efficiently utilizes parameterized rotation gates to pack multiple features into minimal qubits while maintaining pure states. We evaluate the performance of the dissimilarity measure using both the noiseless state vector Simulator and the IBM Qiskit Estimator primitives. The quantum circuit demonstrates robust predictive capabilities comparable to the classical model. While it does not claim computational supremacy over the classical baseline, the framework proves that fidelity-based similarity is a physically meaningful and efficient approach for hybrid quantum classical classification.

20.
arXiv (CS.AI) 2026-06-18

IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

arXiv:2604.03275v2 Announce Type: replace-cross Abstract: Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty

21.
medRxiv (Medicine) 2026-06-19

Validation of an Artificial Intelligence-Assisted Mobile Application for Dietary Oxalate Assessment in Kidney Stone Prevention

Background: Calcium oxalate nephrolithiasis is the most common type of kidney stone disease. Dietary oxalate intake is an important modifiable factor. Assessing dietary oxalate exposure in clinical practice poses challenges due to limitations of traditional dietary recall tools and variability in food composition data. Artificial intelligence (AI) applications in mobile health may offer scalable solutions for better dietary monitoring and kidney stone prevention. We examined the ability of StoneFree AI to estimate dietary oxalate from verbal and image-based food inputs. Objective: To evaluate the accuracy and limitations of StoneFree AI, for estimating dietary oxalate intake from verbal food descriptions and meal images, and to evaluate errors from entries that may inform future clinical use in kidney stone prevention. Methods: StoneFree AI is a cross-platform mobile application that uses a multimodal large language model (Google Gemini) to interpret verbal food descriptions and visual food images. The identified foods were mapped to oxalate values using the Harvard Oxalate Database. System performance was evaluated using 804 verbal food entries and 276 portion-size food images obtained from the ASA24 dietary assessment database. Verbal inputs were compared with reference oxalate values using absolute error and predefined agreement thresholds ({+/-}1, {+/-}5, {+/-}10 mg). Image-based inputs were evaluated against mutually exclusive primary error categories, including food identification, portion estimation, ingredient recognition, oxalate reference selection, and non-analyzable cases. Results: For verbal food entries, the AI system showed strong agreement with reference oxalate values. Overall, 82.1% of estimates were within {+/-}1 mg, 91.5% within {+/-}5 mg, and 94.5% within {+/-}10 mg of reference values. The mean absolute error was 3.32 mg, the median absolute error was 0.10 mg, and the concordance correlation coefficient (CCC) was 0.860. Image-based inputs showed a higher overall error rate of 63.0%, primarily due to food identification errors (33.0%), inaccurate portion estimation (11.0%), and ingredient recognition errors (9.8%). Most errors occurred with visually complex meals, such as mixed dishes and grain-based foods. Conclusions: AI-assisted estimation of dietary oxalate intake demonstrated high accuracy when structured verbal inputs were used but was less reliable for image-based meal analysis. These findings suggest AI-enabled mobile tools may support dietary monitoring for kidney stone prevention, particularly when user input is structured. Further refinement of computer vision models and prospective clinical validation are required before widespread clinical implementation.

22.
arXiv (CS.AI) 2026-06-18

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

arXiv:2606.18303v1 Announce Type: cross Abstract: We develop a mathematically explicit link between shock-wave theory and the symmetry-quotiented learning dynamics of stochastic gradient descent, drawing on differential geometry, Lie group theory, and fluid mechanics. Specifically, after quotienting parameter symmetries and applying local-entropy coarse-graining, the effective dynamics satisfy a viscous Hamilton–Jacobi equation on the quotient manifold. Moreover, under the assumption that the raw parameter dynamics can be summarized by a gradient field on the quotiented space, the gradient of the coarse-grained loss function obeys a Burgers-type equation, and shock formation can be established rigorously. We apply our theory to multilayer perceptrons, convolutional neural networks, Transformers, and mean-field networks, and show that they obey the Hamilton–Jacobi or Burgers-type equations. We conjecture that this framework also yields practical diagnostics for deep learning. In architectures such as Transformers, raw parameter norms are often distorted by symmetry redundancy and may therefore be misleading, whereas symmetry-corrected quotient observables provide a principled basis for monitoring, forecasting, and controlling training-phase transitions.

23.
arXiv (CS.LG) 2026-06-18

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

arXiv:2606.19270v1 Announce Type: cross Abstract: Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

24.
arXiv (quant-ph) 2026-06-19

Asymmetric and chiral dynamics of two-component anyons with synthetic gauge flux

arXiv:2512.19139v3 Announce Type: replace-cross Abstract: In this work, we investigate the non-equilibrium dynamics in a one-dimensional two-component anyon-Hubbard model, which can be mapped to an extended Bose-Hubbard ladder with density-dependent hopping phase and synthetic gauge flux. Through numerical simulations of two-particle dynamics and the symmetry analysis, we reveal the asymmetric transport with broken inversion symmetry and two dynamical symmetries in the expansion dynamics. The expansion of two-component anyons is dynamically symmetric under spatial inversion and component flip, when the sign of anyonic statistics phase or the signs of gauge flux and interaction are changed. In the non-interacting case, we show the dynamical suppression induced by both the statistics phase and gauge flux. In the interacting case, we demonstrate that both chiral and antichiral dynamics can be exhibited and tuned by the statistics phase and gauge flux. The dynamical phase regimes with respect to the chiral-antichiral dynamics are obtained. These findings highlight the rich dynamical phenomena arising from the interplay of anyonic exchange statistics, synthetic gauge fields, and interactions in multi-component anyons.

25.
arXiv (CS.CV) 2026-06-15

MooMIns – Monocular 3D Reconstruction and Object Pose Estimation from Multiple Instances

Simultaneous 3D reconstruction and 6D object pose estimation from a single monocular image is an inherently ill-posed problem. In industrial settings, however, multiple instances of an object are often randomly arranged in bins, implicitly providing several views of the same object within a single image. We show that this implicit multi-view geometry can be exploited to simultaneously reconstruct the object in 3D and estimate the 6D pose of each visible object instance. We present MooMIns, a new Gaussian-splatting-based approach that inverts the original Gaussian splatting formulation: instead of rendering a single scene from multiple cameras, we render multiple object instances from a single camera. Our method is initialized with SAM3 instance segmentation masks and a modified Structure from Motion (SfM) pipeline. In contrast to learned monocular depth estimation, we perform true geometry-based reconstruction from image evidence, avoiding hallucinations caused by training data priors. We evaluate MooMIns on synthetic and real bin-picking scenarios, and demonstrate accurate reconstruction of previously unseen objects as well as reliable pose estimation of individual instance