Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

arXiv:2603.09309v2 Announce Type: replace Abstract: Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0–100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78\% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary placement, and range regularity, and evaluate metacognitive sensitivity using $meta-d'$. We find that a 0–20 scale consistently improves metacognitive efficiency over the standard 0–100 format, while boundary compression degrades performance and round-number preferences persist even under irregular ranges. These results demonstrate that confidence scale design directly affects the quality of verbalized uncertainty and should be treated as a first-class experimental variable in LLM evaluation.

02.
arXiv (CS.CL) 2026-06-16

Measuring Whether LLM Tutors Teach or Solve: A Diagnostic for Educational Impact

Large language models are increasingly proposed as educational tutors, yet stronger task-solving ability does not necessarily imply stronger learning support. Motivated by recent calls to measure the social impact of NLP systems in practice, we study whether public LLM tutoring benchmarks distinguish learning-supportive behavior from mere answer production. We propose a lightweight diagnostic based on the gap between solving-oriented and pedagogy-oriented benchmark performance. Using public MathTutorBench leaderboard results, we show that these dimensions are only partially aligned: across eight publicly reported models, the correlation between solving and pedagogy composites is 0.421, and several models shift meaningfully in rank when evaluation moves from solving to pedagogy. We then analyze the public TutorBench sample and show that agency-relevant behaviors are explicitly encoded in benchmark rubrics, especially in active-learning settings that reward guiding questions, calibrated hints, and non-disclosive scaffolding. Together, these findings suggest that educational-impact evaluation should not treat task success as a sufficient proxy for learning support. We argue that public tutoring benchmarks can better support positive-impact evaluation by reporting solving-oriented and pedagogy-oriented scores separately and by making disclosure-sensitive, student-agency-preserving criteria more explicit.

03.
arXiv (CS.CV) 2026-06-12

Comparing Commercial Depth Sensor Accuracy for Medical Applications

Depth estimation has numerous medical and surgical applications. We benchmark four depth sensors on a porcine bone specimen, a porcine belly specimen, and a silicone kidney phantom using stylus-sampled references. These objects contain several real-world challenges, including homogeneous surfaces, specular surfaces, and subsurface scattering. The comparison includes stereo, structured-light, and time-of-flight sensors at a distance of approximately 50 cm. Specifically, the Intel RealSense D405 (Intel RealSense, United States), PMD Flexx2 (pmdtechnologies, Germany), Stereolabs ZED 2i (Stereolabs, France), and Zivid 2M+ 60 (Zivid, Norway) are compared. The Zivid 2M+ 60 performed best across all objects and metrics considered in this work. The ZED ranked second for real tissue, but last on the phantom.

04.
arXiv (CS.CV) 2026-06-16

Learning Topology-Aware Implicit Field for Unified Pulmonary Tree Modeling with Incomplete Topological Supervision

Pulmonary trees extracted from CT images frequently exhibit topological incompleteness, such as missing or disconnected branches, which substantially degrades downstream anatomical analysis and limits the applicability of existing pulmonary tree modeling pipelines. Current approaches typically rely on dense volumetric processing, explicit graph reasoning, or generic point cloud completion priors, leading to limited efficiency, weak structural awareness, and reduced robustness under realistic structural corruption. We propose TopoField, a topology-aware implicit modeling framework that treats topology repair as a first-class modeling problem and enables unified multi-task inference for pulmonary tree analysis. TopoField represents pulmonary anatomy using sparse surface and skeleton point clouds and learns a continuous implicit field that supports topology repair without relying on complete or explicit disconnection annotations, by training on synthetically introduced structural disruptions over already incomplete trees. Building upon the repaired implicit representation, anatomical labeling and lung segment reconstruction are jointly inferred through task-specific implicit functions within a single forward pass. Extensive experiments on the Lung3D+ dataset demonstrate that TopoField consistently improves topological completeness and achieves accurate anatomical labeling and lung segment reconstruction under challenging incomplete scenarios. We further validate TopoField on real incomplete outputs from an external segmentation model, demonstrating its applicability to realistic segmentation pipelines. Owing to its implicit formulation, TopoField attains high computational efficiency, completing all tasks in just over one second per case, highlighting its practicality for large-scale and time-sensitive clinical applications.

05.
arXiv (CS.CL) 2026-06-18

As Easy as Rocket Science: Assessing the Ability of Large Language Models to Interpret Negation in Figurative Language

Figurative language and negation are two areas that challenge current language models, however, both are widely used throughout written and spoken language. Large language models (LLMs) are also widely used in everyday contexts where they cannot necessarily be tuned for a specific dataset. It is therefore essential to understand the ability of LLMs to correctly interpret text that includes both negation and figurative language. To investigate this, we develop a set of new annotations to an existing dataset of figurative language, and test a range of language models on the dataset. We find that the combination of negation and figurativeness can present a particular challenge, and that performance overall and across different negation types is particularly dependent on the prompt style used.

06.
arXiv (CS.AI) 2026-06-17

A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

arXiv:2606.18075v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing large language models (LLMs) with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without true knowledge fusion. While entity-centric methods connect logically related content and chunk-centric methods preserve context, both retrieve information separately through similarity search, missing emergent understanding from their synthesis. In this paper, we propose HyGRAG, a hierarchical graph RAG framework that transcends source documents by addressing three core challenges: constructing summaries that genuinely integrate contextual and relational information, leveraging these synthesized representations to access emergent knowledge during retrieval, and efficiently updating hierarchical structures for dynamic corpora. Specifically, we design hierarchical index structures over hybrid graphs with both chunk and entity nodes, then iteratively cluster them and generate LLM-based summaries. Then, we design context and relation-aware retrieval that searches across all abstraction levels while expanding through community membership. Moreover, we enable dynamic knowledge update through attachment-based algorithms with only local re-summarization. Experimental results show that HyGRAG improves the average accuracy of multi-hop reasoning tasks by 9.7%, while maintaining reasonable efficiency.

07.
arXiv (CS.LG) 2026-06-15

Minimum Distance Summaries for Robust Neural Posterior Estimation

arXiv:2602.09161v2 Announce Type: replace-cross Abstract: Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

08.
arXiv (CS.AI) 2026-06-19

Implicit Semantic-Aware Communication Based on Hypergraph Reasoning

arXiv:2606.20162v1 Announce Type: new Abstract: Semantic-aware communication has emerged as a transformative paradigm for next-generation communication systems, shifting the fundamental goal from transmitting bit-level symbols to reliably recovering and understanding the semantic meaning of information. Previous studies have demonstrated that representing the semantic content of source messages as graph-based structures can significantly improve communication efficiency and the accuracy of semantic inference at the receiver. However, existing solutions typically employ graphs that capture only pairwise relationships, thereby neglecting higher-order implicit correlations commonly observed in real-world scenarios, such as group interactions, multi-entity associations, and complex relational contexts. This limitation reduces semantic expressiveness and makes semantic inference susceptible to ambiguity and performance degradation, particularly under noisy or corrupted channel conditions. To address these issues, this paper proposes a novel hypergraph-based implicit semantic reasoning framework, HISR, which leverages hypergraphs to represent complex multi-entity relationships among semantic knowledge entities. In HISR, entities and their associated higher-order relations are mapped into dedicated semantic subspaces tailored to distinct relational contexts. This design not only disentangles diverse semantic interactions to mitigate the over-smoothing effects commonly found in traditional graph embedding methods but also enables robust semantic inference even when partial information loss occurs during transmission. Numerical results show that the proposed HISR achieves up to a 36.6% improvement in implicit semantic interpretation accuracy over the state-of-the-art benchmarks.

09.
arXiv (CS.CV) 2026-06-12

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failures. Diagnosing these failures requires instance-level feedback that answers where a defect occurs, what type it is, why it is defective, and its importance to overall image quality. While recent dense-feedback methods move beyond scalar supervision, their heatmap-centric representations still formulate diagnosis as pixel-field regression, making it difficult to localize variable-cardinality defects and bind semantic reasons to individual failures. To address this representation bottleneck, we propose Structured Defect Grounding (SDG), which casts T2I diagnosis as structured set prediction by modeling each defect as a (location, type, reason, importance) tuple. To make this formulation trainable and measurable, we introduce SDG-30K, a 30K-image dataset with box-grounded annotations across four modern T2I generators, together with a dedicated evaluation protocol, SDG-Eval. Building on this structured representation, we further present a diagnosis-to-alignment framework in which a Vision-Language Model (VLM) serves as the SDG detector, and BoxFlow-GRPO converts predicted defect sets into box-derived, importance-weighted spatial rewards for diffusion model alignment. Extensive experiments show that our SDG detector outperforms leading proprietary VLMs on structured defect grounding, while SDG-guided rewards consistently improve T2I alignment and support localized image refinement. These results establish SDG as a unified, instance-level interface for diagnosing, evaluating, and enhancing modern generative models.

10.
arXiv (CS.LG) 2026-06-17

Continual Self-Improvement with Lightweight Experiential Latent Memories

arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate whether models can instead learn online from this experience, converting transient computation (reasoning traces) into persistent reusable knowledge, and without external supervision or access to future data. We show that In-Context Learning (ICL) over raw reasoning traces fails to generalize, reflecting a fundamental limitation of token-level reuse: individual traces lack the abstraction needed for transfer, even after refinement (e.g. self-reflection). In contrast, drawing inspiration from recent works on unsupervised reinforcement learning, we find that lightweight per-instance training with self-generated test-time signals (majority voting) as rewards yields substantial gains, often surpassing full-dataset offline training, motivating a shift from raw traces to learned latent representations. Building on this insight, we propose an online method that distills inference-time compute spent on encountered problems into compact modular latent memories capturing the underlying reasoning structure. These memories are stored and retrieved for future inputs, enabling continual improvement while avoiding catastrophic forgetting through modular design. Importantly, our method is highly efficient, parametrized as extremely lightweight soft prompt memories (~0.001% of model parameters) and trained with only a few gradient steps, yet achieving performance competitive with full parametric updates and offline training. Across challenging mathematical reasoning benchmarks, our approach significantly outperforms zero-shot and raw data ICL baselines, while transferring effectively across datasets.

11.
arXiv (CS.LG) 2026-06-16

Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension

arXiv:2602.12471v2 Announce Type: replace Abstract: We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that an accelerated $1/T^2$ rate is possible by choosing a large stepsize $\eta = \Theta(\gamma^2 T)$ (where $\gamma$ is the dataset's margin) despite the resulting non-monotonicity of the loss. In this paper, we provide a tighter analysis of gradient descent for this problem when the data is two-dimensional: we show that GD with a sufficiently large learning rate $\eta$ finds a point with loss smaller than $\mathcal{O}(1/(\eta \gamma^2 T))$, as long as $T \geq \Omega(n/\gamma + 1/\gamma^2)$, where $n$ is the dataset size. Our improved rate comes from a tighter bound on the time $\tau$ that it takes for GD to transition from unstable (non-monotonic loss) to stable (monotonic loss), via a fine-grained analysis of the oscillatory dynamics of GD in the subspace orthogonal to the max-margin classifier. We also provide a lower bound of $\tau$ matching our upper bound up to logarithmic factors, showing that our analysis is tight.

12.
arXiv (CS.CV) 2026-06-11

Contour Field based Elliptical Shape Prior for the Segment Anything Model

The elliptical shape prior information plays a vital role in improving the accuracy of image segmentation for specific tasks in medical and natural images. Existing deep learning-based segmentation methods, including the Segment Anything Model (SAM), often struggle to produce segmentation results with elliptical shapes efficiently. This paper proposes a new approach to integrate the prior of elliptical shapes into the deep learning-based SAM image segmentation techniques using variational methods. The proposed method establishes a parameterized elliptical contour field, which constrains the segmentation results to align with predefined elliptical contours. Utilizing the dual algorithm, the model seamlessly integrates image features with elliptical priors and spatial regularization priors, thereby greatly enhancing segmentation accuracy. By decomposing SAM into four mathematical sub-problems, we integrate the variational ellipse prior to design a new SAM network structure, ensuring that the segmentation output of SAM consists of elliptical regions. Experimental results on some specific image datasets demonstrate an improvement over the original SAM.

13.
medRxiv (Medicine) 2026-06-10

Epidemiology of Cervical Precancerous Lesions: Prevalence and Predictors from Pap Smear Screening in Hawassa City Hospitals, Sidama Region, Ethiopia. Institutional-Based Cross-sectional Study

Background: Cervical cancer is the fourth most common cancer in women worldwide and remains a major public health challenge. In Ethiopia, it is the second leading cause of cancer deaths, with around 8,000 new cases and 6,000 deaths each year. Region?specific data on the prevalence and predictors of precancerous lesions remain scarce, yet such information is vital for guiding targeted reproductive health strategies. This study therefore examined the prevalence and predictors of cervical precancerous lesions among women aged 21-60 years undergoing Pap smear screening in public hospitals in Hawassa City, Sidama Region. Methods: An institution-based cross-sectional study was conducted among 241 women attending Pap smear screening at public hospitals in Hawassa City from March to August 2025. Sociodemographic and clinical data were collected via interviews and medical records. Lesions were classified based on the standardized international framework for reporting cervical cytology results from Pap smears per the Bethesda system. Multivariable logistic regression identified predictors p

14.
arXiv (math.PR) 2026-06-16

Structure preserving properties of higher order moment closures for TASEP

arXiv:2604.15925v2 Announce Type: replace-cross Abstract: The totally asymmetric simple exclusion process (TASEP) is a stochastic model for the unidirectional flow of interacting particles on a 1D-lattice that is much used in systems biology and statistical physics. Its master equation describes the evolution of the probability distribution on the configuration space. The size of the master equation grows exponentially with the length of the lattice. It is known that the complexity of the system may be reduced using mean-field approximations. We provide a rigorous definition of a family of such models using moments of any order and an extension to the pair approximation for obtaining closures for the system. The dimension of these models grows linearly with the lattice size and exponentially in the order of the approximation. Moreover, we show that the states of these models still have a probabilistic interpretation and that basic structural properties of the master equation are preserved. This extends known results on the Ribosome Flow Model which can be viewed as the first order approximation for TASEP.

15.
arXiv (CS.CV) 2026-06-16

Learning Directional Semantic Transitions for Longitudinal Chest X-ray Analysis

Chest X-ray (CXR) interpretation often requires longitudinal comparison to assess disease progression. Existing approaches typically rely on temporal feature fusion or inter-study discrepancy modeling, yet remain limited in capturing subtle progression semantics and overlook the inherently directional nature of disease trajectories. In this paper, we propose ProTrans, a novel vision-language pretraining framework that formulates disease progression as a directional semantic transition between paired CXR studies. ProTrans leverages radiology reports to anchor individual CXR representations within interpretable disease states, and introduces a learnable progression feature map to explicitly encode semantic shifts between states, aligned with report-derived progression descriptions. To enforce direction-aware perception, ProTrans incorporates a reversed temporal modeling process and imposes bidirectional reconstruction consistency across states and transitions, thereby disentangling directional semantics and promoting coherent trajectory modeling. Extensive experiments on longitudinal downstream tasks, including disease progression classification and progression captioning, demonstrate that ProTrans consistently outperforms existing methods, establishing a unified pretraining framework for longitudinal CXR understanding. https://github.com/RPIDIAL/ProTrans

16.
arXiv (CS.LG) 2026-06-19

3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning

arXiv:2606.19451v1 Announce Type: new Abstract: We introduce 3D-DLP, a self-supervised object-centric representation learning model that decomposes scene-level RGB-D or voxel observations into a set of 3D latent particles. Building on the Deep Latent Particles (DLP) framework, each particle encodes disentangled attributes, including 3D keypoint position, bounding box dimensions, and appearance features, and represents a distinct entity in the scene. The model learns interpretable per-particle segmentation maps through an end-to-end self-supervised reconstruction objective. We demonstrate on both simulated and real-world datasets that the learned latent space is interpretable and controllable: by manipulating particle positions and decoding, we can generate novel scene configurations. Furthermore, we show that leveraging these compact 3D latent particles for downstream robotic manipulation improves performance over baselines that either lack explicit 3D information or rely on memory-intensive dense 3D inputs without object-centric structure. Code and videos are available at https://eubooks3003.github.io/3d-dlp.

17.
bioRxiv (Bioinfo) 2026-06-18

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

Next-token prediction has produced predictable scaling in language, but the recipe presumes a sequence of tokens with a meaningful order. Single-cell RNA-seq counts have no natural gene ordering, so applying the recipe directly to raw expression fails under an ill-suited left-to-right bias. We instead ask whether a learned latent can supply the structure the recipe needs. We introduce texttt{ExpressionVAE} (eVAE), a discrete-latent perturbation model that compresses each cell into a short sequence of discrete codes through a finite-scalar-quantization (FSQ) bottleneck and trains a perturbation-conditioned discrete prior over those codes. On Replogle and Parse~1M, eVAE sets a new state of the art on every distributional metric and leads on most cell-eval perturbation metrics, with Fr'echet distance and $mathrm{MMD}^2$ roughly $3$ to $20times$ lower than the strongest continuous-latent baseline. Swapping the prior between autoregressive and masked discrete diffusion leaves performance near-identical, isolating the gain to the discrete latent itself rather than the prior family. A decoder-head ablation then exposes a single design axis, the richness of the predictive distribution at inference, that splits the standard metrics into two groups, variance-sensitive and mean-sensitive, which move in opposite directions along the axis. Finally, on a held-out CRISPRi reversion benchmark of $1{,}732$ perturbations under inflammatory cytokine stress, the frozen eVAE encoder outperforms UMAP and differential expression and matches scGPT on perturbation ranking at a fraction of the data.

18.
arXiv (CS.LG) 2026-06-16

The Information-Theoretic Benefit of Shared Representations under Orthogonality Constraints

arXiv:2606.16028v1 Announce Type: new Abstract: Modern deep learning architectures are increasingly multi-task and multi-modal, using a pretrained foundation model combined with task-specific, fine-tuned models. Empirically, exploiting similarity across different problems, instead of solving them individually, can significantly improve overall performance. While the generalization and sample complexity properties of multitask learning have been widely studied, the parametric complexity of joint approximation in comparison to separate approximation remains less well understood. The question is particularly relevant in modern deep learning, where models are increasingly required to satisfy structural constraints such as equivariance, conservation laws, or orthogonality. We prove lower and upper bounds on the description-length for separate and joint approximation classes, respectively, in uniform norm. We build a class of orthogonal functions by composing a shared hard feature, realized by a Rademacher-Haar wavelet series, with Sawtooth-Walsh readouts to enforce orthogonality of output coordinates. The dyadic tree structure of the Rademacher-Haar wavelet concentrates the approximation hardness in the common feature component, while the readouts act as task-specific heads. Using an information-theoretic framework, we obtain a sharp gap between the optimal approximation rates achievable by joint and separate coding. Finally, we realize this separation in a neural network model using Heaviside activations via reduction to triangle-wave approximation. Our results show that even under an orthogonality constraint joint approximation requires strictly fewer bits in compositional architectures, provided the tasks share a latent hard feature. This provides theoretical insight into the description-length-efficiency of compositional multi-output architectures and clarifies how neural networks can retain expressivity under geometric constraints.

19.
arXiv (CS.LG) 2026-06-16

Machine Learning-Driven Chemical Reactor Network Modeling of the Sandia-D Flame

arXiv:2606.14729v1 Announce Type: cross Abstract: Turbulent combustion simulations are crucial for many scientific and engineering systems. However, the high cost to fully resolve the complex multiscale and multiphysics behavior makes direct simulation typically infeasible. The equivalent reactor network (ERN) approach attempts to improve computational efficiency by replacing a multidimensional turbulent simulation with a series of much cheaper 0-D and 1-D chemical reactors, providing a surrogate model that retains detailed chemistry at the cost of simplified flow physics. However, their development remains a challenge, often requiring either expert analysis, or automated approaches that sacrifice accuracy. In this work, we develop an automated machine-learning-assisted framework for constructing ERNs of the Sandia-D turbulent methane/air flame. Principal component analysis is first used to reduce high-dimensional thermochemical computational fluid dynamics (CFD) data to a low-dimensional latent space, where k-means clustering identifies physically interpretable flame regions used to initialize a reactor-network graph. This initialization is then refined using finite-difference gradient descent wrapped around non-differentiable Cantera reactor simulations. Across 30 RANS simulations spanning a range of pilot temperatures and inlet methane compositions, the optimized 7-reactor ERN achieves a maximum-temperature $R^2$ score of 0.7945 while preserving a $\sim6000\times$ speedup over the CFD solver. Outlet CO prediction remains more challenging, with a final $R^2$ score of $-0.4183$, but improves substantially from the unoptimized clustering initialization. These results show that unsupervised thermochemical feature extraction can provide effective physics-informed initializations for ERN construction, while gradient-based refinement can significantly improve predictive accuracy without manual reactor-network design.

20.
arXiv (quant-ph) 2026-06-19

Sparse positive maps on qutrits with exact nondecomposability thresholds and PPT-entanglement transitions

arXiv:2606.19765v1 Announce Type: new Abstract: We study a family of sparse positive maps on qutrits for which positivity, decomposability, and PPT entanglement can all be analysed explicitly. The block structure of the associated Choi matrices reduces positivity to a Hermitian biquadratic form and leads to exact positivity boundaries for three representative parametric families. For the same families we determine the exact transition between decomposable and non-decomposable maps and construct associated PPT states of two classes. The first consists of witness-adapted deformations naturally tied to the non-decomposability analysis. The second consists of analytically tractable families whose full PPT-entangled branch is detected by fixed positive maps, yielding exact thresholds between separability and bound entanglement. For the trace-preserving subclass, we further compare positivity with a recent eigenvalue bound for 2-positive maps, thereby making the gap between positivity and higher-order positivity fully explicit within this family.

21.
arXiv (CS.LG) 2026-06-16

Enhancing Physics-Informed Neural Networks Through Feature Engineering

arXiv:2502.07209v4 Announce Type: replace Abstract: Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters – on average, 65% fewer than the competing feature engineering methods – while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.

22.
arXiv (CS.CV) 2026-06-17

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

Text-to-video retrieval in operating rooms (OR) is an enabling technology for OR safety, as it allows stakeholders to retrieve and inspect recordings of specific events. However, because the most safety-critical events may not follow the common structure, to unlock its full potential text-to-video retrieval must be able to handle implicit queries that require reasoning to identify the right video (e.g., the step right before clipping). However, existing methods rely on global embeddings that cannot reason over such queries. We propose OR3, a text-to-video retrieval method that converts clips into action-driven digital twins (ActDTs), grouping concurrent subject-action-object triplets under non-overlapping temporal intervals. Moreover, rather than cross-modal matching through paired encoders, OR3 performs imagination-based retrieval where an LLM generates hypothetical ActDTs from queries. This enables intra-modal matching via a single encoder trained with ActDT-tailored hard negatives. Finally, evidence-grounded refinement revises imagined ActDTs based on discrepancies with top candidates to capture procedure-specific patterns. We construct a benchmark from MM-OR with 276 implicit queries across four reasoning categories over 386 clips from robotic knee procedures. OR3 achieves 57.6 R@1 and 77.3 R@5, outperforming the strongest baseline. These results demonstrate that OR3 enables fine-grained discrimination between visually similar OR video clips through temporal action reasoning.

23.
arXiv (quant-ph) 2026-06-15

Local correlations in long-range dual-unitary kicked Hamiltonian chains

arXiv:2606.13857v1 Announce Type: new Abstract: Many-body Floquet models with exact space–time symmetry, such as the kicked Ising spin chain (KIC), provide natural examples of systems with dual-unitary dynamics. The requirement of exact space–time symmetry is, however, highly restrictive, as it permits only nearest-neighbor interactions. Based on a pair of Hadamard matrices, we construct a wide family of dual-unitary kicked spin chains with long-range interactions. We show that local two-point correlations in such models propagate along the light-cone edges \( |n| = r|t| \), where \(r\) is the interaction range, and can be derived analytically for operators with local support. This approach is illustrated using the example of a kicked Ising spin chain with next-to-next-neighbor interactions.

24.
medRxiv (Medicine) 2026-06-17

Method comparisons for differentiation of Schizophrenia and Bipolar based on rs-fMRI Intrinsic and Functional Networks

Psychosis as a symptom manifests in schizophenia and bipolar disorder, two highly heterogeneous psychiatric illnesses with overlapping clinical manifestations. Resting-state functional Magnetic Resonance Imaging (rsfMRI), represents a promising tool for identifying objective biomarkers of functional brain alterations to aid differential diagnosis. In this work, we comparatively evaluate multiple rs-fMRI representations for differentiating schizophrenia and bipolar disorder using intrinsic connectivity network (ICN) temporal profiles and several functional network connectivity (FNC) approaches, including static, dynamic, and high-order connectivity analyses. The study was conducted on a cohort of 371 subjects with psychosis, while evaluation was performed using a separate held-out cohort of 315 subjects. We investigated convolutional neural network architectures applied to ICN temporal profiles, spectrograms, and scalograms, alongside classical machine learning models trained on connectivity-derived features. Across the evaluated approaches, ICN temporal profiles provided the most consistent discriminative performance, with a 1D convolutional neural network achieving the strongest overall results under the benchmark protocol. Among connectivity-based methods, static functional connectivity generally outperformed dynamic and high-order representations, suggesting that increased representational complexity did not necessarily translate into improved generalization. Although the obtained classification performance remained modest, the results highlight the challenges of robust psychosis differentiation using rs-fMRI while emphasizing the relative stability of low-order connectivity representations and temporal ICN features. These findings contribute to ongoing efforts toward reproducible and interpretable neuroimaging biomarkers for psychiatric disorders.

25.
arXiv (CS.LG) 2026-06-17

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

arXiv:2606.17260v1 Announce Type: cross Abstract: We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.