Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-11

Nonlocal continuous-variable gates by amplified optical connections

arXiv:2603.12866v2 Announce Type: replace Abstract: Nonlocal quantum gates, coupling quantum systems located at a distance, are crucial for distributed quantum computing. To this aim, high-capacity optical noiseless connections between different processing units are essential for transmitting large amounts of information per mode. Simultaneously, optical quantum computing offers future high-speed multimode quantum processors. We propose a library of feasible protocols to implement a necessary nonlocal continuous-variable (CV) quantum nondemolition (QND) gate between two distant users sharing a quantum channel and exploiting classical communication. The users are endowed with a newly achieved high-fidelity and large-bandwith element - single-pass phase-sensitive optical parametric amplifier (OPA), that allows for both online squeezing and channel-loss compensation. The use of OPAs enhances quality of the resulting gate in terms of both excess noise and entangling capability. The proposed schemes are also applicable to CV cluster state fusion, providing a first step towards development of distributed CV measurement-based quantum computation.

02.
arXiv (CS.AI) 2026-06-16

Orchestrated Reality: From Role-Play to Living, Playable Game Worlds – LLM-Driven World Simulation as a Parameterized-Action POMDP

arXiv:2606.16014v1 Announce Type: cross Abstract: Many games rely on storytelling combined with systems that track levelling, NPC behaviour, and consequence simulation; bridging tightly-authored narrative with deeply-simulated worlds – most acute in sandbox and open-world settings – has been prohibitively expensive. LLM-driven worlds open a new path: a single harness can coordinate numerical state, narrative voice, storytelling pacing, and rule logic together. Realising this requires the LLM system to sustain a persistent world (who is where, what has just happened, what is currently true), which today's deployed systems do not: the narrative voice asserts state in free prose without any validated representation, so a fully autonomous game engine remains infeasible. We treat this as an architectural choice, not a limitation of language models, and report work in progress on a framework – orchestrated reality – that makes the world a canonical object owned by a singleton orchestration agent analogous to the tabletop-RPG Game Master (GM). We formalise an LLM-driven game world for a human player as a Parameterized-Action POMDP: state is a tree of canonical JSON entities, actions decompose as $a=(k, x_k)$ (a discrete intent kind plus structured JSON parameters), the agent observes only a narrative projection $o=O(s)$ of state, and the transition kernel $F$ is an LLM-driven Plan-Diff-Validate-Apply (PDVA) pipeline that commits schema-validated, content-hashed JSON deltas. We give the formal model, a JSON-state example, a worked single-turn example, and a catalogue of 15 illustrative incidents drawn from a real deployment showing the framework in action. Empirical validation through a planned human player study – together with multi-NPC concurrent agency and deployment as an RL environment – is situated as future work.

03.
medRxiv (Medicine) 2026-06-12

High coverage, persistent gaps: quality of Antenatal Care and its determinants in Zambia based on the 2024 Demographic and Health Survey.

Abstract Background Evaluating antenatal care (ANC) quality is critical to reducing maternal and neonatal mortality. In Zambia, despite high basic ANC attendance, comprehensive national evidence on the clinical content and quality of services remains limited. This study assessed the coverage of WHO-recommended ANC interventions and identified factors associated with care quality using the latest national data. Methods A cross-sectional analysis was conducted using data from the 2024 Zambia Demographic and Health Survey. The final analytic sample comprised 4,829 women aged 15-49 with a live birth in the preceding 5 years. A composite index of 15 selected, equally weighted WHO-recommended components evaluated clinical assessment, counseling/screening, preventive interventions, and utilization. Survey-weighted Poisson regression estimated adjusted incidence rate ratios (aIRRs) for the count of ANC components received. Results The mean ANC quality score was 12.5 out of 15 (95% CI: 12.4-12.6), and 78.5% (95% CI: 77.0-80.0) of women achieved adequate ANC ([≥] 12/15 components). While individual clinical and counseling coverage generally exceeded 90%, only 47.2% (95% CI: 45.3-49.0) of women initiated care during the first trimester, and just 4.8% (95% CI: 4.1-5.6) achieved [≥] 8 ANC contacts. Maternal education was the strongest and most stable predictor of quality across all models. Compared to no education, higher education was associated with an 8.0% higher expected quality score (aIRR = 1.080, 95% CI: 1.051-1.110). Lower ANC quality was significantly associated with unwanted pregnancies (aIRR = 0.970, 95% CI: 0.956-0.993) and with residence in Western (aIRR = 0.923, 95% CI: 0.897-0.951) and North Western (aIRR = 0.966, 95% CI: 0.937-0.996) provinces. Absence of distance barriers and residence in Eastern, Luapula, and Copperbelt provinces were associated with higher quality scores. Conclusion While average ANC component coverage in Zambia is high, critical gaps persist in early initiation and total contact frequency. Care adequacy is strongly influenced by maternal education, relationship status, pregnancy intention, and regional inequities. These findings underscore the need for interventions targeted at uneducated women, preventing unintended pregnancies, and underserved regions such as Western and North Western Provinces. Keywords: Antenatal care quality, ANC content, Zambia, maternal education.

05.
arXiv (CS.CV) 2026-06-12

Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head. Trained on the provided train and validation splits, the model reaches a 3D mAP of 0.85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0.99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.

06.
arXiv (CS.AI) 2026-06-15

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

arXiv:2606.14176v1 Announce Type: new Abstract: Geometry problem generation is useful for AI-assisted education and multimodal mathematical reasoning, but reliable synthesis remains difficult because the problem statement, diagram, constraints, and solution should be mutually consistent. Existing methods often trade off controllability and reliability: seed-based rewriting is flexible but weakly verifiable, whereas diagram-first construction improves validity but is less suited to arbitrary user-specified constraints. We introduce VeriGeo, a controllable geometry generation framework grounded in executable reasoning traces. Given user constraints such as target concepts and difficulty, an Author agent generates a problem and diagram, and a Solver agent produces a proof-aligned solution. Both agents use a shared action sequence that connects natural language, diagrams, geometric constraints, and proof steps into a verifiable representation. A three-stage pipeline checks numerical consistency, analytical realizability, and global consistency, using verification-guided reflection to repair recoverable failures and reject unrecoverable ones. Across five LLM backbones, raw generations frequently fail these checks, while VeriGeo repairs a substantial fraction of the invalid attempts. Supervised fine-tuning on 8.7k examples generated by VeriGeo achieves the best reported GeoQA performance among end-to-end multimodal LLM-based solvers, and obtains strong results on PGPS9K and MathVista-GPS, demonstrating the effectiveness of verified synthetic data for improving multimodal geometry reasoning.

07.
medRxiv (Medicine) 2026-06-12

Reduced nighttime smartphone use among cohabiting partners: a longitudinal study under the lens of social control of health behaviors theory

Objective: We examined the link between cohabitation with a partner and nighttime smartphone use through the social control of health behavior theory. Background: Nighttime smartphone use is a behavioral risk factor for sleep problems. While previous research has predominantly focused on individual-level risks of sleep disturbances, the role of social context remains underexplored. Theoretical frameworks, specifically the Social Control of Health Behavior, suggest that social relationships regulate health-related behaviors; however, it is unclear how far this regulation extends to modern digital behaviors among couples. Method: We analyzed survey data from three waves of the SmartSleep Study (2018, 2020, and 2023; total N = 25,028), including a longitudinal follow-up subset (N = 1,003). We tested multivariate associations between living with a partner, changes in cohabitation status and frequent nighttime smartphone use by fitting generalized linear mixed-effects models. Additionally, we mapped the complex interplay between indicators of social integration, social support, smartphone use, and sleep quality using hierarchical clustering of non-linear correlations. Results: Cohabiting participants had lower odds of frequent nighttime smartphone use compared to those living alone (OR = 0.66; 95% CI: 0.61, 0.72). This lower risk was driven primarily by cohabitation with a partner (OR = 0.49; 95% CI: 0.36, 0.66). Longitudinal analysis supported these findings, showing that sustained cohabitation was associated with less frequent nighttime use (OR = 0.56; 95% CI: 0.38, 0.82). Clustering analysis revealed that indicators of social integration and support clustered with favorable sleep quality. Conclusion: Our findings suggest that the health-protective effects of cohabitation with a partner extend to digital behaviors. Consistent with social control of health behavior theory, the presence of a partner appears to reduce frequent nighttime smartphone use, highlighting the critical importance of considering social context when addressing digital health hygiene and promoting sleep.

08.
arXiv (CS.AI) 2026-06-19

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

arXiv:2606.19602v1 Announce Type: new Abstract: Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generation fails on this data, mishandling temporal reasoning, cross-document dependencies, and missing metadata. We deploy ACIE (Agentic Clinical Information Extraction) at University Medicine Essen: an on-premise agentic RAG pipeline that reasons over complete patient contexts and grounds every answer in source passages for clinician verification. We quantify the metadata gap, trace the architectural decisions it shaped, and evaluate extraction alongside an independent retrospective lymphoma registry study, in which nuclear-medicine physicians verify every extracted value against its cited sources. Across 7,326 judgments, clinicians accepted 96.5\% of extractions, with per-type acceptance ranging from 80\% to 99\%.

09.
bioRxiv (Bioinfo) 2026-06-18

A data-driven rediscovery of the specificity-conferring code of adenylation domains in nonribosomal peptide synthetases

Nonribosomal peptide synthetases (NRPSs) are large modular enzymes that assemble structurally diverse peptides, many of pharmacological importance, including antibiotics and immunosuppressants. Within each NRPS module, the adenylation (A) domain selects the substrate to be incorporated, a choice governed by a small set of residues lining the binding pocket. For two decades, computational prediction of A-domain substrate specificity has relied on residue sets - most prominently the Stachelhaus code and the 34-residue "8 Angstrom code" - that were defined by spatial proximity to the substrate rather than by demonstrated predictive value. Here we revisit which residues govern substrate specificity from a purely data-driven perspective. We assembled a non-redundant dataset of 5,366 A-domain sequences (4,693 bacterial and 673 fungal) and used information-theoretic measures to rank alignment positions by their statistical association with substrate identity, without restricting candidate positions to any predefined structural shell. This procedure yielded two compact, kingdom-specific codes: IG15B (15 positions) for bacterial and IG13F (13 positions) for fungal A-domains. Both match or exceed the predictive accuracy of the 34-residue 8 Angstrom code while using fewer than half its positions, and both independently recover the majority of the classical Stachelhaus positions. Notably, our analysis identifies four positions (242, 280, 281, and 284) that lie outside all conventional codes yet carry non-redundant specificity information and co-localize with classical determinants on two helices flanking the binding pocket. These positions provide new candidate sites for the rational engineering of A-domain specificity.

10.
arXiv (CS.AI) 2026-06-16

Input-Dependent Fisher Information for Local Sensitivity Analysis of Medical Image Classifiers

arXiv:2606.16362v1 Announce Type: cross Abstract: Deep neural networks have achieved strong performance in medical image classification, but often work like black-box. Commonly used post-hoc interpretation methods often provide heuristic visualizations whose relationship to the classifier's predictive distribution is indirect. This work introduces a local sensitivity analysis framework based on the input-dependent Fisher Information Matrix (iFIM) of a trained classifier. The iFIM characterizes how the classifier's predictive distribution changes under infinitesimal perturbations of the input image. By using a Gram-matrix formulation, the nonzero eigenspectrum of the iFIM can be recovered without explicitly forming the full image-dimensional Fisher matrix. The leading iFIM eigenspace is then used to project an input image into a high local-sensitivity component and its orthogonal component. These components provide a model-intrinsic description of local predictive sensitivity, rather than a conventional pixel-wise attribution heatmap or a causal segmentation of task-relevant anatomy. The framework is evaluated on controlled and clinical medical image classification tasks using multiple classifier architectures. Perturbation-based experiments show that high-sensitivity iFIM components are more strongly coupled to changes in predictive confidence and classification performance than lower-sensitivity complementary components. The results support the iFIM framework as a principled tool for analyzing local decision sensitivity and for complementing existing attribution-based interpretability methods in medical imaging.

11.
arXiv (CS.CL) 2026-06-19

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings – vectors that encode the semantic relationships between words – through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.

12.
arXiv (CS.AI) 2026-06-11

Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

arXiv:2606.02670v3 Announce Type: replace-cross Abstract: Many recent multivariate time series anomaly detection (MTSAD) models incorporate cross-channel modeling, under the implicit assumption that the structure of anomalies may be spread across multiple channels. We evaluate this assumption on eight widely used public benchmarks by introducing a per-segment diagnostic framework that flags, for each labeled anomaly, whether at least one channel deviates individually from its normal history, whether the cross-channel correlation structure changes, or both. The framework shows that no cross-channel rupture occurs without an accompanying univariate deviation across a range of reasonable thresholds. A complementary metric also reveals that on six of the eight benchmarks, at least half of the labeled anomaly segments deviate univariately on 89% to 100% of their timesteps, reaching 100% on three of these datasets. To verify that our framework captures cross-channel structure when present, we construct synthetic data of phase-shifted sinusoidal channels with shared noise. Each anomalous segment is altered through one of two channel-wise corruptions that preserve the per-channel marginal distribution while breaking cross-channel structure, and our framework correctly characterizes these segments as cross-channel-only. On these data, channel-dependent (CD) models successfully exploit the cross-channel signal whereas channel-independent (CI) ones fail. The CI/CD comparison of a recent SOTA detector on real benchmarks further confirms that CD modeling brings no measurable gain. We conclude that current MTSAD benchmarks are unsuitable for validating cross-channel modeling capabilities, and we call for the development of more structurally diverse evaluation sets. The code for this study is publicly available.

13.
arXiv (CS.CL) 2026-06-16

SPI: Query-Depth-Adaptive Indexing for Streaming RAG in Vector Databases

Vector databases (VecDBs) are increasingly deployed in retrieval-augmented generation (RAG) pipelines where query processing and document ingestion occur concurrently. The index layer needs to provide low-latency search while incorporating new vectors without frequent global rebuilding. Existing VecDB pipelines typically operate within a uniform representation regime, despite substantial variation in the semantic granularity required across queries. This motivates an index design that supports incremental updates while adapting retrieval depth to query distribution and complexity. We propose Semantic Pyramid Indexing (SPI), a VecDB-layer indexing framework that organizes embeddings into $L$ semantically aligned resolution levels and selects retrieval depth per query via a lightweight uncertainty-aware controller. SPI supports progressive coarse-to-fine ANN search, level-wise streaming insertion without global rebuilds, and distributed execution through LSH partitioning with asynchronous gRPC coordination. Unlike hierarchical ANN structures with fixed traversal rules (e.g., SPANN), SPI adapts resolution at query time while remaining compatible with FAISS and Qdrant backends. On MS MARCO and Natural Questions, SPI achieves competitive Recall@10 with lower latency under the same dense encoder family, yielding a 1.4–2.3$\times$ average retrieval latency reduction under fixed Recall@10 targets relative to comparable approximate-ANN baselines. A prototype scaling study up to 8 nodes shows $6.2\times$ throughput scaling (${\approx}73\%$ efficiency); the 16-node configuration is included for completeness but shows diminishing efficiency. We provide a top-$K$ stability guarantee: queries with sufficient retrieval margin return an identical top-$K$ set at a shallower level. Code and configurations are available at https://github.com/FastLM/SPI_VecDB.

14.
arXiv (CS.CV) 2026-06-16

HemExp: Clinically-Guided Latent Diffusion for Modeling Hematoma Expansion

Hematoma expansion (HE) after spontaneous intracerebral hemorrhage (ICH) is a major determinant of acute triage and treatment decisions in neurosurgical care. However, most existing methods provide either a binary expansion risk or a single follow-up volume, limiting uncertainty-aware decisions. We introduce HemExp, a clinically-guided latent diffusion model that generates patient-specific follow-up non-contrast CT images, along with segmentations of intraparenchymal and intraventricular hemorrhage. Generation is conditioned on baseline imaging, clinical variables, and an explicit expansion indicator, enabling controllable simulation of realistic clinical scenarios. HemExp uses a hemorrhage-aware multi-head variational autoencoder and models progression as the difference between baseline and follow-up latent representations with a conditional diffusion model. The model is trained on paired scans from 450 patients across multiple centers and evaluated on 107 patients from a held-out institution. HemExp produces spatial HE probability maps by generating multiple synthetic follow-up images per patient to estimate distributions of plausible follow-up hematoma volumes. Perturbing clinical inputs such as symptom-onset-to-imaging time or anticoagulant status shifts the predicted follow-up volume distribution. HemExp extends binary predictors and demonstrates robust estimation of clinically relevant outcomes in the imaging space, such as hematoma volume, intraventricular involvement, and mass effects. Overall, our results support controllable latent diffusion as a promising direction for uncertainty-aware modeling of early ICH progression.

15.
arXiv (CS.AI) 2026-06-19

Multi-Head Attention-Based Feature Extractor Integration with Soft Actor-Critic for Porosity Prediction and Process Parameter Optimization in Additive Manufacturing

arXiv:2606.20087v1 Announce Type: new Abstract: Additive manufacturing process optimization requires precise parameter control to minimize defects such as porosity. Traditional reinforcement learning (RL) approaches using discrete action spaces suffer from slow convergence and susceptibility to local optima, limiting their effectiveness for high-precision manufacturing tasks. This study addresses these limitations by employing a continuous action space combined with a novel architecture that integrates a multi-head attention mechanism with the Soft Actor-Critic (SAC) algorithm. The attention-based feature extractor enhances the agent's ability to capture subtle variations in low-dimensional input features, enabling more effective exploration-exploitation balance for navigating value spaces with local minima. We validate our approach on porosity prediction and process parameter optimization in laser powder bed fusion, demonstrating faster convergence and higher final reward values compared to standard RL methods including DQN, PPO, TD3, and vanilla SAC. The proposed methodology achieves a convergence value of 322.79 within 14 episodes, outperforming existing approaches while maintaining stability throughout training.

16.
arXiv (CS.CV) 2026-06-19

DiffMath: Symbol- and Graph-Aware Latent Diffusion Transformer for Handwritten Mathematical Expression Generation

Handwritten Mathematical Expression Generation (HMEG) is challenging due to the complex two-dimensional layouts and long-range structural dependencies of mathematical expressions. Existing methods typically rely on explicit spatial supervision, such as symbol-level bounding boxes, which incurs high annotation costs and limits scalability. In this work, we propose DiffMath, a symbol- and graph-aware latent diffusion framework that leverages the hierarchical structure inherent in LaTeX as a structural prior, eliminating the need for positional supervision. First, we design a Relational Abstract Syntax Tree (RelAST), a generation-oriented representation that distills MathML trees into compact triplet sequences [S, R, D], where each token directly encodes a symbol identity, spatial relation, or nesting depth. Second, we introduce MathVAE, which learns structure-preserving latent representations through symbol-aware and relation-aware perceptual regularization, ensuring that the latent space captures both character semantics and spatial topology. Third, MathDiT performs conditional denoising in this structured latent space, further guided by a global symbol-count prior via Adaptive Layer Normalization (AdaLN) to improve structural coherence. Experiments show that DiffMath produces structurally consistent handwritten expressions, achieves superior performance over existing methods, and improves the accuracy of downstream OCR models through synthetic data augmentation.

17.
arXiv (CS.CL) 2026-06-18

REVES: REvision and VErification–Augmented Training for Test-Time Scaling

Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a fundamental misalignment with multi-step inference dynamics. While recent work treats this as multi-turn reinforcement learning (RL), conventional approaches optimize over the multi-step trajectories directly, failing to further exploit the high-quality mistakes in intermediate steps that model can learn from correcting them. We propose a two-stage iterative framework that alternates between online data/prompt augmentation and policy optimization. By converting the intermediate steps (``near-miss'' answers) in the successful recovery trajectories into decoupled revision and verification prompts, our approach concentrates training on both effective answer transformation and error identification. This approach enables efficient off-policy data generation and reduces the computational overhead of long-horizon sampling compared to standard multi-turn RL. On LiveCodeBench, using publicly available test cases as feedback, we observe gains of +6.5 points over the RL baseline and +4.0 points over standard multi-turn training. Beyond coding, our approach matches the previously reported SOTA result on circle packing while using the smallest base model (4B) and far fewer rollouts than the much larger evolutionary search systems. Math results under ground-truth verification further confirm improved correction ability. It also generalizes to out-of-distribution constraint-satisfaction puzzles such as n\_queens and mini\_sudoku, where correctness is defined entirely by problem constraints. Code is available at https://github.com/yxliu02/REVES.git.

18.
arXiv (quant-ph) 2026-06-11

Necessary and Sufficient Conditions for Universal Gates with Pauli Strings and Beyond

arXiv:2606.12096v1 Announce Type: new Abstract: Any quantum computation consists of a sequence of unitary evolutions described by a finite set of Hamiltonians. For the case where this set consists of only products of Pauli operators, known as Pauli strings, we provide a necessary and sufficient condition for it to generate $\mathfrak{su}(2^n)$, i.e., to be universal for quantum computation on $n$ qubits. When combining Pauli strings with a general Hamiltonian, we show a sufficient (and in certain circumstances even necessary) condition for universality based on the Pauli-basis expansion of the Hamiltonian. As an application of these results, we prove two corollaries: (i) a necessary and sufficient condition for the universality of a general Hamiltonian given arbitrary single-qubit control on all qubits, and (ii) the universality of an XYZ Heisenberg Hamiltonian with local control of just two adjacent qubits.

19.
arXiv (CS.LG) 2026-06-17

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

arXiv:2606.18175v1 Announce Type: cross Abstract: We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters and solved by a single direct linear least-squares QR factorization. The trial space, which we term Linear-in-Learnables (LiL), comprises representations whose trainable parameters enter linearly, including random-feature extreme learning machines, spectral polynomial bases, and trigonometric expansions, each implemented as a physics-informed neural network. The method thus replaces the nonconvex gradient-based training that limits standard PINNs with a convex per-step solve. We establish local Newton-Kantorovich convergence of the outer iteration to a residual-limited neighborhood under an explicit smallness condition, with the limiting accuracy governed by the best-approximation residual of the trial space rather than by an optimization tolerance. The method, denoted LiL-Q, is assessed on seven benchmarks spanning scalar nonlinear PDEs (Bratu, viscous Burgers, Buckley-Leverett), coupled systems (plane-strain elasticity and the incompressible Navier-Stokes equations in two and three spatial dimensions), and steady-state Darcy flow with heterogeneous permeability. Across these problems, LiL-Q converges in single-digit outer iterations in most cases, even at the coarsest basis sizes and independent of the parameter count. When the exact solution lies in the span of the trial space, the method recovers it to machine precision in a single solve. On the Navier-Stokes benchmarks, it matches or exceeds published PINN solvers with up to two orders of magnitude fewer trainable parameters, without gradient-based optimization.

20.
arXiv (quant-ph) 2026-06-15

A new class of degenerate solutions to the massless Dirac equation and their potential applications in optical memories

arXiv:2606.14256v1 Announce Type: new Abstract: In this article, we present a novel class of degenerate solutions to the massless Dirac equation, corresponding to a wide variety of electromagnetic 4-potentials and fields, including both zero field and circularly polarized electromagnetic waves. An interesting property of these solutions is that the spin of the particles rotates in synchronization with the electric and magnetic fields of the electromagnetic waves. These results could be utilized for the development of optical memories based on materials supporting massless Dirac fermions, such as graphene.

21.
arXiv (CS.AI) 2026-06-15

From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing

arXiv:2606.14639v1 Announce Type: cross Abstract: Recent advances in speech generation have significantly improved the naturalness of synthetic speech, making spoofing detection increasingly challenging. A key limitation of current anti-spoofing systems is their limited robustness to unseen synthesis methods. In this work, we transform a self-supervised speech representation model into a Mixture-of-Experts (MoE) architecture to improve generalization. Feed-forward blocks in selected encoder layers are replaced by multiple expert networks controlled by a layer-wise gating mechanism, allowing experts to capture complementary acoustic patterns while preserving the representations learned during self-supervised pretraining. We further analyze the architectural choices affecting the performance of this MoE conversion and investigate the activation behavior of the experts. The proposed approach is evaluated on 14 spoofing datasets and reduces the macro EER from 5.46% to 4.81%, corresponding to 11.9% relative improvement over the baseline.

22.
arXiv (CS.AI) 2026-06-11

When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

arXiv:2605.26418v2 Announce Type: replace-cross Abstract: A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload we test - so when, if ever, does DRL actually help? We study this in RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control, where an agent allocates compute to a dynamic workload under cost and service-level constraints. We evaluate PPO, DQN, A2C, SAC, TD3, and DDPG under matched architectures, training budgets, and reward functions against a calibrated rule-based baseline across six workload patterns and five seeds (240 runs), instantiate the benchmark on Kubernetes Horizontal Pod Autoscaling, and probe distribution-shift generalization. Three findings challenge common assumptions: (i) the calibrated controller achieves the lowest cost on all six workloads, though it trails the best RL agents on bursty and flash traffic; (ii) discrete-action algorithms outperform continuous-action ones by one to two orders of magnitude in constraint violations due to action-space mismatch; and (iii) no single algorithm dominates across workloads, with rankings shifting by up to four positions. The bottleneck in RL-based resource control is not algorithm selection but baseline calibration, reward engineering, and realistic evaluation protocols.

23.
arXiv (CS.CV) 2026-06-16

Training-free sparse attention based on cumulative energy filtering

Sparse attention accelerates Diffusion Transformers (DiTs) for video generation by computing only the important tokens while skipping the rest. The token selection strategy is key to balancing sparsity and accuracy. We formulate the token filtering process as a dual-goal optimization problem: maximizing sparsity and minimizing accuracy degradation. Existing algorithms cannot fulfill both objectives simultaneously. For example, Top-p only considers the accuracy constraint, while Top-k maintains a fixed computational budget but loosens the accuracy constraint. This paper demonstrates that maintaining a fixed recall rate is sufficient for ensuring accuracy, whereas a fixed threshold is suboptimal for reducing computational cost. Therefore, we propose a dynamic thresholding scheme to improve sparsity while maintaining the same level of accuracy. Furthermore, our algorithm is deeply integrated with Flash Attention (FA), eliminating the need for any additional masking computation overhead. Experimental results on Wan 2.2 validate that, compared to the BLASST algorithm which is also integrated with FA, our dynamic thresholding strategy enhances sparsity from 61.42\% to 82\% with a VBench metric drop of less than 5\%. This results in an approximate 15\% in attention computation and a $1.61\times$ increase in computational efficiency, which is 1.18x higher than that of BLASST.

24.
arXiv (CS.LG) 2026-06-18

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

arXiv:2606.18464v1 Announce Type: cross Abstract: Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

25.
arXiv (CS.LG) 2026-06-16

Scalable Graph Condensation with Evolving Capabilities

arXiv:2502.17614v3 Announce Type: replace Abstract: The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.