Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-24

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

arXiv:2605.01482v3 Announce Type: replace Abstract: Multi-Hop Fact Verification requires complex reasoning across disparate evidence, posing significant challenges for Large Language Models , which may suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought , often lack explicit modeling of the structural dependencies between evidence and claims. In this work, we introduce an SCM-inspired framework that grounds reasoning in explicit directed dependency graphs, treating verification as a constructive structural reasoning process rather than full causal inference with interventions or counterfactual semantics. We empirically identify an "inverted U-shaped" correlation between reasoning-chain length and accuracy, revealing that excessive structural complexity can degrade performance. To address this, we propose a rule-based reinforcement learning strategy using Group Relative Policy Optimization. This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework outperforms strong baselines while producing more traceable reasoning structures for complex fact verification.

02.
arXiv (math.PR) 2026-06-15

Lehner's operator norm formulas, semidefinite programming, and spiked matrix models

arXiv:2606.14687v1 Announce Type: new Abstract: Lehner (1999) derived elegant formulas for the operator norm $\|\mathfrak{X}\|$ of operators of the form $\mathfrak{X} = \mathbf{A}_0 \otimes \mathfrak{1} + \sum_{i = 1}^n \mathbf{A}_i \otimes \mathfrak{m}_i$, also easily generalized to the spectral edge $\lambda_{\max}(\mathfrak{X})$, in terms of nonlinear optimization problems over positive definite matrices. Here the $\mathbf{A}_i$ are finite-dimensional Hermitian matrices, the $\mathfrak{m}_i$ are either free semicircular or free Rademacher families of operators, and $\mathfrak{1}$ is the identity operator. We first show that both of Lehner's nonlinear optimizations can be rewritten as linear semidefinite programs (SDPs), even in the Rademacher case where Lehner's optimization is not itself convex. We give the primal and dual forms of these SDPs, derive the complementary slackness relations and consequences thereof, and propose that the SDPs are more stable and accurate than the iterative numerical scheme proposed in Lehner's original work. We then apply the SDPs from the semicircular case to spiked matrix models, studied recently via Lehner's formula by Bandeira, Cipolloni, Schröder, and van Handel (2024). We give a new proof of the Baik–Ben Arous–Péché (BBP) transition they establish in models with isotropic (but possibly correlated) Gaussian noise by constructing feasible variables for the associated primal and dual SDPs. Combining our construction with a sensitivity interpretation of optimal dual variables, we study the fluctuations of leading eigenvectors of such models. We conjecture and give numerical evidence that these fluctuations are Gaussian but anisotropic and non-universal, and that their covariance may be computed in terms of the optimizer of the dual of Lehner's formula, which in turn is approximately the leading eigenmatrix of a completely positive operator associated to the covariance of the noise model.

03.
arXiv (CS.LG) 2026-06-12

Majority-of-Three is Optimal

arXiv:2606.13614v1 Announce Type: cross Abstract: We give a short proof that the majority vote of three independent consistent classifiers is an optimal learner in the realizable PAC setting. This proves optimality for the simplest voting scheme, while simplifying both the algorithmic structure and the probabilistic analysis of previous voting learners, including the algorithm of S. Hanneke and the analysis of bagging by K. Green Larsen.

04.
arXiv (CS.AI) 2026-06-12

TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation

arXiv:2606.12657v1 Announce Type: new Abstract: Human mobility data is important for transportation, urban planning, and epidemic control, but large-scale trajectory collection is often costly and privacy-constrained, motivating realistic synthetic trajectory generation. Existing LLM-based generators typically rely on either prompt engineering, which preserves zero-shot reasoning but lacks fine-grained spatiotemporal grounding, or trajectory-level fine-tuning, which improves statistical precision but incurs substantial computational cost and may weaken general reasoning. We propose TrajGenAgent, a semantic-aware hierarchical LLM-agent framework for human mobility trajectory generation without model fine-tuning. TrajGenAgent uses a two-stage orchestrator-worker design: an LLM first synthesizes an individual- and weekday-conditioned activity chain from historical evidence via in-context learning, and a deterministic workflow then grounds each activity into a complete visit using personalized POI retrieval, distance-aware location selection, kinematics-aware travel-time propagation, and LLM-based duration estimation. To evaluate realism beyond aggregate spatiotemporal statistics, we introduce an anomaly-detection-based evaluation framework using two complementary detectors to assess behavioral and semantic plausibility. Experiments on benchmark and large-scale simulation datasets show that TrajGenAgent improves spatiotemporal fidelity, semantic coherence, and individual-specific behavioral realism over representative neural and LLM-based baselines, while avoiding parameter updates.

05.
arXiv (CS.CV) 2026-06-16

VEPHand: View-Efficient Photometric Hand Performance Capture at Scale

Robust, high-fidelity 3D hand capture, while fundamental to digital human creation, remains challenging with practical multi-view systems that balance rich photometry with the geometric ambiguities of reconstruction arising from limited viewpoint density. This paper presents an end-to-end pipeline for dynamic hand performance capture and registration, specifically designed for view-efficient setups ($\sim$20 views). We address key challenges with two primary innovations. First, to overcome reconstruction difficulties like limited view overlap and background clutter, our mask-free neural method robustly extracts detailed hand geometry and appearance from unmasked images using scene parameterization and scenario-specific density regularization. Second, addressing registration challenges such as accurately capturing non-linear skin deformations and ensuring plausible results during severe self-contact, we propose a physics-inspired framework. It aligns reconstructions to a personalized hand model by optimizing intrinsic volumetric offsets within its canonical tetrahedral mesh, alongside pose parameters. This approach, supported by robust losses and optimization, captures fine surface deformations, ensures plausible results under severe articulation and self-contact, and demonstrates strong tolerance to input noise. We demonstrate the scalability and robustness of our automated pipeline on an extensive dataset of over 12,000 sequences, from which we also derive a large-scale, high-quality synthetic 2D/3D hand dataset for training downstream tasks. This showcases its effectiveness for single hands, intricate two-hand interactions, and natural hand-object manipulations. Our method achieves state-of-the-art reconstruction fidelity in view-efficient, unmasked scenarios and highly accurate registration. Our project page are available at https://zyshen021.github.io/VEPHand/.

06.
arXiv (CS.AI) 2026-06-18

Self-CTRL: Self-Consistency Training with Reinforcement Learning

arXiv:2606.18327v1 Announce Type: cross Abstract: Language models (LMs) that faithfully describe their own behavior can more easily be audited, understood, and trusted by users. This paper describes Self-Consistency Training with Reinforcement Learning (Self-CTRL), a method that optimizes for consistency between a LM's self-explanations and behavior on related inputs by updating explanations to better predict behavior or updating behavior to better match explanations. We apply our method in two domains. First, we study a formal probabilistic reasoning task in which LMs must learn to imitate a family of biased samplers and evaluated on their ability to report the associated biases. We find that consistency training improves the correlation between self-reported and behaviorally-measured latent biases from $R^2=0.24$ to $R^2=0.64$ on a set of held-out distributions, matching the generalization of direct ground-truth supervision. Second, we study a constitutional AI domain in which LMs must describe when they will refuse or comply with user requests. Here, Self-CTRL produces rules that faithfully describe the model's behavior on held-out requests, improving the refusal predictions of a third-party auditor model from $36\%$ to $92\%$. In the other direction, behavior updates improve alignment, reducing HarmBench failure rate from $15.0\%$ to $0.5\%$ without substantially increasing refusal on harmless prompts. By aligning explanations and behavior, our work provides a general recipe for training AI models to be safer, more transparent, and more controllable.

07.
arXiv (CS.CV) 2026-06-17

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs through sliding window subsampling training timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO's effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment quality.

08.
medRxiv (Medicine) 2026-06-22

T Cell Receptor repertoire analysis reveals antigenic convergence and immunotherapeutic opportunities in Prostate Cancer

Background: The T-cell receptor {beta} (TCR{beta}) repertoire reflects antigen-driven adaptive immune responses and provides insight into tumor-immune interaction. In prostate cancer (PCa), the immunosuppressive tumor microenvironment limits effective T-cell activation, and the antigenic drivers shaping intratumoral TCR repertoires remains poorly defined. This study aimed to characterize matched tumor and peripheral TCR{beta} repertoires from treatment-naive PCa patients and to identify shared clonotypes and antigenic specificities associated with disease severity. Methods: Next-generation sequencing was used to profile TCR{beta} repertoires from matched tumor biopsies and peripheral blood mononuclear cells obtained from treatment-naive PCa patients. Repertoires clonality, diversity, and was assessed using established metrics. Antigenic convergence was evaluated using GLIPH2 to identify shared CDR3{beta} motifs and predicted tumor-associated antigen (TAA) recognition, followed by functional validation using IFN-{gamma} ELISpot and T-cell expansion assays. Results: Tumor-derived TCR{beta} repertoires displayed reduced richness and increased clonality compared with peripheral blood mononuclear cells, consistent with local antigen-driven expansion. High-grade tumors demonstrated greater interpatient clonotype sharing and motif-level convergence, indicative of recognition of common TAAs. GLIPH2 analysis associated expanded clonotypes with epitopes derived from prostate-specific G-protein coupled receptor (PSGR), prostate-specific membrane antigen (PSMA), and prostate-specific antigen (PSA). Functional validation confirmed that peptide pools containing PSGR- and PSMA-derived epitopes induced IFN-{gamma} production and antigen-specific T-cell proliferation in vitro. Conclusions: These findings reveal an oligoclonal, antigen-driven intratumoral TCR{beta} landscape and identify PSGR and PSMA as immunogenic, potentially actionable targets. Integration of TCR profiling with antigen discovery pipelines may support the development of TCR-based biomarkers and precision immunotherapeutic strategies in prostate cancer.

09.
arXiv (CS.CL) 2026-06-11

Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting

When many people highlight the same document, is the crowd a single consensus, or is it internally structured into reader sub-groups that mark different things – and is that structure a stable property of a reader or of the document? Building on prior work showing an individual's within-document highlighting signal is a whisper while individuality lives in selection, we ask the group-level question on a co-readership platform using a margin-preserving curveball null. Experiment 1: within a document, readers form strong sub-groups – pairs agree far beyond what shared salience, mark density, and sentence popularity predict (nearest-neighbour agreement z=+6.3, significant in 88% of documents). Under an eight-block region-preserving null, shared engagement with the same coarse regions of the document accounts for about 40% of this excess; the majority survives as finer reader-specific agreement (z=+3.6, 77% significant). So the within-document crowd is, in a descriptive sense, factional. Experiment 2: is that grouping a stable reader trait? Here we are honest about power. The cross-document split-half reproducibility of a pair's agreement is near zero pooled (+0.078 and 0.000 in two separately drawn samples), and a power calibration shows the test is informative only for pairs that co-read many documents. In the only informative high-overlap subset (k>=4), point estimates are positive but small-sample, imprecise across the separately drawn samples, never significant, and attenuate under the region-preserving null. We therefore leave cross-document stability unresolved: the data is consistent with anything from situational grouping to a weak-to-moderate stable reader trait. The crowd is factional within a document; whether its factions follow the reader across documents is, honestly, beyond our reach.

10.
arXiv (CS.AI) 2026-06-19

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

arXiv:2606.20031v1 Announce Type: cross Abstract: Dynamic environmental changes, confined workspaces, and stringent real-time constraints make pathfinding in Robotic Mobile Fulfillment Systems (RMFS) a challenging problem for conventional search- and rule-based methods, which typically suffer from high computational complexity and long decision latency. While reinforcement learning (RL) has emerged as a powerful alternative, deploying learned policies with extreme energy efficiency on resource-constrained hardware remains an open challenge. We present SDQN-RMFS, an end-to-end framework that achieves high-fidelity deployment of an RL-trained policy from a full-precision artificial neural network (ANN) through to a neuromorphic chip. By computing only when triggered by sparse events, this framework unlocks ultra-low-power RMFS pathfinding. Our full-stack pipeline operates as follows: an ANN policy is first efficiently trained via a collision-allowing strategy to densify informative trajectories, and then converted into a spiking neural network (SNN) via a hard-label knowledge distillation approach. This effectively addresses the output distribution mismatch, preserving policy capability across the ANN-to-SNN pipeline while substantially reducing inference latency. Hardware experiments demonstrate up to 11,281$\times$ energy savings and a nearly two-fold reduction in latency compared to a high-performance GPU baseline, while maintaining decision quality on par with the original trained policy. These results establish physical neuromorphic inference as a practical and energy-sustainable pathway for large-scale RMFS operations.

11.
arXiv (CS.CV) 2026-06-16

ST-DiffEye: Diffusion-based Continuous Gaze Generation via Joint Scanpath-Trajectory Modeling

We study the problem of human gaze modeling, which aims to generate the gaze patterns a viewer produces while observing a visual stimulus. Gaze is primarily captured through two modalities: continuous eye-tracking trajectories, which describe fine-grained motion dynamics, and discrete scanpaths, which describe high-level fixation structure. Because gaze varies substantially across viewers and trials, we treat this variability as a defining property rather than noise and model gaze as a stochastic generative process. Existing generative gaze models supervise on only one of these two representations in isolation. We hypothesize that trajectories and scanpaths describe gaze at complementary scales and are jointly informative during training, and test this hypothesis through ST-DiffEye, a joint trajectory-scanpath diffusion framework that couples both modalities by concatenating them as an additional raw input channel, requiring no architectural overhead beyond an input and output channel expansion. We further introduce a principled evaluation framework based on the Continuous Ranked Probability Score (CRPS), which generalizes any existing sequence similarity metric into a proper scoring rule that jointly assesses the accuracy and diversity of generated gaze. Experiments on task-driven visual search, covering both target-present and target-absent scenarios, and on free-viewing benchmarks demonstrate state-of-the-art performance. These results, along with detailed ablations, confirm the benefit of joint modeling and the value of distribution-aware evaluation in capturing the intrinsic variability of human gaze. Project webpage: https://st-diffeye.github.io/

12.
arXiv (CS.LG) 2026-06-19

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty

arXiv:2412.18980v2 Announce Type: replace Abstract: Uncertainty-aware deep learning (DL) models recently gained attention in fault diagnosis as a way to promote the reliable detection of faults when out-of-distribution (OOD) data arise from unseen faults (epistemic uncertainty) or the presence of noise (aleatoric uncertainty). In this paper, we present the first comprehensive comparative study of state-of-the-art uncertainty-aware DL architectures for fault diagnosis in rotating machinery, where different scenarios affected by epistemic uncertainty and different types of aleatoric uncertainty are investigated. The selected architectures include sampling by dropout, Bayesian neural networks, and deep ensembles. Moreover, to distinguish between in-distribution and OOD data in the different scenarios two uncertainty thresholds, one of which is introduced in this paper, are alternatively applied. Our empirical findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems. In particular, they reveal that, in the presence of epistemic uncertainty, all DL models are capable of effectively detecting, on average, a substantial portion of OOD data across all the scenarios. However, deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination. In the presence of aleatoric uncertainty, the noise level plays an important role. Specifically, low noise levels hinder the models' ability to effectively detect OOD data. Even in this case, however, deep ensemble models exhibit a milder degradation in performance, dominating the others. These achievements, combined with their shorter inference time, make deep ensemble architectures the preferred choice.

13.
arXiv (CS.AI) 2026-06-16

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

arXiv:2606.16613v1 Announce Type: new Abstract: As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent, requiring autonomous agents to communicate, negotiate, and transact while pursuing their own objectives over extended periods. We introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy composed of heterogeneous firms. In CoffeeBench, two farmers, two roasters, and two retailers autonomously operate their businesses over a 90-day simulation, each seeking to maximize cumulative net income through communication and transactions while managing cash, inventory, and pricing. The evaluated model controls one coffee roaster, while the remaining firms are controlled by fixed reference agents. Across several recent open-weight and proprietary LLMs, all models outperform a passive baseline that takes no actions, with most achieving positive net income. Analysis of agent behavior reveals substantial differences in long-horizon economic interaction: higher-performing models communicate more actively with other firms, whereas Claude~Haiku~4.5 exhibits an idle-drift failure mode, repeatedly choosing inaction despite producing coherent assessments and plans. We release our code and agent trajectories to support future research.

14.
bioRxiv (Bioinfo) 2026-06-10

Pseudoperplexity Probes Memorization in Protein Language Models

Protein Language Models (pLMs) have significantly advanced computational biology. Yet their scale and reliance on redundant training data raise a fundamental question: do pLMs generalize the statistical grammar of proteins, or do they simply memorize their training data? To investigate this, we used pseudoperplexity as a probe for sequence-level memorization, comparing ProtT5's pseudoperplexity on a pre-training proxy dataset against a post-training holdout of genuinely novel sequences. To ensure a valid comparison, we matched the datasets by sequence length, cluster size, and taxonomic family. As a statistical baseline, we trained n-gram language models; analysis of higher-order n-gram composition and a statistically significant divergence in perplexity confirmed that the post-training sequences were genuinely novel at the local sequence level. ProtT5 showed a statistically significant difference in pseudoperplexity between seen and unseen sequences, though further analysis revealed this memorization signal to be modest. These findings suggest that ProtT5 exhibits detectable but limited memorization of its training data as measured by a pseudoperplexity-based probe.

15.
arXiv (quant-ph) 2026-06-11

Quantum thermodynamics of the Caldeira-Leggett model with non-equilibrium Gaussian reservoirs

arXiv:2405.00215v5 Announce Type: replace Abstract: We introduce a non-equilibrium version of the Caldeira-Leggett model in which a quantum particle is strongly coupled to a set of engineered reservoirs. The reservoirs are composed by collections of squeezed and displaced thermal modes, in contrast to the standard case in which the modes are assumed to be at equilibrium. The model proves to be very versatile. Strongly displaced/squeezed reservoirs can be used to generate an effective time dependence in the system Hamiltonian and can be identified as sources of pure work. In the case of squeezing, the time dependence is stochastic and breaks the fluctuation-dissipation relation, this can be reconciled with the second law of thermodynamics by correctly accounting for the energy used to generate the initial non-equilibrium conditions. To go beyond the average description and compute the full heat statistics, we treat squeezing and displacement as generalized Hamiltonians on a modified Keldysh contour. As an application of this technique, we show the quantum-classical correspondence between the heat statistics in the non-equilibrium Caldeira-Leggett model and the statistics of a classical Langevin particle under the action of squeezed and displaced colored noises. Finally, we discuss thermodynamic symmetries of the heat generating function, proving a fluctuation theorem for the energy balance and showing that the conservation of energy at the trajectory level emerges in the classical limit.

16.
arXiv (quant-ph) 2026-06-25

Single-Period Floquet Control of Bosonic Codes with Quantum Lattice Gates

arXiv:2601.08782v2 Announce Type: replace Abstract: Bosonic codes constitute a promising route to fault-tolerant quantum computing. Existing Floquet protocols enable analytical construction of bosonic codes but typically rely on slow adiabatic ramps with thousands of driving periods. In this work, we circumvent this bottleneck by introducing an analytical and deterministic Floquet method that directly synthesizes arbitrary unitaries within a single period. The phase-space unitary ensembles generated by our approach reproduce the Haar-random statistics, enabling practical pseudorandom states in continuous-variable systems. We prepare various prototypical bosonic codes from vacuum and implement single-qubit logical gates with high fidelities using quantum lattice gates. By harnessing the full intrinsic nonlinearity of Josephson junctions, quantum lattice gates decompose quantum circuits into primitive operations for efficient continuous-variable quantum computing.

17.
arXiv (CS.LG) 2026-06-19

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

arXiv:2606.19876v1 Announce Type: new Abstract: The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetilde{\Omega}(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

18.
arXiv (CS.CV) 2026-06-16

Shift-and-Sum Quantization for Visual Autoregressive Models

Post-training quantization (PTQ) enables efficient deployment of deep networks using a small set of data. Its application to visual autoregressive models (VAR), however, remains relatively unexplored. We identify two key challenges for applying PTQ to VAR: (i) large reconstruction errors in attention-value products, especially at coarse scales where high attention scores occur more frequently; and (ii) a discrepancy between the sampling frequencies of codebook entries and their predicted probabilities due to limited calibration data. To address these challenges, we propose a PTQ framework tailored for VAR. First, we introduce a shift-and-sum quantization method that reduces reconstruction errors by aggregating quantized results from symmetrically shifted duplicates of value tokens. Second, we present a resampling strategy for calibration data that aligns sampling frequencies of codebook entries with their predicted probabilities. Experiments on class-conditional image generation, inpainting, outpainting, and class-conditional editing show consistent improvements across VAR architectures, establishing a new state of the art in PTQ for VAR.

19.
bioRxiv (Bioinfo) 2026-06-18

A Two-Stage Interpretable Framework for Predicting Plant-Derived Small RNA Targets on Human 3'UTRs

Authors:

Can plant-derived small RNAs target human mRNA 3'UTRs via complementary base pairing and produce experimentally detectable regulatory effects? This question concerns not only the fundamental feasibility of cross-kingdom RNA regulation but also the technological pathway for screening plant-derived active small nucleic acids. Existing miRNA target prediction tools are predominantly designed for endogenous miRNA-mRNA systems, exhibiting notable limitations when applied to cross-species small RNA inputs and small-sample wet-lab experimental adaptation. In this study, we developed a two-layer prediction framework, MetaLulu-AI. The first layer builds upon publicly available human miRNA-mRNA 3'UTR interaction data, utilizing XGBoost to learn foundational binding rules on human 3'UTRs based on 41 interpretable computational features, including seed region pairing types, local context sequence composition, site positioning, and RNA secondary structures. The second layer is tailored to the experimental system of plant-derived small RNAs and human target genes. It introduces 40 experimental samples using significant changes in endogenous protein expression as the regulatory standard (determined by Western blot or ELISA 48 hours post-transfection of small RNAs via Lipo3000). Using 52-dimensional computational features and the optimal transcript scores from the first layer as inputs, this layer employs TabPFN for experimental label adaptation. The first-layer dataset consists of 38,752 training samples, 5,536 validation samples, and 11,073 testing samples (totaling 55,361), with a positive-to-negative sample ratio of approximately 1:5.4. On the randomly split test set, the model achieved an AUC of 0.9686, a recall of 0.8523, a precision of 0.8080, and an accuracy of 0.9452 (at a decision threshold of 0.4797). Group-based splitting revealed that the model maintains high discriminative power for unseen genes (AUC = 0.9541), though its generalization ability for completely unseen miRNAs decreases (AUC = 0.7390). For the 40 experimental samples in the second layer, the TabPFN model achieved an average AUC of 0.7406 {+/-} 0.092 across ten repeated 70/30 random splits, outperforming the baseline of directly using the first-layer scores (0.3563 {+/-} 0.149); the average AUC in a 5-fold cross-validation was 0.770 {+/-} 0.177. SHAP analysis demonstrated a clear divergence in the discriminative basis of the two models: the first layer relies more heavily on the thermodynamics of the small RNA itself and the quality of canonical seed sites, whereas the second layer focuses more on the local UTR environment and statistical site features. Although the current second-layer results are constrained by sample size and gene coverage, this framework serves as a preliminary observation of the adaptation mechanism for cross-kingdom regulation experiments, and motivating future large-scale validation. Under stricter leave-one-gene-out and leave-one-small-RNA-out evaluation, the adapter exceeded the first-layer score baseline but only matched the majority-class baseline, underscoring that entity-level generalization is not yet established.

20.
arXiv (CS.AI) 2026-06-17

Conservation Laws for Modern Neural Architectures

arXiv:2606.17816v1 Announce Type: cross Abstract: Understanding gradient descent dynamics is key to explaining the success of over-parameterized models, where implicit bias manifests through conservation laws in gradient flow. While such laws are well understood for linear and ReLU networks, they remain largely unexplored for modern architectures. This work develops a unified framework to characterize conservation laws for contemporary models, including feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts architectures under diverse gating designs. Our theoretical findings are supported by experiments that validate the predicted invariants.

21.
arXiv (CS.AI) 2026-06-12

BASENet: Band-Adapted Speech Enhancement Network with Cross-Band Attention

arXiv:2606.12662v1 Announce Type: cross Abstract: Speech enhancement models typically apply uniform capacity across all frequencies, disregarding the non-uniform spectral resolution of human hearing. We propose BASENet, a frequency-adapted architecture that partitions the spectrum into Bark-scale bands and assigns each a scaled-capacity encoder derived from critical-band density, automatically granting deeper branches to perceptually dense low frequencies and lighter ones to high frequencies. A cross-band attention module captures harmonic dependencies across bands through compact frequency-pooled representations at linear complexity. Built on inverted residual blocks with dense connectivity and a convolutional recurrent network, BASENet achieves 3.55 PESQ and STOI~96% on VoiceBank+DEMAND with only 0.83M parameters and 7.3 G~MACs, the fewest parameters among all methods with PESQ > 3.50. A causal variant (3.44 PESQ) surpasses several non-causal baselines, confirming suitability for real-time streaming on resource-constrained devices.

23.
arXiv (CS.CL) 2026-06-11

Mapping Scientific Literature with Large Language Models and Topic Modeling

Scientific literature is increasingly fragmented by disciplinary boundaries, specialized terminology, and potentially sparse keyword systems, making it difficult to capture the evolving structure of modern science. This study introduces a large language model (LLM)-driven framework for mapping scientific literature from a topic modeling perspective. The approach is demonstrated on a 20-year corpus of more than 1,500 engineering-related articles published in the Proceedings of the National Academy of Sciences (PNAS). A two-stage classification pipeline first assigns a primary thematic category to each article based on its abstract, followed by full-text analysis to identify secondary classifications that reveal latent cross-topic connections within the corpus. Unlike conventional topic models, the LLM-based framework produces semantically interpretable topics while maintaining strong quantitative performance. Comparative evaluation against established topic modeling methods shows higher topic diversity and lower overlap with competitive coherence metrics. Manual validation on a randomly sampled subset of abstracts yields an accuracy of 75.9%. Additional traditional natural language processing analyses confirm that the generated topics correspond to meaningful linguistic patterns in the corpus. A bipartite network linking primary and secondary classifications further reveals implicit thematic relationships that are not readily observable through abstracts or keyword systems alone. The findings indicate that the framework independently recovers much of the journal's editorial dual-classification structure without prior knowledge of its schema. Overall, the proposed approach offers a powerful tool for mapping science and identifying emerging cross-topic connections in research.

24.
arXiv (CS.CL) 2026-06-16

IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication Dataset

IMPACTeen is a dataset of textual social influence scenarios spanning interpersonal, media-based, and digital settings in an adolescent context. It contains 1,021 texts, 5,100 individual annotation records, and gold labels for social influence techniques, with each text annotated from five distinct perspectives: teenagers, parents, psychologists, communication experts, and teachers. The resource was constructed through constrained LLM generation, followed by a two-step human editing and validation phase aimed at ensuring youth-context realism. A multi-dimensional annotation covered influence presence, techniques, intentions, consequences, resistance, reactions, and annotation confidence. The dataset supports research on social influence detection, annotator disagreement, cross-lingual modeling, and the training and evaluation of language models. The dataset was created in Polish and is accompanied by a corresponding English version.

25.
arXiv (CS.CV) 2026-06-12

MAMVI: 3D Test-Time Adaptation via Masked Multi-View Point Clouds

3D point cloud models suffer significant performance degradation under distribution shifts caused by sensor noise, occlusions, and environmental changes. Test-time adaptation (TTA) has emerged as a practical paradigm for mitigating this issue during inference. Recently, leveraging multi-view augmentation has shown promise in improving 3D TTA performance. However, existing multi-view approaches are often constrained by sequential optimization that treats each view independently. This sequential optimization leads to substantial inference latency due to repetitive optimization steps, making real-time adaptation impractical. To address this, we propose Masked Multi-View Test-Time Adaptation (MAMVI), which replaces sequential optimization with a unified single-step adaptation. Specifically, MAMVI utilizes a hybrid masking strategy that combines fixed ratios for stability with Beta-distributed sampling for diversity. By aggregating losses across multiple views, MAMVI performs adaptation through a single backward pass based on multi-view consensus. Additionally, a confidence-based adaptive learning rate is used to dynamically adjust the adaptation intensity for each sample. Extensive experiments on ModelNet-40C, ShapeNet-C, and ScanObjectNN-C demonstrate that MAMVI achieves state-of-the-art accuracy on ShapeNet-C and ScanObjectNN-C. Moreover, it remains competitive on ModelNet-40C while delivering 4.9-8.9 times faster inference, making it highly suitable for real-time applications. Our code is available at https://github.com/Inseok-kong/MAMVI