Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
PLOS Medicine 2026-06-01

The NIH 2025 Public Access Policy: Immediate access, unequal costs

by Caitlin R. Ryus, Caroline Raymond King, Edward R. Melnick The NIH 2025 Public Access Policy eliminates embargo periods for federally funded research, expanding who can read science. Yet without addressing article processing charges and market concentration, the policy risks creating new barriers to who can afford to perform and publish their science. In this Perspective, Caitlin Ryus and colleagues discuss the NIH 2025 Public Access Policy, highlighting that while expanding who can read science, the policy risks creating new barriers to who can afford to perform and publish their science.

03.
arXiv (CS.LG) 2026-06-16

Schattor: Schatten-family methods for deep learning optimization

arXiv:2606.15702v1 Announce Type: cross Abstract: Modern deep learning optimization features heterogeneous parameter structures, noisy gradients, and highly nonconvex landscapes, posing significant challenges for both algorithm design and theoretical analysis. Motivated by the limitations of SGD and the success of adaptive optimizers, we propose {\it Schattor}, a family of adaptive first-order methods based on Schatten norms. Schattor unifies SGD and the recently proposed matrix-variate adaptive optimizer Muon within a single Schatten-norm-based framework. We establish dimension-free stationarity guarantees for methods in the Schattor family for stochastic matrix optimization problems via a novel matrix martingale moment bound. We also develop multi-block extensions that adaptively balance block-wise optimization progress and prove dimension-free stationarity guarantees in this more general setting.

04.
arXiv (CS.CL) 2026-06-11

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

Spoken dialogue models typically start from text LLM backbones, yet reasoning often degrades when conditioning on speech instead of text. We attribute part of this modality gap to a temporal-granularity mismatch: speech tokens are temporally redundant and far longer than text under matched semantics, diluting per-token semantic density and weakening text-native reasoning dynamics. We study speech token design as a representation selection problem and sweep frame rates under a frozen LLM backbone with a fixed information rate. To make low frame rates feasible, we introduce factorized FSQ and a lightweight non-autoregressive audio LM head, scaling capacity to nearly 300\,bits/frame without sacrificing efficient prediction. With the bottleneck removed, we sweep frame rates (50$\rightarrow$2.08\,Hz) and alignment depth, and observe a consistent best regime for speech QA at 4.17\,Hz with intermediate-layer representation alignment.

05.
arXiv (quant-ph) 2026-06-11

PHASE: Pauli Hierarchical Assembly on Subdivided Elements for Quantum-Compatible Operator Synthesis

arXiv:2606.11478v1 Announce Type: new Abstract: Efficiently decomposing finite element stiffness matrices into the Pauli basis is challenging due to the exponential growth of Pauli strings with problem size. A naive Pauli expansion requires $\Theta(8^{\lceil \log_2 N \rceil})$ operations, where $N$ denotes the number of degrees of freedom, rendering direct decomposition infeasible for large systems. Existing approaches exploit algebraic sparsity or operator structure but do not incorporate the geometric organization intrinsic to finite element discretizations, and consequently exhibit poor scaling for stiffness matrices. To address this problem, we introduce PHASE, a hierarchical, geometry-aware Pauli decomposition algorithm that leverages recursive mesh partitioning to organize element contributions across multiple spatial scales. PHASE employs a hybrid strategy that combines full- and reduced-space Tensorized Pauli Decomposition with Fast Walsh-Hadamard Transform-based aggregation to assemble global Pauli coefficients efficiently. We show that this approach yields a dimension-dependent reduction in the exponential scaling exponent of Pauli assembly asymptotic complexity relative to existing methods, reducing the cost from $2^{2{\lceil \log_2 N \rceil}}$ to $2^{\gamma_d{\lceil \log_2 N \rceil}}$ with $\gamma_d < 2$ under standard mesh regularity and balanced partition assumptions. These results substantially improve the feasibility of quantum-compatible operator synthesis for large-scale finite element models.

06.
arXiv (CS.AI) 2026-06-25

WinDOM: Self-Family Distillation for Small-Model GUI Grounding

arXiv:2606.25964v1 Announce Type: new Abstract: Small ($\sim$2B) GUI-grounding agents are attractive for on-device deployment, accessibility tooling, and low-cost iteration, but at this scale they face two open recipe questions: how to obtain bounding-box training data without expensive human annotation, and how to combine supervised fine-tuning with reinforcement learning. We address both, with the explicit goal of pushing small-model performance rather than scaling up. WinDOM is a $54{,}425$-record grounding corpus harvested by driving an open-source Windows 11 web reimplementation under headless Playwright, with bounding boxes read directly off the DOM and no OCR or human annotation. Self-Family Distillation (SFD) is a single rejection-sampling cold-start parameterised only by the teacher choice: either an EMA of the student (no external model) or a frozen larger same-family teacher. We then treat the saturation depth of the SFD cold-start as an explicit GRPO hyperparameter. On a Qwen3.5-2B student, the under-saturated cold-start is a better GRPO initialiser than the converged one: SFD-4B with Early-init RL gains $+5.4$ OOD-mean ($+3.5$ ScreenSpot-Pro, $+7.0$ OSWorld-G, $+5.8$ ScreenSpot-V2) over the base. The same-size EMA mode lands within roughly one OOD-mean point of the cross-size $4$B variant ($65.2$ vs $66.3$) without an external teacher.

07.
arXiv (CS.CV) 2026-06-11

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction

Reconstructing visual stimuli from fMRI signals is a central challenge bridging machine learning and neuroscience. Recent diffusion-based methods typically map fMRI activity to a single neural embedding, using it as static guidance throughout the entire generation process. However, this fixed guidance collapses hierarchical neural information and is misaligned with the stage-dependent demands of image reconstruction. In response, we propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling. MindHier introduces three components: a Hierarchical fMRI Encoder to extract multi-level neural embeddings, a Hierarchy-to-Hierarchy Alignment scheme to enforce layer-wise correspondence with CLIP features, and a Scale-Aware Coarse-to-Fine Neural Guidance strategy to inject these embeddings into autoregression at matching scales. These designs make MindHier an efficient and cognitively aligned alternative to diffusion-based methods by enabling a hierarchical reconstruction process that synthesizes global semantics before refining local details, akin to human visual perception. Extensive experiments on the NSD dataset show that MindHier achieves superior semantic fidelity, 4.67$\times$ faster inference, and more deterministic results than the diffusion-based baselines.

08.
arXiv (CS.AI) 2026-06-19

Controlled Comparison of Machine Learning Models for Fault Classification and Localization in Power System Protection

arXiv:2510.00831v2 Announce Type: replace Abstract: The increasing complexity of modern power systems, driven by the integration of inverter-based and distributed energy resources, challenges the reliability of conventional protection schemes and motivates the use of machine learning for protection tasks. However, published results are often difficult to compare because datasets, sensing assumptions, and decision horizons vary across studies. This paper presents a controlled comparison of machine learning models for fault classification (FC) and fault localization (FL) under identical sensing, timing, and validation conditions on a common electromagnetic transient dataset, using decision windows of 10-50 ms to reflect protection-relevant time scales. For FC, the best-performing nonlinear models achieve F1 scores above 0.98 already at 10 ms, while lower-capacity models degrade at shorter horizons but improve with longer windows, indicating that relevant fault-type information is already present in the earliest transient. For FL, the top-performing models reach a stable localization error of about 10 % of normalized line length across all evaluated horizons, while weaker models form a clearly separated second performance tier. Line-resolved analysis shows that localization accuracy varies across grid segments, indicating topology-dependent difficulty rather than insufficient temporal context alone. These findings provide a controlled reference for comparing machine learning models across two protection tasks with fundamentally different information requirements.

09.
bioRxiv (Bioinfo) 2026-06-15

Multi-platform reassessment of human mitochondrial DNA methylation reveals signals consistent with technical artifacts

The existence and functional relevance of mitochondrial DNA methylation remain controversial. Here, we systematically profiled cytosine methylation and hydroxymethylation across human brain and blood tissues spanning healthy and malignant states using orthogonal sequencing approaches that avoid chemical conversion during library preparation. While nuclear DNA exhibited canonical methylation patterns, mitochondrial DNA consistently showed negligible signal, indistinguishable from background technical noise. By mapping cytosine-guanine sites between mitochondrial DNA and nuclear-embedded mitochondrial sequences, we demonstrate the potential of these nuclear counterparts to confound not only cytosine methylation but also hydroxymethylation measurements, corroborating and extending prior findings implicating nuclear contamination as a potential source of apparent mitochondrial epigenetic signals. Additional technical factors that inflate apparent mtDNA methylation signals were identified, including sequence context biases, flow cell chemistries, and coverage-dependent discrepancies between the heavy and light strands. Collectively, these results provide convergent evidence against the presence of biologically meaningful cytosine methylation or hydroxymethylation in mitochondrial DNA. These findings caution against interpreting apparent mtDNA methylation signals in human adult tissues as meaningful without rigorous orthogonal validation and comprehensive consideration of technical and analytical confounding factors.

10.
arXiv (CS.AI) 2026-06-16

Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap

Authors:

arXiv:2606.15091v1 Announce Type: cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor restoration. However, the scientific literature remains highly fragmented between invasive neuroprosthetics and non-invasive electrophysiological decoders, with a lack of consistent terminology and comparison metrics. This chapter proposes a unified 2 x 2 framework categorizing BCIs along two axes: degree of invasiveness (invasive vs. non-invasive) and signal direction (afferent sensory-IN vs. efferent sensory-OUT). We define and distinguish the paradigms of restoration, substitution, and augmentation. Furthermore, we outline a structural roadmap for the convergence of these modalities over near-, medium-, and long-term horizons, focusing on physical limits and the integrative role of machine learning foundation models.

11.
arXiv (CS.AI) 2026-06-12

PlaceRep: Geospatial Place Representation Learning from Large-Scale Point-of-Interest Data

arXiv:2507.02921v4 Announce Type: replace-cross Abstract: Learning effective representations of urban environments requires capturing spatial structure beyond fixed administrative boundaries. Existing geospatial representation learning approaches typically aggregate Points of Interest (POIs) into pre-defined administrative regions such as census units or ZIP code areas, assigning a single embedding to each region. However, POIs often form semantically meaningful groups that extend across, within, or beyond these boundaries, defining places that better reflect human activity and urban function. To address this limitation, we propose PlaceRep, a geospatial representation learning method that constructs place-level representations by clustering spatially and semantically related POIs. PlaceRep summarizes large-scale POI graphs from U.S. Foursquare data to produce general-purpose urban region embeddings while automatically identifying places across multiple spatial scales. By eliminating model pre-training, PlaceRep provides a scalable and efficient solution for multi-granular geospatial analysis. Experiments using the tasks of population density estimation and housing price prediction as downstream tasks show that PlaceRep outperforms most state-of-the-art graph-based geospatial representation learning methods and achieves up to a x100 speedup in generating region-level representations on large-scale POI graphs. The implementation of PlaceRep is available at https://github.com/mohammadhashemii/PlaceRep.

12.
arXiv (CS.CV) 2026-06-15

QualiaNet: An Experience-Before-Inference Network

Authors:

Human 3D vision involves two distinct stages: an Experience Module, where stereo depth is extracted relative to fixation, and an Inference Module, where this experience is interpreted to estimate 3D scene properties. Paradoxically, although stereo vision does not provide us with absolute distance information, it nonetheless affects our inferences about distance. We propose the Inference Module exploits a natural scene statistic: near scenes produce vivid disparity gradients, while far scenes appear comparatively flat. QualiaNet implements this two-stage architecture computationally: disparity maps simulating human stereo experience are passed to a CNN trained to estimate distance. The network can recover distance from disparity gradients alone, validating this approach.

13.
arXiv (CS.AI) 2026-06-25

Exploring Information Seeking Agent Consolidation

arXiv:2602.00585v2 Announce Type: replace Abstract: Information-seeking agents have emerged as a powerful paradigm for knowledge-intensive tasks, yet today's systems remain specialized for the open web, documents, or local knowledge bases, hindering scalable and cross-domain deployment. We present the first systematic empirical study of consolidating these information-seeking agents into a single foundation agentic model. We compare two paradigms – data-level mixing, which trains a unified model on a mixture of datasets, and parameter-level merging, which merges independently trained experts in parameter space – across 3 training scenarios, evaluating 26 representative parameter-level methods on 10 benchmarks. To compare across heterogeneous benchmarks, we introduce a geometric Composite Score and an Imbalance Score that describe overall performance and task skew. Our analysis shows that (i) well-designed parameter-level merging attains parity with data mixing at a fraction of its training cost and is order-agnostic; (ii) parameter-level merging structurally preserves out-of-domain capabilities that data mixing universally forgets; and (iii) cross-scenario stability is strongly tied to consolidation quality. We distil our observations into a method-selection guide and design principles for next-generation merging operators.

14.
arXiv (CS.CV) 2026-06-12

Point-Wise Geometry-Aware Transformer for Partial-to-Full Point Cloud Registration in Computer-Assisted Surgery

Partial-to-full registration remains challenging due to varying overlap ratios, fluctuating point densities, and the presence of noise. While transformers have shown strong potential for point cloud processing, prior methods typically confine them to global context aggregation, overlooking fine-grained local geometry crucial for accurate correspondence. We propose GAPR-Net, a learning-based point cloud registration framework with a coarse-to-fine architecture that combines convolution and transformer modules, in which local and global information is fused between the partial and full point clouds using a cross-attention mechanism. To achieve this, a transformation-invariant point-wise geometric feature representation is proposed, which can robustly capture relative geometric features for individual points with respect to their neighboring points. To evaluate the effectiveness of the proposed approach, experiments are conducted on four geometrically distinct bones, including the tibia, femur, pelvis, and thoracic cartilage. The overall registration recall reaches 94.2\%, the method results in a low RMSE of 1.992 mm and $R^2$ values of 0.908 and 0.974 for rotation and translation, respectively. The results demonstrate that the proposed method effectively addresses the partial-to-full point cloud registration problem. The proposed method enables highly accurate 3D point cloud registration using partial observation, providing a critical foundation for precise surgical navigation and robotic interventions in computer-assisted surgery. The code will be accessed after the double-blind review process.

15.
arXiv (CS.CV) 2026-06-17

Contrastive Action-Image Pre-training for Visuomotor Control

Existing vision encoders for robotics face a fundamental bottleneck: robotic datasets lack the scale necessary for large-scale pre-training. Prior work circumvents this data scarcity by turning to internet-scale image and language data or egocentric human video. While these models show promise, neither paradigm learns from paired vision and action data, which downstream visuomotor control policies require. However, robot trajectories, the most direct source of this paired signal, are not available at pre-training scale, motivating us to extract action signals from abundant human video instead. To this end, we introduce CAIP (Contrastive Action-Image Pre-training), a vision encoder that treats human hand poses from large-scale egocentric video as a proxy for end-effector actions. By extracting 3D hand keypoints, a representation that aligns naturally with downstream robot action spaces, CAIP learns a unified action-image representation through a contrastive objective. Leveraging 32,041 hours of egocentric human video and only 88 hours of robotic manipulation data, CAIP outperforms state-of-the-art vision encoders including DINOv2, SigLIP, MVP, and R3M. Evaluated on a challenging real-world dexterous manipulation setup using Dexmate Vega and Sharpa Wave hands, CAIP yields performance gains of more than 30% on tasks involving folding, pouring, and fine-grained manipulation. Our results show that our method of contrastive action-centric pre-training yields a scalable path to achieving robust visual representations better suited for physical interaction.

16.
bioRxiv (Bioinfo) 2026-06-10

Bias-mitigated microbiome inference refines coronary artery disease signature

Authors:

Roughly half the cells in the human body are microbial, and changes in these communities are increasingly implicated in cardiovascular, metabolic, and oncological diseases. Yet identifying which taxa truly differ in abundance, differential abundance (DA), is distorted by four major sources of bias: loss of total microbial load, taxa measurement efficiencies, arbitrary pseudocounts required to handle pervasive zeros, and contamination which has recently driven retractions. No existing DA method accounts for all four. Here we introduce BootDA, a non-parametric bootstrap-based method that explicitly models each bias source without data transformations, pseudocounts, parametric assumptions, or assuming that most taxa are non-DA. In semi-parametric simulations preserving the sparsity (>70% zeros) and correlation structure of real 16S amplicon data, BootDA achieved the highest sensitivity among tested methods, including ANCOM-BC2, LinDA, MaAsLin 3, and Wilcoxon tests, while controlling the false discovery rate. Performance was retained in low biomass settings when contamination contributed ~50% of counts, and without negative controls, indicating de novo decontamination capability. Applied to a coronary artery disease cohort, BootDA refined the original signature to two co-enriched genera, Klebsiella and Gemmiger, and excluded likely contaminants. BootDA is available as an R package and could generalise to other sparse, high dimensional biological data.

17.
arXiv (quant-ph) 2026-06-15

Trap-Quenched Matter-Wave Optics for Dual Species Lensing

arXiv:2606.14577v1 Announce Type: cross Abstract: Dual-species atom interferometry in space promises precise tests of the Universality of Free Fall (UFF), with a sensitivity that grows quadratically with the extended interrogation time accessible in weightlessness. These tests demand exquisite control over the expansion energies of both condensed sources as well as over their differential center-of-mass dynamics. We propose a trap-quenched collimation technique featuring in-trap excitations of collective modes compatible with state-of-the-art atom-chip setups. Using NASA's Cold Atom Laboratory aboard the International Space Station, we demonstrate it on a single-species $^{87}$Rb condensate. By controlling the center-of-mass release dynamics we observe free expansion times up to 700 ms and measure a two-dimensional expansion energy of $k_B \cdot 78\pm 9 \;\mathrm{pK}$ in the imaging plane. A detailed model of the magnetically-induced dynamics indicates that this corresponds to a two-dimensional expansion energy of about $k_B \cdot 15^{+12}_{-5}\; \mathrm{pK}$ along two of the condensate's eigenaxes. Finally, we theoretically study this trap-quenched collimation scheme for a $^{41}$K-$^{87}$Rb mixture, predicting a simultaneous collimation that meets the expansion energy requirements for a state-of-the-art UFF test at the $10^{-15}$ accuracy level.

18.
arXiv (CS.LG) 2026-06-19

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

arXiv:2605.30456v2 Announce Type: replace Abstract: Many learning tasks in science and engineering are characterized by sparse datasets, which limits the effectiveness of purely data-driven approaches. At the same time, these problems are often accompanied by rich domain knowledge derived from physical laws, operational requirements, and expert heuristics. Such knowledge is frequently expressed as rules involving logical propositions and linear inequalities. Existing neuro-symbolic methods typically enforce these rules approximately through soft penalties, assume input-independent rules when designing specialized architectures, or rely on non-differentiable post-processing at inference time to achieve hard constraint satisfaction. While recent advances in differentiable optimization layers enable end-to-end feasibility enforcement within neural networks, extending these approaches to logical or mixed-integer rules remains challenging due to inherent nonconvexity. In this work, we propose a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks. Our approach represents rules as disjunctive constraints and applies hierarchical convex relaxations to obtain convex hull formulations. These relaxations yield tractable linear constraints that can be embedded as differentiable optimization layers while enabling exact rule satisfaction. We demonstrate the effectiveness of the proposed framework on real-world datasets, achieving perfect rule satisfaction and strong predictive performance.

19.
arXiv (CS.AI) 2026-06-16

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

20.
arXiv (CS.CV) 2026-06-16

PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning

6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly constructing feature correspondences between the query image and either the object model or template images. In this work, we propose PoseGAM, a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images, eliminating the need for explicit matching. Built upon recent multi-view-based foundation model architectures, the method integrates object geometry information through two complementary mechanisms: explicit point-based geometry and learned features from geometry representation networks. In addition, we construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions to enhance robustness and generalization. Extensive evaluations across multiple benchmarks demonstrate our state-of-the-art performance, yielding an average AR improvement of 5.1% over prior methods and achieving up to 17.6% gains on individual datasets, indicating strong generalization to unseen objects. Project page: https://windvchen.github.io/PoseGAM/ .

21.
PLOS Computational Biology 2026-06-23

A novel biclustering algorithm for mining m<sup>6</sup>A co-methylation patterns based on beta-binomial distribution and data screening strategy

Authors:

by Zhaoyang Liu, Yuteng Xiao, Dao Xiang, Hao Shi, Kaijian Xia Studies have shown that m6A plays a key role in different life processes such as RNA metabolism, physiology and pathology. However, due to the complexity of life processes, its specific regulatory details are still not revealed. The computational approach based on co-methylation pattern mining of m6A sequencing data can assist in revealing its mechanism and save time and economic cost, however, the current algorithms suffer from the problems of insufficient robustness to low signal-to-noise data and unreliable performance. Based on this, this paper proposes an enhanced beta-binomial distribution biclustering algorithm (EBBM) based on data screening strategy. This algorithm is based on the framework of Bayesian, adopts Gibbs sampling method for parameter inference, and introduces the data screening strategy in the process of parameter inference, which effectively removes the problem that the low signal-to-noise data in the original sequencing data of m6A affects the reliability of the clustering results. The simulation experiment results show that this algorithm can effectively deal with the interference of low signal-to-noise data and accurately mine the co-methylation patterns pre-planted in the data, which is significantly better than the current mainstream biclustering algorithm. In real human m6A sequencing data with 32 samples, this algorithm mined two effective co-methylation patterns, which were enriched to different biological processes, such as negative regulation of phosphorylation and peptidyl lysine methylation, etc. The scoring results of GEO_Score indicate that the results of this algorithm are more biologically meaningful than the clustering results of current mainstream m6A co-methylation pattern mining algorithms.

22.
arXiv (CS.LG) 2026-06-25

Memory-Efficient Policy Libraries with Low-Rank Adaptation in Reinforcement Learning

arXiv:2606.25700v1 Announce Type: new Abstract: When fine-tuning Large Language Models (LLMs), there has been success in minimizing both memory usage and computation with Parameter-Efficient Fine-Tuning (PEFT), like Low Rank Adaptation (LoRA). In this article, we have explored whether this approach is transferable to the world of robotics and Reinforcement Learning (RL), allowing learning with reduced memory usage and improved computational performance. Specifically, we focused on a version of multi-task robotics, where a library of specialist policies are created. In such a library memory efficiency is especially important. We used a Proximal Policy Optimization (PPO) algorithm and fine-tuned a baseline model to different tasks using LoRA. Our results demonstrate that, depending on the hyperparameters, LoRA can minimize memory usage by a factor of 20-160 compared to full fine-tuning of all layers. This implies a 90-95% storage saving when deploying a library of many (10-50) specialized policies, which can be the differentiating factor between being able to store the entire library in memory or having to use swap-memory in an applied robotics setting. At the same time, our results indicate that there is no significant difference in the success-rate between full fine-tuning and LoRA fine-tuning for the selected tasks.

23.
arXiv (CS.AI) 2026-06-25

Rational Neural Networks have Expressivity Advantages

arXiv:2602.12390v2 Announce Type: replace-cross Abstract: We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $\Omega(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers.

24.
arXiv (CS.CV) 2026-06-17

Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Vision backbone networks play a central role in modern computer vision. Enhancing their efficiency directly benefits a wide range of downstream applications. To measure efficiency, many publications rely on MACs (Multiply Accumulate operations) as a predictor of execution time. In this paper, we experimentally demonstrate the shortcomings of such a metric, especially in the context of edge devices. By contrasting the MAC count and execution time of common architectural design elements, we identify key factors for efficient execution and provide insights to optimize backbone design. Based on these insights, we present LowFormer, a novel vision backbone family. LowFormer features a streamlined macro and micro design that includes Lowtention, a lightweight alternative to Multi-Head Self-Attention. Lowtention not only proves more efficient, but also enables superior results on ImageNet. Additionally, we present an edge GPU version of LowFormer, that can further improve upon its baseline's speed on edge GPU and desktop GPU. We demonstrate LowFormer's wide applicability by evaluating it on smaller image classification datasets, as well as adapting it to several downstream tasks, such as object detection, semantic segmentation, image retrieval, and visual object tracking. LowFormer models consistently achieve remarkable speed-ups across various hardware platforms compared to recent state-of-the-art backbones. Code and models are available at https://github.com/altair199797/LowFormer/blob/main/Beyond_MACs.md.

25.
arXiv (CS.AI) 2026-06-16

Steering Emotional Dynamics for Art Therapy: Controllable Narrative Script Generation through Hierarchically Guided LLM Agents

arXiv:2606.16481v1 Announce Type: new Abstract: Art therapy plays a vital role in emotional healing, in which narrative creation acts as the primary vehicle for emotional expression. Given the inherently dynamic nature of emotions during healing, narratives with finely controlled emotional fluctuations enable individuals to safely project inner conflicts and achieve emotional catharsis. Recently, with the rapid development of Large Language Models (LLMs), automated narrative generation technology has provided a new pathway to support such artistic designs. However, while existing methods can produce fluent texts, they struggle to generate narratives that adhere to specified affective trajectories, failing to meet the demands of emotion-oriented psychological healing. To address these issues, this paper proposes EC-Script, an LLM agent-based framework that enables hierarchical control of the affective trajectory in narrative generation for emotional healing. To ensure that the generated narratives strictly follow the given emotional patterns, EC-Script establishes overall narrative direction through Emotion-Trajectory Planning, propels scene-level plot development with Character-Driven Scene Generation, and regulates local emotional changes of characters via Emotion-Controlled Script Writing. Ultimately, it outputs scene-by-scene script content that remains highly consistent with the preset affective trajectory. Experimental results demonstrate that EC-Script significantly outperforms baseline methods in affective trajectory adherence, exhibiting excellent and reliable emotional controllability, thereby providing effective technical support for AI-assisted emotional healing scenarios.