Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-15

ICD-10 Code Ambiguity Obscures Treatment-Eligible Adults with Spinal Muscular Atrophy: A Single-Center Chart Review and Patient Outreach Study

Background. Three disease-modifying therapies (DMTs) for spinal muscular atrophy (SMA) have been approved since 2016, yet many adults remain untreated. Identifying them depends on ICD-10 codes that capture SMA but do not reliably distinguish it from other related conditions. We examined, in one U.S. health system, both patients' engagement with therapy and the accuracy of the codes used to find them. Methods. We conducted a retrospective chart review of adults in an academic health system identified by SMA-associated ICD-10 codes, with manual adjudication of diagnosis and DMT status. Confirmed SMA-positive, DMT-naive patients were invited to a structured telephone interview on treatment awareness and barriers. Results. Of 60 charts, 22 (36.7%; 95% CI 25.6-49.3%) were appropriately coded for SMA or a related disorder; only 16 (26.7%) had molecularly confirmed SMA. The other 38 (63.3%) were miscoded, spanning spinal and bulbar muscular atrophy, asymptomatic carriers, prenatal screening, and conditions unrelated to SMA. Ten of the 16 confirmed patients (62.5%) were DMT-naive; one was interviewed, one declined, and eight could not be reached. The non-response is itself a finding: the patients least visible to administrative data are the hardest to reach. Conclusions. ICD-10 ambiguity is a barrier to treatment access in adult SMA, as is loss to follow-up. We make two recommendations: continuous documentation-coding alignment that uses natural language processing to verify the genetic precondition, and type-specific SMA codes (subcodes for Types 0-4) anchored on molecular SMN1 confirmation. Together these would support cohort identification, outreach, and evidence generation without adding to clinician burden.

02.
arXiv (CS.CL) 2026-06-16

Evaluating and Preserving Lexical Stress in English-to-Chinese Speech-to-Speech Translation

Speech-to-speech translation (S2ST) systems have achieved impressive progress in semantic accuracy and speech naturalness. However, the cross-lingual transfer of lexical stress, a vital cue for emphasis and speaker intent, remains heavily underexplored, compounded by a lack of reliable automatic evaluation metrics for tonal languages like Chinese. We investigate English-to-Chinese S2ST stress transfer by constructing a stress-annotated Chinese dataset and an XLS-R-based Mandarin stress detector. Integrating this with the English EmphAssess system, we propose a novel objective metric for cross-lingual stress evaluation. Furthermore, we fine-tune CosyVoice3 to build a stress-aware S2ST system. Experiments demonstrate that our proposed S2ST architecture significantly outperforms existing systems in stress translation capability while maintaining competitive translation quality. Furthermore, our evaluation metric exhibits a strong correlation with human subjective judgments.

03.
Nature (Science) 2026-06-19

Daily briefing: Human detritus remakes geology

作者:

What, exactly, is a rock? Plus, a stem-cell success for a severe autoimmune disease and evidence that ‘AI deskilling’ is real. Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals.

04.
arXiv (CS.CL) 2026-06-16

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

06.
arXiv (CS.CV) 2026-06-15

Avatar V: Scaling Video-Reference Avatar Video Generation

Generating avatar videos that are not merely visually similar to a target individual but behaviorally recognizable, faithfully reproducing their talking rhythm, gestural tendencies, and expression dynamics, remains an open challenge. Existing methods predominantly condition on single static images, which provide insufficient identity information and cannot capture dynamic motion traits, while standard pixel-level objectives underserve the perceptually critical facial regions that determine avatar fidelity. We present Avatar V, a production-scale framework that addresses these limitations through video-reference-conditioned identity modeling. Rather than compressing identity into fixed-size embeddings, the model conditions directly on the full token sequence of a reference video, learning to reproduce both static identity attributes (facial geometry, skin texture) and dynamic behavioral patterns (talking rhythm, micro-expressions) through attention over the reference context. We introduce Sparse Reference Attention, an asymmetric mechanism achieving linear-complexity conditioning on arbitrarily long references; a motion representation stream enabling closed-loop talking style transfer; and an identity-aware super-resolution refiner inheriting the full reference conditioning. These are supported by a data engine curating 100M+ training clips from 50M raw videos, and a five-stage training pipeline with flow matching pre-training, personality fine-tuning, two-phase distillation (>10x acceleration), and RLHF alignment, deployed across thousands of GPUs. Avatar V generates 1080p videos of unlimited duration, achieving state-of-the-art identity preservation, lip synchronization, and generation quality on our cross-scene benchmark, consistently outperforming leading systems including Seedance 2.0, Kling O3 Pro, Veo 3.1, and OmniHuman 1.5 in both automated metrics and human evaluation.

08.
Nature (Science) 2026-06-10

Measurement of reactor neutrino oscillation with the first JUNO data

Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO)4 is a 20-ktonne liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision5,6. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $${\sin }^{2}{\theta }_{12}=0.3092\,\pm \,0.0087$$ and $$\Delta {m}_{21}^{2}=(7.50\,\pm \,0.12)\times 1{0}^{-5}\,{\mathrm{eV}}^{2}$$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the design of the detector and indicate the readiness of JUNO for resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights the potential of JUNO to push the frontiers of precision neutrino physics and paves the way for its broad scientific programme. The first data of the Jiangmen Underground Neutrino Observatory deliver high-precision neutrino oscillation parameters, improving measurements and demonstrating readiness to determine neutrino mass ordering.

09.
arXiv (CS.CL) 2026-06-19

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their fine-grained image-text alignment and advanced text-generation capabilities. Currently, state-of-the-art MRGs primarily focus on adapting pre-trained LVLMs with direct supervised fine-tuning (SFT), a fine-tuning strategy with medical image-report pairs. However, several factors limit the performance of these LVLMs. Firstly, direct SFT enables LVLMs to generate medical reports directly without an intermediate thinking process of pathological feature perception and diagnostic reasoning. This causes a potential failure to perceive pathological features and thus leads to misdiagnosis. Secondly, direct SFT lacks the incorporation of radiology-specific knowledge guidance, causing LVLMs to misinterpret perceived pathological features and make incorrect diagnoses. To address these gaps, we propose a novel fine-tuning strategy named Med-R2. We introduce a perception-driven long reasoning process that precedes report generation and incorporates radiology-specific knowledge as guidance. Additionally, to alleviate potential perceptual errors in complex reasoning, a reflection mechanism is introduced to refine the perception of pathological features and the generated report. Our experiments demonstrate that Med-R2 effectively enhances the capability of pathological features perception and diagnosis accuracy for MRG via fine-tuned LVLMs.

10.
arXiv (CS.CV) 2026-06-18

Low-Cost Neuromorphic Fall Detection Using Synthetic Event Data and Hybrid SNNs

This work presents the development of hybrid models that integrate spiking neural networks (SNNs) with components of convolutional neural networks (CNNs) to learn from simulated event-based camera data (Dynamic Vision Sensor, DVS) generated from conventional smartphone videos. Aimed primarily at human fall detection, the approach leverages the energy efficiency and spatio-temporal processing capabilities of SNNs by converting video frames into event-based data. The proposed models are evaluated through simulations on multiple datasets, comparing their performance to that of traditional machine learning models. Results demonstrate significant gains in efficiency without sacrificing accuracy, underscoring the potential of combining SNNs and DVS technology for complex tasks in real-world environments.

11.
arXiv (CS.AI) 2026-06-12

LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold

arXiv:2606.12921v1 Announce Type: cross Abstract: Low-Rank Adaptation (LoRA) significantly reduces compute and memory costs for finetuning Deep Learning models but is often harder to tune than dense training: when using factor-wise optimizers such as AdamW, it is sensitive to initialization choices, its optimal learning rates transfer poorly across ranks, and it often fails to beat dense baselines. We derive LoRA-Muon by applying the Muon optimizer's spectral steepest-descent rule to the low-rank setting. Along with our split weight-decay rule, our main claim is that LoRA-Muon is a good low-rank proxy for full-rank Muon and Shampoo-family optimizers. Its optimal learning rates transfer across rank, width, depth, and factor-rescaling. In our compute-matched TinyShakespeare study, a rank-$2$ proxy recovers the dense best tested learning rate, and a rank-$32$ LoRA-Muon run attains lower mean validation loss than the dense baseline in the seed-averaged sweep. We further show that the Spectron optimizer depends on arbitrary factor scaling, so it would likely be a poor fit when finetuning starts from badly imbalanced factors, and that LoRA-RITE's simplified QR-coordinate core implements the same spectral update. LoRA-Muon computes that update without QR-decomposition and avoids storing second moments, making it more accelerator-friendly and memory-efficient.

12.
arXiv (CS.LG) 2026-06-17

X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation

arXiv:2602.22277v2 Announce Type: replace Abstract: AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based eXplainable Artificial Intelligence (XAI) solutions offer input filtering, they often neglect internal structural optimization. We propose X-REFINE, an XAI-based framework for joint input-filtering and architecture fine-tuning. By utilizing a decomposition-based, sign-stabilized LRP epsilon rule, X-REFINE backpropagates predictions to derive high-resolution relevance scores for both subcarriers and hidden neurons. This enables a reliable optimization that identifies the most reliable model components. Simulation results demonstrate that X-REFINE achieves a superior performance-complexity-interpretability trade-off compared to the external perturbation-based XAI frameworks, significantly reducing computational complexity while maintaining robust bit error rate (BER) performance.

13.
arXiv (CS.LG) 2026-06-11

Triangular-Reference Schrödinger Bridges for Time Series Generation

arXiv:2605.27478v3 Announce Type: replace-cross Abstract: Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya–Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

14.
arXiv (CS.CL) 2026-06-17

A Framework for Evaluating Agentic Skills at Scale

Agent skills – structured, reusable knowledge artifacts that augment LLM agent capabilities – have been rapidly adopted in industry, yet their cross-domain impact and use across commercial and open-source models remain under-studied, and no reusable methodology exists for evaluating an individual skill. In this work, we present an evaluation framework that lets a skill author construct realistic tasks to rigorously assess the aspects of a skill that matter most to them, and that estimates skill utility by solving those tasks. Further, we apply our evaluation approach at scale to 500 real-world skills, generating 1,000 tasks derived from the skills' content, along with instruction-following and goal-completion scoring rubrics. Using these metrics, we evaluate how 19 agent-model configurations, both proprietary and open-source, perform on the tasks. Our results show that models vary widely in how closely they adhere to the instructions encoded in skills, leading to substantial differences in their performance gains. Furthermore, we show that access to a skill significantly changes model behavior compared to the no-skill setup, providing an essential mechanism for encoding opinionated workflows into LLM agents. We release our evaluation dataset to support future work on agent skills.

15.
arXiv (CS.LG) 2026-06-12

Is Spurious Correlation Removal Always Learnable?

arXiv:2606.12930v1 Announce Type: new Abstract: Invariant learning can fail even when the invariant structure is statistically identifiable. We show a conditional computational barrier: under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist samplable multi-environment instances with a one-dimensional predictive invariant subspace ($k=1$) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. We further quantify environment diversity by a separation parameter $\gamma$, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity and local Gaussian regularity, the minimax risk is $\mathbb{E}[\dist(\hat{V},V_{\mathrm{inv}})^2]=\Theta(k(d-k)/(n|\mathcal{E}|))$, and under label-induced shifts a phase transition occurs at $n^*\propto k(d-k)/(|\mathcal{E}|\gamma^2)$ with refined estimation error scaling proportional to $1/\gamma^2$. Synthetic and real datasets illustrate the predicted gaps and transitions and motivate simple diversity diagnostics.

16.
medRxiv (Medicine) 2026-06-12

Design, Implementation, and Evaluation of a Shadowing Program for Medical Students in the Basic Sciences Phase

Introduction Shadowing, as an educational method based on active observation, can foster a realistic understanding of professional roles and enhance the communication skills of medical students. This study aimed to design, implement, and evaluate a shadowing program for basic sciences medical students. Methods This development study was conducted based on the ADDIE model in five phases. The study population consisted of 799 medical students in semesters 2 to 5. The stages included Analysis (determining needs through literature review and expert panels), Design (specifying learning environments and evaluation methods), Development (preparing guides and educational tools), Implementation (within the Medical Ethics course), and Evaluation (using questionnaires and reflection forms). Findings This study aimed to design and evaluate an educational shadowing program based on the ADDIE model. In the Analysis phase, the profiles of 799 students and learning objectives were determined. In the Design phase, a structured program for four types of shadowing was designed. In the Development phase, all guides and educational tools were prepared. In the Implementation phase, the program was carried out with complete coverage and adherence to ethical considerations. Finally, the program evaluation showed that "Motivation to become a good physician" (3.75-3.95) and "Enhancing empathy" (3.50-3.94) received the highest scores, while "Increasing understanding of the basic science-clinical connection" (2.53-2.89) and "Willingness to attend on holidays" (1.87-2.31) received the lowest scores. Conclusion The findings indicate that implementing the shadowing program is an effective method for strengthening the professional attitudes and academic motivation of medical students. However, the program did not significantly improve students perception of the basic science-clinical connection, indicating a need for curricular refinement. The continuation and extension of this program to other levels and fields of medical sciences are recommended.

17.
arXiv (CS.AI) 2026-06-16

Can We Stop Malicious AI? KILLBENCH: A Benchmark for External AI Kill Switch Feasibility

arXiv:2511.13725v4 Announce Type: replace-cross Abstract: Malicious AI causing harm to humans is not just a Hollywood fantasy. Indeed, as highly capable models such as Claude Mythos emerge and agent systems like OpenClaw rapidly spread, the question of how to stop an AI that acts maliciously – whether by design or by accident – has become urgent. To address this, we propose Killbench, a benchmark for evaluating the Killswitch: a mechanism that halts a malicious AI's in-progress behavior using only external signals. Targeting web agents – the most widely deployed agent domain – Killbench evaluates a range of Kill Switch methods that halt a maliciously operating agent without any access to its internal parameters or the surrounding malicious AI's system, relying solely on external inputs. The benchmark comprises four malicious AI's agent configurations (including an uncensored LLM Agent), 8 harmful scenarios, and malicious prompts constructed from 10 distinct jailbreak patterns. We further construct four External AI Kill Switch defense methods and evaluate them on Grok-4.3, GPT-5.2, Gemma4, Qwen3.6 and Qwen3.5-uncensored, contributing an empirical instrument toward the feasibility of External AI Kill Switches against malicious AI and to the study of AI corrigibility.

19.
arXiv (CS.AI) 2026-06-18

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

作者:

arXiv:2606.19172v1 Announce Type: new Abstract: Personal memory in a language model is two problems: content and reasoning skill. The brain keeps the two apart (a sparse, local engram in the hippocampus for each episode, a slow neocortex for the shared skills that interpret it), so a new fact need not overwrite everything else. Most personalization today keeps a user's facts outside the weights, in a natural-language memory file or a retrieval index. When facts are written into the model instead, the standard recipe is the per-user LoRA adapter, which does the opposite of the brain, folding content and skill into one global weight delta. Writing a user's facts as a LoRA contaminates text unrelated to them; writing the same facts as local Engram rows leaves it mathematically untouched, resulting in a roughly 33,000x smaller memory footprint. We therefore propose User as Engram: store a user's content as surgical edits to the hash-keyed memory table of an Engram model, and carry the reasoning skill in one shared adapter. This layered design matches per-user LoRA's direct recall while delivering 5.6x higher indirect-reasoning accuracy on average, and never makes a single user worse at reasoning than the untouched base. The edit is a glass box: writing a fact switches on its lookup at exactly the trigger, adds the value the answer needs, leaves every other position unchanged to the last bit, and fails if written into the wrong layer. Because different users' facts land in disjoint hash slots, their edits compose: many users live in one shared table at once, stacking additively and losslessly, where a per-user LoRA, a single global weight delta, admits only one. Upon retrieval, a per-user Engram table does not grow with the population the retriever must search, so past ~100 facts it overtakes a retrieval pipeline on a 2.5x larger model.

20.
bioRxiv (Bioinfo) 2026-06-19

Nickel-Driven Dynamics of Urease in Sporosarcina pasteurii: Integrated Computational and Experimental Insights

Urease is a nickel-dependent enzyme that plays an important role in urea hydrolysis and in a process named as microbial-induced calcium carbonate precipitation (MICP), which is widely used in sustainable environmental biotechnology. Despite its ecological importance, urease powers Biogrout (biocementation), a promising green technology for soil stabilization and infrastructure repair. Yet, the relationship between nickel availability, enzyme activation, and bacterial fitness remains poorly understood. In this study, we reveal a striking dual effect of nickel on Sporosarcina pasteurii: while high Ni2+ concentrations strongly inhibit growth (IC50 {approx} 637.7 {micro}M), they simultaneously boost specific urease activity up to six-fold. This uncoupling between biomass and enzymatic efficiency highlights a previously overlooked adaptive strategy under metal stress. Using structural bioinformatics and molecular docking, we show that Ure1–the catalytic subunit–exhibits the strongest nickel affinity (-4.3 kcal{middle dot}mol-1), supported by highly conserved active-site residues, whereas accessory proteins UreE and UreG display moderate and weak binding, consistent with their roles in metal delivery and GTP-dependent maturation. In addition, microscopic observations confirmed that calcium carbonate precipitation was most pronounced at intermediate nickel concentrations (approximately 400-1000 {micro}M), whereas higher concentrations ([≥]1000-1300 {micro}M) led to reduced mineral formation due to loss viable cells. Taken together, these results indicates that nickel availability controls both urease activation and bacterial fitness, and that an optimal balance is required to maximize biomenerilization efficiency in environmental applications, particularly in biocementation technology.

21.
arXiv (CS.AI) 2026-06-12

Parthenon Law: A Self-Evolving Legal-Agent Framework

arXiv:2606.04602v3 Announce Type: replace Abstract: As agents grow more capable, legal-domain LLM agents promise to turn document-heavy matters into reviewable work products – yet reliable deployment faces three obstacles: no large-scale evidence on how today's strongest model-and-harness combinations behave on end-to-end legal matters; no agent architecture adapted to the legal vertical, only general-purpose harnesses; and, in a setting that keeps shifting with new facts, authorities, and deadlines, no mechanism for systems to learn from their own outcomes. We address each. A large-scale empirical study on Harvey LAB – $12{,}510$ agent trajectories – shows that even frontier agents remain far from completing matters in a single pass: per-criterion accuracy climbs with stronger models while strict matter completion stalls. We then introduce \textsc{Parthenon}, a self-evolving legal-agent framework that factors Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills into auditable surfaces for source traceability, date and number grounding, deliverable compliance, and issue closure. Finally, an anti-leakage learning loop converts scored failures into task-agnostic edits to skills, tools, and knowledge, letting the system improve with experience – as a firm refines its checklists and playbooks after each matter – without touching model weights. Across our large-scale empirical analysis, \textsc{Parthenon} substantially improves the performance of state-of-the-art models and harnesses on legal-matter tasks.

22.
arXiv (CS.LG) 2026-06-16

The Algebra of Units: From Buckingham's Pi-grec Theorem to Latent-Variable Learning

arXiv:2606.16737v1 Announce Type: cross Abstract: Engineers often measure many quantities-speed, pressure, temperature, length-expressed in different physical units. The Buckingham Pi-grec theorem states that these variables can always be combined into a smaller set of dimensionless numbers whose values fully determine the system's behaviour. Identifying the appropriate dimensionless groups has traditionally required expert knowledge and physical insight. This paper shows that they can instead be discovered automatically from data, without prior knowledge of the governing physics. The key observation is that, after logarithmic transformation, measurements collected under different scalings of the same system lie on a low-dimensional manifold whose geometry is determined by the underlying dimensionless groups. Singular value decomposition (SVD) identifies this manifold directly from data. A subsequent search over integer-exponent combinations recovers candidate dimensionless quantities, while a repeating-variable filter retains only those constructed from the machine's characteristic scales. This procedure recovers familiar engineering groups, including the flow coefficient, head coefficient, and Mach number, while excluding equivalent but less interpretable alternatives. The method is demonstrated on a synthetic compressor dataset containing 16,000 measurements. Starting from raw dimensional variables and no physics input, it recovers the correct dimensionless groups to numerical precision and reproduces the compressor performance map with an error below 0.01%. More broadly, the work reveals a close connection between classical dimensional analysis and modern data-driven learning. Both rely on the same underlying algebraic structure, suggesting new approaches for building physical models that are simultaneously interpretable, scalable, and data-efficient.

23.
arXiv (CS.CV) 2026-06-11

Continual Learning with Support Boundary Experience Blending

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

24.
arXiv (CS.CV) 2026-06-19

Neural Events: Discrete Asynchronous Autoencoders for Event-Based Vision

Event cameras capture dynamic scenes with exceptional temporal fidelity by representing them as a continuous stream of microsecond resolution events. Each individual event, however, only carries minimal semantic value, merely signaling a localized brightness change. To derive meaningful signals, downstream algorithms need to quickly integrate cues from a potentially massive torrent of low-information events. Current architectures, however, are easily overwhelmed, struggling to balance capturing fine-grained temporal dynamics and maintaining a manageable data throughput. This paper proposes a framework to re-tokenize event streams into a small set of highly informative neural events, each representing a local spatio-temporal context window with a discrete learnable code. Every time this code flips, a neural event is triggered, yielding a highly compressed data stream. We demonstrate that, across object detection and classification, networks trained on neural events are on par or surpass the performance of state-of-the-art approaches while reducing the event rate by a factor of 2.0.

25.
arXiv (CS.CL) 2026-06-16

EffGen: Enabling Small Language Models as Capable Autonomous Agents

Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls; while powerful, this approach faces several limitations including high token costs and privacy concerns for sensitive applications. We introduce EffGen, an open-source agentic framework optimized for small language models (SLMs) that enables effective, efficient, and secure local deployment. EffGen makes four major contributions: (1) Enhanced tool-calling with prompt optimization that compresses input prompts by up to 70-80% (and 57% on average across our benchmarks) while preserving task semantics, (2) Intelligent task decomposition that breaks complex queries into parallel or sequential subtasks based on dependencies, (3) Complexity-based routing using five factors to make smart pre-execution decisions, and (4) Unified memory system combining short-term, long-term, and vector-based storage. Additionally, EffGen unifies multiple agent protocols (MCP, A2A, ACP) for cross-protocol communication. Results on 13 benchmarks show EffGen outperforms LangChain, AutoGen, and Smolagents with higher success rates, faster execution, and lower memory. Our results reveal that prompt optimization and complexity routing have complementary scaling behavior: optimization benefits SLMs more (11.2% gain at 1.5B vs 2.4% at 32B), while routing benefits large models more (3.6% at 1.5B vs 7.9% at 32B), providing consistent gains across all scales when combined. EffGen is released under the Apache 2.0 License, ensuring broad accessibility for research and commercial use, with the code available at https://github.com/ctrl-gaurav/effGen, the Python package at https://pypi.org/project/effgen/ (pip install effgen), and the project website and documentation at https://effgen.org/ and https://docs.effgen.org/.