Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-16

Adverse Childhood Experiences and Growth Outcomes in Childhood: A Longitudinal EHR-Based Study

Question Are adverse childhood experiences (ACEs) associated with altered growth trajectories in childhood? Findings In this cohort study of 412,549 children and adolescents, ACEs were associated with lower height throughout childhood, earlier pubertal timing, and shorter final stature. Height differences emerged approximately 2 years before ACE documentation and were greatest among those with earlier documentation. Meaning These findings suggest that early adversity affects physical growth in children and may serve as a measurable indicator of the biological consequences of early-life stress, especially in those with documentation of ACEs prior to the onset of typical pubertal growth. Importance Adverse childhood experiences (ACEs) are among the strongest risk factors for long-term mental and physical health complications, yet their impact on physical growth in childhood remains incompletely understood. Objective To determine the association of ACEs on childhood growth trajectories and growth dynamics. Design, Setting and Participants Retrospective cohort study using longitudinal electronic health record data. Data was collected from participants between February 1999 and August 2025. A large academic medical center biobank linked to deidentified electronic health records in the southeastern United States. A total of 412,549 individuals with at least 2 recorded height measurements between the ages of 2 and 20 were included in the primary analysis. Growth curve analyses were performed in a subset of 199,844 individuals with at least 3 height measurements spanning at least 2 years. Genetic analyses were performed in a subset of 10,114 individuals of primarily European ancestry. Exposure(s) Documented exposure to adverse childhood experiences before age 18 years identified through a natural language processing algorithm. Main Outcome(s) and Measure(s) Height-for-age z-scores across childhood, final attained height, and growth curve parameters estimated using SuperImposition by Translation and Rotation (SITAR) modeling. Results Among 412,549 participants, 18,502 (4.5%) had clinically documented ACEs during childhood. ACE documentation was associated with lower height-for-age z-scores throughout childhood and adolescence. Final attained height was significantly lower among ACE-documented individuals, with mean differences of -3.0 cm among males (174.0 cm vs 177.0 cm, p < 0.001) and -1.3 cm among females (161.8 cm vs 163.1 cm, p < 0.001). Height differences emerged approximately 2 years before clinical ACE documentation. Earlier age at first ACE documentation was associated with progressively shorter final attained height, with each year decrease in age at ACE documentation associated with a decrease in final height of -0.20 cm in females and -0.35 cm in males. Those with first ACE documented prior to pubertal age also showed the most pronounced growth dynamic differences, with males demonstrating a mean reduction in size of 5.25 cm (95% CI, -6.79 cm to -3.70 cm) and 1.26-year earlier pubertal timing (95% CI, -1.50 to -1.03 years), and females demonstrating a reduction in growth curve size of 3.62 cm (95% CI, -4.83 to -2.41 cm) and 1.14-year earlier pubertal timing (95% CI, -1.29 to -0.99 years). Conclusions and Relevance In this large clinical cohort, clinically documented ACEs were associated with time-dependent reductions in stature, earlier pubertal timing, and short final attained height. These findings suggest that early childhood adversity may have lasting effects on physical development and highlight growth trajectories as a potential marker of the biological consequences of early-life stress.

02.
arXiv (CS.CL) 2026-06-12

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking–silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.

03.
arXiv (CS.AI) 2026-06-18

Do Neural Networks Lose Plasticity in a Gradually Changing World?

arXiv:2602.09234v2 Announce Type: replace-cross Abstract: Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on benchmarks with abrupt task transitions, without examining whether the abruptness itself contributes to the observed plasticity loss. In this paper, we investigate the role of transition abruptness by simulating gradually changing environments through input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the severity of plasticity loss is closely tied to the abruptness of task transitions, and can be substantially reduced when the environment changes gradually.

04.
arXiv (CS.CL) 2026-06-18

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary. This coupling makes grounding hard to inspect, tune, reuse, or port, and can trigger Search-Induced Verbosity that breaks strict output contracts. We present Decoupled Search Grounding (DSG), a vendor-agnostic boundary that moves grounding outside the reasoning model through an MCP-compatible gateway, exposing provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and exact plus semantic caching as first-class controls. Across five frontier models on SimpleQA, FreshQA, and HotpotQA, native search leads on recency-sensitive FreshQA, but DSG exposes a stronger frontier when control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) at 91% lower search cost, preserves concise answer contracts, and reaches a 99.4% warm-cache hit rate with 68% lower latency. Deployed as a shared production grounding layer for large-scale agentic workloads with interchangeable models, DSG matches or slightly exceeds native-search accuracy on an e-commerce query-understanding (QIU) workload while cutting search cost by over 98%. Real-time grounding is best treated as an optimizable interface boundary, not a fixed model feature.

05.
arXiv (CS.AI) 2026-06-19

PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation

arXiv:2606.19935v1 Announce Type: new Abstract: Humanoid robots require co-speech motions that are not only expressive and speech-aligned, but also physically executable under embodiment constraints. Existing co-speech generation pipelines are predominantly human-centric: motions are first generated in human-body representations such as SMPL-X and subsequently retargeted to humanoid robots. In this work, we identify a fundamental embodiment gap in this paradigm, where the mismatch between human motion manifolds and humanoid embodiment constraints disrupts embodiment consistency during motion transfer and physical execution. Through extensive analysis, we show that although retargeting can preserve coarse motion semantics, it significantly compresses motion diversity and weakens prosody-motion synchronization, limiting expressive humanoid behaviors. To address this problem, we first propose IK-EER, a prosody-preserving humanoid motion curation framework that jointly optimizes kinematic feasibility and speech-motion temporal alignment during retargeting. Building upon the curated robot-native motion dataset, we further introduce PhysDrift, an embodiment-aware co-speech motion generation framework that directly predicts executable humanoid joint trajectories from speech without relying on intermediate human-body representations. Unlike conventional human-centric pipelines, PhysDrift maintains embodiment consistency throughout both training and inference while incorporating physical regularization to stabilize robot motion dynamics. Extensive experiments and real-world humanoid deployment demonstrate that embodiment-aware robot-native generation substantially improves speech-motion alignment, physical plausibility, motion smoothness, inference efficiency, and real-time interaction capability.

06.
arXiv (CS.AI) 2026-06-18

EMORSION: Examining the Impact of Audio Parameters on Emotional Responses and Immersion in Film

arXiv:2606.18266v1 Announce Type: cross Abstract: EMORSION is an exploratory proof-of-concept study examining how film audio design shapes audience emotion and immersion in acinema setting. Four film scenes were selected across the horror (2) and drama (2) genres, balanced between mainstream and independent productions. For each scene, multiple alternative audio mixes were created by systematically manipulating three core aspects of audio design, frequency (pitch), dynamics (loudness), and directionality (spatial placement). Three audience groups viewed the scenes, with each group exposed to one manipulated mix alongside a control mix for each scene. Audience responses were assessed through a triangulated multimodal framework combining self-reported emotion and immersion via a questionnaire, physiological measures including heart rate monitoring, and video-based motion tracking. The protocol successfully captured measurable, interpretable differences across audio conditions, indicating that even subtle changes in audio design can shape emotional perception and immersion. Unconventional mixes tended to produce greater variability in audience interpretation, while conventional immersive mixes were associated with stronger cross-audience agreement. These findings establish the feasibility of the EMORSION protocol and motivate larger-scale studies to characterise the role of specific audio parameters in shaping audience experience.

07.
arXiv (CS.LG) 2026-06-16

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

arXiv:2606.17011v1 Announce Type: cross Abstract: Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

08.
arXiv (CS.AI) 2026-06-15

When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

arXiv:2606.14476v1 Announce Type: new Abstract: A growing line of work equips large language model (LLM) agents with graph neural networks (GNNs) as callable tools, assuming the agent exercises judgment over when and how much to rely on such a tool. We test this directly. We expose a frozen GNN to a ReAct-style LLM agent as an explicit tool and measure, on node classification over a text-attributed graph (ogbn-arxiv, replicated on WikiCS), whether the agent uses the tool or merely obeys it. We find the agent does not exercise judgment: its predictions agree with the raw GNN's 97.6-99.2% of the time (5 seeds), collapsing into a GNN parrot that adopts the tool's output wholesale and bypasses its own reasoning. Sweeping backbone capability (Qwen2.5 0.5B-7B), the deference is not a weak-model artifact: among models able to invoke the tool, agreement rises with capability (0.60 to 0.98 from 1.5B to 7B). Crucially, the cost of deference does not shrink as capability grows and grows where alternatives emerge: a per-node oracle over the available actions beats the parrot by 0.09-0.18 at 3B and 0.12-0.22 at 7B, roughly doubling at high homophily, because the parrot is pinned to the frozen GNN while the agent's alternatives improve; at 7B a simple neighbour-label tool overtakes the GNN at high homophily (0.81 vs 0.71) yet the agent still defers. A simple selective-invocation gate recovers about half of that high-homophily gap (0.71 to 0.83) but yields no net global gain, and held-out estimates bound the best achievable gate over standard test-time features to at most a third of the oracle headroom: reliable selective invocation looks limited by available information, not merely router design. Our results are a cautionary measurement: evaluations of agent+tool systems cannot assume the agent adds judgment on top of the tool, and selective invocation must be designed in rather than expected to emerge from scale.

09.
arXiv (CS.CV) 2026-06-18

Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization

Even during fixation the human eye is constantly in low amplitude motion, jittering over small angles in random directions at up to 100Hz. This motion results in all features of the image on the retina constantly traversing a number of cones, yet objects which are stable in the world are perceived to be stable, and any object which is moving in the world is perceived to be moving. A series of experiments carried out over a dozen years revealed the psychophysics of visual stabilization to be more nuanced than might be assumed, say, from the mechanics of stabilization of camera images, or what might be assumed to be the simplest solution from an evolutionary perspective. The psychophysics revealed by the experiments strongly implies a specific set of operations on retinal signals resulting in the observed stabilization behavior. The presentation is in two levels. First is a functional description of the action of the mechanism that is very likely responsible for the experimentally observed behavior. Second is a more speculative proposal of circuit-level neural elements that might implement the functional behavior.

10.
arXiv (CS.LG) 2026-06-19

Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States

arXiv:2606.20053v1 Announce Type: new Abstract: The Doyle-Fuller-Newman (DFN) model resolves internal electrochemical states in lithium-ion batteries with high fidelity. However, the numerical solution of its governing equations is computationally prohibitive for real-time deployment, limiting scalability from individual cells to pack and fleet-scale applications. While machine learning surrogates can substantially reduce inference latency through GPU acceleration, most existing approaches learn solution approximations tied to specific operating conditions rather than learning generalizable state-evolution dynamics. This work presents a systematic comparison of four neural network architectures (MLP, ResNet, U-Net, FNO) formulated as autoregressive state-transition operators that predict full DFN internal states across a wide range of operating conditions. To ensure a controlled architectural comparison, all models are trained under a unified framework using multi-step unrolling and current-conditioning, isolating the impact of spatial inductive bias. Results demonstrate that the U-Net's multi-scale feature hierarchy achieves a mean final-step nRMSE of 3% averaged across all internal state variables after 300-step autoregressive rollouts, while providing a 5.38x speed-up over the numerical solver. These findings highlight spatial inductive bias as a critical determinant of surrogate performance, advancing the development of surrogates for internal state observability for next-generation battery management systems and digital twins.

11.
arXiv (CS.CV) 2026-06-16

Implementation of Licensed Plate Detection and Noise Removal in Image Processing

作者:

Car license plate recognition system is an image processing technology used to identify vehicles by capturing their Car License Plates. The car license plate recognition technology is also known as automatic number-plate recognition, automatic vehicle identification, car license plate recognition or optical character recognition for cars. In Malaysia, as the number of vehicle is increasing rapidly nowadays, a pretty great number of vehicle on the road has brought about the considerable demands of car license plate recognition system. Car license plate recognition system can be implemented in electronic parking payment system, highway toll-fee system, traffic surveillance system and as police enforcement tools. Additionally, car license plate recognition system technology also has potential to be combined with various techniques in other different fields like biology, aerospace and so on to achieve the goal of solving some specialized problems.

12.
arXiv (CS.CL) 2026-06-16

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility – yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position – a content-free artifact – shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.

13.
arXiv (CS.CV) 2026-06-12

Why Commodity WiFi Sensors Fail at Multi-Person Gait Identification: A Systematic Analysis Using ESP32

WiFi Channel State Information (CSI) has shown promise for single-person gait identification, raising interest in its use for contactless biometrics, continuous authentication, and passive identification. However, the feasibility of multi-person identification on low-cost commodity devices remains unclear. A critical question is whether weak multi-person performance is primarily an algorithmic limitation, or whether it reflects a more fundamental sensing ceiling on commodity WiFi hardware. We address this question through a systematic empirical study using commodity ESP32 WiFi sensors. We evaluated six different signal separation methods–FastICA, SOBI, PCA-ICA, NMF, Wavelet, and Tensor decomposition–across seven scenarios spanning 1-10 people in both controlled and realistic indoor environments. To investigate beyond classification accuracy, we introduce three diagnostic metrics: intra-subject variability (ISV), inter-subject distinguishability (ISD), and performance degradation rate (PDR). In all methods, performance remains moderate (39%-56% accuracy), with limited evidence that algorithmic choice alone solves the problem. The best-performing method, NMF, reaches 56% accuracy, while all methods exhibit extremely high feature-space overlap (97%-99%), unstable within-subject representations, and marked environmental sensitivity. These findings suggest that, under commodity ESP32 CSI constraints, dense multi-person gait identification is limited more by sensing quality and spatial diversity than by the chosen separation algorithm. Our results have direct implications for security and privacy: they call into question the practicality of commodity WiFi CSI as a robust multi-user biometric primitive for authentication, while also placing important bounds on the passive identification capabilities achievable with low-cost off-the-shelf WiFi hardware.

14.
arXiv (CS.AI) 2026-06-18

Graph Grounded Cross Attention Transformer Neural Network for Structurally Constrained Full Event Sequence Generation in Predictive Process Monitoring

arXiv:2606.18726v1 Announce Type: cross Abstract: Structurally constrained event sequence generation remains challenging because generated paths must preserve transition feasibility, temporal order, termination, and attribute consistency. In predictive process monitoring (PPM), this challenge appears as full event sequence generation, whereas existing work mainly addresses component tasks such as next activity, remaining time, outcome, and attribute prediction. This paper proposes the Graph Grounded Cross Attention Transformer Neural Network (GGATN) for this unified PPM task. GGATN uses a global process graph as structured activity memory, contextualizes sequence positions through Transformer self attention, and injects process topology through graph grounded cross attention. Unlike autoregressive decoding, GGATN generates activities, timestamps, length, and event level and sequence level attributes in a single pass, followed by Viterbi style graph constrained decoding for feasible paths and explicit termination. Experiments on six benchmark event logs show more reliable generation quality than local instruction prompted LLM baselines. GGATN achieves strong performance on sequence similarity, Damerau Levenshtein similarity, bigram based control flow similarity, and duration distribution, while maintaining zero hallucinated activities and zero sequence level attribute inconsistency. Ablation analyses confirm the global graph encoder as a stable structural prior. Interpretability analyses show how graph structure, sequence context, feedback refinement, and constrained decoding shape generation.

15.
bioRxiv (Bioinfo) 2026-06-15

AliceDB database and pipeline for identification of natural protein variants based on mass spectrometry measurement data

The natural variation that distinguishes living organisms within a single species is currently being studied intensively, primarily at the genetic level. Unfortunately, studies of natural variants at the level of protein gene products are not very common, mainly due to the lack of appropriate databases and bioinformatics tools. The main research technique used to study proteomes/peptidomes is mass spectrometry (MS). A classic method for interpreting raw mass spectrometry data in proteomic/peptidomic studies involves the use of databases containing representative (canonical) sequences that define the proteome of the organism under study. In this paper, we present the AliceDB database, which contains information on over 7 million natural variants of protein sequences described in the scientific literature for Homo sapiens. The data contained in the AliceDB database can be utilized using widely available and commonly used software for interpreting proteomic data. Test results regarding the use of the AliceDB database for the interpretation of proteomic data indicate that accounting for the presence of natural variants increases both the number and quality of identified proteins. Furthermore, it is easy to identify protein sequence variants that may, for example, be of significance in medicine.

16.
arXiv (CS.CV) 2026-06-16

Adaptive Inference-Time Scaling via Early-Step Latent Verification for Image Editing

Instruction-based image editing has made notable progress with recent advances in generative models. However, the quality of the edited result is still influenced by the randomly sampled initial noise, particularly in complex editing scenarios. An unsuitable initial noise may lead to unsatisfactory editing results. Recent inference-time scaling methods address this issue by sampling multiple initial noises and selecting better candidates. Nevertheless, most of them follow a decode-then-verify scheme which introduces an efficiency-accuracy trade-off. When decoding is performed after limited inference steps, the decoded images often remain too noisy for reliable assessment, whereas sufficiently denoised images require much higher computational cost. To address this issue, we propose VeriLatent, a plug-and-play adaptive inference-time scaling framework with early-step latent verification for image editing. Specifically, we propose a novel verifier that scores each initial noise through a latent-space editing activation map at an early stage. It identifies promising candidates by assessing whether they can induce an effective edit in the correct region. This enables efficient early pruning without decoding latents into images. Building on this, we further develop an adaptive search strategy for inference-time scaling. It allocates inference budgets according to editing difficulty, thereby reducing the number of function evaluations (NFE). Extensive experiments on multiple benchmarks and different base models demonstrate that VeriLatent consistently improves both editing performance and inference-time scaling efficiency.

17.
arXiv (quant-ph) 2026-06-16

Sharp Transitions for Subsystem Complexity

arXiv:2510.18832v2 Announce Type: replace-cross Abstract: The circuit complexity of time-evolved pure quantum states grows linearly in time for an exponentially long time. This behavior has been proven in certain models, is conjectured to hold for generic quantum many-body systems, and is believed to be dual to the long-time growth of black hole interiors in AdS/CFT. Achieving a similar understanding for mixed states remains an important problem. In this work, we study the circuit complexity of time-evolved subsystems of pure quantum states. We find that for greater-than-half subsystem sizes, the complexity grows linearly in time for an exponentially long time, similarly to that of the full state. However, for less-than-half subsystem sizes, the complexity rises and then falls, returning to low complexity as the subsystem equilibrates. Notably, the transition between these two regimes occurs sharply at half system size. We use holographic duality to map out this picture of subsystem complexity dynamics and rigorously prove the existence of the sharp transition in random quantum circuits. Furthermore, we use holography to predict features of complexity growth at finite temperature that lie beyond the reach of techniques based on random quantum circuits. In particular, at finite temperature, we argue for an additional sharp transition at a critical less-than-half subsystem size. Below this critical value, the subsystem complexity saturates nearly instantaneously rather than exhibiting a rise and fall. This novel phenomenon, as well as an analogous transition above half system size, provides a target for future studies based on rigorous methods.

18.
arXiv (CS.CL) 2026-06-11

RedAct: Redacting Agent Capability Traces for Procedural Skill Protection

Users rely on execution traces to observe agent behavior, diagnose failures, and ensure accountability. These traces contain rich procedural detail, including tool invocations, intermediate decisions, and error-recovery logic. Yet this detail can expose private procedural skills, allowing downstream methods to recover key formulas, thresholds, and strategies without access to model weights or skill files. To quantify this risk and evaluate protection, we construct \textsc{CapTraceBench}, a benchmark of 75 specialized long-horizon tasks and 154 curated skills across seven domains. We also introduce \textsc{RedAct} https://github.com/XuShuwenn/RedAct, a protected trace release framework that localizes protected key information, rewrites traces while preserving verifier-critical evidence, and embeds behavioral watermarks for downstream provenance analysis. Across representative trace reuse methods, \textsc{RedAct} reduces normalized skill transfer (NST) from 44.7–67.1\% on raw traces to below the no-skill baseline, while preserving audit evidence. Its standalone behavioral watermarks reach 93.6–100.0\% true detection with a false alarm rate of at most 1.9\%. These results frame public agent traces as security interfaces and show that selective redaction can reduce procedural capability leakage without removing audit evidence.

19.
arXiv (CS.CL) 2026-06-12

LEDGER: A Long-Context Benchmark of Corporate Annual Reports for Grounded Financial Retrieval and Extraction

Finance reporting is a natural proving ground for large language models, and the very-long-context capabilities of recent models across all sizes make rigorous evaluation in this domain an increasingly pressing need. Yet most public financial resources reduce the task to plain-text SEC 10-K filings paired with a handful of question-answer items. We release LEDGER (Long-context Evaluation of Documents for Grounded Extraction and Retrieval), a corpus of 4,999 digitized corporate annual reports - full documents with figures, tables, and narrative, not just regulatory filings. Each report is labeled with 31 consolidated financial KPIs to be extracted and linked to the market's reaction at the earnings date. From this data we derive three evaluation benchmarks spanning the difficulty spectrum: a pure page-level KPI retrieval task with TREC-style relevance judgments over 118,048 questions in natural language, a conversational "needle-in-a-haystack" single-value lookup, and a full KPI extraction task, both from long, numerically dense reports. We additionally provide human OCR-quality annotations with inter-annotator agreement and the complete extraction, validation, and scoring toolchain. We further demonstrate the dataset's research utility with a case study linking CEO-letter rhetoric to post-publication market impact.

20.
arXiv (CS.AI) 2026-06-16

NeuronFabric: A Software Reference Architecture for On-Chip Transformer Training with Local Adam

arXiv:2606.16440v1 Announce Type: cross Abstract: Publicly documented accelerator architectures generally separate training computation from optimizer-state updates or rely on external memory and host orchestration. This paper presents NeuronFabric, a software reference architecture intended for future FPGA and ASIC implementations of transformer training with local Adam updates. A complete C# prototype implements forward pass, backpropagation, and Adam optimization without external machine-learning frameworks. The goal is to validate numerical correctness and memory requirements before hardware implementation. The evaluated model is a 334K-parameter autoregressive transformer (d=88, H=4, f=264, L=4, vocab=256) trained on the Shakespeare corpus. The BF16W configuration achieves evaluation loss 1.5426 after 80K samples, compared with 1.5224 for an FP32 GPU reference, while producing coherent character-level text. The paper introduces BF16W, which stores weights in BF16 while retaining Adam optimizer moments in FP32. This reduces memory requirements for on-chip training. A 334K-parameter FP32 model with Adam moments requires approximately 4.0 MB, matching the BRAM capacity of a Xilinx ZCU102 device. The BF16W variant requires approximately 3.34 MB, leaving memory available for activation storage. We describe the vocabulary-budget constraint observed during earlier experiments, quantify BF16W memory savings, and outline FPGA training as the next stage of development. No FPGA measurements are included in this paper. This publication serves as a public architectural disclosure and software reference implementation for future FPGA and ASIC exploration of the NeuronFabric architecture.

21.
arXiv (quant-ph) 2026-06-12

Effective Geometry and Position-Dependent Mass in Dual-$q$ Quantum Mechanics

arXiv:2606.12444v1 Announce Type: new Abstract: This work investigates the deformed-derivative formalism introduced by Borges, with emphasis on the relation between the linear operator $D_{(q)}$ and its nonlinear dual counterpart $D^{(q)}$. Directly inserting the dual derivative into the kinetic term leads to a nonlinear Schrödinger equation and obscures the usual interpretation of superposition and probability. We show that this nonlinearity can be removed by a simultaneous transformation of the coordinate and of the wave function. The transformed problem is an ordinary linear Schrödinger equation in a deformed coordinate, and its representation in the physical coordinate is equivalent to a Hermitian position-dependent-mass (PDM) Hamiltonian. In this formulation, the deformation parameter $q$ determines both the effective mass profile and the associated metric. The formalism is applied to the free particle, the infinite square well, the rectangular barrier, and the harmonic oscillator in the weak-deformation regime. Comparison with the nonadditive-translation approach of Costa Filho et al. shows that the Borges dual-$q$ framework provides an alternative route to the same effective geometric structure. For $q1$, the effective length is increased, which lowers the spectrum and suppresses tunneling relative to the undeformed limit $q=1$.

22.
arXiv (CS.LG) 2026-06-19

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

arXiv:2606.19411v1 Announce Type: new Abstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning – data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\operatorname{DPP} s) give a principled, well-calibrated notion of diversity for this task, but their MAP objective – pick a size-$k$ subset $S$ maximizing $\logdet(L_S)$ – is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \operatorname{DPP}-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a Nonlinear Eigenvalue Problem with eigenvector dependency (\operatorname{NEP}v) of a previously unstudied form. This \operatorname{NEP}v\ admits a self-consistent field (\operatorname{SCF}) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \OurMethod, requires only matrix-vector products with the kernel and runs in time $O\!\big((ndk+nk^2)\,t\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

23.
arXiv (CS.CV) 2026-06-16

MoECa: Aligning Feature Reuse with Expert Decomposition in Diffusion Transformers

Diffusion Transformers with Mixture-of-Experts (DiT-MoE) improve model capacity under sparse activation, but diffusion inference is still bottlenecked by redundant computation across timesteps. Existing caching methods mainly operate at the token level, which becomes suboptimal in DiT-MoE because each token update is internally decomposed into multiple routed expert branches. Our analysis shows that cross-timestep redundancy in DiT-MoE is better characterized at the expert-branch level than at the whole-token level. Based on this observation, we propose MoECa, a fine-grained caching framework that performs branch-level feature reuse across timesteps. MoECa further introduces expert-aware adaptive control and synchronized cache updates across MoE and attention paths to maintain stable intermediate states. Experiments on multiple DiT-MoE models show that MoECa consistently achieves a better speed-quality trade-off than prior caching methods, with up to 2.83$\times$ inference speedup and minimal quality degradation.

24.
arXiv (CS.CV) 2026-06-16

An Extensive Benchmark for Single-round and Multi-round Instruction-based Image Editing

In recent years, there have been notable advancements in the area of instruction-based image editing (IIE), which focuses on the automatic alteration of input images using a model. Nevertheless, assessing the effectiveness of these editing models poses a considerable challenge due to the intricate nature of instructions and the wide variety of edits. To tackle this problem, one urgent task in this domain is the development of a robust evaluation framework that can precisely gauge the quality of editing outcomes and offer valuable benchmarks to guide future improvements. To address this challenge, we present a comprehensive evaluation benchmark named I2EBench2.0, designed for single-round and multi-round assessment of IIE models. I2EBench2.0 has four key features: 1) Evaluation Across Single and Multi-rounds: I2EBench2.0 simultaneously evaluates both single-round and multi-round instruction-based edits, assessing the precision and consistency of the edits. 2) Extensive Evaluation Criteria: I2EBench2.0 encompasses a broad range of criteria, evaluating both high-level and low-level aspects of each IIE model. Specifically, it incorporates 16 dimensions for single-round evaluations and 7 for multi-round evaluations. 3) Alignment with Human Judgment: To ensure our benchmark aligns with human evaluation, we conducted a comprehensive user study for each criterion. 4) Research-driven Insights: By analyzing the strengths and weaknesses of current IIE models across all 16 single-round and 7 multi-round dimensions, we provide critical insights aimed at directing future research in this area. We tested eight recently developed IIE models using I2EBench2.0 and derived academic insights through meticulous comparison and analysis. The related code, dataset, and images generated by all IIE models are available on GitHub: https://github.com/cocoshe/I2EBench.

25.
arXiv (CS.CL) 2026-06-11

The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

Large Language Models (LLMs) have achieved strong performance across natural language processing tasks, yet reliable reasoning remains an open challenge. Although modern LLMs show progress in structured inference, multi-step problem solving, and contextual understanding, their reasoning behavior is often inconsistent and sensitive to prompting strategies, task design, and model scale. This survey provides a systematic analysis of more than 300 recent papers from arXiv, Semantic Scholar, Google Scholar, Papers with Code, and the ACL Anthology to examine how reasoning capabilities emerge in LLMs and where they fail. We make three main contributions. First, we introduce a structured taxonomy of LLM reasoning research, covering Chain-of-Thought reasoning, multi-hop reasoning, mathematical reasoning, common sense reasoning, visual and temporal reasoning, code and algorithmic reasoning, retrieval-augmented reasoning, tool-augmented and agentic reasoning, and reinforcement learning-based reasoning. Second, we analyze methodological trends across these paradigms, including prompting methods, model architectures, training objectives, reward modeling, and evaluation benchmarks. Third, we synthesize recurring limitations and failure modes, such as reasoning hallucinations, brittle multi-step inference, weak causal abstraction, and poor cross-domain generalization. By organizing a rapidly expanding literature, this survey offers a unified view of the current capabilities and limitations of reasoning in LLMs. We also identify emerging research directions, including meta-reasoning, self-evolving reasoning frameworks, multimodal reasoning, and socially grounded reasoning. Overall, this work aims to serve as a reference for developing more robust, interpretable, and generalizable reasoning systems in future language models.