Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-19

Too long; didn't solve

arXiv:2604.07593v2 Announce Type: replace Abstract: Mathematical benchmarks consisting of a range of mathematics problems are widely used to evaluate the reasoning abilities of large language models, yet little is known about how their structural properties influence model behaviour. In this work, we investigate two structural length variables, prompt length and solution length, and analyse how they relate to model performance on a newly constructed adversarial dataset of expert-authored mathematics problems. We find that both prompt and solution lengths correlate positively with increased model failure across models. We also include a secondary, exploratory analysis of cross-model disagreement. Under a difficulty-adjusted normalised analysis, both variables retain weak negative associations with realised model separation, slightly stronger for prompt length. Overall, our main robust finding is that structural length is linked to empirical difficulty in this dataset.

02.
arXiv (CS.CL) 2026-06-24

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

Discrete diffusion language model (DLM) fine-tuning inherits inexpensive diagnostics from denoising-time confidence monitors, but their PEFT-training meaning is untested. We test top-1 argmax concentration as a collapse warning. Across 816 LoRA/PEFT configurations from three DLM families, the warning fires for every configuration while logs record 0/816 actual collapses at the 200 step horizon, giving zero precision. The cause is pre-equilibrium saturation: top-1 concentration is already high before optimization and quickly becomes insensitive to final training stability. We then evaluate max LoRA gradient norm, a parameter-side signal that samples gradient routing rather than token concentration. On a pooled held-out LLaDA-family split, a train-optimized threshold identifies top-decile final-loss configurations with precision 0.68 and F1=0.79, above the all-positive top-1 baseline even at the lower split-bootstrap confidence bound. Autoregressive controls and cross-family threshold failures bound the result to short-horizon DLM-LoRA inspection rather than a universal collapse detector. Workflow: drop top-1 as a PEFT alarm, log max-gradient early in training, and calibrate thresholds per DLM family before routing runs for inspection.

03.
arXiv (CS.AI) 2026-06-17

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

arXiv:2606.17070v1 Announce Type: cross Abstract: Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring computational efficiency. We present Koopman Fourier Time-Differentiable (KFTD) Network, a time continuous twostage paradigm that decouples interpolation from prediction to achieve efficient and scalable spatiotemporal modeling. We map complex nonlinear dynamics into the Koopman linear space and exploit Fourier analysis to enable continuous time interpolation at arbitrary sub-steps. A lightweight residual network consumes the high fidelity intermediate states to yield the final forecast. Unlike diffusion models, KFTD eliminates multi step noise sampling and directly evolves the system in continuous time, yielding a 4 computational speedup. We further introduce a DPP Loss that supports arbitrary PDE constraints in an endtoend manner, breaking the physical consistency bottleneck of pure data-driven approaches. Empirical results on four ocean datasets confirm that our continuous time framework reduces MSE by an average of 5.6% (up to 12.7% for SST) and improves efficiency over MCVD by 76.25%.

04.
arXiv (CS.CV) 2026-06-24

ForensicsTok: Forensics-Guided Tokenized Modeling for Image Tampering Localization

Multi-modal Large Language Models (MLLMs) offer powerful reasoning for forensic tasks, yet existing approaches utilizing exogenous segmentation decoders often suffer from suboptimal localization. The reliance on stitched pipelines introduces information bottlenecks during backpropagation, which dilutes spatial signals and is limited by semantic priors of the segmentor. To address these limitations, we propose ForensicsTok, which reformulates image manipulation localization as an autoregressive sequence generation task. ForensicsTok directly generates spatially grounded token sequences, enabling precise mask prediction without intermediary supervision. Specifically, we introduce a Token Splatting Decoder (TSD) to map tokens to binary masks via codebook-aware code smoothing, which mitigates sharp gradients from deterministic detokenizers. Furthermore, to capture diverse tampering clues, we propose a Hierarchical Expert Fusion (HEF) module that injects multi-scale features from a forensic expert model. This unified architecture effectively compensates for the lack of forensic priors in standard MLLMs. Extensive experiments on six benchmarks show that ForensicsTok substantially improves over existing MLLM-based baselines and slightly improves over strong forensic expert baselines, while exhibiting stronger robustness to perturbations.

05.
arXiv (CS.AI) 2026-06-24

Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering

arXiv:2505.17338v3 Announce Type: replace-cross Abstract: Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approaches such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require prohibitive per-scan optimization (hours for NeRF, about 30 minutes for 3DGS), making them impractical in clinical settings. We propose Render-FM, a feedforward model that eliminates this bottleneck by directly regressing 6D Gaussian Splatting (6DGS) parameters from a CT volume in a single 2.8-second forward pass, a 500x speedup over per-scan optimization. To bridge the domain gap between natural scene reconstruction and medical volumetric rendering, we introduce Anatomy-Guided Priming (AGP), which incorporates segmentation masks and transfer functions as structural and appearance priors, information that existing Gaussian splatting methods overlook. Built on an nnU-Net-inspired 3D U-Net trained on diverse CT scans, Render-FM predicts per-voxel 6DGS parameters and supports immediate real-time rendering. Unlike per-scan methods, it generalizes to unseen anatomies, novel transfer functions, and enables compositional organ visualization with zero additional preparation time. Optional 89-second fine-tuning further improves quality, surpassing per-scan optimized baselines. Project page: https://gaozhongpai.github.io/renderfm/.

06.
arXiv (CS.CL) 2026-06-12

Unraveling Syntax: Language Modeling and the Substructure of Grammars

While language models achieve impressive results, their learning dynamics are far from understood. Many domains of interest – such as natural language syntax, coding languages, arithmetic – are captured by context-free grammars (CFGs). In this work, we extend prior work on neural language modeling of CFGs in a novel direction: how language modeling behaves with respect to CFG substructure, namely subgrammars. We define subgrammars, and prove a set of fundamental theorems connecting language modeling and subgrammars. We show that language modeling loss recurses linearly over its top-level subgrammars; applied recursively, the loss decomposes into losses for "irreducible" subgrammars. Under additional assumptions, and empirically, parametrized models learn subgrammars in parallel, unlike children who first master simple substructures. We find that subgrammar pretraining can improve final performance, but only for tiny models relative to the grammar, while alignment analyses show that pretraining consistently leads to internal representations that better reflect the grammar's substructure.

07.
arXiv (math.PR) 2026-06-11

On multidimensional infinite dihedral group extensions of Gibbs Markov maps

arXiv:2601.08961v2 Announce Type: replace-cross Abstract: We obtain a local central limit theorem for cocycles associated with a class of non abelian and non compact group extensions of Gibbs Markov maps. This class consists of multidimensional infinite dihedral groups. Unlike in the set up of the random walks on groups, we cannot use the convolution of measures on the group and instead we resort to an approach based on irreducible representations. Depending on the dimension of the group, we obtain either mixing, and thus ergodicity, or dissipativity. Also, we obtain the asymptotics of the first return time of the group extension to the origin.

08.
arXiv (CS.AI) 2026-06-24

Legal Reasoning Is Not Lawyering: Rethinking Legal Benchmarks for Pro Se Access to Justice

arXiv:2606.23716v1 Announce Type: cross Abstract: Legal AI benchmark research frequently invokes the assumption that large language models can improve access to justice, including for people who cannot access lawyers in order to understand and exercise their legal rights. We argue that current benchmarks are not equipped to support this assumption because they evaluate legal reasoning over inputs that have already been preprocessed by legal experts, which measures the upper bound of model performance. Access to justice depends on a lower bound: how models perform when inputs come from pro se litigants, whose prompts may contain noisy narratives, buried facts, omissions, folk-legal assumptions, and surface-level errors. These degradations are comparable to conditions under which LLMs are known to degrade in the general machine learning literature, including long-context sensitivity, underspecification, hallucination, and typographical perturbations. We connect evidence from pro se literature with this body of machine learning research and present a small perturbation experiment on LEXam, a legal benchmark, to illustrate the gap between these two bounds. If model development continues to focus on benchmarks that measure only the upper bound, this gap may remain hidden or even widen. We conclude by calling for legal benchmarks that directly measure robustness under pro se-like inputs so that access-to-justice claims about legal AI can become empirically testable.

09.
arXiv (CS.CL) 2026-06-11

Multi-Agent Reasoning with Adaptive Worker Allocation for Stance Detection

Stance detection requires identifying an author's position toward a target, often from short-form texts where stance is implicit, indirect, or rhetorically framed. Although large language models (LLMs) achieve strong performance on this task, single-pass prompting can be brittle when multiple interpretations are plausible. Existing aggregation strategies, such as majority voting or self-consistency, improve robustness by combining labels, but they discard the intermediate reasoning needed to resolve conflicting interpretations. We introduce a multi-agent reasoning framework with adaptive worker allocation for stance detection that shifts aggregation from label-level voting to reasoning-level synthesis. The framework employs a Manager-Worker architecture in which a Manager adaptively allocates a variable number of Worker agents based on input complexity. Each Worker analyzes the input from a distinct perspective and produces a reasoning-only explanation without emitting a stance label; the Manager then synthesizes these explanations to produce the final prediction. We evaluate the proposed framework on SemEval-2016, P-Stance, and COVID-19 Stance using Llama, Mistral, and Gemini. Results show that the framework yields the largest gains on implicit and context-dependent stance cases, achieving 86.07 Macro-F1 on COVID-19 and 82.90 on SemEval-2016, while remaining competitive on more explicit stance datasets such as P-Stance. These findings suggest that adaptive reasoning-level aggregation is most beneficial when stance cannot be reliably inferred from surface cues alone.

10.
arXiv (quant-ph) 2026-06-12

Proper and improper mixed states serve as different prior beliefs for quantum state retrodiction

arXiv:2502.10030v2 Announce Type: replace Abstract: A mixed quantum state can be taken as capturing an unspecified form of ignorance; or as describing the lack of knowledge about the true pure state of the system ("proper mixture"); or as arising from entanglement with another system that has been disregarded ("improper mixture"). These different views yield identical density matrices and therefore identical predictions for future measurements. But when used as prior beliefs for inferring the past state from later observations ("retrodiction"), they lead to different updated beliefs. This is a purely quantum feature of Bayesian agency. Based on this observation, we establish a framework for retrodicting on any quantum belief and we prove a necessary and sufficient condition for the equivalence of beliefs. We also illustrate how these differences have operational consequences in quantum state recovery.

11.
arXiv (quant-ph) 2026-06-16

Bath memory as a precision resource in quantum transport

arXiv:2606.17026v1 Announce Type: new Abstract: Structured baths can reshape transport fluctuations in mesoscopic quantum devices, yet a predictive criterion for when this enhances precision has been lacking. We propose a route towards such precision advantages by utilizing bath memory in coherent fermionic transport through a noninteracting quantum-dot chain. Using the Landauer-Büttiker formalism, we derive a dual impedance-matching condition that synchronizes the conductor mode splitting, boundary dissipation, and bath bandwidth, and sustains constructive multimode interference across the transmission window. The analytical predictions for the optimal bath bandwidths show excellent agreement with exact nonequilibrium Green's function calculations of the transport for Lorentzian, Gaussian, and Newns spectral densities. The prescription yields an optimal bath bandwidth at which the current Fano factor is minimized and the thermodynamic and kinetic precision coefficients are simultaneously enhanced beyond their Markovian limits. The alignment of the optimal precision regime with the experimentally accessible current Fano factor minimum thus provides a practical strategy for designing precision-enhanced transport in mesoscopic platforms such as semiconductor quantum-dot arrays and ultracold fermionic channels.

12.
arXiv (CS.LG) 2026-06-19

Influence-Guided Concolic Testing of Transformer Robustness

arXiv:2509.23806v2 Announce Type: replace-cross Abstract: Concolic testing for neural networks alternates concrete execution with constraint solving to search for inputs that flip model decisions. We present a concolic tester for Transformer classifiers that uses SHAP estimates to rank pending path predicates by their impact on the current prediction. To support self-attention with multiple heads in execution backed by SMT solving, we implement attention semantics in pure Python that are compatible with the solver and make the softmax boundary explicit by concretizing exponentiation arguments. We evaluate our method on CIFAR-10 across three compact Transformer classifiers, ResNet18, and VGG16 under a one-pixel budget and a 900s horizon. Across the 500 model–input pairs in this matched comparison, our method achieves 60% success, compared with 15% for a differential evolution baseline that treats the model as a black box. In the primary two-layer Transformer branch-ordering study, SHAP-based predicate prioritization raises success from 56% to 60% and reduces median attack time by 51%. These results show that influence-guided path exploration can make concolic testing a practical way to find adversarial examples in Transformer models.

13.
arXiv (CS.AI) 2026-06-11

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

arXiv:2606.11559v1 Announce Type: new Abstract: Reinforcement learning typically improves multi-turn agent capabilities through the terminal outcome of the trajectories, which makes it difficult to determine credit assignments for each intermediate turns. Recent on-policy self-distillation methods offer a promising alternative by converting privileged feedback into dense token-level supervision through a self-teacher. Our study is motivated by the unexpected performance degradation observed when naively extending this paradigm to multi-turn settings, which we attribute to a lack of alignment between privileged feedback, such as successful trajectories or terminal outcomes, and the student's current decision context. We introduce HERO, a hindsight-enhanced self-distillation framework that uses next environment observations as locally aligned feedback. After each rollout, HERO reflects on the completed interaction to convert each observation into a compact turn-level diagnosis, that captures actionable feedback about the original action such as its necessity, validity or failure cause. On TauBench and WebShop, HERO improves task success and reduces unnecessary turns over environment-feedback-only self-distillation and GRPO. It is especially effective under limited training turn budgets, where successful rollouts are rare and GRPO provides weak reward-contrast signals.

14.
arXiv (CS.CV) 2026-06-15

SAFformer:Improving Spiking Transformer via Active Predictive Filtering

Spiking Neural Networks (SNNs) offer notable advantages in biological plausibility and energy efficiency, making them promising candidates for building low-power Transformers. However, existing Spiking Transformers largely adhere to a passive reactive paradigm, which struggles to focus on task-relevant information and incurs substantial computational overhead when processing redundant visual data. To overcome this fundamental yet underexplored limitation, we propose SAFformer, a novel Spiking Transformer architecture based on an active predictive filtering paradigm. Inspired by the brain's predictive coding mechanism, SAFformer actively suppresses predictable signals and focuses on salient visual features. Extensive experiments show that SAFformer establishes new state-of-the-art performance on CIFAR-10/100 and CIFAR10-DVS. Remarkably, on ImageNet-1K, it achieves 80.44% Top-1 accuracy with only 26.58M parameters and an energy consumption of 5.88 mJ, demonstrating an exceptional balance between accuracy and efficiency.

15.
arXiv (CS.CV) 2026-06-17

Geometry-Consistent Endoscopic Representations for Image-Guided Navigation via Structured Foundation Model Adaptation

Accurate vision-based navigation in monocular endoscopy is difficult due to limited depth cues, weak tissue texture, non-rigid deformation, and substantial appearance variation across domains, all of which complicate pose estimation, depth prediction, and image-to-anatomy alignment. Although recent vision foundation models have shown promise, their learned representations often remain insufficiently geometry-consistent, hindering stable feature correspondence and limiting their reliability for downstream navigation tasks. We propose a unified framework for learning geometry-consistent and domain-robust image representations for monocular endoscopy. The framework combines a synthetic data pipeline that provides accurate geometric supervision with Hierarchy-Aware Geometry-Semantic Adaptation, a structured alternative to standard LoRA that inserts low-rank adapters selectively across the transformer hierarchy and couples them with layer-wise training objectives to encourage geometric correspondence in intermediate features and semantic consistency in deeper features. Experiments on public and proprietary datasets show improved geometric and semantic representation quality, leading to better performance on downstream navigation tasks including pose estimation and monocular depth estimation. The learned representations show favorable synthetic-to-real transfer on clinical bronchoscopy and provide a useful initialization for adaptation to sinus endoscopy and colonoscopy under limited supervision. The framework also shows favorable scaling with model size and training data. These results support hierarchy-aware, geometry-guided adaptation as a practical approach for endoscopic representation learning.

16.
arXiv (CS.AI) 2026-06-16

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

arXiv:2604.13085v2 Announce Type: replace-cross Abstract: Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolidation in continual reinforcement learning. AMC is conceptually inspired by the qualitative structure of synaptic tagging and capture (STC) theory, the idea that memories transition through discrete stability phases, but makes no claim to model the underlying molecular or synaptic mechanisms. AMC models memory as a continuous crystallization process in which experiences migrate from plastic to stable states according to a multi-objective utility signal. The framework introduces a three-phase memory hierarchy (Liquid–Glass–Crystal) governed by an Itô stochastic differential equation (SDE) whose population-level behavior is captured by an explicit Fokker–Planck equation admitting a closed-form Beta stationary distribution. We provide proofs of: (i) well-posedness and global convergence of the crystallization SDE to a unique Beta stationary distribution; (ii) exponential convergence of individual crystallization states to their fixed points, with explicit rates and variance bounds; and (iii) end-to-end Q-learning error bounds and matching memory-capacity lower bounds that link SDE parameters directly to agent performance. Empirical evaluation on Meta-World MT50, Atari 20-game sequential learning, and MuJoCo continual locomotion consistently shows improvements in forward transfer (+34–43\% over the strongest baseline), reductions in catastrophic forgetting (67–80\%), and a 62\% decrease in memory footprint.

17.
arXiv (CS.LG) 2026-06-16

Towards Functional Correctness of Large Code Models with Selective Generation

arXiv:2505.13553v3 Announce Type: replace-cross Abstract: The hallucination of code generation models hinders their applicability to systems requiring higher safety standards. One critical bottleneck in addressing code hallucination is the difficulty of identifying the functional correctness of generated code, due to its unnatural form. We address this core bottleneck by automatically generating unit tests using dynamic code analysis tools, leveraging the executable nature of code. Accordingly, we propose a selective code generator that abstains from uncertain generations – based on the functional correctness evaluated by generated unit tests – to theoretically control the correctness among non-abstained answers, \ie the false discovery rate. Finally, we propose to use generated unit tests in evaluation as well as in learning for precise code evaluation, calling this paradigm FuzzEval. We demonstrate the efficacy of our method along with the controllability of code hallucination and reasonable selection efficiency.

18.
arXiv (CS.AI) 2026-06-11

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

arXiv:2606.11671v1 Announce Type: cross Abstract: Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19–20 out of 20 malicious skills across rounds.

19.
arXiv (CS.LG) 2026-06-18

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

arXiv:2606.18853v1 Announce Type: cross Abstract: A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

20.
medRxiv (Medicine) 2026-06-15

Socioeconomic inequalities in smoking prevalence and intensity in Germany: A repeated cross-sectional analysis from 1998 to 2024

Background: Smoking inequalities by socioeconomic status have widened consistently in Germany, but sex-specific trends after 2013 and inequalities in daily cigarette consumption among smokers (intensity) are unknown. We analyzed trends in absolute and relative socioeconomic inequalities in smoking prevalence and intensity among German adults across three decades. Methods: We used 14 waves (1998-2024) of population-representative cross-sectional data from the German Socio-Economic Panel to estimate sex-specific trends in smoking prevalence and intensity in adults aged 25-64. Inequalities were quantified across strata of education, occupation, and equivalized household income using the absolute and relative concentration index with 95% bootstrap confidence intervals. Results: Overall smoking prevalence declined from 35.05% (CI: [33.90%, 36.20%] in 1998 to 22.19% (CI: [21.15%, 23.24%]) in 2024, and mean intensity from 17.49 (CI: [17.09,17.90]) to 13.33 (CI: [12.88, 13.79]) cigarettes/day. Over this period sex-differences in both outcomes narrowed almost completely. Absolute and relative inequalities in smoking prevalence widened across all SES dimensions, particularly for education and occupation. By 2024, inequalities were larger among women than men driven by a stagnating or rising smoking prevalence among low-SES women at least until 2018 alongside continued declines in higher-SES women and for men. Inequalities in smoking intensity, particularly related to income, were generally smaller than those in prevalence. Conclusion: Socioeconomic smoking inequalities in Germany widened from 1998 to 2024 primarily driven by reductions among higher-SES groups and increases in low-SES women. However, recent reductions in low-SES women may indicate a new phase in the smoking epidemic. Health equity considerations should be integrated into a targeted German tobacco control strategy.

21.
arXiv (CS.AI) 2026-06-17

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

Authors:

arXiv:2502.17518v3 Announce Type: replace-cross Abstract: This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(\tau\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

22.
arXiv (CS.CV) 2026-06-16

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

Reward models play a pivotal role in reinforcement learning (RL) and multi-modal trajectory selection for autonomous driving. However, acquiring such rewards typically relies on hand-crafted rule-based objectives or perception ground truth, which hinders generalization for data-scaling. While Vision-Language Models (VLMs) have demonstrated feasibility as reward models in other domains, their effectiveness in driving tasks remains underexplored. In this work, we bridge this gap by (1) introducing DriveReward, a reasoning trajectory evaluation dataset rigorously labeled via temporally-grounded visual guidance, and augmented with counterfactual driving behaviors., (2) alongside a specialized Vision-Language Reward Model. To address the scarcity of failure cases in conventional datasets, we propose a counterfactual data annotation scheme to construct cases encompassing diverse driving styles and erroneous behaviors. Evaluations on our proposed benchmark reveal that even leading open-source and proprietary VLMs fail to excel across all tasks, highlighting significant room for improvement in existing models. Building on these findings, we subsequently tailor a specialized 1B reward model that outperforms larger VLMs on task-specific reward alignment. Finally, we validate our reward model's effectiveness by integrating it into RL finetuning and multi-modal trajectory scoring across multiple baselines, achieving performance comparable to rule-based reward calculations in both open-loop and closed-loop evaluation.

23.
arXiv (CS.CL) 2026-06-18

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1

25.
arXiv (CS.AI) 2026-06-12

AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude

Authors:

arXiv:2606.12424v1 Announce Type: cross Abstract: As generative AI and low-code workflow platforms become routine in software practice, a key educational question is whether the next generation of computer engineers will accept these tools as useful, usable, and worthy of sustained engagement. This paper reports a mixed-methods, cross-sectional study of undergraduate computer engineering students' acceptance of AI automation tooling, instantiated through the open-source platform n8n across three identically scripted workshops in Thailand (n = 103). A 12-item, five-point Likert instrument mapped to six TAM/UTAUT constructs - Performance Expectancy (PE), Effort Expectancy (EE), Behavioral Intention (BI), Self-Efficacy (SE), Hedonic Motivation (HM), and Output Quality (OQ) - was complemented by inductive thematic analysis of open-ended feedback. Analyses combined ordinal reliability estimation, bootstrap confidence intervals, non-parametric tests, multiple-comparison-controlled correlations, polychoric dimensionality diagnostics, a common-method-bias check, and between-session comparisons. Acceptance was favorable across all six constructs with large effect sizes, with PE emerging as the strongest construct and HM as the weakest. Dimensionality diagnostics further revealed that canonical TAM/UTAUT sub-facets collapsed into a single general acceptance factor in this short-form post-workshop context, a finding with important methodological and theoretical implications. Qualitative themes converged with the quantitative profile regarding usefulness and enthusiasm but diverged on output quality, revealing a small yet articulate reliability-skeptical minority. The findings support the curricular adoption of AI automation tooling in undergraduate computing education and identify three theory-grounded instructional levers: instruction-sequencing scaffolds, self-efficacy supports, and trust-calibration interventions.