Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

From Physics to Representation: Audio Learning with Synthetic Pre-training via Procedural Generation

arXiv:2606.14791v1 Announce Type: cross Abstract: Self-supervised learning advances audio representation for multimedia analysis. However, prevailing data-centric approaches rely on massive real-world corpora, increasing training costs, curation burdens, and privacy barriers. To address this, we present AudioPG, a procedural synthesis framework eliminating real audio recordings during pre-training. AudioPG trains a Transformer-based masked autoencoder on waveforms generated on-the-fly from basic acoustic primitives and composition rules. The encoder transfers effectively to real audio benchmarks, achieving 90.60% accuracy on ESC-50, 0.546 mAP on FSD50K, 88.17% on UrbanSound8K, and 97.03% on Speech Commands V2. Notably, pre-training completes in under 20 minutes on a single GPU. Latent space analysis reveals physical factors, including fundamental frequency and relative intensity, emerge in orthogonal subspaces, making representations linearly decodable. These results establish procedural synthesis as an efficient, interpretable pre-training signal when large-scale corpora are unavailable. Our code is available at: https://github.com/Freyliu0516/audioPG.

02.
arXiv (quant-ph) 2026-06-12

Matrix phase-space representations for quantum symmetries

arXiv:2606.12769v1 Announce Type: new Abstract: We introduce a general phase-space representation that includes global quantum symmetries in the basis expansion. This method, called matrix phase-space, projects the basis onto a reduced Hilbert space, which can greatly reduce sampling errors of many-body quantum simulations and unifies several previous phase-space methods. The purpose of this paper is to provide detailed proofs of basic theorems and operator identities. We also treat several different types of symmetries. To illustrate the benefits of matrix phase-space methods, we give a detailed derivation of a recent application to the topical problem of verifying the outputs of Gaussian boson sampling (GBS) quantum computers with photon number resolving detectors. This has exponential complexity, and using parity symmetry reduces sampling errors by very large factors relative to earlier methods.

03.
arXiv (math.PR) 2026-06-16

Transposition Approach to Optimal Control of McKean-Vlasov SPDEs

arXiv:2603.06245v2 Announce Type: replace Abstract: In this paper, we investigate an optimal control problem for McKean-Vlasov stochastic partial differential equations, in which the coefficients depend on the law of the state process. For systems with nonconvex control sets, we establish a Pontryagin-type stochastic maximum principle that provides necessary optimality conditions for admissible controls. The analysis is based on the classical spike variation method together with the introduction of an adjoint backward stochastic partial differential equation involving Lions derivatives with respect to probability measures. Our results extend the stochastic maximum principle for McKean-Vlasov controlled stochastic differential equations to the infinite-dimensional SPDE setting.

04.
arXiv (CS.CV) 2026-06-16

Wavelength-Multiplexed 2D Beam Steering via a Passive Diffractive Network

We introduce a wavelength-addressable diffractive optical network that transforms illumination wavelength into a high-dimensional control parameter for arbitrarily programmable 2D beam steering. The proposed passive architecture comprises cascaded spatially optimized diffractive layers, jointly designed using deep learning, to rapidly map distinct wavelengths to predefined/desired output angles. Unlike conventional single-layer dispersive optical elements, which are physically restricted to 1D linear mapping, this framework harnesses complex wavefront transformations to utilize the illumination wavelength as an intrinsic addressing key for arbitrary 2D beam steering, eliminating the need for mechanical scanning or electronic phase control. We numerically demonstrate wavelength-controlled beam steering across 625 wavelength channels spanning 400-750 nm, realizing a 25 x 25 array of independently addressable beam positions with subwavelength positioning accuracy and high channel fidelity. Unlike conventional gratings, which constrain wavelength routing to a linear trajectory, the proposed diffractive network performs nonlocal wavefront transformations, enabling arbitrary wavelength-to-angle mappings across a 2D field of view. We further validate the proposed framework experimentally in both the terahertz and visible spectral regimes, demonstrating wavelength-multiplexed beam steering using 3D fabricated passive diffractive layers at terahertz frequencies and phase-only spatial light modulators in the visible spectrum. This wavelength-addressable diffractive architecture establishes a compact and scalable paradigm for high-speed programmable beam steering, with potential applications in optical communications, routing, imaging, sensing, and emerging photonic information-processing systems.

05.
arXiv (CS.LG) 2026-06-19

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

arXiv:2505.18201v2 Announce Type: replace-cross Abstract: Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

06.
arXiv (CS.LG) 2026-06-16

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

arXiv:2606.17043v1 Announce Type: cross Abstract: When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse outcome to a single scalar reward or advantage signal, which conflates distinct forms of transition-level feedback and provides limited guidance once basic task success becomes achievable. First, a single scalar signal conflates the two objectives of viability and efficiency; once basic success is achieved, the binary label provides no gradient to distinguish efficient completions from slow ones. Second, real-world rollouts mix autonomous and intervention segments; naively assigning episode outcomes across these boundaries introduces incorrect credit assignment. To address these issues, we propose Hierarchical Advantage-Weighted Behavior Cloning (HABC), which trains separate critic heads for these two objectives on different data subsets and combines their outputs with a state-adaptive balance. A state-adaptive gate $g_t$ merges their one-step advantages, prioritizing viability when success is uncertain and shifting to efficiency only when viability is high, and converts the result into per-transition weights on the actor loss. Intervention-aware credit assignment further restricts outcome labels to segments executed by the current policy, preventing supervision from leaking across intervention boundaries. In real-robot experiments on three contact-rich bimanual tasks, HABC raises success from supervised fine-tuning (SFT) baselines of 36%, 44%, and 12% to 92%, 88%, and 38%.

07.
arXiv (CS.CV) 2026-06-17

Spatio-Temporal Fusion Model for Standard View Classification of Echocardiographic Videos

Automated classification of standard echocardiographic views is crucial for efficient clinical workflow but faces three main challenges. First, publicly available datasets are scarce and limited in scale and view coverage. Second, the performance of some modern video-level architectures for echocardiographic view classification remains underexplored. Third, some view categories exhibit highly similar spatial appearances, making single-frame features insufficient for discrimination, while heterogeneous frame quality complicates robust temporal information fusion. To address these challenges, we release the Echocardiographic Videos of Nine Views (EV9V) dataset, comprising 5,138 videos, 910,579 frames, and 9 standard views, which is, to the best of our knowledge, the largest publicly available echocardiography video dataset. Using EV9V, we systematically benchmark representative video classification architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. Furthermore, we propose a Spatio-Temporal Fusion Model (STFM), an efficient dual-stream CNN-LSTM (Long Short-Term Memory) framework that jointly captures spatial anatomical structures and temporal cardiac dynamics. The proposed framework leverages uncertainty-aware learning to preferentially sample representative video segments during training and evidence-based fusion during inference, improving robustness to variations in frame quality across echocardiographic videos. Extensive experiments demonstrate that our method achieves competitive performance across diverse video classification models, validating the effectiveness of uncertainty-aware spatio-temporal learning for echocardiographic view classification. The code is available at https://github.com/bgx666/stfm.

08.
arXiv (CS.LG) 2026-06-12

Computationally tractable robust differentially private mean estimation

作者:

arXiv:2606.12654v1 Announce Type: cross Abstract: We develop a new, differentially private mean estimator called the balloon mean. The main features of the balloon mean are that it is computationally tractable and enjoys robustness to outlying observations. It is based on an iterative clipping procedure over expanding Mahalanobis balls, or ``balloons.'' The method satisfies zero-concentrated differential privacy and depends on a small number of interpretable tuning parameters. We provide theoretical guarantees under heavy-tailed and contaminated elliptical models, characterizing its statistical performance and robustness to outliers. Extensive simulations demonstrate that the balloon mean is robust to heavy-tailed and contaminated data, and outperforms existing differentially private mean estimators in contaminated settings.

09.
arXiv (CS.LG) 2026-06-16

Sobolev Approximation by Fixed-Size Neural Networks with Arbitrary Accuracy

arXiv:2606.16975v1 Announce Type: cross Abstract: In this work, we investigate new activation functions for achieving arbitrary-accuracy Sobolev approximation by fixed-size neural networks. We first show that any function in $W^{2,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy, measured in the $W^{1,\infty}$-norm, by a fixed-size neural network using the Elementary Universal Activation Function ($\mathrm{EUAF}$). To extend this result to $W^{s,\infty}((a,b)^d)$ for $s\in\mathbb{N}$, we introduce a smooth activation $\mathrm{DUAF}_{\infty}$ from the family of Differentiable Universal Activation Functions ($\mathrm{DUAF}_n$). We prove that any function in $W^{s,\infty}((a,b)^d)$ can be approximated with arbitrary accuracy in the $W^{s-1,\infty}$-norm by a fixed-size $\mathrm{DUAF}_{\infty}$-activated network. We further construct sigmoidal variants $\widetilde{\mathrm{DUAF}}_n$ and show that, for every $1\leq s\leq n$, fixed-size $\widetilde{\mathrm{DUAF}}_n$-activated networks still approximate any $f\in W^{s,\infty}((a,b)^d)$ with arbitrary accuracy in the $W^{s-1,\infty}$-norm. In all these results, the width and depth bounds are computed explicitly, and the proposed activations are elementary.

10.
arXiv (CS.CL) 2026-06-12

KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty

Math reasoning benchmarks have proliferated, yet most lack a per-item difficulty signal grounded in actual human performance. We introduce KCSAT-ML, a decade (2014-2025) of Korean College Scholastic Ability Test (KCSAT; Suneung) mathematics: 664 problems with a 339-item core set carrying official per-item error rates from nationwide cohorts of hundreds of thousands of examinees. We pair the benchmark with Difficulty-aligned Reasoning Gain (DRG): a score-orthogonal metric that asks whether a model's mistakes concentrate on the items humans found hard, or on items humans found easy. Together they expose, across a wide range of VLMs (and LLMs via OCR), three patterns: (i) low-budget accuracy collapses on the high-human-error tail at every model size; (ii) test-time scaling (TTS) raises token use roughly linearly with cohort error rate, while accuracy gains follow a non-monotonic curve; (iii) within a single family, TTS flips between anti-scaling on the hardest items and overthinking on easier ones – two faces of the same alignment failure. On DRG, models with near-identical accuracy can sit at near-opposite values: one model gets wrong what humans also find hard, while another solves the hardest items yet fails on items humans find easy – a contrast that aggregate accuracy hides. Our code and dataset builder will be open-sourced at https://github.com/naver-ai/KCSAT-ML.

11.
arXiv (CS.AI) 2026-06-12

Strategic Decision Support for AI Agents

arXiv:2606.12587v1 Announce Type: new Abstract: Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes support mechanisms around them. This role reversal brings reliability concerns to the forefront, since agentic errors can be consequential and agent behavior must remain aligned with human goals and constraints. Departing from the classical view of decision support, we revisit its two basic principles, the cost–value tradeoff of seeking support and the role of uncertainty quantification, in a setting where AI agents are the central actors. We propose a framework for strategic decision support for AI agents through an optimization problem that minimizes support usage subject to controlling a counterfactual missed-support error: the probability that the agent acts alone on instances where support would have materially improved its output. At the population level, we show that the optimal policy is a threshold rule on the value of support. Building on this structure, we develop an online algorithm that adaptively thresholds such a score and uses randomized exploration to control missed-support error without distributional assumptions. We further introduce a calibration-on-the-fly method that reduces unnecessary support calls online. We instantiate this framework across diverse scenarios, including information gathering, human–AI collaboration, and tool use, showing how each can be modeled through the same strategic decision-support lens. Experiments across these settings show that our method reliably controls the target error while substantially reducing support usage in practice.

12.
arXiv (CS.CV) 2026-06-11

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding

Reward models for text-to-video (T2V) generation guide post-training but often fail at fine-grained semantic alignment. We trace this to two structural weaknesses in existing reasoning-based reward models: they do not systematically verify every condition described in the prompt, and the visual evidence supporting each judgment remains implicit in their free-form reasoning. We propose SG-PVR, a video reward model that addresses these limitations through plan-and-verify reasoning grounded in spatio-temporal scene graphs. The verification plan decomposes the prompt into atomic claims, ensuring every requirement is checked. The spatio-temporal scene graph, encoding entities, attributes, and temporally-grounded relations, is extracted from the video and maintained as a persistent structured visual reference throughout reasoning. Each claim is verified against both the video and the scene graph, anchoring judgments in explicit visual evidence. SG-PVR achieves strong performance on semantic alignment, including fine-grained temporal semantics. As a test-time reranker, it further enhances compositional alignment in T2V generation.

13.
arXiv (CS.LG) 2026-06-11

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

arXiv:2606.12372v1 Announce Type: cross Abstract: Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL-RL frameworks remain intervention-intensive, relying on frequent human corrections to redirect the policy out of unproductive exploration, which incurs high labor cost and limits real-world scalability. To address this, we propose UniIntervene, an agentic intervention model that detects unproductive exploration and autonomously recovers the policy toward high-value states, taking over the bulk of interventions from human operators. Specifically, UniIntervene first performs future-conditioned action-value estimation, predicting the latent consequence of the current action and evaluating its induced value, which provides a more stable progress signal. Building on this, a temporal value-risk critic aggregates recent value dynamics and triggers intervention when the estimated value exhibits sustained stagnation or degradation. When intervention is required, UniIntervene retrieves a high-value recovery target from a memory of past intervention episodes and produces executable corrective actions through a goal-conditioned recovery policy. In this way, UniIntervene turns intervention from passive human correction into a value-aware recovery process for efficient real-world RL. Extensive experiments on diverse real-world manipulation tasks demonstrate that UniIntervene improves the average success rate by 8.6% while reducing human interventions by 57% relative to state-of-the-art HiL-RL baselines.

14.
arXiv (CS.CL) 2026-06-24

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black-box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees for robustness to data poisoning attacks and extraction attacks, while ensuring privacy by design. In experiments with LLMs, we demonstrate empirically that black-box optimization methods-despite the scalability and computational challenges inherent to black-box approaches-are able to learn, showing how a few iterations of BBoxER improve performance, generalize well on a benchmark of reasoning datasets, and are robust to membership inference attacks. This positions BBoxER as an attractive add-on on top of gradient-based optimization, offering suitability for deployment in restricted environments while also providing non-vacuous generalization guarantees.

15.
arXiv (CS.CL) 2026-06-12

Can Factual Opinions Be Edited (Manipulated) in Large Language Models?

Large Language Models (LLMs) are increasingly integrated into various domains, making knowledge editing techniques crucial yet potentially hazardous. Current editing methods primarily target atomic facts, overlooking the significant risks associated with manipulating factual opinions, e.g., documented stances of public figures on societal issues. Such manipulation could reshape public images, influence elections, and alter societal views. To systematically assess this threat, we introduce the Factual Opinion Editing with Evidence (FOE) benchmark, which encompasses 261 public figures, 19 issue categories, and 2,178 complete opinion records. Our evaluations demonstrate that current editing techniques struggle significantly with factual opinions, often achieving only superficial changes while failing to preserve consistency between the edited opinion and the supporting evidence generated by the model. To address this limitation, we further propose a simple yet effective Self-Generated Evidence-Aligned method that achieves opinion-evidence alignment without relying on explicit instructions. Together, our benchmark and method provide a foundation for understanding the emerging security implications of factual opinion editing in LLMs.

16.
arXiv (CS.CL) 2026-06-25

How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

作者:

Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safety classifier trained for the task, or a general chat model prompted to grade. The judge is rarely checked. We check it. Using 596 human-labeled completions from the HarmBench classifier validation set, we compare the two judge families against human majority votes and then attack them. The two families fail in opposite ways. The dedicated classifier over-flags (precision 0.835, recall 0.974); three different LLM-as-judges keep high precision (0.81 to 0.94) but show erratic recall (0.06 to 0.65), so the same responses produce very different ASR depending on which judge scores them. The two families also differ sharply in robustness. Wrappers that leave the harmful text untouched and only add benign framing flip every LLM-judge between 57% and 100% of the time, and a single prepended refusal sentence accounts for much of this (39% to 88%). The dedicated classifier resists these surface attacks (at most 6.7%), but a white-box GCG attack on its open weights flips 70% of confident true positives (21 of 30; 95% CI 54 to 86%) even at a small optimization budget. A two-annotator audit confirms the attacks leave the harm intact: every one of 80 sampled flips still contained the harmful content. Because a large and growing share of reported ASR comes from LLM-judges, many such numbers are unreliable both on average and under deliberate pressure. We recommend that papers report judge precision and recall on a human-labeled slice, report ASR corrected for judge precision, and include an adversarial check of the judge. Our code is released.

17.
medRxiv (Medicine) 2026-06-24

Cognitive-emotional responses to ultrasonic neuromodulation of anterior cingulate cortex

The anterior cingulate cortex (ACC) is a key brain center involved in cognitive and emotional processing that is implicated in a variety of neuropsychiatric disorders including chronic pain and depression. Circuit-targeted diagnosis and treatment of these disorders will require the capacity to precisely modulate ACC subregions. Toward that end, we recently developed and validated a novel low-intensity transcranial focused ultrasound device that can noninvasively and directly modulate ACC subdivisions in humans with millimeter precision. Here we describe the subjective reports of 36 individuals diagnosed with either chronic pain or major depression who received repeated brief stimulation trials (807 active, 797 sham; duration 30s-3min) spanning the dorsoventral extent of the ACC. Sonication immediately altered cognitive-emotional states (odds ratio 5.6, active versus sham), eliciting a positive-valence experience more often than negative (29% versus 8%) in both diagnostic groups. Sham-adjusted response rate varied across ACC targets, with the largest effects (Cohen's d ~ 0.8) observed in pregenual and subgenual ACC in subjects with chronic pain and depression, respectively. These rapid trial-by-trial responses to ACC stimulation predicted subsequent improvements in pain and depression severity at 24 hours. Collectively, these findings reveal that transcranial ultrasound can robustly evoke immediate, target-specific, clinically meaningful changes in cognitive-emotional state, demonstrating the potential of ultrasonic neuromodulation as a tool for individualized probing of circuit function and dysfunction.

18.
bioRxiv (Bioinfo) 2026-06-18

Deciphering shared and divergent tissue architectures from cross-species spatial transcriptomics

作者:

The integration of spatial transcriptomics (ST) data across species is essential for cross-species and translational studies, but remains challenging due to molecular divergence and anatomical differences between organisms. We present STACAME, a graph attention autoencoder-based framework to decipher shared and divergent tissue architectures from cross-species ST data by explicitly modeling both orthologous and species-specific genes. STACAME aligns ST slices in a spatially aware manner, identifies homologous and species-specific domains, and enables a suite of downstream comparative analyses. We demonstrate its utility by integrating ST datasets from diverse tissues, including hippocampus, isocortex, embryo, breast, liver, and cerebellum, across multiple species such as human, macaque, marmoset, mouse, and zebrafish. STACAME supports cross-species spatial domain alignment, the detection of shared and divergent spatially variable genes, development alignment and comparison, and the 3D integration of tissue architecture. This flexible approach facilitates the translation of findings from model organisms to humans, providing a unified computational platform for cross-species spatial transcriptomics.

19.
arXiv (quant-ph) 2026-06-25

Halo-Independent Quantum Sensor Probes of Low-Velocity Dark Matter

arXiv:2606.25129v1 Announce Type: cross Abstract: We present a halo-independent framework for sub-GeV dark matter (DM) direct detection using quantum sensors with sub-eV energy thresholds. Such detectors enable access to low DM velocities and may be sensitive to departures from the Standard Halo Model that are challenging to probe with conventional direct DM detection experiments. The method expresses the DM scattering event rate in terms of a detector and particle model-dependent response function, and a universal halo function common to all experiments to be determined from data. This allows the local DM velocity distribution to be constrained. As representative implementations, we consider TES (Al) and MKID (TiN)-like sensors and show that their differing material responses probe complementary regimes of the DM velocity distribution. Applying the framework to mock data derived from several benchmark local halo models, we demonstrate how the assumed halo function could be reconstructed. This framework demonstrates the potential of quantum sensors as a new avenue for mapping the local DM velocity distribution.

20.
arXiv (CS.AI) 2026-06-24

Structural Kolmogorov-Arnold Convolutions: Learnable Function on the Values or the Filter Shape as Parameter-Efficient Alternative to Per-Edge Convolutional KANs

arXiv:2606.24371v1 Announce Type: cross Abstract: Convolutional Kolmogorov–Arnold Networks (KANs) replace the fixed weights of a convolutional kernel with learnable univariate functions. The dominant formulation attaches one such function to every kernel entry and lets it act on pixel values, expressive but parameter-heavy and prone to overfitting. We argue that the learnable functions are better placed in the structure of the convolution than on each edge, and we organise the design space along a single axis: whether the function acts on the pixel values or on the filter shape. We study three realisations. SV-KAN applies one shared univariate function to the values and leaves the spatial filter free and static, aa classical convolution with a single learnable shared activation. AG-KAN keeps the shared value function but supplies the spatial structure through a content-adaptive Gaussian gate. RF-KAN instead moves the learnable functions onto the filter shape, building each filter from oriented ridge profiles expanded in a localised oscillatory (Morlet) wavelet basis with content-adaptive amplitudes. Under a matched four-layer protocol with in-run references and three seeds, RF-KAN and SV-KAN reach $88.47\pm0.10\%$ and $88.20\pm0.31\%$ on CIFAR-10 and $64.40\pm0.19\%$ and $64.57\pm0.30\%$ on CIFAR-100, at about $0.4$M parameters. At this matched scale the shape model and the simplest value model meet at the top, both above a plain convolution and every per-edge KAN we tested, including the official Gram variant, at roughly a fifth of the parameters. A controlled study attributes the RF-KAN gain to an intrinsically localised oscillatory basis and to content adaptivity, and an ablation that removes the learned shape entirely, leaving only the shared value function, collapses accuracy by over forty points, identifying the learned shape as the load-bearing ingredient at this scale.

21.
arXiv (CS.CV) 2026-06-11

XPR: An Extensible Cross-Platform Point-Based Differentiable Renderer

Point-based differentiable rendering underpins modern 3D reconstruction, novel-view synthesis, and learning-based graphics pipelines, but developing new rendering methods often requires extensive low-level implementation, hardware-specific kernels, and manually written backward passes. This limits rapid prototyping, reproducibility, exploration, and deployment, especially across diverse hardware platforms. This paper presents XPR, an extensible cross-platform framework for point-based differentiable rendering. XPR introduces a high-level programming interface that separates method-specific logic from the shared rendering pipeline, allowing users to implement new methods in a few lines of code. Its pipeline decomposes rendering into modular, statically shaped parallel operations that can be lowered by a cross-platform compiler to GPUs, TPUs, CPUs, and other ML accelerators. We demonstrate implementations of 3DGS, 3DGUT, and LinPrim, with only a few 100s lines of Python code, each of which can be compiled to a range of hardware platforms with the XLA compiler. These results show that XPR enables fast experimentation and portable execution for emerging point-based differentiable rendering systems.

22.
arXiv (CS.CV) 2026-06-17

Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent

The primary tools used to monitor and defend object detectors under adversarial attack assume that when accuracy degrades, detection count drops in tandem. This coupling was assumed, not measured. We report a counterexample observed on a single model: under standard PGD, EMS-YOLO, a spiking neural network (SNN) object detector, retains more than 70% of its detections while mAP collapses from 0.528 to 0.042. We term this count-preserving accuracy collapse Quality Corruption (QC), to distinguish it from the suppression that dominates untargeted evaluation. Across four SNN architectures and two threat models (l-infinity and l-2), QC appears only in one of the four detectors tested (EMS-YOLO). On this model, all five standard defense components fail to detect or mitigate QC, suggesting the defense ecosystem may rely on a shared assumption calibrated on a single substrate. These results provide, to our knowledge, the first evidence that adversarial failure modes can be substrate-dependent.

23.
arXiv (math.PR) 2026-06-25

Uniform Consistency of Generalized Fréchet Means

arXiv:2408.07534v2 Announce Type: replace-cross Abstract: Loss-based notions of centre on nonlinear spaces range from the Fréchet mean and power means to the geometric median and, in a limiting sense, the Chebyshev centre. To use such summaries statistically, one first needs a law of large numbers that remains valid beyond smooth manifolds and beyond a fixed choice of loss. We study generalized Fréchet means on metric spaces with the Heine–Borel property, obtained by replacing squared distance with a convex loss under a mild exponential-growth condition. We prove existence and compactness of the population mean set, establish a sharp diameter bound, obtain almost-sure consistency of empirical $\phi$-means, and derive a uniform strong law over compact classes of losses. The analysis is driven by a deterministic argmin principle together with a Glivenko–Cantelli theorem for monotone classes. For isotropic densities on Riemannian symmetric spaces, we identify the population $\phi$-mean for every strictly increasing loss for which the objective is finite, including bounded robust losses. We also illustrate the framework on spheres and on the polyhedral space of ultrametric phylogenetic trees.

24.
arXiv (CS.AI) 2026-06-24

Context-Aware Prediction of Student Quiz Performance with Multimodal Textbook Features

作者:

arXiv:2606.24770v1 Announce Type: cross Abstract: Educational platforms often predict student performance from prior interactions, but the assessment content itself also varies in linguistic and visual complexity. This paper studies whether lightweight content features extracted from CourseKata chapter-review questions improve prediction of end-of-chapter quiz scores beyond a student's average prior exercise performance. The study combines 2023 CourseKata student response data with chapter-level text features from review-question wording and image features from textbook visuals. Across 4,742 student-chapter observations from 562 class-student IDs, adding content features improves student-grouped five-fold quiz prediction performance by 9.1% relative to a prior-performance baseline. In leave-chapter-out validation, text features reduce prediction error relative to the baseline, while image-containing models have higher error. This paper suggests that a context-aware model adds useful signal about the text and visual features of questions to better predict student quiz performance compared with using past student performance alone.

25.
arXiv (CS.LG) 2026-06-24

Hybrid Sequence Modeling and Reinforced Verification for Controllable Target-Conditioned Decision Making

arXiv:2508.16420v3 Announce Type: replace Abstract: Target-conditioned sequence models provide a simple interface for controllable offline decision making, but the requested target return can be an unreliable control signal, especially when the target return lies in underrepresented regions of the dataset. This paper proposes Doctor, a hybrid sequence modeling and reinforced verification framework for controllable target-conditioned offline decision making. Doctor trains a shared masked trajectory Transformer with two complementary objectives: masked trajectory reconstruction for candidate generation and in-sample value learning for action-value verification. At inference time, the model samples multiple nearby target returns, generates candidate actions in parallel, and selects the action whose verified value is closest to the requested target return. We analyze this verifier-guided selection rule and show that its value-level alignment error is bounded by candidate-value coverage around the target return and verifier accuracy. Experiments on D4RL and EpiCare show that Doctor improves target-return alignment under reduced high-return coverage, remains competitive on standard offline return-maximization benchmarks, and enables a single policy to modulate between conservative and aggressive operating points in a simulated clinical decision-making task. These results suggest that reinforced verification can improve the controllability of target-conditioned policies.