Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature (Science) 2026-06-12

‘Student Geng’ ignites research-integrity scandal in China after calling out senior academics<b> </b>

Authors:

Video blogger’s viral accusations of data manipulation in Nature journals have sparked intense debate and speedy institutional investigations. Video blogger’s viral accusations of data manipulation in Nature journals have sparked intense debate and speedy institutional investigations.

02.
arXiv (CS.AI) 2026-06-16

RetailBench: Benchmarking long horizon reasoning and coherent decision making of LLM agents in realistic retail environments

arXiv:2606.15862v1 Announce Type: new Abstract: Large language model (LLM) agents have made rapid progress on short-horizon, well-scoped tasks, yet their ability to sustain coherent decisions in dynamic long-horizon environments remains uncertain. We introduce RetailBench, a data-grounded simulation benchmark for evaluating tool-using LLM agents in single-store supermarket operation. RetailBench models retail management as a partially observable decision process and is designed to support thousand-day-scale simulations. In this environment, agents must manage pricing, replenishment, supplier selection, shelf assortment, inventory aging, customer feedback, external events, and cash-flow constraints. We evaluate seven contemporary LLMs under representative agent frameworks over a 180-day evaluation horizon and compare them with a privileged oracle policy. Results show substantial variation across models: only a small subset survives the full evaluation horizon, and even the strongest LLM runs remain substantially behind the oracle policy in final net worth and sales outcomes. Behavioral analysis attributes these gaps to incomplete evidence acquisition, surface-level decision making, and the lack of a consistent long-horizon policy. RetailBench provides a controlled testbed for studying reliable autonomy in economically grounded long-horizon decision-making.

03.
arXiv (CS.LG) 2026-06-15

Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It

arXiv:2606.14488v1 Announce Type: cross Abstract: Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate $Y_k$ with stepsizes $\beta_k=\Theta(k^{-1})$ and $\alpha_k=\Theta(k^{-a})$, $a\in(1/2,1)$, generally satisfies a mean-square rate of order $k^{-a}$; decoupled $k^{-1}$ rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order $1+\rho$ ($\rho\in[0,1]$), the uncorrected recursion satisfies \[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+\rho)}\bigr), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled $k^{-1}$ rate is guaranteed for the uncorrected recursion exactly when $a(1+\rho)\ge 1$. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ M_{k+1}=M_k+\gamma_k(R(X_k)-M_k),\qquad \beta_k\ll\gamma_k\ll\alpha_k, \] and subtracting $M_k$ from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves $\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$ for every $\rho\in[0,1]$, including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.

04.
arXiv (CS.AI) 2026-06-16

The Distributed Detectability Band Against Marginal-Preserving Attacks

arXiv:2606.10456v2 Announce Type: replace-cross Abstract: AI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to benign, so mean, max, top-k tail, and threshold monitors (Monitor A) are defeated by construction, while harm is encoded in the temporal correlation structure. We sequence the paper around three reviewer-mandated gates. (1) Realizability gate: the stealthy attack achieves KS-distance to benign of 0.013 (effectively zero) at all tested harm levels up to 3.0, confirming that harm is fully decoupled from the per-step marginal and realizability is not harm-limited. (2) Monitor-A-vs-B reconciliation: we show formally that the attack, built against Monitor A's score marginal, remains marginal-preserving under a different-score Monitor B (the correlation/sequence family: CUSUM, SPRT, HMM-LR, runs test, autocorrelation, windowed logistic), and scope worst-case claims to score functions that admit a temporal signature. (3) Non-empty detectability band: Monitor A achieves AUC 0.52 (chance); Monitor B spans AUC 0.79-0.97 at the same 1% FPR target, and as harm is amortized over more steps Monitor A collapses to chance while Monitor B holds at AUC ~0.95. These results demonstrate a non-empty detectability band and characterize the sub-threshold sabotage frontier: distribution-shape monitors fail by construction; temporal-correlation monitors can detect but are not trivially optimal.

05.
arXiv (CS.CV) 2026-06-19

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

Memory remains a critical bottleneck for long-horizon robotic manipulation, as standard Vision-Language-Action (VLA) policies often fail when task-relevant cues become occluded or unobservable over time. While existing memory-augmented methods utilize historical context, they either suffer from severe information bottlenecks, incur high latency via decoupled dual systems, or rely on unselective buffers that accumulate massive visual redundancies. To address these limitations, we introduce EventVLA, an end-to-end framework founded on the concept of sparse visual evidence memory that comprises two core components: foundational visual anchors to retain initial and short-term contexts, and a dynamic Keyframe Evidence Memory (KEM) module. Specifically, KEM directly predicts future keyframe probabilities from the VLA's latent embeddings to autonomously capture and store sparse, task-critical visual events. This foresight-driven mechanism empowers the policy to dynamically evaluate the future causal utility of current observations, preserving transient visual evidence before it becomes unobservable. Furthermore, we propose RoboTwin-MeM, a diagnostic benchmark specifically designed to evaluate non-Markovian manipulation tasks with interactive visual evidence. Extensive evaluations show that across 17 memory-requiring simulation tasks and 4 real-world bimanual tasks, EventVLA achieves an average success rate improvement of +40% over state-of-the-art memory-augmented VLAs.

06.
arXiv (CS.LG) 2026-06-11

Triangular-Reference Schrödinger Bridges for Time Series Generation

arXiv:2605.27478v3 Announce Type: replace-cross Abstract: Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya–Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

07.
arXiv (quant-ph) 2026-06-16

Hardy and Cabello Arguments in Spatial and Temporal Frauchiger-Renner Scenarios

arXiv:2606.15467v1 Announce Type: new Abstract: We investigate Hardy- and Cabello-type logical structures within spatial and temporal extensions of the Frauchiger–Renner (FR) framework, embedding these constructions directly into the FR multi-observer architecture. In the spatial multi-observer scenario, both Hardy and Cabello contradictions arise, with the Cabello construction yielding the stronger violation,$\(\Delta_Cabello^{\max}=0.1078\)$, which exceeds the maximal Hardy probability $\(P_{H}^{\max}=\frac{5\sqrt{5}-11}{2}\approx 0.09017\)$. We then develop a sequential temporal FR protocol based on coherent multi-observer measurements performed on a single spin-$\tfrac12$ system. In this temporal setting, the Hardy contradiction disappears identically due to dynamical constraints imposed by sequential state updates, whereas a finite Cabello-type violation survives, \(\Delta_Cabello^{\max}\approx 0.0674\). Our results establish a fundamental structural distinction between spatial entanglement and temporal multi-observer correlations in FR-type logical scenarios, and demonstrate that certain observer-independent description failures persist even without spacelike separation.

08.
arXiv (CS.AI) 2026-06-16

STRIDE: Strategic Trajectory Reasoning via Discriminative Estimation for Verifiable Reinforcement Learning

arXiv:2606.15866v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become an effective post-training paradigm for improving the reasoning abilities of large language models. However, existing RLVR methods typically rely on final-answer correctness to assign trajectory-level rewards, providing sparse supervision and treating all tokens uniformly regardless of their actual contribution to reasoning. Although recent studies introduce intermediate signals such as process rewards, high-entropy tokens, and semantic uncertainty, these signals are often not inherently verifiable and may fail to distinguish beneficial strategic patterns from harmful ones. To address this limitation, we propose STRIDE (Strategic Trajectory Reasoning with Discriminative Estimation), a fine-grained RLVR framework that derives strategic reasoning supervision from verifiable outcomes. STRIDE contrasts successful and failed trajectories within each response group to estimate the outcome-discriminative preference of each $n$-gram strategic pattern, and further combines this signal with reasoning saliency entropy to identify decision-relevant strategic patterns. These patterns are assigned differentiated advantage values during RL optimization, enabling more precise credit assignment while preserving the verifiability of RLVR. Extensive experiments demonstrate that STRIDE consistently improves reasoning performance across diverse models, tasks, and extended settings, including VLMs and agent-based systems.

09.
arXiv (CS.AI) 2026-06-16

Minimalist Genetic Programming

arXiv:2606.10237v2 Announce Type: replace Abstract: Genetic programming (GP) is based on two important insights. First, that any learning task can fundamentally be posed as a program induction problem, where the goal is to construct a symbolic hierarchical model that is expressed as a syntax tree. Second, to pose this task as a search problem, and use evolution to locate the desired model. Since it was proposed, GP has produced notable results in a wide range of tasks and problem domains. This work presents an alternative view by modifying the second core insight of GP, posing the problem as a syntactic derivation task instead. In particular, this paper presents Minimalist Genetic Programming (MGP), an algorithm that like GP is biologically inspired, but instead of evolution it takes inspiration from the Minimalist Program to human language, in which syntax is understood as an optimal solution to the problem of linking two other mental systems. In minimalism, the core computational process is a binary set formation operator called $MERGE$, than can be used to incrementally construct complex syntactic structures using a simple Markovian process. MGP is able to discover the core building blocks of the symbolic expressions, and to incrementally combined them using $MERGE$. The proposed system is benchmarked on symbolic regression tasks that are known to be difficult to solve with standard GP systems because of the propensity for bloat. Results show that when a proper lexicon of atomic syntactic objects are chosen, MGP is able to consistently produce the exact ground truth model on a set of symbolic regression tasks where standard GP struggles to do the same. The insights provided by minimalism are shown to be relevant to the problem of program induction, and should be explored further based on the potential exhibited by MGP in this work.

10.
arXiv (CS.AI) 2026-06-11

Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends

arXiv:2606.12207v1 Announce Type: cross Abstract: Embodied intelligence now spans navigation, household assistance, manipulation, autonomous driving, aerial agents, and multimodal large-model control. This expansion has made benchmark construction a central bottleneck for reliable evaluation. Unlike static datasets, embodied benchmarks combine task specifications, environments, robot data, demonstrations, annotations, metrics, evaluation scripts, and release policies into a single evaluation system. This survey reviews the literature through a five-stage construction pipeline: requirement and task construction, data acquisition, data cleaning and annotation, benchmark suite generation and metric definition, and evaluation execution with diagnostic feedback. For each stage, the survey analyzes the transition from manual curation to traditional automation, foundation-model assistance, and agentic closed-loop workflows. It also compares qualitative construction costs across human labor, data and asset acquisition, compute and simulation, validation and debugging, governance and maintenance, and rework risk. The main conclusion is that automation does not simply reduce benchmark cost. Instead, it often shifts cost toward validation, auditability, version control, and long-term governance. Progress in embodied evaluation will therefore depend not only on larger benchmark suites, but also on construction pipelines that are diagnosable, auditable, and responsibly refreshable.

11.
arXiv (math.PR) 2026-06-12

Interference Queueing Networks: A Replica Mean-Field Approach in the Symmetric Setting

arXiv:2606.13264v1 Announce Type: new Abstract: We propose a model for evaluating the performance of wireless communication networks beyond the ubiquitous full-buffer assumption, under which every transmitter is always active. The network is represented by N interacting queues arranged on a torus, with homogeneous arrival rate and service rates depending on the activity of neighboring interferers. More precisely, each queue is associated with a transmitter-receiver pair, and its service rate is given by the Shannon capacity, which depends on the corresponding Signal-to-Interference-plus-Noise Ratio (SINR). Since interfering transmitters only emit when their queue is non-empty, the SINR and hence the service rate improves when neighboring queues are empty. We derive the stability region of the system, together with approximations of its stationary distribution and its exponential rate of convergence to stationarity. These approximations are obtained via a replica mean-field limit, for which we establish propagation of chaos and long-time behavior results.

12.
arXiv (CS.LG) 2026-06-17

On Randomized Algorithms in Online Strategic Classification

arXiv:2602.06257v2 Announce Type: replace Abstract: Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings; our bounds depend on the Littlestone dimension $\mathrm{Ldim}(\mathcal H)$ of the hypothesis class $\mathcal H$ and the maximum degree $\Delta$ of the manipulation graph. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal H) \Delta^2$, the existing lower bound $\Omega(\mathrm{Ldim}(\mathcal H) \Delta)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We then provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot \Delta \log \Delta)$. In the agnostic setting, we give an improper randomized learner that improves the regret upper bound to $O(\sqrt{T\log|\mathcal H|})$, matching the standard online learning rate. We also show a larger lower bound for all proper learning rules, demonstrating that improperness is necessary to achieve the optimal rate.

13.
arXiv (CS.AI) 2026-06-15

Discovery under Hypothesis Redundancy: A Geometric Theory of Discovery Bottlenecks

arXiv:2606.14386v1 Announce Type: cross Abstract: Scientific discovery saturates when new hypotheses cease to provide independent information, even if the nominal hypothesis space remains large. We study hybrid discovery systems that combine structured local search with LLM-generated non-local proposals and pose the Search Compression Hypothesis: non-local exploration helps only when three geometric conditions co-occur: spectral compression, orthogonal escape from the explored span, and residual signal alignment with the target. We formalize these conditions, derive necessary conditions for hybrid advantage, and test the mechanism in controlled synthetic environments, large-scale A-share factor discovery, and symbolic-regression benchmarks; a public tabular operational sanity check tests the associated budget-allocation implication. Signal-planting and directed-versus-random experiments show that novelty alone is insufficient: random orthogonal jumps expand coverage but do not improve yield without predictive alignment. Across compression sweeps, real factor archives, and LLM-SRBench tasks, hybrid gains concentrate in weakly represented but target-bearing directions and vanish as the hypothesis space approaches full rank. The framework turns LLM-guided discovery from generic novelty search into a diagnostic procedure for deciding when directed non-local exploration is warranted.

14.
arXiv (CS.CL) 2026-06-15

Token-Level LLM Collaboration via FusionRoute

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

15.
arXiv (CS.AI) 2026-06-17

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

arXiv:2606.17266v1 Announce Type: new Abstract: Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.

16.
arXiv (CS.CV) 2026-06-18

Visual-OPSD: Cross-Modal On-Policy Self-Distillation for Efficient Unified Multimodal Reasoning

Unified multimodal models (UMMs) interleave generated ''visual thoughts'' (VTs) with text reasoning to improve spatial tasks. This incurs roughly an order-of-magnitude inference cost from multi-step diffusion. We find this cost yields limited direct benefit. On ThinkMorph, removing or noising VTs barely changes accuracy across nine benchmarks. Once rendered, attention concentrates on the VT regardless of content. Yet a KL diagnostic shows that conditioning on a privileged VT trace shifts the model's completion distribution. This suggests the generation pathway encodes useful reasoning beyond the rendered pixels. Motivated by this gap, we propose Visual On-Policy Self-Distillation(Visual-OPSD). Teacher and student share identical weights but differ in context: the teacher sees privileged VTs while the student sees only the question. Token-level JSD distillation on on-policy student trajectories transfers the teacher's reasoning to a text-only student. Across nine benchmarks, Visual-OPSD improves over its generative teacher by $+3.40$pp with $14.3\times$ speedup (10.0s vs. 142.8s per sample) and outperforms same-scale VLMs by $+63.83$pp on VSP. A Gaussian-noise control ($+0.40$pp vs. $+10.28$pp for real VTs) and $58.4\%$ closure of the KL gap confirm that gains come from the semantic content of the generation pathway.

17.
arXiv (math.PR) 2026-06-11

Persistent Homology of the Planar Wiener Sausage: Brownian Scaling and a Logarithmic Expectation Law

arXiv:2606.11248v1 Announce Type: new Abstract: We study degree-one persistent homology of the planar Wiener-sausage filtration generated by standard Brownian motion without drift. In the drifted case, regeneration along the drift direction leads to linear-in-time laws for persistent-homological observables. In the recurrent zero-drift case, this renewal structure disappears. The organizing mechanism is instead Brownian self-similarity: the persistence diagram at time $T$ is equal in law to the image of the unit-time diagram under spatial dilation by $\sqrt T$. Consequently, large-time questions on fixed radius windows are transformed into small-radius questions for the unit-time Brownian trace. Let $B$ be standard planar Brownian motion, let $K_T=B\left(\left[0,T\right]\right)$, and let $K_T^{\left(r\right)}$ be the radius-$r$ Wiener sausage. Since $K_T^{\left(r\right)}$ is connected, its first Betti number $\beta_1^T\left(r\right)$ is the number of bounded complementary components of $K_T^{\left(r\right)}$. For a bounded nonnegative Borel function $\psi$ supported in a compact interval $\left[a,b\right]\subset\left(0,\infty\right)$, we consider the smoothed Betti-curve observable $\left[r_0,r_1\right] \mathrm{\Phi}_\psi \left(T\right) = \int_{r_0}^{r_1} \beta_1^T \left( r \right) \psi \left( r \right) dr$. We prove that there exist absolute constants 0

18.
arXiv (CS.CV) 2026-06-17

SPHINX: First Explain, Then Explore

Generating adversarial driving scenarios is critical for evaluating and improving autonomous vehicle decision-making systems in simulation. Recent approaches, such as ChatScene and LLM-Attacker, rely primarily on the prior knowledge of Large Language Models and Vision-Language Models to generate driving scenarios procedurally. We argue that adversarial scenes should be generated based on the failure diagnosis (e.g., indecisiveness, multi-frame inconsistency) of the driving policy to specifically address the policy's weaknesses instead of relying on prior assumptions. In this paper, we propose SPHINX, a closed-loop framework for adversarial scenario synthesis guided by a simple principle: first explain, then explore. Beyond blindly exploring the scenario space, SPHINX leverages explainable artificial intelligence methods to analyze the policy, identifying key visual concepts and their influence on policy outputs, and the uncertainty of the decisions. Given the interpretable evidence extracted from the policy's own decision process, we use a vision language model to rationalize and criticize failure modes of the current policy. These critics are then used to generate targeted adversarial scenarios for policy retraining and improvement. We demonstrate that SPHINX can highlight an interpretable account of policy failures while other adversarial scene generation cannot. Across the evaluated benchmarks and test suites, SPHINX can be applied to diverse state-of-the-art autonomous vehicle architectures and yields consistent robustness improvements over existing scenario-generation methods.

19.
arXiv (CS.AI) 2026-06-16

Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

arXiv:2606.15762v1 Announce Type: cross Abstract: We ran 300 repeated vulnerability-finding scans to measure how repeatable agentic large language model (LLM) security review is on the same JavaScript code, prompt, and benchmark harness. The headline result is that LLM security findings were unevenly repeatable: reference-matched findings were stable, but extra model reports varied heavily from run to run. Across 250 model runs, 80 of 161 unique unmatched findings appeared in only one of five identical repetitions, while only 22 appeared in all five. By contrast, when Claude matched a Snyk Code reference finding, the behavior was much more stable: 134 of 158 unique reference-matched findings appeared in all five repetitions. The benchmark also shows complementarity. Models consistently found familiar, high-signal exploit shapes, and in one case surfaced a likely Snyk Code product gap. Snyk Code static application security testing (SAST) was deterministic and better at systematically enumerating repeated data-flow sinks. The results support combining agentic LLM review with deterministic SAST rather than treating either technique as a replacement for the other.

20.
medRxiv (Medicine) 2026-06-16

The Target48 Neurodegeneration Panel: A Novel Tool for Profiling Protein Signatures in Neurodegenerative Disorders

Introduction: Novel tools for absolute quantification of established and emerging fluid neuro-biomarkers are required to advance diagnostic studies and improve biological insights. Methods: We conducted an extensive analytical and clinical validation of the Olink Target 48 Neurodegeneration panel (T48 Neuropanel) in 352 paired CSF and plasma samples from cognitively unimpaired controls (CU), Alzheimer dementia (AD), frontotemporal dementia (FTD), and dementia with Lewy bodies (DLB), n=44 per group. Comparisons with benchmark assays were performed. Results: Good detectability (CSF: 31 out of 42 assays; plasma: 38 out of 42 assays) and technical performance was observed. Benchmark assays showed good correlations, supporting method transformation formulas. Next to emerging biomarkers (MMP10, ITGB2), discriminative performance was excellent in AD: CSF pTau217: AUC=1; FTD: plasma NfL: AUC=0.952; and DLB: CSF DDC: AUC=0.901. Discussion: This analytical and clinical validation of the T48 Neuropanel highlights initial cut-offs and emerging biomarkers to aid clinical studies for the diagnosis, prognosis, and monitoring of neurodegenerative diseases. Highlights: The T48 Neuropanel shows robust analytical performance, with high detectability across both plasma and CSF matrices. The T48 Neuropanel validates established (i.e., pTau217, Abeta42, NfL, and GFAP) and emerging biomarkers (i.e., DDC, MMP10, ITGB2, ITGAM, NPTX2, NPTXR, SMOC1, sTREM1, and sTREM2) in CSF and plasma. CSF NfL, GFAP, ITGB2, and ITGAM and plasma GFAP were dysregulated across AD, FTD, and DLB dementias. -The multiplex design of the T48 Neuropanel enables rich biological interpretation by simultaneously quantifying established and emerging neurodegeneration biomarkers. Importantly, the inclusion of absolute quantification facilitates the establishment of cut-offs, supporting its potential for clinical translation.

21.
arXiv (CS.CV) 2026-06-15

Boundary-Centric Clip-Budgeted Active Learning for Temporal Action Segmentation

Temporal action segmentation (TAS) in untrimmed videos requires dense temporal supervision. However, most of the annotation cost is spent identifying action transitions where segmentation errors concentrate and small temporal shifts can disproportionately degrade segment-level metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these error-prone boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queries unlabeled videos using predictive uncertainty, and (ii) within each selected video, it detects candidate transitions from the current model predictions and selects the top-$K$ boundaries via a novel boundary score. The boundary score fuses neighborhood uncertainty, class ambiguity, and temporal prediction dynamics to reveal the underlying importance of each frame. Importantly, our annotation protocol requests labels only at the boundary frames while still training on boundary-centered clips to exploit temporal context through the model's receptive field. Extensive experiments on GTEA, 50Salads, and Breakfast demonstrate that boundary-centric supervision delivers strong label efficiency and consistently surpasses representative TAS active learning baselines and prior state of the art under sparse budgets. Gains are largest on datasets where performance is highly sensitive to boundary placement, as measured by edit and overlap-based F1 metrics.

22.
arXiv (CS.AI) 2026-06-16

A Model-Free Universal AI

arXiv:2602.23242v3 Announce Type: replace Abstract: In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models. This paper introduces Universal AI with Q-Induction (AIQI), the first model-free agent proven to be asymptotically $\varepsilon$-optimal in general RL. AIQI performs universal induction over distributional action-value functions, instead of policies or environments like previous works. Under a grain of truth condition, we prove that AIQI is strong asymptotically $\varepsilon$-optimal and asymptotically $\varepsilon$-Bayes-optimal. We also apply our novel proof techniques to show asymptotic $\varepsilon$-optimality of Self-AIXI without any ad-hoc assumptions. Our results significantly expand the diversity of known universal agents.

23.
arXiv (quant-ph) 2026-06-16

Quantum speedup from nonclassical polarization

arXiv:2603.23124v2 Announce Type: replace Abstract: We develop a framework for identifying nonclassical speedups in systems with polarization, likewise spin degrees of freedom. By confining the dynamics to the manifold of angular momentum coherent states, which act as the classical reference in this case, we compute the speed limit that bounds the rate of change of the state achievable without generating quantum coherence. A comparison with the unrestricted quantum speed limit enables the quantitative identification of speedups arising from polarization nonclassicality. We apply this framework to the cross-Kerr interaction, demonstrating a persistent speedup scaling as $\mathcal{O}(\sqrt{N})$ with the photon number $N$ with a parity effect in favour of even photon numbers. The results establish polarization nonclassicality as a genuine dynamical resource, linking quantum coherence to quantum-enhanced evolution speeds in nonlinear photonic systems.

24.
arXiv (CS.AI) 2026-06-16

LLM-as-Code Agentic Programming for Agent Harness

arXiv:2606.15874v1 Announce Type: new Abstract: Every major LLM agent framework gives the LLM the role of orchestrator; the model decides what to do next, when to call tools, and when to stop. We argue that token explosion, control-flow hallucination, and unreliable completion are not implementation bugs but architectural consequences of assigning the deterministic work of looping, branching, and sequencing to a probabilistic system. A better prompt or a stronger model cannot guarantee the reliability of the LLM agent. We therefore propose Agentic Programming, in which the program governs all control flow, and the LLM is itself part of it, an adaptive component we call LLM-as-Code and invoke only where a task calls for reasoning or generation. Within each call the model keeps full flexibility, but it cannot alter the program's execution path. With control in the program, the LLM's context is built from the execution history's call tree and forms a directed acyclic graph (DAG). Each call's context length is then determined by its call depth rather than by accumulation over steps. A case study of computer-use agents shows that the design is practical, not just a theoretical stance, substantially improving the stability of long visual operation sequences.

25.
arXiv (CS.LG) 2026-06-11

Efficient Multinomial Logistic Bandit via Frequent Directions

arXiv:2606.11968v1 Announce Type: new Abstract: This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of $\tilde{\mathcal{O}}(Kd\sqrt{T})$, but still requires $\mathcal{O}(K^3d^3)$ time and $\mathcal{O}(K^2d^2)$ space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and $Kd \times K$ spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and $K \times K$ eigenvalue computations, respectively. This yields dominant per-round time complexity $\mathcal{O}(Kd(m+K)^2)$ and space complexity $\mathcal{O}(Kd(m+K))$, where $m \ll d$ is the sketch size. We further prove a regret bound of $\tilde{\mathcal{O}}(\Delta_T(Kd\ln\Delta_T+m)\sqrt{T})$, where the sketching error factor $\Delta_T$ is controlled by the $m$-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.