Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-18

Continual Adaptation for Pacific Indigenous Speech Recognition

Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate the impact of data volume, adaptation strategies, and representational drift on speech foundation models for various Pacific languages. Additionally, we analyze a continual learning framework for sequential language acquisition. Empirical results across three distinct Pacific Indigenous languages demonstrate that adapting to these linguistically distant languages induces severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.

02.
arXiv (CS.CL) 2026-06-11

Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining

Short pretraining runs can reduce experimental cost, but they can also over-promote configurations that only look strong at tiny budgets. We study an auditable staged-promotion protocol for a fixed micro-pretraining runner on two heterogeneous host blocks: Windows A100 and Linux L40S. Starting from twelve prior-screened configurations, we use staged budgets of 2 minutes, 5 minutes, 10 minutes, 60 minutes, and 12 hours, with frozen promotion rules before expensive continuations. The early screens are intentionally treated as unstable: the 5- and 10-minute rankings are host-sensitive, and the eventual 12-hour top-ranked condition is not the mean-best condition at the replicated 10-minute gate. Because seed ranges differ across stages, these changes are operational promotion evidence, not within-seed curves. A replicated 60-minute gate keeps the Staged Factorial Screening bridge reference in the promoted set, where it ranks first in all four 60-minute host-seed cells. In the final 12-hour confirmation package, the bridge condition ranks first in all four host-seed cells across two seeds; the greedy comparator does not meet the frozen 0.010 val_bpb near-equivalence rule; and the cheaper d8/ar48 (depth-8, aspect-48) sentinel does not meet the frozen 0.020 mean-gap rule. The executed 12-hour branch spends 144 GPU-hours, and the full staged protocol records 169.2 training GPU-hours including screening stages. Continuing all four 60-minute candidates would spend 192 GPU-hours, while continuing all nine replicated 10-minute candidates would spend 432 GPU-hours. The latter numbers are accounting counterfactuals for unrun continuations, not evidence that skipped candidates could not have overtaken the reference. The result is a bounded cost-allocation finding, not a claim of global optimality, capacity-normalized superiority, or superiority over adaptive hyperparameter optimization methods.

03.
arXiv (quant-ph) 2026-06-25

Quantum metrology of electric and magnetic dipole moments: ultimate limits and optimal regimes

arXiv:2606.25510v1 Announce Type: new Abstract: The characterization of electric and magnetic dipole moments (EDM and MDM) in quantum systems is central to fundamental physics and quantum sensing. While EDM searches provide powerful probes of CP violation within and beyond the Standard Model, precise MDM estimation is crucial for high-precision magnetometry and the development of quantum sensors. In this work, we address the ultimate precision limits for separate and simultaneous estimation of both dipole moments in a generic two-level system coupled to electromagnetic fields. We analyze three classes of quantum probes/strategies: unitary and depolarizing dynamics, and thermal equilibrium states. For each, we derive the quantum Fisher information (matrix), identify optimal probes, and determine the ideal operating conditions, such as evolution times and temperatures, that maximize estimation precision. We further assess the compatibility and sloppiness of the statistical models, showing that orthogonal dipole moments configurations enable joint estimation of EDM and MDM, whereas parallel configurations are intrinsically sloppy, permitting only the estimation of a single parameter combination. Our results provide a unified metrological framework for estimation schemes ranging from neutron EDM searches to molecular magnetometry, and highlight the distinct roles of coherence, noise, and thermalization in multiparameter quantum sensing of dipole moments.

04.
arXiv (CS.CL) 2026-06-24

An Approach to Simultaneous Acquisition of Real-Time MRI Video, EEG, and Surface EMG for Articulatory, Brain, and Muscle Activity During Speech Production

Speech production is a complex process spanning neural planning, motor control, muscle activation, and articulatory kinematics. While the acoustic speech signal is the most accessible product of the speech production act, it does not directly reveal its causal neurophysiological substrates. We present the first simultaneous acquisition of real-time (dynamic) MRI, EEG, and surface EMG, capturing several key aspects of the speech production chain: brain signals, muscle activations, and articulatory movements. This multimodal acquisition paradigm presents substantial technical challenges, including MRI-induced electromagnetic interference and myogenic artifacts. To mitigate these, we introduce an artifact suppression pipeline tailored to this tri-modal setting. Once fully developed, this framework is poised to offer an unprecedented window into speech neuroscience and insights leading to brain-computer interface advances. The source code and data are available.

05.
arXiv (CS.LG) 2026-06-16

A Penalty Approach for Differentiation Through Black-Box Quadratic Programming Solvers

arXiv:2602.14154v3 Announce Type: replace Abstract: Differentiating through the solution of a quadratic program (QP) is a central problem in differentiable optimization. Most existing approaches differentiate through the Karush–Kuhn–Tucker (KKT) system, but their computational cost and numerical robustness can degrade at scale. To address these limitations, we propose dXPP, a penalty-based differentiation framework that decouples QP solving from differentiation. In the solving step (forward pass), dXPP is solver-agnostic and can leverage any black-box QP solver. In the differentiation step (backward pass), we map the solution to a smooth approximate penalty problem and implicitly differentiate through it, requiring only the solution of a much smaller linear system in the primal variables. This approach bypasses the difficulties inherent in explicit KKT differentiation and significantly improves computational efficiency and robustness. We evaluate dXPP on various tasks, including randomly generated QPs, large-scale sparse projection problems, and a real-world multi-period portfolio optimization task. Empirical results demonstrate that dXPP is competitive with KKT-based differentiation methods and achieves substantial speedups on large-scale problems. Our implementation is open source and available at https://github.com/mmmmmmlinghu/dXPP.

06.
arXiv (CS.CV) 2026-06-24

DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates

We propose DLTPose, a novel method for 6DoF object pose estimation from RGBD images that combines the accuracy of sparse keypoint methods with the robustness of dense pixel-wise predictions. DLTPose predicts per-pixel radial distances to a set of minimally four keypoints, which are then fed into our novel Direct Linear Transform (DLT) formulation to produce accurate 3D object frame surface estimates, leading to better 6DoF pose estimation. Additionally, we introduce a novel symmetry-aware keypoint ordering approach, designed to handle object symmetries that otherwise cause inconsistencies in keypoint assignments. Previous keypoint-based methods relied on fixed keypoint orderings, which failed to account for the multiple valid configurations exhibited by symmetric objects, which our ordering approach exploits to enhance the model's ability to learn stable keypoint representations. Extensive experiments on the benchmark LINEMOD, Occlusion LINEMOD and YCB-Video datasets show that DLTPose outperforms existing methods, especially for symmetric and occluded objects. The code is available at https://anonymous.4open.science/r/DLTPose_/ .

07.
arXiv (quant-ph) 2026-06-16

Against probability: A quantum state is more than a list of probability distributions

arXiv:2601.18872v2 Announce Type: replace Abstract: The state of a quantum system can be represented by listing the outcome probabilities for a tomographically complete set of measurements. Such representations appear throughout physics, for example, in quantum field theory via correlation functions and in quantum foundations within generalized probabilistic frameworks. In this paper, we show a no-go result: To enable useful statements, the probability representation must be topologically robust$\unicode{x2014}$preserving the notion of closeness between states. Yet, a topologically robust probability representation cannot simultaneously retain other essential structure, such as the subsystem structure.

08.
arXiv (CS.AI) 2026-06-16

Attribute Inference from Interactive Targeted Ads

作者:

arXiv:2606.15209v1 Announce Type: new Abstract: Targeted advertising systems can pair audiences selected by advertisers with ad units that expose visible user actions. When an interaction remains linked to the campaign that elicited it, the advertiser may receive an observation tied to a user rather than only an aggregate report. We model that channel as a noisy oracle for attribute inference. The model separates targeting predicates, exposure, interaction, and disclosure. These boundaries capture the gap between eligibility and delivery, and the gap between interaction and advertiser visibility. We build a reproducible benchmark using synthetic populations calibrated with public data, each with known sensitive labels. A generated campaign semantics layer provides topic variants and response priors. The simulator generates the ground truth, event traces, disclosed observations, and metrics. The evaluation compares Bayesian, supervised, positive and unlabeled, and adaptive attacks under common campaign and disclosure definitions. The final evaluation uses four topic variants, seven simulator seeds, and two interaction settings. Repeated campaigns with identity exposure produce measurable but bounded inference signal. At $160$ campaigns, Bayesian and supervised attacks reach about $0.64$ AUC in the main setting and about $0.65$ AUC in the higher interaction setting. Disclosure policy is the strongest control. Aggregate reporting removes the evaluated oracle input tied to users. Type filtering and randomized disclosure reduce the released signal. The result is a model, artifact, and defense evaluation method for privacy in interactive targeted advertising. The code is available at https://github.com/P-HOW/Interactive-Ad-Oracle.

09.
arXiv (CS.CL) 2026-06-19

CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility – Semantic Metrics and Convergence Analysis

Decomposing compound sentences into atomic, verifiable claims is a prerequisite for reliable automated fact-checking. Prior work has relied on token-overlap (Jaccard) metrics that systematically underestimate decomposition quality for paraphrastic claims, and has lacked formal termination analysis for the repair loop. We present Credence, a revised claim decomposition and evaluation framework addressing both shortcomings. Our contributions are: (1) Semantic-F1: we use BGE-large cosine similarity fidelity metric that resolves Jaccard's penalisation and improves downstream fact-checking accuracy; (2) Convergence theorems: we formally characterise four properties of the repair pipeline, establishing that rule-based repair is monotone and finitely terminating under an oracle parser assumption; LLM-based self-repair is provably non-monotone and requires an early-exit guard; (3) Three evaluation benchmarks spanning social-media, encyclopaedic, and news domains for cross-domain generalisation measurement; (4) Multi-model benchmarking across four decomposer models (3.8B-12B) and a closed API model. Experiments on SocialClaimSplit, WikiSplitBench, and ClaimDecompBench show that Semantic-F1 outperforms Jaccard-F1 by +15-32pp. EPR ranges from 0.94 to 1.00 on SocialClaimSplit and WikiSplitBench, while ClaimDecompBench includes lower base EPR cases (down to 0.824) due to harder news-domain constructions, and rule-repair reduces the Atomicity Violation Rate (AVR) by 47-100% relative to the base model without degrading fidelity.

10.
PLOS Computational Biology 2026-06-09

Evolution of phenocopying in a dynamical model of developmental trajectories

by Yuuki Matsushita, Archishman Raju Developmental trajectories are known to be canalized, or robust to both environmental and genetic perturbations. However, even when these trajectories are decanalized by an environmental perturbation outside the range of conditions to which they are robust, they often produce phenotypes similar to known mutants, called phenocopies. This correspondence between the effects of environmental and genetic perturbations has received little theoretical attention. Here, we study an abstract regulatory model that is evolved to follow a specific trajectory. We then study the effects of small and large perturbations to the trajectory, both by changing parameters and by perturbing the state at specific times. We find that the phenomenon of phenocopying emerges in evolved trajectories and is not present in a null model of randomly sampled trajectories. Our results suggest that, in this class of dynamic models, evolution can allow high-dimensional phenotypic landscapes to simultaneously exhibit robustness and phenocopying.

11.
arXiv (CS.LG) 2026-06-18

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

arXiv:2606.18972v1 Announce Type: cross Abstract: Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable – e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

12.
medRxiv (Medicine) 2026-06-22

Symptom-based phenotype discovery in motor neuron disease using natural language processing of electronic health records

Background: Motor neuron disease (MND) is a fatal neurodegenerative condition with significant clinical heterogeneity that is incompletely captured by existing phenotype classifications based on onset site. Electronic health records (EHRs) contain detailed symptom documentation in clinical narratives that may enable data-driven discovery of clinically meaningful patient subgroups. Methods: We developed a natural language processing (NLP) pipeline using MedCAT to extract symptoms from clinical notes of 2,361 people with a confirmed diagnosis of MND at a tertiary neurology center. MND cohort confirmation used three complementary methods: clinic attendance records, text-based diagnosis detection, and NLP extraction with negation detection. Extracted symptoms were filtered to Unified Medical Language System semantic type T184 (Sign or Symptom) with removal of negated concepts. Patients were clustered using latent class analysis on binary symptom profiles. Survival differences were assessed using Kaplan-Meier analysis, log-rank tests, and Cox proportional hazards regression. Results: From the first clinical notes, we identified four clusters of symptoms among 872 patients and 76 symptoms: Motor-Bulbar (n=373), Motor-Tremor (n=154), Sensory-Pain (n=222), and Motor-Respiratory (n=123). When extended to all clinical notes (n=2,065; 184 symptoms), these reorganized into three clusters: Autonomic-Respiratory (n=472), Nocturnal-Respiratory (n=338), and Classic Motor (n=1,255). Survival differences were significant across all clusters in both the first notes and all notes analyses (log-rank p < 0.001). Conclusions: NLP-based symptom extraction from EHRs identifies clinically meaningful MND subgroups that extend beyond traditional onset-site classifications. Autonomic-respiratory symptom burden is associated with poorer survival while a newly identified Sensory-Pain subtype with a better prognosis. These data-driven phenotypes may improve prognostication and inform targeted supportive care.

13.
arXiv (CS.AI) 2026-06-11

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

arXiv:2606.11918v1 Announce Type: new Abstract: Current Large Reasoning Models (LRMs) exhibit remarkable general capabilities but significantly underperform in spatial reasoning tasks. Existing approaches treat this gap as a knowledge deficit, relying on supervised fine-tuning (SFT) to ingest labeled spatial data from external vision sources or synthetic engines. In contrast, we argue that for many tasks, spatial reasoning capabilities are already present in pre-trained LRMs but require alignment through logical coherence under geometric 2D and 3D constraints. In this work, we propose a self-supervised reinforcement learning (RL) framework that targets the internal reasoning process without requiring ground-truth annotations. By formalizing the notion of consistency verifiers – reward functions that check for geometric and semantic consistency under transformations – we demonstrate that models can improve their spatial reasoning abilities. We use both image transformations, like flipping, and textual transformations, like swapping the order of objects in the question, and propose a new optimal transport-based RL strategy, OT-GRPO, which is a minimal-matching variant of group relative policy optimization tailored to pairwise verifiers. We show that this label-free consistency training approaches the accuracy of models trained with ground-truth supervision and achieves similar generalization across diverse tasks and data domains.

14.
arXiv (CS.CL) 2026-06-11

Fast Speech Foundation Model Distillation Using Interleaved Stacking

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.

15.
arXiv (CS.AI) 2026-06-24

AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

arXiv:2606.24245v1 Announce Type: cross Abstract: Large language model (LLM) agents increasingly automate complex tasks by integrating language models with external tools and environments. However, their autonomy poses significant safety risks: agents may execute destructive commands, leak sensitive data, or violate domain constraints. Existing safety approaches face a fundamental tradeoff: hand-crafted rules are interpretable but brittle, with overly conservative rules blocking safe operations (high false positives) while permissive rules miss unsafe behaviors (high false negatives). Neural classifiers lack the interpretability required for safety-critical deployments. We present AutoSpec, a framework that automatically evolves deployed expert-designed safety rules from user safe/unsafe annotations through counterexample-guided inductive synthesis (CEGIS) guided by inductive logic programming (ILP). Starting from the expert rules and a stream of annotated traces, AutoSpec iteratively evaluates rules, mines false-positive and false-negative counterexamples, uses ILP to learn which predicates discriminate them, generates candidate rule edits, and verifies candidates to select the best revision. The key insight is that ILP efficiently identifies predicates that appear frequently in false negatives but rarely in false positives (or vice versa), dramatically pruning the exponential search space of rule edits. This continues until convergence, producing interpretable rules that balance precision and recall. We evaluate AutoSpec on 291 execution traces spanning code execution and embodied agent domains. AutoSpec raises rule F1 to 0.98 and 0.93 across the two domains, achieving up to 94% false positive reduction while maintaining high recall, and converges within 4-5 iterations. The ILP-guided approach achieves up to 4.8x higher F1 than heuristic CEGIS. The learned rules are human-readable, auditable, and generalize to unseen scenarios.

16.
arXiv (CS.CL) 2026-06-18

RCEM: Robust Conversational Search EMbedder in Distributional Shift

We propose RCEM, a Robust Conversational search EMbedder that is additionally equipped with LLM's query reformulation capability without losing base model's generalization. Unlike prior conversational dense retrieval approaches that learn direct conversation-to-passage matching, RCEM aligns conversations, prepended by special token, to LLM-rewritten queries, while preserving the original embedding space. The unchanged embedding space automatically maps the rewritten-query to the relevant passages. As a result, RCEM (1) reduces overfitting by simplifying the alignment task from long passages to shorter rewritten queries, (2) eliminates the need for conversation-to-passage relevance labels for training, and (3) maintains its original embedding space that allows conversational queries against indexes built by original embedder without rebuilding them. Extensive experiments show that RCEM consistently outperforms prior approaches, achieving up to 30% improvement under distributional shift.

17.
arXiv (quant-ph) 2026-06-16

Finite-Element Matrix Product States for Continuum Models in One Dimension

arXiv:2606.14873v1 Announce Type: new Abstract: We present a matrix product state framework for simulating one-dimensional quantum many-body systems in the continuum using non-orthogonal single-particle basis sets. By mapping the physical problem to an auxiliary computational space, we show that the resulting many-body overlap operator can be efficiently encoded as a matrix product operator for sufficiently localized orbitals, thereby generalizing a construction that first appeared in [arXiv:2405.10285]. This construction recasts the variational ground-state search into a generalized eigenvalue problem, which can be solved using a generalized density matrix renormalization group algorithm. As a primary application, we employ a first-order finite-element expansion to study the ground state properties of the Lieb-Liniger gas in the presence of inhomogeneities. This approach also provides a natural setting for exactly refining the lattice, thereby enabling multigrid optimization strategies for matrix product states.

18.
Science (Express) 2026-05-06

A 481-meter-high landslide-tsunami in a cruise ship–frequented Alaska fjord | Science

作者: 未知作者

Early in the morning of 10 August 2025, a >64 × 10 6 m 3 landslide struck Tracy Arm fjord in Alaska. The landslide was preconditioned by glacial retreat caused by climate change. The resulting 481 m runup megatsunami followed an initial 100-m-high breaking wave traveling >70 m s −1 . The landslide was preceded by several days of microseismicity, which increased in rate and magnitude until ~1 hour before failure. The landslide produced globally observed long-period seismic waves equivalent in size to a M5.4 earthquake. A long-period (~66 s) global seismic signal, produced by a landslide-induced seiche trapped within the fjord, persisted for up to 36 hours, the second time a days-long seiche has been thus observed. With fjord regions increasingly visited by cruise ships, and climate change making similar events more likely, this unanticipated, near-miss event highlights the growing risk from landslides and tsunamis in coastal environments.

19.
arXiv (quant-ph) 2026-06-12

Supersymmetry of dissipative Bose-Fermi systems with application to Jaynes-Cummings and Dicke models

arXiv:2606.12682v1 Announce Type: new Abstract: We demonstrate how supersymmetries of Hamiltonians for coupled Bose-Fermi systems can be used to place the Hamiltonians of the Jaynes-Cummings model and Dicke model under the rotating wave approximation in matrix form and provide explicit analytic solutions for their eigenvalues. We then use this supersymmetry to place the Liouvillians of the associated Markovian open systems in matrix form and provide explicit solutions for their eigenvalues. These results are a consequence of the fact that the Hamiltonian of the Jaynes-Cummings model commutes with the linear Casimir invariant of the superalgebra $u(1|1)$ and that the Hamiltonian of the Dicke model commutes both with the linear invariant of $\sum_{i} u_{i}(1|1)$ and with the invariant of an additional $su(2)$ algebra. Our methods apply to various coupled Bose-Fermi systems with $u(1|1)$ and more generally with $u(n|m)$ dynamical superalgebras, and may provide efficient tools for studying more complicated examples.

20.
PLOS Medicine 2026-05-08

Climate change and non-communicable diseases: An invisible syndemic

by Gokul Parameswaran, Sadeer Al-Kindi, Sanjay Rajagopalan Climate change accelerates non-communicable diseases (NCDs) through cascading environmental disruptions and is attributed to driving increased NCD-related mortality. Yet this syndemic remains invisible and underfunded. We detail why addressing the climate-NCD intersection is critical for improving health. In this Perspective, Sanjay Rajagopalan and colleagues discusses how climate change accelerates non-communicable diseases (NCDs) and exacerbates NCD-related mortality, and calls for greater visibility and funding to address this syndemic and improve human health.

21.
arXiv (CS.CV) 2026-06-11

Task-Aligned Stability Analysis of Vision-Language Models for Autonomous Driving Hazard Detection

Vision-language models (VLMs) are increasingly used for scene understanding in autonomous driving, but robustness analysis often relies on task-agnostic embedding stability alone. We study whether corruption-induced embedding drift predicts changes in a task-aligned hazard score derived from CLIP image-text similarities. Using controlled corruptions on BDD100K road scenes, we compare embedding drift against margin drift, defined as the change in hazard score under perturbation. The relationship is highly corruption-dependent: some families exhibit strong coupling between representation drift and decision drift, while others induce hazardous decision instability despite relatively modest embedding change. Furthermore, corruption families differ in failure direction: most suppress hazard detections via false negatives, while occlusion instead triggers false alarms, suggesting that benchmark design should account for asymmetric failure modes, not just overall instability rates. These results suggest that robustness benchmarks should include task-aligned stability measures in addition to embedding-level perturbation statistics.

22.
arXiv (CS.CL) 2026-06-25

Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making

Recent studies on world modeling for Large Language Model (LLM) agents typically formulate the learning objective as next-observation prediction. However, this objective ties supervision to what a transition happens to reveal, which may omit the dynamics most relevant to the agent's current decision. To bridge this gap, we propose Agent-Authored World Modeling (AAWM), a training procedure that constructs supervision from the policy's own decision needs. Specifically, at each state, the agent identifies what it needs to understand about the environment before acting. These needs drive the retrieval of relevant transition evidence across trajectories, which is then synthesized into training targets that capture decision-oriented dynamics instead of reconstructing the next observation. This aligns the training objective with the dynamics the policy needs before acting, not with the contents of the next observation. Experimental results validate the effectiveness of AAWM across multiple environments and training settings. These results show that decision-aware world-model targets provide a more effective learning signal than next-observation prediction.

23.
arXiv (CS.CV) 2026-06-16

RefGC-SR$^2$: Reference-guided Generated Content Super-Resolution and Refinement

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR$^2$), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR$^2$ task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR$^2$ that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR$^2$ model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.

24.
arXiv (CS.AI) 2026-06-15

Regulating the Machine Contributor: Governance and Policy Alignment in Open Source

arXiv:2606.14594v1 Announce Type: cross Abstract: AI-assisted software development has moved from line-level autocomplete to agents that can plan changes, edit files, and submit pull requests with limited human supervision. Open-source software, however, evolves through a process designed for humans: contributor agreements, codes of conduct, and review norms all assume a legally accountable person who can attest to provenance and answer reviewer questions. Autonomous and semi-autonomous AI contributors strain those assumptions, and the 2025-2026 record of agent-driven incidents, AI-generated nuisance volume, and platform-level shutdowns shows that the gap is operationally consequential. Several open-source organisations have responded with contribution policies, but the result is fragmented, and its alignment with emerging AI governance frameworks (EU AI Act, NIST AI RMF with the UC Berkeley Agentic AI Profile, ISO/IEC 42001 and 23894) is unmapped at the contribution level. We compare policies across six organisations (SymPy, LLVM, matplotlib, OpenInfra, the Apache Software Foundation, and the Linux Foundation) using Most-Similar Systems Design with indicator-based coding and process tracing for SymPy and LLVM. From this we derive a six-dimensional taxonomy (disclosure, responsibility, human oversight, licensing, enforcement, maintainer workload), an ordinal Policy Maturity Score, and a mapping of documented agent incidents onto the dimensions each policy fails to govern. Aligning the dimensions with the regulatory frameworks above identifies overlapping gaps neither side currently closes, and we close by sketching the shape of a harmonised tiered framework and the empirical evaluation needed to calibrate it.

25.
arXiv (CS.LG) 2026-06-16

Reinforcement Learning for LLM-based Event Forecasting

arXiv:2606.15917v1 Announce Type: new Abstract: We use Group Relative Policy Optimization (GRPO), a recently devised sample and memory efficient reinforcement learning method, to finetune pretrained LLMs in the range of 1.5B to 14B parameters equipped with the ability to get current information through the use of a Wikipedia revisions tool, or news summaries, to forecast real events beyond the knowledge cutoff of the LLM, as well as problems made to simulate different aspects of the dynamics of that training. We use the results of these experiments to comment on the scaling capability of LLMs for forecasting, as well as classify how judgmental forecasting fits into the verifiable/unverifiable domain taxonomy, considering the impact of the inherent aleatoric uncertainty when forecasting future events (e.g. the roll of a die). As a result of the GRPO training, we manage to bring a 1.5B parameter transformer (Qwen 2.5 1.5B) to forecasting performance superior to Claude Sonnet 3.5 over the same dataset as measured by cross entropy from the market agreed probabilities. We also discuss various dead ends on the path to this result.