Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-17

Quantum algorithm for dephasing of coupled systems: decoupling and IQP duality

arXiv:2601.06298v2 Announce Type: replace Abstract: Noise and decoherence are ubiquitous in the dynamics of quantum systems coupled to an external environment. In the regime where environmental correlations decay rapidly, the evolution of a subsytem is well described by a Lindblad quantum master equation. In this work, we introduce a quantum algorithm for simulating unital Lindbladian dynamics by sampling unitary quantum channels without extra ancillas. Using ancillary qubits we show that this algorithm allows approximating general Lindbladians as well. For interacting dephasing Lindbladians coupling two subsystems, we develop a decoupling scheme that reduces the circuit complexity of the simulation. This is achieved by sampling from a time-correlated probability distribution - determined by the evolution of one subsystem, which specifies the stochastic circuit implemented on the complementary subsystem. We demonstrate our approach by studying a model of bosons coupled to fermions via dephasing, which naturally arises from anharmonic effects in an electron-phonon system coupled to a bath. Our method enables tracing out the bosonic degrees of freedom, reducing part of the dynamics to sampling an IQP circuit. The sampled bitstrings then define a corresponding fermionic problem, which in the non-interacting case can be solved efficiently classically. We comment on the computational complexity of this class of dissipative problems, using the known fact that sampling from IQP circuits is believed to be difficult classically.

02.
arXiv (quant-ph) 2026-06-11

Measurement incompatibility and quantum steering via linear programming

arXiv:2506.03045v3 Announce Type: replace Abstract: The problem of deciding whether a set of quantum measurements is jointly measurable is known to be equivalent to determining whether a quantum assemblage is unsteerable. This problem can be formulated as a semidefinite program (SDP). However, the number of variables and constraints in such a formulation grows exponentially with the number of measurements, rendering it intractable for large measurement sets. In this work, we circumvent this problem by transforming the SDP into a hierarchy of linear programs that compute upper and lower bounds on the incompatibility robustness with a complexity that grows polynomially in the number of measurements. The hierarchy is guaranteed to converge and it can be applied to arbitrary measurements – including non-projective POVMs (Positive Operator-Valued Measures) – in arbitrary dimensions. While convergence becomes impractical in high dimensions, in the case of qubits our method reliably provides accurate upper and lower bounds for the incompatibility robustness of sets with several hundred measurements in a short time using a standard laptop. We also apply our methods to qutrits, obtaining non-trivial upper and lower bounds in scenarios that are otherwise intractable using the standard SDP approach, although such bounds are significantly looser than the ones obtained in the qubit case. Finally, we show how our methods can be used to construct local hidden state models for states (i.e., to prove that a state cannot lead to steering under any possible local measurements), or conversely, to certify that a given state exhibits steering; for two-qubit quantum states, our approach is comparable to, and in some cases outperforms, the current best methods.

03.
arXiv (CS.CV) 2026-06-11

Tac-DINO: Learning Vision-Tactile Features with Patch Alignment

Touch is the primary medium through which humans interact with the environment. Currently, tactile learning mainly focuses on image-level pretraining or alignment. However, tactile signals correspond to local object contact, while research into scale alignment and holographic matching remains limited and proper datasets and benchmarks also lack. To bridge this gap, we first construct a data collection system to acquire a large-scale tactile dataset, with over 20 K tactile contacts from 505 real-world objects. Building on this dataset, we design a Vis-Tac Holographic Matching Benchmark to evaluate vision-tactile local-to-global alignment ability. Then we propose Vision-Tactile Patch Alignment (VTPA) methods for vision-tactile representation learning. Experiments demonstrate that these exceed the performance of methods without alignment and align with whole-object images.

04.
arXiv (quant-ph) 2026-06-12

Coupling-Grouped XY-QAOA for Joint Anomaly-Feature Selection

arXiv:2606.13244v1 Announce Type: new Abstract: Selecting anomalous samples and explanatory features under fixed budgets defines a coupled constrained-optimization problem. Sequential feature-first selection ranks features before choosing samples, which can overlook features whose utility depends on which samples are selected, especially when scores are calibrated from reference data that may be limited, noisy, or drifting. We instead formulate the task as joint sample-feature selection under the same fixed counts. In the analyzed formal model, calibration-error sensitivity grows linearly with the number of samples for feature-first ordering but stays constant for joint selection. We introduce Coupling-Grouped XY-QAOA, a constraint-preserving grouped-angle variant for the resulting optimization problem. On matched sparse IBM Heron R3 benchmarks, a hardware-aware implementation reduces circuit depth by 45.9%-61.3% and two-qubit gates by 2.6%-5.2% relative to Qiskit optimization level 3 on the CZ-basis target. It enables, to our knowledge, the largest reported width-depth configurations for constraint-preserving bipartite-selection QAOA hardware executions with feasible-sector retention: 64 qubits at p=2 and 36 qubits at p=3. The 20-qubit p=5 runs retain 63% valid samples. Across 36-64 qubits, fixed-angle runs yield lower-energy feasible samples than matched random-feasible sampling. Warm starts reduce the gap to strict-feasible classical references by 57.5%-80.5%, and near-budget repair matches the sparse classical reference at 36 qubits. Benchmarks show gains in balanced fixed-budget regimes, and noiseless simulations show that problem-structured angle grouping improves over same-depth XY-QAOA and matched-parameter, type-preserving randomization controls. Overall, the results support calibrated joint selection and hardware-realizable constrained-mixer execution in the tested regimes.

05.
arXiv (CS.LG) 2026-06-19

Model soups need only one ingredient

arXiv:2602.09689v2 Announce Type: replace Abstract: Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

06.
arXiv (CS.LG) 2026-06-15

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

arXiv:2603.18464v3 Announce Type: replace Abstract: Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models is severely bottlenecked by synchronization barriers and the high cost of environment data acquisition. To overcome these challenges, we propose AcceRL, a distributed asynchronous RL framework that physically isolates environment rollouts, model inference, and gradient updates. By eliminating the cascading long-tail idle bubbles inherent in synchronous systems, AcceRL maximizes hardware utilization and ensures scalable throughput. Furthermore, AcceRL features a modular design that supports the integration of diverse, plug-and-play world models into its distributed pipeline. Extensive experiments demonstrate that the base framework achieves highly competitive performance across all four LIBERO[liu2023libero] task suites. Systematically, the asynchronous architecture delivers a $2.4\times$ throughput speedup over leading synchronous baselines. Algorithmically, by leveraging a world model pre-trained on 1,000 offline trajectories, AcceRL achieves up to a $200\times$ improvement in online sample efficiency on LIBERO-Spatial, establishing a robust framework that is both sample-efficient and time-efficient for embodied AI. Code is included in the supplementary material. Code is available at https://github.com/distanceLu/AcceRL.

07.
arXiv (CS.CV) 2026-06-12

Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension

Visual Text Comprehension (VTC) renders text into images for a vision-language model (VLM) to read, sidestepping LLM context-window limits and powering applications from long-page OCR to multi-page memory QA. Yet existing VTC pipelines treat rendering and layout as a fixed, content-agnostic preprocessing step and offer little mechanistic understanding of how VLMs internally process visualized text. Through a focused empirical study on VTC QA tasks, we reveal that VLMs exhibit a localization-without-utilization regime: evidence-localizing attention emerges sharply in the middle-to-late layers and is largely decoupled from answer correctness, yet simply enlarging the localized spans on the rendered page recovers a large fraction of the failures. Building on these observations, we propose AGAR (Attention-Guided Adaptive Rendering), a training-free, model-agnostic method that leverages a VLM's own middle-to-late layer attention to identify the top-K important visual patches, maps them back to word spans, and re-renders the page with those spans enlarged before re-inferring the answer. Extensive experiments across nine VTC benchmarks (short-form, long-context, and multi-page memory QA) and four VLM backbones show that AGAR (i)consistently improves off-the-shelf VLMs as a plug-and-play enhancement, (ii)composes with VLM post-training to yield further gains, and (iii)remains robust under both visual- and text-side input degradation.

08.
arXiv (quant-ph) 2026-06-19

Quantum Dynamics from Lax Pair Theory: A Reconstruction from Spectrum Preservation

arXiv:2606.19664v1 Announce Type: new Abstract: We reconstruct unitary quantum dynamics from a minimal axiomatic foundation built on Hilbert-space observables and isospectral evolution. The only dynamical assumption is that physical time evolution is a continuous one-parameter flow of Hermitian observables that preserves their spectra, i.e. the possible outcomes of measurement. We show that this assumption is already sufficient to force the Lax form of quantum dynamics. The Heisenberg equation, the time-dependent and time-independent Schrödinger equations, conservation laws, and good quantum numbers then follow as theorems rather than postulates. In this formulation, Lax pair theory supplies the missing dynamical bridge between the measurement structure of a Hilbert space and standard quantum evolution: the Hamiltonian is not assumed, but emerges as the generator required for an isospectral observable flow.

09.
arXiv (CS.AI) 2026-06-16

InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset

作者:

arXiv:2606.15730v1 Announce Type: cross Abstract: Backdoor unlearning aims to remove a malicious trigger behavior from a deployed model while preserving clean utility. We study the update-free inference-time setting, where model parameters remain frozen. First, we audit a common projection assumption under oracle paired clean and triggered features. Projection succeeds mainly on BadNets and leaves WaNet, Blended, and SIG at 0.683, 0.888, and 0.941 ASR on CIFAR-10 ResNet-18. This failure is not explained by spectral compactness, spatial locality, or subspace misalignment. It is predicted by a logit-triplet gap involving the target margin, target-logit drop, and non-target logit rise. We then introduce InstantForget, a clean-calibrated gated reset that flags anomalous features with a Mahalanobis score and moves only flagged features toward a neutral non-target representation. With one fixed operating point selected on held-out triggered validation, InstantForget reduces average ASR to 0.071 across four non-adaptive CIFAR-10 triggers without triggered samples or parameter updates at deployment. It also reaches 0.981 detection AUROC and transfers to six of eight tested backbones. Reported failures under WaNet, ModelNet10 point blend, two backbone geometries, and adaptive feature-compactness attacks define the method's scope.

10.
Nature (Science) 2026-06-17

Optical metasurfaces for general vision processing on the edge

作者:

Large-scale artificial intelligence (AI) models achieve notable performance in computer vision but require substantial computational resources, limiting their deployment on edge devices1,2. Optical neural networks (ONNs) promise reduced latency and energy consumption by making use of the inherent parallelism of light3. However, present ONNs struggle to scale and are confined to simple tasks, owing to the challenges of replicating exact algebraic operations of digital models using physical (analogue) systems. This work introduces a new paradigm that directly embeds core computer vision principles, including similarity-based recognition, attention-guided perception and detail–context fusion, into a large-scale optical metasurface. By unifying optical physics with these computer vision fundamentals, we develop a photonic–electronic engine that overcomes scalability and generality barriers, enabling high-accuracy, general-purpose computer vision at the edge. The resulting system combines a 41-million-parameter optical metasurface front end with a co-designed, ultraefficient 87,000-parameter digital back end, outperforming many digital models with tens of millions of parameters across object detection, segmentation, 3D reconstruction and video understanding. We build a deployable prototype and demonstrate real-time edge visual processing in natural scenes. This work represents a path towards practical optical computing for general vision tasks in complex natural environments, enabling a new paradigm for low-energy, low-latency, real-time on-device vision intelligence. By embedding core computer vision principles into a large-scale optical metasurface, an efficient vision processing system using far fewer parameters is demonstrated to outperform many digital models and enables deployment on edge devices.

11.
Science (Express) 2026-06-04

Long-range extended chains arising from polymerization-driven spontaneous assembly | Science

作者: 未知作者

A central challenge for conjugated polymers is to achieve long-range order while remaining solution-processable, which is essential for matching the electrical performance of their counterparts of crystalline inorganic semiconductors. Here we show that n-doped poly(benzodifurandione) (n-PBDF) can undergo polymerization-driven spontaneous assembly (PSA), in which chain growth, chemical doping, and structural ordering are intrinsically coupled, yielding long-range chain extension over hundreds of nanometers. We reveal that the spontaneously formed n-PBDF nanoribbons arise from a self-initiated, convergent growth mechanism driven by cooperative monomer–polymer interactions and stabilized by proton-coupled duplex chains and the polymer’s intrinsic polyelectrolyte character. With long-range extended chains in the nanoribbons, the aligned n-PBDF thin films demonstrate metallic-level conductivity (>10 4 Siemens per centimeter).

12.
arXiv (CS.AI) 2026-06-11

SPEA2$^+$: Improved Density Estimation in SPEA2 with Provable Runtime Guarantees

arXiv:2606.12382v1 Announce Type: cross Abstract: The Strength Pareto Evolutionary Algorithm 2 (SPEA2) is a popular and prominent evolutionary algorithm for solving multi-objective optimisation problems. Despite its popularity, theoretical analyses of SPEA2 have only appeared recently. Moreover, these analyses focus exclusively on how SPEA2 handles non-dominated solutions and disregard the algorithmic components responsible for handling dominated solutions. We conduct a first runtime analysis of SPEA2 for which these components are analysed. We prove that, unlike other prominent algorithms, including NSGA-II, NSGA-III and SMS-EMOA under the same setting of constant population size and duplicate elimination, SPEA2 is unable to cover the Pareto front of the OneTrapZeroTrap benchmark efficiently. Our results indicate that using k-th nearest-neighbour distance in the fitness assignment provides an insufficient signal to maintain diversity among dominated individuals. To address this issue, we propose an improved variant, SPEA2$^+$, that considers all pairwise distances. The new algorithm achieves the same performance guarantees as the other prominent algorithms on OneTrapZeroTrap, while matching the performance of the original SPEA2 on simpler problems. Experimental results complement our theoretical findings.

13.
arXiv (CS.LG) 2026-06-15

Recovery thresholds for hidden weighted sparse graphs

arXiv:2606.14335v1 Announce Type: cross Abstract: Recovering structural information from noisy high-dimensional data is a fundamental task in statistical inference. We investigate the recovery thresholds for a graph hidden in a randomly weighted complete graph. Specifically, an unknown graph $H^* \in H_n$ is chosen uniformly at random, and hidden in a complete graph of $n$ vertices as follows: the weight of an edge $e \in H$ is distributed independently according to $P_n$; otherwise the weight is distributed independently according to $Q_n$. The goal is to recover almost all of $H$ from these edge weights. Assuming a local Lipschitzness of the Rényi divergence between distributions $P_n$ and $Q_n$, and a mild density condition for the graphs $H_n$, we give a unified characterization of the information-theoretic limit for recovering almost all of $H$ (also known as almost exact recovery). Our characterization connects the KL divergence between $P_n$ and $Q_n$ to the logarithm of the first moment threshold of $H$ in the Erdős-Rényi random graph model $G(n,p)$. Our lower bound also extends to the task of partial recovery, in which only a constant $\lambda$-fraction of $H$ needs to be recovered. Last but not least, for certain Bernoulli and Exponential regimes, and for Gaussian distributions, we are able to show an All-or-Nothing (AoN) threshold phenomenon at the exponential scale.

14.
arXiv (CS.AI) 2026-06-18

RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories

arXiv:2512.04144v2 Announce Type: replace Abstract: Targeted interventions on language models, such as unlearning or model editing, aim to modify specific information, but their effects often propagate to related, unintended areas (e.g., removing virology content may degrade performance on allergies); these side-effects are commonly referred to as the ripple effect. We introduce RippleBench-Maker, an automatic pipeline that retrieves semantic neighbors of any source concept from a knowledge repository and generates multiple-choice questions at varying semantic distances. We instantiate this framework using WikiRAG, an open-source RAG system over English Wikipedia, to construct RippleBench-WMDP-Bio (584 seed topics, 352,961 questions), and evaluate eight unlearning methods on Llama3-8B-Instruct. All eight exhibit accuracy drops that are largest near the unlearned target and decay with semantic distance, each with a distinct propagation profile. We replicate these findings across Mistral-7B, Zephyr-7B, and Yi-34B; cross-model delta curves are nearly identical, suggesting ripple effects are a property of the unlearning method rather than the base model. We validate all major pipeline stages using a four-experiment Mechanical Turk study (5,200+ responses, 61 workers). We release all code, data, and infrastructure.

15.
arXiv (CS.CV) 2026-06-16

Decoupled Object-Centric Video Understanding for Generating Robotic Manipulation Commands

Translating video demonstrations into executable robot commands remains challenging because existing methods often fail to identify which objects are functionally involved in the demonstrated action. As a result, they may generate commands that are linguistically plausible but operationally ambiguous. We propose an object-centric video understanding framework that decouples action recognition from object identification to generate precise, grammar-free manipulation commands. Our approach integrates Temporal Shift Modules (TSM) for efficient spatio-temporal action classification with a novel Object Selection algorithm that identifies task-relevant objects through trajectory-based role classification, blur detection, and overlap minimization. The selected objects are then processed by Vision-Language Models (VLMs) for robust category recognition and zero-shot generalization. Evaluated on a modified Something-Something V2 dataset, our method achieves 86.79\% action classification accuracy and BLEU-4 scores of 0.337 on standard objects and 0.261 on novel objects. These results improve over the strongest task-specific baseline by 80.2\% and 143.9\%, respectively. Larger gains are observed in METEOR and CIDEr, reaching 157.9\% and 171.7\% on novel objects. Across all semantic metrics, our approach consistently outperforms task-specific methods and remains competitive with, or surpasses, large general-purpose VLMs while retaining a modular, object-centric design.

16.
bioRxiv (Bioinfo) 2026-06-14

Prediction of parsimonious and temporally sensitive sets of cell fate engineering transcription factors with IMCell

Transcription factor (TF) cocktails used in cell identity reprogramming protocols have largely been developed from experimental approaches. A handful of computational approaches have been reported, though have not been widely adopted by the scientific community. To standardize their use and assess their performance, we built CompForce, a platform that integrates these tools. Using CompForce, we found that existing computational methods offer modest improvements over differential expression on both synthetic and literature-curated data, and that their lackluster and inconsistent performance could be attributed to a reliance on local centrality metrics. To improve upon these methods, we developed IMCell, a prediction method that is inspired by the influence maximization problem. Unlike existing tools, IMCell returns optimized TF sets rather than ranked TF lists. We demonstrate that IMCell vastly out-performs existing tools, and further extend it to dynamic, stepwise contexts. The tools presented here are available in the R packages CompForce and IMCell.

17.
arXiv (quant-ph) 2026-06-19

A Quantum Encoding of Traveling Salesperson Tours via Route Generation, Cost Phases, and a Reversible Valid-Permutation Oracle

arXiv:2603.21283v3 Announce Type: replace Abstract: For a traveling salesperson problem (TSP) of n cities, we present a compact quantum encoding based on a time-register representation of tours. A candidate route is represented as a sequence of n-1 city labels over discrete time steps, with one fixed start city and the remaining cities encoded in binary registers. We describe three ingredients of the construction: uniform route generation over the route register, a reversible validity oracle, and a phase oracle that encodes the total tour cost. The validity oracle checks both that the non-start city labels form a permutation and, for incomplete graphs, that every directed edge used by the route exists. The cost oracle then accumulates the start-edge, intermediate-transition, and return-edge costs into a tour-dependent phase for valid routes. This yields a coherent superposition of candidate routes with feasibility and tour-length information embedded directly in the quantum state. The complete construction uses O(n log n) qubits, while a naive implementation has worst-case elementary-gate complexity O(n^3 log n). The encoding is compatible with amplitude amplification or spectral filtering techniques such as the quantum singular value transform (QSVT) or Grover's algorithm. However, due to the exponentially small fraction of valid tours, the overall complexity remains exponential even when combined with amplitude amplification.

18.
arXiv (CS.AI) 2026-06-18

Explaining Attention with Program Synthesis

arXiv:2606.19317v1 Announce Type: cross Abstract: A longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of components of deep networks with executable programs. We focus on attention heads in transformer language models. For a given head, we first compute its associated attention matrices on a collection of randomly selected training examples. Next, we prompt a pre-trained language model with a summary of these matrices, and instruct it to generate a set of Python programs that can reproduce the associated attention patterns given only text from the input sentence. Finally, we re-rank programs according to how well our final set of programs predict behavior on held-out inputs. We demonstrate that a set of fewer than 1,000 such generated programs can reproduce the attention patterns of heads in GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity above 75% on TinyStories. Moreover, the best-fit programs can replace neural attention heads without substantially affecting model behavior: replacing 25% of attention heads with programmatic surrogates across the three models incurs only a 16% average perplexity increase, while maintaining performance on a variety of downstream question answering benchmarks. This work contributes a scalable pipeline for reverse-engineering attention heads in transformer models using human-readable, executable code, advancing a path toward symbolic transparency in neural models.

19.
arXiv (CS.AI) 2026-06-17

Constitutional On-Policy Safe Distillation

arXiv:2606.03089v2 Announce Type: replace-cross Abstract: On-policy self-distillation (OPSD) has emerged as an efficient post-training paradigm by using a teacher conditioned on privileged information to provide dense token-level supervision. Prior work has shown that OPSD can collapse in verifiable reasoning tasks, but safety alignment differs in that it is guided by high-level constitutions rather than explicit target answers, making it a natural setting to revisit dense distillation. However, our pilot study show that safety OPSD still suffers from severe collapse: constitutional conditioning contracts the teacher distribution toward short and overly conservative responses, and Reverse KL further amplifies this contraction into reduced expressiveness. We formalize this effect as geometric leakage under safety boundaries in a non-orthogonal semantic space, where safety pressure transfers into the expressiveness dimension. Based on this analysis, we propose Constitutional On-Policy Safe Distillation (COPSD), which first calibrates the teacher through a Cross-SFT cold-start and then performs constitution-conditioned on-policy distillation. Experiments on 12 benchmarks show that COPSD achieves a consistently stronger safety–helpfulness trade-off than baselines while substantially reducing the safety tax on general reasoning ability.

20.
arXiv (CS.CL) 2026-06-12

WildIFEval: Instruction Following in the Wild

Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, extracted from natural user instructions. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. WildIFEval clearly differentiates between small and large models, and demonstrates that all models have a large room for improvement on such tasks. We analyze the effects of the number and type of constraints on performance, revealing interesting patterns of model constraint-following behavior. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

21.
arXiv (CS.CV) 2026-06-16

SceneCraft: Interactive System for Image Editing via Scene Graph

Recent advances in generative AI have enabled natural language-driven image editing, yet existing systems often fail in complex scenes with multiple interacting objects because they rely heavily on users crafting precise text prompts. To address the absence of structured control, we propose SceneCraft, a novel interactive framework that bridges user intent and model execution by representing images as editable scene graphs. Instead of guessing text prompts through trial and error, users interact directly with a visual graph to perform complex spatial and relational operations. These graph modifications are automatically translated into precise, context-aware editing prompts, effectively eliminating linguistic ambiguity. To ensure robust and diverse results, structured prompts are dispatched to multiple state-of-the-art generative models. Evaluations across diverse editing scenarios show that SceneCraft provides a more intuitive control mechanism, significantly reducing the cognitive burden of manual prompt engineering while generating outputs that users consistently rate as higher in quality and fidelity.

22.
arXiv (CS.CV) 2026-06-16

Mask Proposal Voting Based on Geodesic Framework for Robust Image Segmentation

Despite great advances, finding accurate segmentation remains a challenging task, especially in scenarios with cluttered backgrounds, complex intensity variations and topology appearance. Minimal path models have exhibited their strong ability in addressing image segmentation tasks. However, the performance of minimal paths-based segmentation approaches is heavily influenced by model initialization, hence limiting their application scope in practice. In this work, we propose a novel mask proposal voting framework that overcomes the major drawback of classical approaches, allowing robust segmentation even in complicated scenarios. Firstly, we introduce an efficient method for constructing adaptive domain cuts as a constraint for initializing the region-based min-cut evolution, by which diverse and reliable mask proposal candidates can be generated, substantially increasing the possibility of accurately covering the objective region by these proposals. Secondly, we propose a new mask voting scheme to build a voting score map encoding the final segmentation information. In contrast to classical path voting methods, our model allows incorporating priors to assign different importance to each individual mask. As a consequence, the proposed segmentation model is capable of accurately delineating object boundaries under complex scenarios, and is insensitive to initialization. Experiments demonstrate that our method consistently outperforms state-of-the-art minimal path-based approaches in both accuracy and robustness.

23.
arXiv (CS.CL) 2026-06-11

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol that makes heterogeneous agent harnesses, or claws, comparable under fair settings including a fixed prompt, runtime budget, workspace contract, patch extraction procedure, and evaluator. The full benchmark contains 350 GitHub issue-resolution instances across 8 languages and 43 repositories, drawn from SWE-bench-Multilingual and SWE-bench-Verified-Mini after future-commit cleanup. We also release Claw-SWE-Bench Lite for faster validation, which is an 80-instance subset selected by a cost-aware, rank-aware procedure over 17 calibration columns. On the full benchmark, OpenClaw with a minimal direct-diff adapter scores only $19.1\%$ Pass@1, whereas the full adapter reaches $73.4\%$ with the same GLM 5.1 backbone, showing that adapter design is essential for enabling OpenClaw-style harnesses to perform coding tasks effectively. Across an OpenClaw $\times$ nine-model sweep and a five-claw $\times$ two-model sweep, model choice changes Pass@1 by $29.4$ pp and harness choice by $27.4$ pp under fixed models; systems with similar accuracy can differ substantially in total API cost. Claw-SWE-Bench therefore treats harness and cost accounting as first-class axes of SWE-style coding-agent evaluation, providing both a full benchmark and a low-cost reference set for reproducible comparison. The data is available at https://github.com/opensquilla/claw-swe-bench and https://huggingface.co/datasets/TokenRhythm/Claw-SWE-Bench.

24.
arXiv (CS.AI) 2026-06-12

Algorithmic Constitutionalism

arXiv:2606.12437v1 Announce Type: cross Abstract: The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

25.
medRxiv (Medicine) 2026-06-16

A MULTICENTER SWEDISH HISTOPATHOLOGY IMAGE DATASET OF PEDIATRIC CENTRAL NERVOUS SYSTEM TUMORS

Refined detection methods, more detailed tumor characterization, and adequate distinction between different pediatric tumor subtypes are necessary to improve diagnosis and treatment, enable precision medicine, and advance patient prognosis. However, the application of computational approaches to pediatric brain tumors remains limited, largely due to the lack of accessible datasets. To address part of this gap, we provide whole slide images (WSIs) of hematoxylin and eosin (H&E)-stained tissue sections from all pediatric central nervous system (CNS) samples collected in Sweden between 2013 and 2023. These data represent a population-based national cohort encompassing all six pediatric oncology centers in Sweden and are available through the Swedish Childhood Tumor Biobank (BTB). The dataset includes 1,446 WSIs of sufficient image quality with confirmed CNS tumor diagnoses, derived from 537 unique subjects (562 cases). In addition, diagnosticrelevant clinical information is included. Corresponding whole-genome sequencing (WGS), wholetranscriptome sequencing (WTS), and methylation array data are available for most tumor samples through separate resources. This H&E dataset has been specifically curated to support artificial intelligence-based analyses, while also serving broader applications in medical research and education. When combined with matched molecular data, it provides a valuable resource for advancing multimodal and precision diagnostic approaches in the pediatric population. Refined detection methods, more detailed tumor mapping and adequate distinction between different subtypes of pediatric tumors are necessary to improve treatment, enable precision medicine and improve patient prognosis. Application of computational algorithms for pediatric brain tumors is very limited mainly due to the unavailability of pediatric histology brain tumor data sets. To enable the development of AI models comprehensive datasets covering a wide range of pediatric brain tumors are needed.