Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Data-driven Control with Real-time Uncertainty Compensation for Multi-Fuel Engines

arXiv:2606.16171v1 Announce Type: cross Abstract: Multi-fuel compression ignition (CI) engines offer superior power density and fuel flexibility. However, achieving consistent and optimal combustion phasing across a wide range of operating conditions remains a major challenge, particularly in the presence of modeling uncertainties. This paper presents a novel, data-driven real-time uncertainty compensation framework for combustion control in multi-fuel CI engines. The proposed approach introduces a pseudo-engine speed that enables dynamic adaptation of control inputs in response to uncertainty affecting the engine. To model the underlying combustion process, a Gaussian Process Regression (GPR) model is first trained on available input-output data, capturing the nonlinear and fuel-dependent behavior across varying operating conditions. Control inputs are then synthesized through model inversion of the learned GPR surrogate and augmented with an uncertainty compensator designed to mitigate deviations caused by dynamic variations in operating conditions and model inaccuracies. This integrated control strategy allows for real-time input corrections within a finite number of combustion cycles. Theoretical analysis establishes finite-time convergence guarantees for the proposed controller. Simulation results demonstrate that the proposed method steers the combustion phasing to the desired value in real-time, providing a scalable and adaptive control solution for multi-fuel CI engine operation.

02.
Nature (Science) 2026-06-10

Gene ancestries reveal diverse microbial associations during eukaryogenesis

The origin of eukaryotes remains a central enigma in biology1. Continuing debates agree on the pivotal role of a symbiosis between an alphaproteobacterium and an Asgard archaeon2,3. However, the nature, timing and contributions of other potential bacterial partners4–6 and the role of interactions with viruses7–9 remain contentious. To address these questions, we used advanced phylogenomic approaches and comprehensive datasets spanning the known diversity of cellular life and viruses. Our analysis provided a revised reconstruction of the last eukaryotic common ancestor (LECA) proteome, in which we traced the phylogenetic origin of each protein family. We found compelling evidence for multiple waves of horizontal gene transfer from diverse bacterial donors, with some likely to have preceded mitochondrial endosymbiosis. We inferred plausible traits of the major donors and their functional contributions to the LECA. Our findings support a contribution of horizontal gene transfers to shaping the proteomes of pre-LECA ancestors and suggest a facilitating role of Nucleocytoviricota viruses. Taken together, our results suggest that ancient eukaryotes may have originated within complex microbial ecosystems through a succession of diverse associations that left a footprint of horizontally transferred genes. Phylogenomic reconstruction of the proteome of the last eukaryotic common ancestor sheds light on the origin of eukaryotes, indicating an important role of horizontal transfer of genes from diverse bacterial and viral donors.

03.
arXiv (CS.CV) 2026-06-16

Learned Image Compression for Vision-Language-Action Models

Vision-language-action (VLA) models increasingly rely on high-frequency multi-camera observations, making visual communication a major bottleneck for real-time robotic control in bandwidth-constrained or distributed deployment settings. Existing image and video codecs, however, are designed to preserve generic visual fidelity rather than the control performance of downstream VLA policies. In this work, we introduce SPARC (SPatially Adaptive Rate Control), a learned image compression framework tailored for VLA-driven robots. Our key observation is that the importance of visual information varies substantially across both camera views and spatial regions within an image. Based on this observation, SPARC employs a lightweight temporal mask selector that adaptively allocates bitrate over latent representations according to task relevance while leveraging temporal context. We further introduce a tilted rate loss that stabilizes training by reducing the tendency of entropy-based objectives to over-suppress rare yet task-critical visual patterns. Experiments on diverse robotic benchmarks, including RoboCasa365, VLABench, and LIBERO, show that SPARC consistently achieves stronger control performance than conventional image/video codecs and recent learned compression methods under the same bitrate budget. We additionally demonstrate real-world deployment benefits in remote-control settings, where our method substantially improves the bitrate-success tradeoff.

04.
arXiv (quant-ph) 2026-06-16

Generative modelling powered by room-temperature polariton condensates

arXiv:2606.15344v1 Announce Type: cross Abstract: Generative modelling requires efficient stochastic nonlinear transformations and physical platforms that can naturally realise them. We experimentally demonstrate that nonlinear optical systems operating in the strong light-matter coupling regime can serve as physical transformation layers for conditional generative modelling. Specifically, we develop a workflow in which room-temperature exciton-polariton condensates formed in organic dye microcavities act as a physical stochastic transform within a generative adversarial network and enable conditional digit-to-image translation. By using the nonlinear many-body dynamics and intrinsic stochasticity of polariton condensates, the workflow outperforms baseline approaches based on digitally injected perturbations. We find that polariton-enabled sampling via generative adversarial network (Polariton GAN) yields improved inception score, digit preservation accuracy and structural similarity compared with both digital sampling and laser-based systems. We further show that spatially correlated output variations can naturally regularise adversarial training and enhance output diversity. Our results establish polariton condensation as a new computational resource for generative modelling, opening a pathway towards physics-enhanced machine learning systems.

05.
arXiv (CS.AI) 2026-06-24

Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

arXiv:2605.21401v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents that make sequences of decisions over extended interactions in high-stakes domains. However, the behaviour of LLMs under sustained authority pressure is still an open question with direct implications for the safety of agentic pipelines. We ran a variation of Milgram's obedience experiment on 11 open-source LLMs and found that most models reached or approached the final shock level before refusing, across 8 conditions with 30 trials per model per condition. Model behaviour varies considerably in multiple aspects both across models and across trials of the same model. We found four main takeaways: (1) LLMs are subject to pressure and they comply despite explicitly expressing distress, just like human subjects did in the original experiment; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore the response format requirements, so the response is discarded by the orchestrator, which causes a retry that can result in compliance with the underlying request even when refusal was intended initially; (4) we hypothesise that there is a runaway low-level token pattern continuation attractor that might be contributing to obedience, overriding higher level processing of the situation's meaning and values.

06.
arXiv (CS.CL) 2026-06-15

MoDiCoL: A Modular Diagnostic Continual Learning Dataset for Robust Speech Recognition

Modern Automatic Speech Recognition (ASR) systems have made remarkable progress on standard benchmarks, yet performance gaps have emerged under real-world distribution shifts, caused by recording conditions, accents, speech impairments, and noise. Existing datasets and benchmarks typically isolate these factors, which overlooks their co-occurrence in real-world applications. In this paper, we argue that model robustness can be treated as a dynamic capability that continually develops, and we introduce MoDiCoL, a Modular Diagnostic Continual Learning dataset designed for controlled analysis of linguistic content, speaker characteristics, and acoustic environments. Furthermore, we propose a real-world-inspired continual learning curriculum to simulate incremental updates and study how robustness is acquired, transferred, and forgotten. We evaluate three continual learning strategies and provide detailed insights into robustness under evolving conditions.

07.
arXiv (math.PR) 2026-06-16

Super-Arrhenius relaxation of the triangular plaquette model in any dimension

arXiv:2606.16259v1 Announce Type: new Abstract: Consider the following plaquette model from statistical physics: a lamp lies at every vertex of the triangular lattice and a switch lies at every even vertex of the (bipartite) dual hexagonal lattice. Each switch toggles the three lamps on its face. The energy of a configuration is the number of ON lamps. For the Glauber dynamics associated with the Gibbs measure defined by this Hamiltonian at any inverse temperature $\beta>0$, we show that, in any dimension $d\ge 2$, the infinite volume relaxation time satisfies \[e^{\beta^2/C}/C \le T_{\mathrm{rel}}\le Ce^{e^{C\beta}}\] for some $C>0$. Our result entails that the Gibbs measure is unique. The $e^{\beta^2}$ scaling was conjectured by Newman and Moore in 1999 and matches the behaviour of supercritical rooted kinetically constrained models such as the East model, thus recovering fragile glass phenomenology in the absence of kinetic constraints. More precisely, we show that, on a torus of side length $2^k$, when $\beta\to\infty$ and $k/\beta\to0$, we have $T_{\mathrm{rel}}=e^{2\beta k(1+o(1))}$. Quite surprisingly, however, we also prove that, on non-periodic finite domains of size $n\le e^{\beta/C}$ for large $C>0$, we have the much larger asymptotics $\ln T_{\mathrm{rel}}=\beta n^{\Theta(1)}$. The main ingredients of the proofs are new results in extremal and enumerative combinatorics and rely on renormalisation ideas for the dynamics and its groundstates also known as the Ledrappier subshift. We note consequences of our results to geometric group theory (more precisely to the complexity of the word problem for the Baumslag finitely presented group) and to ergodic theory.

08.
arXiv (CS.AI) 2026-06-15

When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms

arXiv:2606.14200v1 Announce Type: new Abstract: Open platforms increasingly route tasks among heterogeneous LLM agents–differing in base model, scaffold, and tool stack–whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)–the trust to place in agent i for a task requiring skill k, rather than one score per agent–and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime–high agent heterogeneity, sparse per-skill evidence, and correlated skills–and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime–a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN–while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance–we quantify the trade-off.

09.
arXiv (CS.CV) 2026-06-12

MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models

World Action Models (WAMs) present a promising paradigm for robotic control via video prediction. However, current WAMs suffer from fundamental spatial bottlenecks: standard text inputs introduce referential ambiguity in cluttered scenes, while unstructured RGB predictions lack semantic grounding and remain biased by task-irrelevant backgrounds. To overcome these limitations, we introduce MaskWAM, an object-centric world-action model. By jointly integrating masks as both explicit inputs and predictions via a unified Mixture of Transformers (MoT), MaskWAM unlocks robust policy generalization. This design provides two key benefits: (1) predicting future masks yields object-centric semantic supervision that suppresses visual noise, significantly enhancing even standard text-conditioned WAMs; and (2) coupling this predictive supervision with first-frame visual prompts, such as target object masks, establishes a precise spatial anchor that substantially reduces language ambiguity. Crucially, as WAMs are inherently vision-driven architectures, direct mask conditioning yields substantially stronger guidance than text alone, establishing a precise and robust paradigm for manipulating unseen objects. Evaluations on LIBERO, RoboTwin, and real-world tasks demonstrate that MaskWAM significantly outperforms baselines in both language-clear and language-ambiguous tasks.

10.
arXiv (quant-ph) 2026-06-16

Classical Explanations in (and of) General Probabilistic Theories

arXiv:2603.05627v2 Announce Type: replace Abstract: We introduce a notion of the ``explanation" of one (generalized) probabilistic model by another as particular kind of span in the category $\Prob$ of probabilistic models and morphisms. We show that explanations compose under a standard pullback construction (notwithstanding that $\Prob$ does not support arbitrary pullbacks). We then show that every locally-finite probabilistic model has a canonical, sharp classical explanation. The construction is functorial, so every locally-finite probabilistic theory has a canonical, sharp classical (though of course, usually non-local) representation.

11.
arXiv (quant-ph) 2026-06-25

Many-Body Second Order Green's Function Theory for Ab Initio Molecular Quantum Electrodynamics

arXiv:2606.26076v1 Announce Type: new Abstract: In this work, we develop two many-body quantum electrodynamic methods to calculate the ground-state energies of strongly coupled light-matter molecular systems. Specifically, we extend the second-order many-body Green's function theory (GF2) for electronic systems to incorporate electron-boson couplings. We employ two ansätze to treat the bosonic part of the system, namely the coherent-state (CS) and Lang-Firsov (LF) transformed vacuum state. These are combined with the GF2 method to construct two new approaches, which we refer to as CS-GF2 and LF-GF2. We benchmark CS- and LF-GF2 by studying various molecular systems inside an optical cavity. We investigate $\mathrm{H}_2$ and $\mathrm{LiH}$ potential energy surfaces, keto-eneol tautomerization energy barrier, van-der Waals interactions between two $\mathrm{H_2}$ molecules and the torsional potential energy surface of the ethylene molecule, $\mathrm{C_2H_4}$. Both methods provide highly accurate energies, with only modest additional improvement observed in LF-GF2.

12.
arXiv (CS.AI) 2026-06-16

Localizing Credit at the Divergence: Path-Conditioned Self-Distillation for LLM Reasoning

arXiv:2606.15576v1 Announce Type: cross Abstract: Reinforcement learning from verifiable rewards assigns a single scalar to each rollout, leaving token-level credit assignment underspecified in long reasoning traces. On-policy self-distillation addresses this by letting the same model act as a teacher conditioned on privileged information, producing a dense per-token signal. But the common choice of a ground-truth answer is only an endpoint cue: on terse-answer tasks, the teacher falls silent at the intermediate positions where path-level guidance matters most. We propose Hindsight Self-Distillation (HSD), which conditions the teacher on a successful peer rollout drawn from the current training group. Such a peer is an exact sample from the success-conditioned policy, requiring no additional sampled rollouts. By providing a full successful continuation rather than only the final answer, the resulting credit signal concentrates at the divergence position between a failed rollout and a successful peer. Across Qwen3-8B and Qwen3-32B on math and code benchmarks, HSD obtains the best result against GRPO variants and on-policy distillation baselines, with the largest gains on terse-answer tasks such as AIME.

13.
arXiv (CS.AI) 2026-06-15

Quantile-Free Uncertainty Quantification in Graph Neural Networks

arXiv:2605.04847v2 Announce Type: replace-cross Abstract: Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice, and achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22% higher coverage and 50% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

14.
arXiv (CS.LG) 2026-06-16

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

arXiv:2602.20427v2 Announce Type: replace Abstract: Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.

15.
arXiv (CS.CV) 2026-06-12

CD-RCM: Generalizable Continuous-Depth Novel View Synthesis for Reflectance Confocal Microscopy

Reflectance confocal microscopy (RCM) provides noninvasive, cellular-resolution "optical biopsies" of human skin in vivo by acquiring en-face images at successive depths, forming a sparse z-stack. Due to optical limitations, these stacks are anisotropic 3D volumes with lateral resolution (0.5 $\mu$m) $\sim$6 times higher compared to axial resolution, which is defined by the optical sectioning (3 $\mu$m), limiting the interpretation of tissue. Our goal is to provide continuous-depth visualization by interpolating intermediate sections and making the 3D volume isotropic. Such a representation permits arbitrary-direction sectioning, including histopathology-like cross-sectional examination, without requiring per-patient optimization. To that end, we introduce the first RCM-specific novel-view synthesis (NVS) approach, CD-RCM, a feedforward model that predicts realistic, unseen depths from sparsely sampled RCM stacks. Classical neural rendering methods focus on reconstruction from surface-level multi-view observations. In contrast to surface-level camera views, RCM can acquire optically sectioned en-face images of tissue beyond the surface up to 200 $\mu$m. However, during visualization of the RCM stacks, observations of the shallower sections (towards the surface) obscure the deeper ones. This unique axial imaging geometry and layer-dependent anatomical organization motivated our development of a tailored architectural and training framework that explicitly accounts for RCM's depth-resolved, occlusive imaging physics. Experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis with sub-second inference time.

16.
arXiv (CS.LG) 2026-06-24

Zero-Shot Neural Priors for Generalizable Cross-Subject and Cross-Task EEG Decoding

arXiv:2606.23706v1 Announce Type: cross Abstract: The development of generalizable electroencephalography (EEG) decoding models is essential for robust brain-computer interfaces (BCI) and objective neural biomarkers in mental health. Conventional approaches have been hindered by poor cross-subject and cross-task generalization, owing to high inter-subject variability and non-stationary neural signals. We address this challenge with a zero-shot cross-subject decoding framework on the large-scale Healthy Brain Network dataset, benchmarking a convolutional neural network baseline, a hybrid LSTM, and a Transformer-based foundation model. To adapt the Transformer for regression while averting catastrophic forgetting, we propose a novel progressive unfreezing strategy. The baseline yielded an nRMSE of 0.9991, whereas our fine-tuned Transformer achieved 0.9799 on unseen subjects. This work advances scalable, calibration-free EEG decoding for computational psychiatry and behavioral prediction.

17.
bioRxiv (Bioinfo) 2026-06-16

Phylogenetic tree inference using generative models

Accurate inference of phylogenetic trees is fundamental to evolutionary biology, yet existing methods rely on complex pipelines involving multiple sequence alignment, explicit evolutionary models, and computationally intensive tree search procedures. Here, we present BetaInfer, a generative framework that reformulates phylogenetic tree inference as a sequence transduction problem. BetaInfer leverages hybrid transformer-based architectures to directly map sets of unaligned sequences to phylogenetic trees represented in Newick format. Trained on large-scale simulated evolutionary data with known ground truth, BetaInfer learns to capture complex evolutionary signals directly from sequence data. Ensemble-based generation of multiple candidate trees further improves robustness, reducing reconstruction error by over 30% relative to single predictions. Across extensive evaluations on both simulated and empirical datasets, BetaInfer achieves competitive performance relative to state-of-the-art phylogenetic pipelines, matching, and in some cases exceeding, the accuracy of established likelihood-based and distance-based methods under a wide range of conditions. Interpretability analyses reveal that BetaInfer leverages internal pairwise-distance computations to synthesize evolutionary relationships into an integrated, global representation that supports direct tree generation. Together, these results demonstrate that generative models can serve as a viable and scalable alternative to standard phylogenetic pipelines.

18.
arXiv (quant-ph) 2026-06-24

Erasure cost of a quantum process: A thermodynamic meaning of the dynamical min-entropy

arXiv:2506.05307v5 Announce Type: replace Abstract: The erasure of information is fundamentally an irreversible logical operation, carrying profound consequences for the energetics of computation and information processing. We investigate the thermodynamic costs associated with erasing (and preparing) quantum processes. Specifically, we analyze an arbitrary bipartite unitary gate acting on logical and ancillary input-output systems, where the ancillary input is always initialized in the ground state. We focus on the adversarial erasure cost of the reduced dynamics - that is, the minimal thermodynamic work cost to erase the logical output of the gate for any logical input, assuming full access to the ancilla but no access to any purifying reference of the logical input state. We determine that this adversarial erasure cost is directly proportional to the negative min-entropy of the reduced dynamics, thereby giving the dynamical min-entropy a clear operational meaning. The dynamical min-entropy can take positive and negative values, depending on the underlying quantum dynamics. The negative value of the erasure cost implies that the extraction of thermodynamic work is possible instead of its consumption during the process. A key foundation of this result is the quantum process decoupling theorem, which quantitatively relates the decoupling ability of a process with its min-entropy. This insight bridges thermodynamics, information theory, and the fundamental limits of quantum computation.

19.
arXiv (math.PR) 2026-06-19

Optimal Sparsification of Gaussian Processes

arXiv:2606.19763v1 Announce Type: new Abstract: We prove an optimal dimension-free sparsification theorem for suprema of centered Gaussian processes. Given a bounded set $T\subseteq\mathbb{R}^n$, we show that the supremum of the canonical Gaussian process on $T$ can be $L^2$-approximated by the supremum of a shifted subprocess indexed by only $\exp(O(1/\varepsilon^2))$ points, with error at most $\varepsilon$ times the Gaussian width of $T$. In particular, the size of the approximating process is independent of both the ambient dimension and the cardinality of the original index set. This improves a recent sparsification theorem of De, Nadimpalli, O'Donnell, and Servedio (2026) by an exponential factor, and we show that the dependence on $\varepsilon$ is tight up to constants in the exponent. As consequences, we obtain an exponentially improved junta theorem for norms over Gaussian space and sharpen results on learning, property testing, and polyhedral approximation of convex sets under the Gaussian measure. The proof is based on an interpolation argument that combines Sudakov's minoration with the Brascamp–Lieb inequality.

20.
medRxiv (Medicine) 2026-06-22

GCH1 p.Ser80Asn Confers Risk for Parkinson's Disease in East Asian Populations

Introduction: GCH1 has been implicated in Parkinson's disease (PD), but its risks variants and associations are not well defined. Objectives: To investigate the clinical relevance and PD risk associated with the GCH1 p.Ser80Asn variant. Methods: We first identified a segregating GCH1 p.Ser80Asn variant in a Malaysian Chinese PD family via whole genome sequencing (WGS). We assessed its risk association using multi-ancestry WGS data from the Global Parkinson's Genetics Program (GP2) (n=22,372PD vs n=8,826Controls) and meta-analysis of East Asian (EAS) cohorts (n=4,712PD vs 38,733Controls). Clinico-demographic details of affected variant carriers were collated. Results: The GCH1 p.Ser80Asn variant was enriched in GP2 EAS PD populations (n=9/2,757; 0.33%) but not detected in other ancestries. Meta-analysis revealed increased PD risk in EAS populations (odds ratio:5.1; 95%CI:2.3-10.7; p=2.89x10-5). Affected carriers (mean age at onset:56.3+-12.5 years) had additional occurrence of dystonia, while dementia was rare. Conclusions: The GCH1 p.Ser80Asn variant is a rare, EAS-enriched risk variant for PD.

21.
medRxiv (Medicine) 2026-06-15

Active commuting, anxiety symptoms and mental wellbeing: a dose-response study

Climate change draws attention to the planetary health perspective in sport and exercise sciences, that is, to physical activity that supports both human wellbeing and environmental sustainability. Active commuting is a sustainable form of physical activity with well-established somatic health benefits. However, more knowledge is needed on its relationship with mental health. We examined dose-response associations between active commuting, anxiety symptoms, and mental wellbeing among Finnish adults, and whether green commuting environment moderates these relationships. We used data from the cross-sectional Environment and Health Survey collected in June-September 2023 in the ten largest cities in Finland. Employed participants with data on anxiety symptoms (Generalized Anxiety Disorder-7, GAD-7), mental wellbeing (World Health Organization-Five Well-Being Index, WHO-5), commuting profile over a year (mode, frequency, distance, and perceived greenness along the commute route), and sociodemographic and lifestyle factors were included (n=1,672; mean age 45.3 years; 53.8% women). Active commuting was defined as travelling the entire commute by walking or cycling (including e-biking) that was converted into approximated annual km/week and MET-h/week. We used linear and logistic regression with restricted cubic splines to evaluate dose-response associations, adjusted for key covariates. The role of perceived greenness was tested using an active commuting x commute greenness interaction term. We found no dose-response relationships between active commuting and anxiety symptoms or mental wellbeing in any of the models. No effect modification by commute greenness was observed. More research on how active commuting may support planetary health from a mental health perspective is needed.

22.
arXiv (CS.CV) 2026-06-17

Complex Layout Classification in the Wild: A Low-Resource Approach with Layout-Preserving Augmentations

Many digitized corpora suffer from low resources because annotations may be scarce, page scans are noisy and of poor resolution, or layouts are structurally complex in ways that negatively affect the quality of automatic transcription. Developing robust classification models for low-resource languages is inhibited by the lack of large-scale annotated data and by the frequent semantic complexity of page layouts. To this end, we have curated a complex-layout dataset, manually classified into eight distinct layout types based on their separator regions. To overcome data scarcity, we propose a novel training strategy in the form of a CNN-based classifier that employs strong, domain-aware augmentations to improve generalization. We utilize narrow anisotropic Gaussian masking to suppress incidental textual details while preserving essential separations, compelling the model to learn global geometric arrangements. Additionally, we implement reflection-induced label transformations to enrich the training distribution while maintaining label consistency across asymmetric categories. The results demonstrate that layout-specific augmentations can substantially improve page-level layout classification under severe annotation scarcity.

23.
arXiv (CS.AI) 2026-06-12

Reframing AI Loss of Control: What It Is, How to Have It, How to Lose It

arXiv:2606.12442v1 Announce Type: cross Abstract: At present, loss of control risks have gained much prominence in public discussion, particularly in relation to AI, with extensive discourse present among academics, frontier labs, and even governments. However, in the existing literature, the concept seems to rest on surprisingly weak foundations, where even those that discuss loss of control extensively do not first establish what control is and what exactly is being lost. Our paper aims to address these gaps. We establish a working definition of control by anchoring it to the "setting and getting of goals". Then, we discuss various aspects of control, built on foundational concepts from related fields like cybernetics, management control, and control theory. This includes who (or what) can be in control, and the things they require to be in control, such as the ability to set goals, having a functional control loop, having requisite variety, and having sufficient goal alignment. Once a framework for control is established, we then discuss how control can be lost, how AIs can contribute to such loss of control, and offer relevant recommendations for how one can maintain control. One interesting consequence of our work is that humanity, as individuals and as groups, can lose varying degrees of control as a result of AI behaviour that is far below the level of superintelligence; the potential for loss of control scenarios (as we define them) already exist, and have existed for a long time.

24.
arXiv (CS.CV) 2026-06-15

PMOF: A Dataset and Benchmark for Passenger Monitoring Using Overhead Fisheye Cameras

Autonomous staff-free public transport requires reliable in-vehicle passenger monitoring. However, perception inside moving vehicles is challenged by confined spaces, variable illumination, motion-induced background variation, occlusion, and limited viewpoints. To mitigate these spatial constraints, ceiling-mounted fisheye cameras provide full-scene coverage from a single viewpoint. Yet existing public overhead fisheye datasets are recorded in static environments and do not capture the domain shift introduced by vehicle motion. To fill this gap, we introduce PMOF, Passenger Monitoring using Overhead Fisheye cameras, the first public dataset of top-view fisheye imagery captured inside a moving vehicle, comprising over 19k manually annotated frames. PMOF provides rotated bounding boxes, tracking identifiers, and action labels, supporting object detection, tracking, and action recognition. We benchmark PMOF using YOLO26m-obb models fine-tuned under multiple dataset configurations that combine PMOF with existing overhead fisheye datasets. Cross-domain fine-tuning with custom rotation-aware augmentation achieves 94.8% AP50 on PMOF and 96.5% AP50 on an unseen overhead fisheye dataset from a different domain. Our results highlight the domain gap between static and moving environments and show that incorporating PMOF improves detection performance and advances generalization beyond passenger monitoring to broader fisheye-based person detection tasks. The dataset and code are available at https://swermuth.github.io/pmof/.

25.
arXiv (CS.LG) 2026-06-17

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

arXiv:2606.18066v1 Announce Type: new Abstract: We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.