Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

Tree-Structured Orthonormal Decomposition of the Aitchison Simplex

arXiv:2606.11646v1 Announce Type: new Abstract: Compositional data – vectors encoding relative proportions – arise across scientific domains, including ecology, geochemistry, and genomics. The features in these data often come with known hierarchical structure (e.g., taxonomies, phylogenies, ontologies), yet existing methods either ignore this structure, discard the intrinsic Aitchison geometry, are designed for binary trees, or yield incomplete coordinate systems. We describe PolyILR, a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology. Our construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. We also establish a novel theoretical connection to softmax classifiers, suggesting possible applications to probabilistic modeling.

02.
arXiv (CS.AI) 2026-06-19

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

arXiv:2502.06866v3 Announce Type: replace-cross Abstract: The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

03.
arXiv (CS.CV) 2026-06-15

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Creative image editing tools, such as Photoshop's Remove or Generative Fill buttons, are central to everyday customer use and account for a major share of traffic in Photoshop and Lightroom. However, current generative AI models face significant latency challenges, which become even more pronounced when transitioning from convolution-based U-Nets to Diffusion Transformers (DiTs). In our evaluation on hundreds of representative image editing samples spanning a wide range of mask ratios, the DiT module alone accounts for an average of 73% of the total model latency, even after being distilled from 50 timesteps down to 8 timesteps. To tackle this challenge, we propose $HiLo-Token$, an input-adaptive token compression framework that allocates more token budget to high-frequency, rich-context regions while assigning fewer tokens to low-frequency areas. Specifically, for the editing region specified by the user mask, we retain all tokens within a dilated mask to preserve strong locality and contextual relevance. Outside the editing region, we introduce a simple yet effective high-frequency token selection strategy based on spatial frequency to capture important local details, while using tokens from a 16x downsampled image to represent low-frequency components and preserve the blurry but global structure. Extensive experiments on production-level evaluation data validate the effectiveness of the proposed method, achieving 3.13x, 2.59x, and 1.67x DiT speedups on A100-80GB for image editing tasks across small, medium, and large mask ratio categories with average ratios of 6.38%, 15.92%, and 35.36%, respectively, without any regression in generation quality.

04.
arXiv (CS.AI) 2026-06-18

The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbot

arXiv:2606.19197v1 Announce Type: cross Abstract: Abduction is a central approach to explain missing entailments from a knowledge base by providing a hypothesis, that would, if added to the knowledge base, make the missing entailment become true. Abduction under repair semantics has recently been investigated in detail, where several desirable properties and optimality criteria were considered, such as signature-restrictions and minimality in size and of introduced conflicts. Naturally, hypotheses that satisfy more than one of these properties or combine a property with an optimality criterion would be even more desirable for applications. So far, such hypotheses have not been investigated in the literature. In the present paper, we consider the ABox abduction problem for hypotheses satisfying more than one property or additional optimality criteria, for EL_bot under brave and AR semantics. Our main observation is that often requiring additional properties for hypotheses does not lead to an increase of complexity.

05.
arXiv (CS.AI) 2026-06-11

Towards an Inferentialist Account of Information Through Proof-theoretic Semantics

arXiv:2605.05368v5 Announce Type: replace-cross Abstract: Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting components. First, conceptual analysis: the metaphysics of information. Dretske expressed the key concepts of information in terms of intentionality, truth, and transmissibility. We replace truth with inferability, and trace the consequences of this replacement. Second, logic: proof-theoretic semantics (P-tS) provides a mathematical-logical realization of inferentialist reasoning. Using P-tS, we develop the first steps towards a mathematical-logical theory of an inferentialist primitive unit of information, the 'inferon'. This proof-theoretic approach counterpoints the model-theoretic view of information articulated in situation theory. Furthermore, we argue that it facilitates addressing all three components of van Benthem and Martinez's categorization of the understandings of information, as range, as correlation, and as code. Our focus is on information-as-correlation. Third, systems: the P-tS tools we develop provide the basis for a mathematical account of distributed systems modelling – a key tool from informatics for understanding the organization of information processing systems. This yields a reasoning-based theory of information flow in models of distributed systems. Overall, we seek to give a conceptually rigorous mathematical-logical account of information and its role within informatics, grounded in inference and reasoning.

06.
arXiv (quant-ph) 2026-06-11

Towards the implementation of a quantum classifier

arXiv:2606.10150v2 Announce Type: replace Abstract: In this work, we investigate the use of a quantum circuit as a binary classification model in the context of quantum machine learning. We call this model, binary quantum classifier. First, we describe fundamental concepts of quantum computing and introduce the computational tool used: Qibo, an open-source framework for efficient quantum simulations and quantum hardware control. Then, we describe how to design a binary quantum classifier for the classification of images and small arrays of variables by showing how to input data in the circuit, defining a quantum circuit model Ansatz with trainable parameters and a loss function, and implementing multiple minimizers. We test our quantum classifier with two data sets. The first one is the MNIST data set which is composed of handwritten digits (reduced to only handwritten zeros and handwritten ones for binary classification). We study the behavior of different minimizers by increasing the number of layers of the Ansatz. The second data set represents two different high energy collisions that can occur at colliders such as LHC (CERN). Due to in-time proton-proton interactions known as pile-up, we distinguish two different data sets: "without pile-up" and "with pile-up". These collisions can be represented by images of size 32x32 or by six high-level variables that we call features. By increasing the size of the training data set and the number of layers of the Ansatz, we search for the best minimizer. Splitting the data set in training set and test set, we compute: ROC curve, AUC score, confusion matrices and test set accuracy. For "with pile-up" images, we compare the results obtained with the quantum classifier with a small convolutional neural network. We conclude that is possible to build a binary quantum classifier with a quantum circuit and we highlight its performances and limitations in comparison with classical technologies.

07.
arXiv (quant-ph) 2026-06-16

Sharp Transitions for Subsystem Complexity

arXiv:2510.18832v2 Announce Type: replace-cross Abstract: The circuit complexity of time-evolved pure quantum states grows linearly in time for an exponentially long time. This behavior has been proven in certain models, is conjectured to hold for generic quantum many-body systems, and is believed to be dual to the long-time growth of black hole interiors in AdS/CFT. Achieving a similar understanding for mixed states remains an important problem. In this work, we study the circuit complexity of time-evolved subsystems of pure quantum states. We find that for greater-than-half subsystem sizes, the complexity grows linearly in time for an exponentially long time, similarly to that of the full state. However, for less-than-half subsystem sizes, the complexity rises and then falls, returning to low complexity as the subsystem equilibrates. Notably, the transition between these two regimes occurs sharply at half system size. We use holographic duality to map out this picture of subsystem complexity dynamics and rigorously prove the existence of the sharp transition in random quantum circuits. Furthermore, we use holography to predict features of complexity growth at finite temperature that lie beyond the reach of techniques based on random quantum circuits. In particular, at finite temperature, we argue for an additional sharp transition at a critical less-than-half subsystem size. Below this critical value, the subsystem complexity saturates nearly instantaneously rather than exhibiting a rise and fall. This novel phenomenon, as well as an analogous transition above half system size, provides a target for future studies based on rigorous methods.

08.
medRxiv (Medicine) 2026-06-22

Maternal-Fetal immune networks and viral signatures in the healthy amniotic cavity

The intrauterine environment has traditionally been viewed as a privileged site protected by the placental barrier. However, emerging evidence suggests that early in utero microbial exposure may prime the developing fetal immune system. Here, using target-enriched metagenomics and high-dimensional proteomics, we characterized the intra-amniotic viral landscape and immune networks in 114 healthy pregnancies including both normal and anomalous fetuses. We identify a sparse yet heterogeneous human viral signature in 26% of samples, predominantly composed of Herpesviridae, Polyomaviridae, and Picornaviridae. Although viral reads abundance was associated with fetal abnormalities, viral detection generally did not induce overt inflammatory activation, supporting a state of immune homeostasis within the amniotic cavity. Instead, viral presence was associated with subtle and selective immune modulation, including altered inducible antimicrobial peptide expression (HBD-2 and HBD-3), coupled with an attenuation of regulatory cytokines. Our results further reveal that the amniotic immune environment is primarily governed by gestational age, transitioning from a Th1-predominant "alert" phase to innate-readiness preceding parturition. These findings suggest that fragments of viral genetic material within the amniotic cavity may contribute to fetal immune instruction without triggering overt inflammation, providing a foundational framework for understanding how "silent" viral-exposure during gestation influences the developmental origins of neonatal immunity.

09.
arXiv (CS.AI) 2026-06-17

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

arXiv:2501.00826v3 Announce Type: replace-cross Abstract: Cryptocurrency portfolio management requires the fusion of heterogeneous multi-modal signals, including structured price and on-chain time series, unstructured news text, and technical indicators, under high-volatility and real-time constraints. While deep learning approaches show predictive capability, their opacity limits practical adoption, and single large language model (LLM) agents struggle to process the breadth of modality-specific inputs needed for robust decision-making. We propose a multi-agent system (MAS) framework in which three modality-specialised agents, a Crypto Agent for market dynamics, a News Agent for weekly news sentiment, and a Trading Agent for signal fusion and portfolio execution, decompose the task across three communication architectures: hierarchical, collaborative, and debate. We evaluate four capability configurations: zero-shot, chain-of-thought (CoT), retrieval-augmented generation (RAG), and skill-augmented. In a 52-week backtest over calendar year 2025 across the top 15 L1 blockchain native cryptocurrencies by market capitalisation as of January 2025, the best configuration, Hierarchical (Skill), achieves a cumulative return of 133.52% and a Sharpe ratio of 1.502, outperforming single-agent variants, passive benchmarks, and deep learning baselines. An ablation study identifies the Crypto Agent as the most critical component, with its removal reducing cumulative return by 42.57 percentage points. A cross-model comparison further shows that MAS outperforms the single-agent baseline under GPT-4o, GPT-5, and Claude Sonnet 4.5, suggesting that the benefit of multi-agent coordination is model-agnostic. Unlike black-box deep learning models, every portfolio decision is traceable to explicit agent reasoning, offering an interpretable and effective approach to multi-modal cryptocurrency portfolio management.

10.
arXiv (CS.LG) 2026-06-19

A Differentiable Composite Approximation Framework for Autonomous Underwater Vehicle Maneuvering Modeling from Sea-Trial Data

arXiv:2606.19711v1 Announce Type: cross Abstract: Field-based modeling from onboard measurements can produce autonomous underwater vehicle (AUV) maneuvering models that reflect real operating characteristics. From an approximation perspective, conventional maneuvering models use predefined constraint polynomial bases, whereas data-driven models use data-adaptive bases. Motivated by this basis-function view, this paper presents a differentiable composite-approximation formulation, in which the polynomial-basis component and the data-adaptive basis component are treated as differentiable parts of a single predictor and calibrated jointly. A gradient-based co-calibration method is developed for full-scale AUV maneuvering prediction, where a sensitivity-aware mechanism regulates bounded polynomial updates while the neural residual captures remaining nonlinear discrepancies under a shared prediction objective. To account for ocean-current effects in field data, a turning-motion-based current estimation and compensation procedure is incorporated to construct current-compensated learning targets for training and rollout. The framework is evaluated using sea-trial data collected from a 7-meter AUV under multiple maneuvering conditions. Results show that the proposed method improves recursive trajectory and velocity prediction compared with polynomial-only, neural-only, and frozen-prior hybrid baselines, demonstrating its applicability to field-data-based AUV maneuvering modeling.

11.
arXiv (CS.CV) 2026-06-11

Spatially Selective Self-Training for Unsupervised Building Change Detection

Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08%, 91.69%, and 86.60%, respectively, outperforming existing unsupervised and label-free baselines.

12.
arXiv (CS.CL) 2026-06-19

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Mean opinion score (MOS) prediction models are widely used as proxy metrics in text-to-speech (TTS) research, yet their ability to capture quality differences beyond acoustic fidelity remains unclear. We investigate this via controlled perturbations on speech: acoustic degradation, prosodic errors, and manipulation of speaker-specific characteristics such as pitch and speaking rate. We obtained MOS predictions for these speech samples from both human listeners and the model, and analyzed the differences in their perceptual characteristics. Results show that most models track acoustic degradation well, while all are insensitive to prosodic errors despite large subjective score drops. For speaker characteristics, models exhibit a double dissociation: strong mean fundamental frequency (F0) biases absent in human ratings, yet insensitivity to speaking rate and F0 variability that humans notice. These findings highlight limitations of scalar MOS prediction beyond acoustic fidelity.

13.
arXiv (CS.CV) 2026-06-12

Bridging Modal Isolation in Interleaved Thinking: Supervising Modality Transitions via Stepwise Reinforcement

Interleaved thinking, where a unified multimodal model alternates between textual reasoning and visual generation, has shown promise on spatial and physical tasks. However, in complex long-chain scenarios, we identify a fundamental failure mode: generated images diverge from the textual context while subsequent text ignores the visual evidence, causing the two modalities to alternate without genuinely informing each other. We term this Modal Isolation and attribute it to compounding information loss at modality boundaries. We decompose each reasoning cycle into atomic operations and define modality transition loss, quantifying cross-modal hallucination (text-to-image) and visual utilization deficit (image-to-text) at each boundary. We propose MoTiF (Modality Tiransition Fidelity), a two-stage training framework that directly optimizes these transitions: Reflective SFT trains the model to detect and recover from erroneous visual outputs; Flow-GRPO improves image generation fidelity via reinforcement learning. All training signals in MoTiF derive from transition-level fidelity rather than end-task accuracy. Across four visual puzzle benchmarks, this transition-level supervision substantially improves both cross-modal coherence and final task accuracy. The results demonstrate that effective interleaved reasoning requires explicit structural supervision at modality boundaries, not merely scaling or end-task optimization.

14.
bioRxiv (Bioinfo) 2026-06-17

Beyond phylogeny: Genome-wide DNA sequence patterns suggest DNA physical properties associated with thermal adaptation in extremophile microbes

Temperature is a fundamental constraint on biological systems, yet how it is reflected in genome sequence organization remains unclear. Here, we show that genome-wide distributions of short DNA sequences contain a robust signal of thermal adaptation that is largely independent of phylogeny. Using Structural Topic Modelling (STM), a machine-learning approach for identifying groups of co-occurring sequence motifs, we analyze canonical 6-mer and 9-mer frequency profiles of bacterial and archaeal genome proxies (randomly sampled genomic regions) and identify motif families systematically associated with thermophiles and psychrophiles. In bacterial thermophiles, the identified motif families are dominated by highly specific, overrepresented and co-occurring C- and G-stacked hexamers, and a distinct family of CG-periodic hexamers recurring across multiple temperature comparisons. In contrast, bacterial psychrophile-associated motifs are dominated by low-complexity A-, T-, and AT-run hexamers. Thermophilic archaea generally exhibit a distinct CTAG-centred hexamer family, suggesting that different domains may adapt to similar environmental constraints through different sequence-level solutions. However, this domain-level contrast is not absolute: in a targeted analysis of two thermophilic bacterium–archaeon pairs, we find unusually similar frequencies of all the STM-identified thermophile-associated hexamer families, suggesting that shared high-temperature environments can, in specific cases, partially override phylogenetic divergence. Notably, the identified motif families constitute only a small and highly selective subset of the vast space of possible G+C-rich or A+T-rich sequences. This indicates that thermal adaptation is associated with specific sequence architectures rather than broad shifts in nucleotide composition. Accordingly, the observed signal cannot be explained by overall base composition alone, but instead arises from structured combinations and positional arrangements of nucleotides within short sequence contexts. Related motif families are recovered at both k=6 and k=9, indicating that the signal reflects systematic shifts in genome-wide sequence organization rather than isolated sequence motifs. These patterns are consistent with known sequence-dependent DNA physical properties documented in biochemical and biophysical studies, including differences in base-stacking interactions and conformational flexibility. Together, our results suggest that genome-wide sequence organization reflects sequence-dependent DNA physical properties associated with thermal adaptation, revealing a previously underappreciated physical layer of genomic information beyond phylogenetic history.

15.
arXiv (quant-ph) 2026-06-19

Steady-state entanglement of spin qubits mediated by nonreciprocal and chiral magnons

arXiv:2509.13094v3 Announce Type: replace Abstract: We propose a hybrid quantum system in which a magnet supporting non-reciprocal magnons, chiral magnons, or both mediates the dissipative and unidirectional coupling of spin qubits. By driving the qubits, the steady state of this qubit-qubit coupling scheme becomes the maximally entangled Bell state. We devise a protocol where the system converges to this entangled state and benchmark it including qubit decay and dephasing. The protocol is numerically tested on a hybrid system consisting of nitrogen-vacancy (NV) centers coupled to magnon surface modes of an yttrium iron garnet (YIG) film. We show that the dephasing time of the NV centers forms the bottleneck for achieving the entanglement of NV centers separated by a distance within the magnon coherence length. Our findings identify the key technological requirements and demonstrate a viable route toward steady-state entanglement of solid-state spins over distances of several microns using magnonic quantum networks, expanding the toolbox of magnonics for quantum information purposes.

16.
arXiv (CS.CL) 2026-06-16

DoubtProbe: Black-Box Jailbreak Defense via Structural Verification and Semantic Auditing

As large language models (LLMs) are increasingly deployed in user-facing systems, black-box jailbreak defense has become an important practical problem. Existing defenses often rely on known-attack coverage, prompt-level semantic judgment, or local runtime control, yet these paths can become unstable under evolving prompt packaging, expression rewriting, and structure manipulation. We observe that many black-box jailbreaks do not remove the harmful goal, but reorganize the information needed to express and execute it, thereby evading safety alignment while remaining recoverable during generation. Motivated by this observation, we propose DoubtProbe, a dual-branch inference-time defense framework that combines structural verification with semantic auditing and formulates black-box jailbreak defense as consistency checking under controlled transformation. The structural branch extracts a structured representation from the original request, reconstructs the request under representation constraints, and detects information-preservation failures between the original and reconstructed requests; the semantic branch audits the original prompt directly. We evaluate DoubtProbe against representative black-box defenses on jailbreak and benign-request benchmarks, and further test backbone transfer from Qwen2.5-72B to Llama-3.1-70B. Results show that DoubtProbe achieves a stronger and more stable defense-utility trade-off: on Qwen2.5-72B, it reduces the JBB attack success rate from 0.293 to 0.100 and the CodeAttack attack success rate from 0.152 to 0.001, while maintaining false positive rates of 0.022 and 0.016 on AlpacaEval and OR-Bench; the same pattern remains stable on Llama-3.1-70B. These findings show that structural inconsistency signals provide a practical and generalizable basis for black-box jailbreak defense, especially when combined with semantic auditing.

17.
arXiv (CS.CV) 2026-06-17

Contactless Respiratory Monitoring on Heterogeneous Mobile Robots: A Multimodal Edge-Computing Framework

Respiratory-rate (RR) monitoring is a critical component of remote triage and victim assessment in emergency response, disaster recovery, and infectious-disease scenarios, where minimizing physical contact can reduce responder risk and improve operational safety. However, field deployment of contactless RR monitoring remains challenging due to variable illumination, posture changes, platform heterogeneity, and the impracticality of wearable sensors in hazardous environments. In this paper, we present a modality-adaptive contactless RR monitoring framework for heterogeneous mobile robots with onboard edge computing. The proposed system combines brightness-adaptive sensor selection across RGB, thermal, near-infrared (NIR), and low-light cameras, keypoint-guided chest ROI extraction for posture-robust monitoring, and a signal-quality-index (SQI)-based filtering mechanism for reliable respiratory estimation. We implement and evaluate the framework on three robotic platforms spanning quadruped and wheeled locomotion and multiple edge-computing architectures. Experiments conducted across diverse lighting conditions, subject poses, and robot-to-subject distances demonstrate that the framework generalizes across platforms without per-platform algorithmic retuning, while revealing modality-specific operational boundaries. RGB provides the broadest coverage up to 8m, NIR remains effective up to 6m, thermal is reliable only at short range, and low-light sensing supports monitoring in complete darkness up to 8m. Overall, the results demonstrate the feasibility of multimodal contactless RR monitoring on mobile robots and support its use as a foundation for autonomous triage and victim assessment in hazardous search-and-rescue settings.

18.
arXiv (CS.CL) 2026-06-16

PreLort: Prefix-Nested LoRA for Federated Fine-Tuning under Rank Heterogeneity

Federated fine-tuning of large language models using parameter-efficient methods such as LoRA enables privacy-preserving adaptation of foundation models. Heterogeneous hardware resources introduce challenges, as clients with different adapter ranks cannot be directly aggregated. While existing methods enable aggregation under heterogeneous ranks, they fail to control how information is distributed across rank dimensions, leading to suboptimal use of shared low-rank representations. Instead, we propose PreLort: a nested low-rank formulation for federated LoRA that organizes adapter dimensions into a prefix hierarchy. Our approach ensures that lower-rank dimensions encode task-relevant information, while higher-rank dimensions capture additional capacity. Building on this, we introduce (i) a segment-wise aggregation rule that averages only over clients contributing to each rank segment, avoiding dilution from zero-padded lower-rank clients, and (ii) a prefix-nested training strategy that optimizes each adapter under multiple rank truncations, encouraging useful signal to concentrate in low-rank prefix dimensions. Together, these components encourage a consistent low-rank prefix capturing the most task-relevant information, while higher-rank dimensions learn additional capacity. This allows low-rank clients to benefit from richer information contributed by higher-rank clients, as prefix dimensions are consistently learned and aggregated. Experiments demonstrate that our method consistently outperforms prior heterogeneous federated LoRA methods in accuracy and ROUGE-L, while achieving lower or comparable perplexity across multiple base models.

19.
arXiv (CS.AI) 2026-06-11

Preregistration for Experiments with AI Agents

arXiv:2606.11217v1 Announce Type: cross Abstract: The proliferation of large language models (LLMs) and autonomous AI agents has given rise to a rapidly growing methodological paradigm: "in silico" behavioral experiments. Originally conceived as a way to use AI agents as proxies for human participants in studies of cognition, decision-making, and social dynamics, this approach has taken on new significance – as AI agents increasingly negotiate, transact, and make consequential decisions on behalf of people and organizations, understanding their behavior has become a research priority in its own right. While these experiments with AI agents offer unprecedented advantages in terms of scalability, cost efficiency, and experimental control, they also inherit, and in some cases amplify, methodological vulnerabilities that have long plagued human subjects research. To address these issues, this paper argues that preregistration practices – central to improving the credibility of human subjects experiments – should now be extended to experiments with AI agents. We systematically catalog the researcher degrees of freedom that experiments with AI agents introduce – model selection, prompt wording, settings, and outcome-contingent redesign, for example – and show how the low cost of iteration and lack of reporting norms make these choices both easy to exploit and difficult to detect. We propose a preregistration template tailored to experiments with AI agents and call on conferences, journals, and funding agencies to make preregistration standard practice for this emerging research paradigm.

20.
arXiv (CS.AI) 2026-06-15

An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment

arXiv:2606.13692v1 Announce Type: cross Abstract: Data quality assessment is a critical prerequisite for effective data analytics and data-driven decision-making, yet it remains a challenging task due to the inherently context-dependent nature of data quality. Existing approaches often rely on static rules or manual assessment strategies, limiting their adaptability to diverse usage scenarios and constraining automation at scale. Recent advances in artificial intelligence, particularly large language models, offer new opportunities for automating data quality assessment, but raise concerns related to reliability, grounding, and execution safety. In this paper, we propose a unified agentic-retrieval framework for autonomous context-aware data quality assessment. The framework interprets natural-language descriptions of intended data usage, derives context-aware assessment strategies, and generates executable validation logic through a multi-agent workflow. To ensure operational reliability, the framework introduces a feasibility validation stage that evaluates the realism and executability of generated assessment specifications before execution, enabling iterative refinement when necessary. Accepted validation logic is executed deterministically to guarantee reproducible and auditable results. We implement the proposed framework as an end-to-end prototype and evaluate it across multiple usage scenarios applied to the same dataset. The results demonstrate that assessment outcomes adapt meaningfully to different intended uses, while feasibility-gated execution reduces unrealistic or non-executable rule generation. The proposed approach provides a practical foundation for deploying autonomous yet controlled data quality assessment in modern data-driven environments.

21.
medRxiv (Medicine) 2026-06-18

Web-based education on Metabolism and Obesity is associated with improved lifestyle and health behaviours among Brazilian school teachers

Background: Obesity is a major global public health challenge, and teachers play a critical role in school-based health promotion. This study examined the perceived impact of a web-based educational program on metabolism and obesity delivered to Brazilian school teachers. Methods: This analytical cross-sectional study included 217 teachers who responded to the evaluation questionnaire after attending the course between 2017 and 2022. Statistical analyses included logistic regression and chi-square tests. Findings: Course completion rate was 81.98%, substantially exceeding the 5-15% typical of global MOOCs. However, ethnic disparities were observed: White respondents were 4.95 times more likely to complete the course than Black respondents (p=0.00097) and Brown respondents were 3.05 times more likely (p=0.0268) than Black respondents. Among non-completers, lack of time (64.7%) was the primary barrier. Participation was concentrated in Sao Paulo (77%), with no respondents from three northern states. Perceived difficulty showed a non-significant trend (p=0.0893) where by Black respondents had the lowest predicted difficulty; the most challenging course material was Scientific Content/Reading papers (50%). Completion was strongly associated with applying learned activities in teaching (p

22.
arXiv (CS.CV) 2026-06-16

HadBalance: A Plug-and-Play Unified Global Geometric Prior Framework for Generalizable Biomedical Segmentation

Precise biomedical image segmentation is crucial for clinical diagnosis. Geometric cues (e.g., boundary, shape, and topology) can improve structural consistency, yet most are task-specific and lack a unified geometric foundation that generalizes across organs and modalities. We are motivated by the observation that several medical segmentation targets can be approximated as globally near-convex shapes. A convex region is one in which any two interior points can be connected by a line segment entirely contained within the region. In practice, medical targets may exhibit small local concavities or boundary irregularities; we refer to such globally convex-like shapes as near-convex. Motivated by this, we derive Hadwiger Shape Priors from Hadwiger's theorem as an interpretable global regularizer using three 2D measures: area A, perimeter P, and Euler characteristic chi, enabling transfer across organs and modalities. However, because medical datasets are shape-heterogeneous, enforcing near-convex priors uniformly can over-regularize non-convex anatomy with significant concavities, washing out concavities and fine details and degrading segmentation accuracy. To address this challenge, we propose Conflict-Aware Objective Balancing (CAOB), which integrates shape priors with segmentation in a gradient-aware manner. For each prior, CAOB removes only the gradient component that conflicts with segmentation while preserving the remaining aligned component, and adaptively regulates objective influences to prevent prior dominance. This enables stable use of shape priors on shape-heterogeneous data without erasing genuine concavities or fine structural details. We call this plug-and-play framework HadBalance.

23.
arXiv (CS.CV) 2026-06-11

Towards Conditional Feature Alignment for Cross-Domain Counting

Object counting models often degrade under cross-domain deployment because density composition varies across domains and is itself task-relevant. Standard feature alignment methods tend to suppress such variation by encouraging global domain invariance, which can be harmful when source and target domains contain different proportions of background, sparse foreground, and dense foreground. We propose Conditional Feature Alignment (CFA), a cross-domain counting framework that aligns representations within label-induced conditions rather than across full marginal feature distributions. Given density annotations or pseudo-density predictions, CFA constructs foreground/background or density-level conditions and aligns only features belonging to matching conditions. We formalise this idea through a conditional divergence perspective, showing that conditional alignment removes within-condition discrepancy while preserving condition-marginal density shift. For unsupervised domain adaptation, CFA estimates source conditions from annotations and target conditions from detached pseudo-density maps, then performs condition-wise adversarial alignment with full-image consistency regularisation. For source-domain generalisation, we instantiate the same principle with MPCount by enforcing condition-wise memory-consistency between generated source-domain views. Experiments on crowd and cell counting benchmarks show competitive or improved performance across diverse UDA and DG settings. For example, on JHU-CROWD++ FH$\rightarrow$SN, CFA-DG reduces MAE/RMSE from MPCount's 216.3/421.4 to 90.5/169.9, indicating that condition-wise alignment is especially effective under large weather- and density-induced shifts. These results suggest that condition-wise alignment is a promising design principle for domain-adaptive counting.

24.
arXiv (CS.CV) 2026-06-16

CausalDrive: Real-time Causal World Models for Autonomous Driving

World models have emerged as a promising paradigm for scaling autonomous driving (AD) data, yet existing video generative models fall short as interactive simulators. Layout-conditioned renderers rely on "oracle" future trajectories of all background agents, rendering them strictly non-reactive. Conversely, pure action-conditioned predictors lack semantic control over complex interactions and suffer from prohibitive diffusion latencies, hindering closed-loop policy learning. To bridge this gap, we present CausalDrive, a controllable, real-time foundation driving world renderer. CausalDrive operates solely on the initial front-view frame, the ego-vehicle's trajectory, and a macroscopic text prompt. By excluding future NPC layouts, we compel the model to intrinsically predict causal interactions, enabling text-driven control over Driving Sociology, allowing users to dynamically orchestrate diverse counterfactual reactions to identical ego-actions. To overcome the efficiency bottleneck and address the covariate shift in autoregressive generation, we propose a novel Context-Forced DMD architecture. This combines continuous flow-matching with a self-correcting distillation objective, achieving interactive speeds of 12 FPS. This breakthrough transforms the passive video generator into a playable neural simulator. We demonstrate its versatility across three downstream applications: (1) generative closed-loop evaluation with significantly mitigated collision artifacts, (2) large-scale Reinforcement Learning (RL) post-training driven by a Video2Reward module, and (3) real-time human-in-the-loop simulation. Extensive experiments validate that policies trained within CausalDrive's reactive scenarios exhibit superior interaction capabilities in the real world.

25.
arXiv (CS.CV) 2026-06-19

Thinking in Boxes: 3D Editing in Real Images Made Easy

Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing – particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating approximate object location rather than specifying the transformation. We instead use 3D boxes as structured specifications: the user provides the input and output boxes of the edit, casting editing as a well-posed geometry problem. This ``thinking in boxes'' interface, where each box face is color-coded to convey 3D orientation, gives precise control over translation, rotation, scaling, and viewpoint changes in real images while preserving scene and object identity, and recovering previously unseen object regions. To ground transformations in scene appearance, we introduce a depth-aligned planar floor as a global reference frame, shaded with depth-aware cues. Conditioned on this structure, an image generator produces consistent results under large transformations. Trained in two stages – on synthetic multi-object scenes and a small set of real-world videos from Objectron – the system generalizes to complex, in-the-wild real images. Our method operates directly on real photographs and substantially outperforms recent state-of-the-art methods on large 3D edits.