Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-24

Predicting 24-Month MCI-to-Alzheimer's Conversion Using Routine Clinical Assessments Without Neuroimaging or Genetic Testing

作者:

ABSTRACT INTRODUCTION: Early identification of individuals with mild cognitive impairment (MCI) at high risk of conversion to Alzheimer's disease (AD) is essential for timely intervention. We evaluated whether routinely obtainable clinical assessments can accurately predict 24-month MC to AD conversion. METHODS: Data from 2,430 participants with MCI in the Alzheimer's Disease Neuroimaging Initiative were analyzed. XGBoost, Random Forest, and Logistic Regression models were evaluated. SHAP-based feature selection and feature ablation analyses assessed the incremental value of APOE4 genotype. RESULTS: A six-feature model incorporating age, sex, education, RAVLT Immediate Recall, MMSE, and EcogSPTotal achieved an AUC of 0.922 (95% CI, 0.911~0.933). APOE4 provided negligible additional predictive value once cognitive measures were included. The XGBoost model outperformed Clinical Dementia Rating Sum of Boxes classification. DISCUSSION: Routine cognitive assessments accurately predict 24-month MCI-to-AD progression without biomarkers, neuroimaging, or genetic testing, offering a practical, low-cost tool for clinical risk stratification.

02.
arXiv (CS.LG) 2026-06-11

Persistent Homology as a Theory of Emergent Structure

作者:

arXiv:2507.03065v2 Announce Type: replace Abstract: Why do some macroscopic structures remain identifiable even though their microscopic constituents continually change? Vortices persist while fluid parcels turn over, neural memories persist while spikes and synapses fluctuate, and institutions persist while individuals enter and leave. We propose a scale-relative answer: an emergent property is a persistent nontrivial homology class $[z]\in H_p=\ker\partial_p/\im\partial_{p+1}$, a macro-feature that is closed but not exact across a filtration of descriptions. This identification turns emergence into a measurement problem. Persistent bars detect stable macro-features, and we introduce a contractive-similarity (CS) graph operator to supply scaffold spectral gaps that predict robustness. Hodge decomposition separates harmonic macro-scaffold from exact and co-exact micro-flow; and functorial condensation explains when one level's emergent class becomes a unit for the next. The resulting scaffold-flow framework expresses six familiar signatures of emergence (i.e., inevitability, coherence, irreducibility, complementarity, robustness, and hierarchy) within one mathematical language. It also yields falsifiable predictions across atmospheric, neural, and social systems: genuine emergent structures should persist across filtrations, remain spectrally stable, respond disproportionately to harmonic interventions, and require timescale separation for hierarchical autonomy.

03.
arXiv (CS.CV) 2026-06-15

CineOrchestra: Unified Entity-Centric Conditioning for Cinematic Video Generation

Cinematic video depicts multiple subjects acting or interacting at specific moments, captured with deliberate camera movement, and stitched together by shot transitions. Together, these elements demand a level of fine-grained control beyond current text-to-video models. Existing work addresses each axis in isolation: multi-subject personalization, temporal control, multi-shot synthesis, or camera control; no prior framework jointly integrates all four. We present CineOrchestra, a unified video diffusion model that controls subjects, events, cameras, and shot transitions simultaneously. Our key insight is that these heterogeneous cinematic elements share a fundamental structure: each is an entity acting over a specific temporal interval, which can therefore all be expressed through one shared structure of entity-centric conditioning primitives, augmented with reference images for visual entities. This formulation reduces the architectural challenge to a single positional encoding problem, which we solve with two parameter-free coordinated rotary embeddings: (a) an interval-sampled temporal RoPE that yields consistent attention behavior across events of dramatically varying duration, and (b) a 2D entity-temporal cross-attention RoPE that disambiguates per-entity conditions and routes each to its corresponding spatiotemporal region. On two new benchmarks, CineOrchestra outperforms six per-axis specialists on dense caption following and shot-transition timing, with consistent gains in a pairwise user study and component ablations.

04.
arXiv (CS.AI) 2026-06-17

Trustworthy Self-Composable Big-Data-as-a-Service: An LLM-Orchestrated Multi-Agent Framework for Automated Data Engineering, AutoML, MLOps Deployment, and Drift-Aware Lifecycle Optimization

arXiv:2606.17915v1 Announce Type: cross Abstract: Big-Data-as-a-Service (BDaaS) platforms require re liable automation across data ingestion, cleaning, feature engi neering, model development, deployment, and post-deployment monitoring. However, existing LLM-based data science agents and AutoML systems mainly focus on isolated workflow stages, leaving limited support for lifecycle-level orchestration, artifact governance, human oversight, and drift-aware adaptation. This paper proposes a trustworthy self-composable BDaaS frame work based on LLM-orchestrated multi-agent collaboration. The proposed architecture decomposes the BDaaS lifecycle into specialized agents for data ingestion, data cleaning, feature engineering, AutoML training, model evaluation, MLOps de ployment, monitoring, and drift detection. A central LLM or chestration layer coordinates agent execution, validates interme diate outputs, manages workflow context, and enables dynamic workflow composition. The framework also incorporates shared artifact governance, reproducibility support, human-in-the-loop checkpoints, and drift-aware feedback loops. A prototype-based evaluation is conducted using controlled tabular benchmark datasets with missing values, categorical variables, outliers, class imbalance, and simulated covariate drift. Compared with manual ML, AutoML-only, and single-agent LLM baselines, the pro posed multi-agent BDaaS pipeline achieves competitive predictive performance while improving lifecycle-level reliability, including workflow completion, artifact traceability, deployment readiness, reproducibility, and drift recovery. The results suggest that LLM-orchestrated multi-agent systems can extend conventional AutoML toward trustworthy, adaptive, and production-oriented BDaaS lifecycle automation.

05.
arXiv (math.PR) 2026-06-17

Moment generating function of the tacnode process

作者:

arXiv:2606.17771v1 Announce Type: cross Abstract: The tacnode process is a universal determinantal point process arising in non-intersecting particle systems and random tiling models. In this paper, we study the generating function for the counting functions of the tacnode process on a union of $m$ intervals, $m\in\mathbb{N}^{+}$. Our first result provides an integral representation for the $m$-point generating function in terms of the Hamiltonian governing a system of $8m+4$ coupled differential equations. Combined with several differential identities for this Hamiltonian, the representation yields the large gap asymptotics, up to and including the constant term. As further applications, we obtain asymptotic formulae for the expectations, variances, and covariances of the counting functions, and establish a central limit theorem for their joint fluctuations. These results extend the previously known $1$-point theory for the tacnode process to the multi-interval setting with multiple discontinuities.

06.
arXiv (quant-ph) 2026-06-12

SAT, MaxSAT, and SMT for QLDPC Distance Computation: A Large-Scale Empirical Study

arXiv:2606.12445v1 Announce Type: new Abstract: Exact distance computation for quantum LDPC (QLDPC) codes plays a central role in validating candidate fault-tolerant quantum-code constructions, yet the computational structure of this problem remains poorly understood. Despite substantial recent progress in QLDPC design, it remains unclear which algorithmic principles govern the practical scalability of exact distance computation and which classes of exact solvers are best suited to this task. To address these questions, we conduct a systematic study of SAT- and MaxSAT-based formulations for exact QLDPC distance computation across representative codes. We further compare these formulations against several established exact-distance approaches in order to better understand the algorithmic landscape of exact QLDPC distance computation. Our study challenges and refines several prevailing intuitions about exact QLDPC distance computation. First, despite the XOR-rich structure of QLDPC parity checks, practical scalability appears to be governed more by the handling of cardinality constraints and optimization bounds than by parity reasoning alone. Accordingly, XOR-aware reasoning does not provide a systematic advantage across our benchmark suite. Second, Brouwer-Zimmermann-style search, long regarded as the benchmark paradigm for exact distance computation in sparse classical codes, no longer maintains its traditional scalability advantage in the QLDPC setting. This finding challenges the expectation that techniques successful for sparse classical codes remain dominant for QLDPC codes. Third, substantial qualitative differences arise even among MaxSAT solvers themselves. Branch-and-bound MaxSAT significantly outperforms unsat-core-based MaxSAT on challenging benchmarks, demonstrating that solver architecture and optimization strategy play a decisive role in practical scalability.

07.
arXiv (CS.LG) 2026-06-15

Beyond task performance: Decoding bioacoustic embeddings with speech features

arXiv:2606.14662v1 Announce Type: new Abstract: Pretrained audio embeddings are standard in bioacoustics, yet little is known about which acoustic features these models encode, nor which are useful for a given task. This hinders transparency and limits extension to rare species or data-scarce domains. Here we reveal which speech-like features are encoded in bioacoustic representations. Using the 88~eGeMAPS features across six taxonomic groups, we apply linear and nonlinear regression probes to quantify which acoustic properties each model captures. Results confirm a ``no free lunch'' pattern: no single model captures the full feature space. A concatenated embedding achieves the highest performance, suggesting complementary acoustic space coverage across models. Loudness features are best encoded ($R^2 = 0.76$) while F0 is hardest to recover ($R^2 = 0.33$). By cross-referencing recoverability with per-species feature salience (NMI), we derive data-driven model selection guidance for bioacoustics.

08.
arXiv (CS.CL) 2026-06-16

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

For the development of Large language models (LLMs), recent approaches to generating pseudo intermediate reasoning have shown remarkable progress. But they typically rely on large numbers of correctly annotated answers to assess reasoning quality. This paper presents a semi-supervised framework that scales reasoning learning from minimal supervision, turning reasoning verification itself into a data creation mechanism. We train a lightweight reasoning-correctness classifier on only a few labeled samples, which judges whether intermediate reasoning traces generated by an LLM are valid. Furthermore, an entropy-based confidence threshold filters out unreliable samples, and the remaining high-confidence reasoning traces are used to fine-tune the model. Experiments on Verifiable Math Problems (Orca-Math subset) and Question Answering on Image Scene Graphs (GQA) with Visual Programming show that our method achieves accuracy comparable to using 10-15x more labeled data. Ablation analyses confirm that both the classifier and entropy filtering are essential for scalable and noise-resistant pseudo-labeling. By replacing expensive answer-level supervision with lightweight reasoning verification, our method provides a practical path toward constructing large-scale reasoning resources and paves the way for future autonomous reasoning systems that learn from minimal human input.

09.
arXiv (CS.CV) 2026-06-15

Value-order Decomposition for Generalist Anomaly Detection

Industrial anomaly detection suffers from limited data, making cross-domain generalization particularly challenging. Generalist Anomaly Detection (GAD) aims to train a unified model on a source domain that can effectively detect anomalies in unseen target domains. In the initial semantic feature space, strong entanglement between anomalies and object categories or defect types hinders effective generalization across domains. Recent works address this issue by projecting features into a residual space; however, such methods primarily increase cross-domain overlap for normal features, while anomalous features remain specific to object categories, defect types and data domains, leading to poor alignment and generalization. To address this limitation, we propose Value-order Decomposition (VOD), a simple yet effective technique that bridges three types of generalization gaps across object categories, defect types (including real and synthetic defects), and data domains. VOD disentangles and suppresses object-category-, defect-type-, and domain-specific information, promoting alignment within normal and abnormal samples while preserving their separability, thereby enabling robust generalization across the three gaps. Leveraging the strong alignment between real and synthetic defects within the same object, we perform anomaly detection using only normal and synthetic-abnormal reference, and effectively generalize to unseen real defect types. Experiments on diverse industrial and medical benchmarks demonstrate that our method, using a simple cut-and-paste anomaly simulation strategy, achieves strong generalization across the three gaps.

10.
arXiv (CS.LG) 2026-06-16

MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions

arXiv:2606.16562v1 Announce Type: new Abstract: Five years after the discovery of persistent anti-Muslim bias in large language models, most evaluations remain confined to single-turn prompt completion, a setting that no longer reflects how frontier LLMs are deployed. We introduce MIRAGE (Muslim-Identity Reasoning and Agentic Generation Evaluation), a benchmark of 1{,}200 prompts spanning three deployment-realistic conditions: direct completion, chain-of-thought reasoning, and simulated agentic decision-making across content moderation, lending triage, refugee claim summarization, and hiring screens. Across six frontier models, we find that (i) chain-of-thought reasoning amplifies rather than suppresses Muslim-violence associations by 12–34\% relative to direct completion, (ii) agentic decisions exhibit a 9–22 percentage-point asymmetry between Muslim and matched non-Muslim cases on identical evidence, and (iii) bias is sharply time-coupled to retrieved news context, increasing 18–27\% under recent-conflict retrieval. Existing prompt-based mitigations transfer poorly across our three conditions, suppressing direct-completion bias while leaving agentic asymmetry largely intact. We release MIRAGE and an open evaluation harness to support targeted mitigation research.

11.
arXiv (CS.CV) 2026-06-24

Full-resolution MLPs Empower Medical Dense Prediction

Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks.

12.
arXiv (CS.CV) 2026-06-25

Position: Reasoning After Perception Means Reasoning Without Vision

A common belief in multimodal research is that the perceptual weaknesses of vision–language models can be compensated by stronger language reasoning (e.g., chain-of-thought, in-context learning, or external tools). We challenge this assumption. We argue that for a broad class of visual tasks hard to specify in language, failures stem from a structural fatality where the temporal decision of when to reason strictly dictates the spatial constraint of where reasoning takes place. When visual reasoning is deferred to language generation, current architectures do not merely delay computation; they displace it from the continuous visual representation to a discrete textual space. Consequently, the sequential ``Perception-then-Reasoning'' paradigm degenerates perception into a passive, one-off feature encoding process, rendering it functionally equivalent to ``Reasoning-in-Text-Space'', where task-critical spatial signals are collapsed before reasoning begins. We substantiate this claim with the Turing Eye Test (TET): tasks that must be resolved in visual space and are hard to verbalize; results show text-only reasoning cannot remedy these perceptual failures. Our findings suggest rethinking the architectural divide: shifting from reasoning about perception to reasoning within perception. This facilitates actively reasoning-driven perception that operates directly on pixel-level visual representations, rather than within a collapsed textual space.

13.
arXiv (CS.CV) 2026-06-15

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

View-conditioned 3D generators such as SAM 3D, TRELLIS, and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence length. This also prevents degradation over long sequences and keeps the underlying generator completely unchanged without retraining, architectural modifications, or auxiliary losses. Evaluated on both realistic and synthetic streaming benchmarks, Stream3D outperforms latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics. More details can be found at: https://stream-3d.github.io/stream3d.github.io/.

14.
arXiv (CS.CL) 2026-06-11

Teaching Diffusion to Speculate Left-to-Right

Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decoding addresses this bottleneck by employing a lightweight draft model to propose multiple future tokens that are subsequently verified in parallel by a larger target model. Recent work has demonstrated that diffusion language models are well suited for this setting, as they can generate entire blocks of draft tokens in parallel and thereby alleviate the sequential constraints of autoregressive drafting. A subtlety of this regime is that block-diffusion drafters generate tokens bidirectionally within a block, whereas verification is performed by an autoregressive target model that evaluates tokens in a strictly left-to-right manner, leaving a gap between the symmetric training-time objective and the asymmetric verification-time reward. In this work, we offer an empirical analysis of three training-time interventions that narrow this gap: token positional weighting, a first-error focal loss that targets the position that breaks the accepted prefix within each block, and a chain loss term that substitutes a differentiable surrogate for the expected accepted length. The three interventions act along orthogonal axes (position, block-conditional first error, joint prefix) and compose additively; they are likewise orthogonal to test-time alignment mechanisms such as multi-draft self-selection, with which they can in principle be combined. Across four target models and six reasoning, code, and dialogue benchmarks, the three interventions raise accepted draft length by 21-76% per benchmark over a position-uniform baseline, without adding additional forward passes and without changing the inference pipeline or the rejection-sampling exactness contract.

15.
arXiv (CS.LG) 2026-06-18

Toward Simultaneously Optimal Regret in U-Calibration

arXiv:2606.18527v1 Announce Type: cross Abstract: U-calibration studies online forecasting algorithms whose predictions can be consumed by any unknown downstream agent, guaranteeing sublinear regret simultaneously for all proper loss functions. Existing U-calibration algorithms achieve worst-case optimal $O(\sqrt{T})$ regret for every bounded proper loss, but they fail to adapt to easier losses: as we show, even for smooth losses such as squared loss, they incur $\Omega(\sqrt{T})$ regret instead of the optimal $O(\log T)$ regret. In this work, we show that this limitation is not inherent. Specifically, we design a single forecast algorithm that simultaneously achieves $\tilde O(\sqrt{T})$ regret for every bounded proper loss and $O(\log T)$ regret for every bounded smooth proper loss. More generally, our algorithm also attains logarithmic regret for losses that are smooth relative to the log-barrier, which include several non-Lipschitz examples. Our approach is based on a novel variant of Follow-the-Perturbed-Leader (FTPL) in which perturbations are applied directly in the prediction space using self-concordant noise. The resulting analysis also departs substantially from prior FTPL analyses due to the complex nature of this noise and may be of independent interest.

16.
arXiv (quant-ph) 2026-06-24

Enhancing quantum-classical configuration interaction methods using a neural-network classifier

arXiv:2606.24332v1 Announce Type: cross Abstract: Selected configuration interaction methods achieve near-exact electronic structure calculations by iteratively constructing compact variational spaces, but their efficiency depends critically on the heuristics used to identify important determinants. Here, we introduce a data-driven selection framework that recasts determinant importance as a binary classification task and integrates a neural-network classifier into the iterative CI workflow through an active-learning loop. At each iteration, a random subset of candidate determinants is labelled via temporary diagonalisation, and the trained classifier guides selection of the remaining configurations. We demonstrate the utility of this framework for both classical and quantum CI methods by calculating the ground-state energy of a diatomic molecule. Our method achieves result parity with traditional configuration interaction methods at substantially lower computational cost: roughly a $\times 5$ reduction in memory and per-iteration cost for the classical cHCI variant, and convergence in markedly fewer iterations for the quantum-classical cSQD variant. These results establish classifier-assisted determinant selection as a lightweight, method-agnostic tool for compressing variational spaces and accelerating both classical and hybrid quantum-classical configuration interaction algorithms.

17.
arXiv (CS.CV) 2026-06-16

ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT

Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs. While structured weight pruning and token compression have emerged as promising solutions, they suffer from prolonged retraining and inter-layer dependencies that complicate optimization, respectively. We propose ToaSt, a decoupled framework applying specialized strategies to distinct ViT components. We apply coupled head-wise structured pruning to Multi-Head Self-Attention modules, leveraging attention operation characteristics to enhance robustness. For Feed-Forward Networks (over 60% of FLOPs), we introduce Token Channel Selection (TCS), a training-free method that filters redundant noise channels at inference time. Extensive evaluations across nine diverse models, including DeiT, ViT-MAE, and Swin Transformer, demonstrate that ToaSt achieves superior trade-offs between accuracy and efficiency, consistently outperforming existing baselines. On ViT-MAE-Huge, ToaSt achieves 88.52% accuracy (+1.64%p) with 39.4% FLOPs reduction. ToaSt also transfers effectively to diverse downstream tasks (COCO detection, ADE20K segmentation, CIFAR-100 classification), achieving 52.2 versus 51.9 mAP on COCO. Code: github.com/SHANNonLab-HUFS/ToaSt

18.
arXiv (CS.CV) 2026-06-24

Automated Residual Plot Assessment With the R Package autovi and the Shiny Application autovi.web

Visual assessment of residual plots is a common approach for diagnosing linear models, but it relies on manual evaluation, which does not scale well and can lead to inconsistent decisions across analysts. The lineup protocol, which embeds the observed plot among null plots, can reduce subjectivity but requires even more human effort. In today's data-driven world, such tasks are well suited for automation. We present a new R package that uses a computer vision model to automate the evaluation of residual plots. An accompanying Shiny application is provided for ease of use. Given a sample of residuals, the model predicts a visual signal strength (VSS) and offers supporting information to help analysts assess model fit.

19.
arXiv (CS.AI) 2026-06-15

The Shrinking Lifespan of LLMs in Science

arXiv:2604.07530v2 Announce Type: replace-cross Abstract: Scaling laws describe how language model capabilities grow with compute and data, but say nothing about how long a model matters once released. We introduce time-to-peak and lifespan as measures of model obsolescence and use them to characterize the scientific adoption trajectories of 62 LLMs across more than 108k citing papers (2019-2025), separating active adoption from background citation to recover per-model trajectories that citation counts cannot resolve. We find that a model's longevity is shaped more by when it was released than by its characteristics: release year predicts time-to-peak and lifespan more strongly than architecture, openness, or scale. LLM adoption follows an inverted-U curve (rising after release, peaking, and then declining), but this pattern is rapidly compressing. Each successive release year is associated with a 27% shorter time-to-peak and a 23% shorter lifespan ($p < 0.001$), robust to minimum-age thresholds and controls for model size. These adoption-side dynamics are invisible to scaling laws and suggest that specialization on any single model may be a depreciating investment, with costs falling on reproducibility and migration.

20.
bioRxiv (Bioinfo) 2026-06-24

BATTLE-AMP: Benchmarking Antimicrobial Peptide Predictors

As antimicrobial resistance outpaces antibiotic development, antimicrobial peptides (AMPs) have emerged as a promising class of alternative antibacterials, and computational predictors are increasingly used to prioritize AMP candidates. Such predictors are typically evaluated on binary AMP/non-AMP classification, which does not test whether they can identify peptides with clinically relevant potency against specific pathogens. We present BATTLE-AMP, a benchmarking framework that evaluates AMP predictors against experimentally measured minimum inhibitory concentrations (MICs) across clinically relevant bacterial species and strains. We surveyed 48 published methods, finding fewer than 25% reproducible, and benchmarked 10 model families (21 variants) using experimental MIC data, synthetic sequence perturbations, activity cliff analyses, and all-atom molecular dynamics (MD) simulations. Four findings emerge: (i) models trained on MIC data outperform binary classifiers regardless of architecture; (ii) the best model depends on the target pathogen, so model selection must be guided by the biological question; (iii) most models cannot distinguish active peptides from inactive sequences with identical amino acid composition; and (iv) activity cliffs remain unresolved by both machine learning and MD, marking a limit of current computational methods. BATTLE-AMP is released as an open Snakemake framework at https://github.com/szczurek-lab/battleamp-snakemake for benchmarking new models and scoring novel candidate libraries.

21.
arXiv (CS.CV) 2026-06-25

Evaluation Protocols and Validation for Cameras in Indoor Healthcare Monitoring

Camera-based monitoring systems are increasingly adopted in healthcare settings for the continuous assessment of patient movement and activities. However, their technical performance under real-world indoor conditions remains insufficiently characterised, preventing appropriate camera selection for clinical or home adoption and reproducibility. Existing validation studies typically assess either device metrological performance or algorithm accuracy in isolation, and often do not systematically account for practical deployment factors, such as lighting variability, occlusions, and camera positioning. We present two technical validation protocols: the first evaluates the metrological performance of RGB and RGB-D cameras, and the second assesses their use in supporting human pose estimation, validated using state-of-the-art pose estimators. The proposed protocols systematically assess five cameras, four RGB-D and one RGB, under controlled variations in lighting, camera height, viewing angle, and occlusion level within representative indoor scenarios. The experimental results show that metrological performance varies substantially across cameras, with depth bias at 5 m ranging from 50 mm to over 1400 mm depending on the device. For 2D pose estimation, all cameras achieve broadly comparable accuracy, with mean mAP between approximately 78% and 90% across cameras and estimators, whereas 3D reconstruction error differs markedly across devices, with MPJPE ranging from 104 mm to 365 mm, closely reflecting underlying depth-sensing quality. Environmental factors have a camera- and estimator-dependent effect on 3D performance, while camera mounting height has minimal influence within the evaluated range. This work provides evidence-based guidance for the selection and deployment of cameras in healthcare monitoring applications, addressing an important gap in current technical validation practice.

22.
arXiv (CS.LG) 2026-06-19

On the Redundancy of Timestep Embeddings in Diffusion Models

arXiv:2606.20416v1 Announce Type: new Abstract: Diffusion models rely heavily on explicit timestep embeddings to modulate the denoising process across various noise scales. In this work, we challenge the necessity of these temporal signals by analyzing their impact on U-Net and Diffusion Transformer architectures. Beyond empirical evidence, we provide a theoretical framework demonstrating that, under certain conditions, the global minimizer of the diffusion training objective can be achieved without explicit timestep conditioning. Our findings reveal a surprising robustness when timestep embeddings are completely removed. Extensive ablation studies on the CelebA and CIFAR-10 datasets show that these time-agnostic models can maintain high structural fidelity and even surpass their conditioned counterparts in competitive metrics, including FID, precision, and recall. Our analysis suggests these architectures can implicitly infer noise scales from the corrupted input under specific assumptions, rendering explicit temporal conditioning redundant. This study challenges long-standing temporal conditioning paradigms and paves the way for more efficient and structurally focused generative architectures.

23.
arXiv (CS.LG) 2026-06-12

Disparate Impact in Synthetic Data Generation

arXiv:2606.13105v1 Announce Type: new Abstract: We revisit the fairness notion of disparate impact for synthetic data generation (SDG), that assesses whether the utility of generated records is the same across sensitive groups. Our approach departs from existing work on fair SDG, that address the problem of correcting for undue biases in the observed distribution, hence redefining SDG as learning a distribution that is not that of the real data. By contrast, non-disparate impact is notably achieved when the synthetic and real distributions are the same. We expose reasons why SDG may fail to reach that solution and discuss why approximation and estimation errors occur and can be disparate across groups. We notably look into the expressive power of SDG methods relative to distribution complexity, sampling errors due to group proportions, and estimation errors induced by differential privacy mechanisms. We illustrate cases of disparate impact on both artificial and real-world data, focusing on SDG methods that rely on probabilistic graphical models. We also introduce a strategy of learning group-wise SDG models and illustrate how it can improve both the overall utility and its parity in many settings.

24.
arXiv (CS.LG) 2026-06-12

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

arXiv:2606.13191v1 Announce Type: new Abstract: Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

25.
arXiv (CS.CV) 2026-06-15

FLaRA: Predicting Future Latent Representations for Accident Anticipation

Anticipating traffic accidents from dashcam videos is a critical challenge in intelligent transportation systems. Existing methods typically map visual context directly to a collision probability without explicitly modeling the future evolution of the driving scene. In this paper we propose FLaRA (Predicting Future Latent Representations for Accident Anticipation), a novel predictive architecture that shifts this paradigm by forecasting future latent representations for accident anticipation. Building upon the Video Joint-Embedding Predictive Architecture (V-JEPA2), our model conditions a predictor network on observed context frames to predict the forthcoming latent features of the scene. A classifier then operates on these predicted future representations rather than only on past observations. To ensure these forecasts remain grounded in realistic future dynamics, we introduce a joint training objective that simultaneously optimizes an auxiliary feature-level reconstruction loss and a cross-entropy classification loss. Extensive evaluations on the Nexar dataset, alongside cross-domain validations on the DAD, DADA-2000, and DoTA benchmarks, demonstrate that our approach achieves state-of-the-art performance while maintaining realistic early warning capabilities.