Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-17

PhaseWin: An Efficient Search Algorithm for Faithful Visual Attribution

Visual attribution is a fundamental tool for interpreting modern vision and vision-language models, particularly when their decisions must be inspected, diagnosed, or audited. Its goal is to explain how a model's decision depends on local regions of the visual input, typically by assigning an importance ordering over candidate image regions. Given an image partitioned into $n$ regions, faithful attribution can be cast as an ordered subset-search problem, in which progressively inserting the selected regions should recover the target model response as early as possible. Exhaustive search over region subsets incurs exponential cost, while the widely used greedy search still requires a quadratic number of model evaluations, because every selection step rescores all remaining candidates. We propose PhaseWin, an efficient subset-search algorithm for faithful visual attribution. PhaseWin reorganizes greedy region selection into a phased window-search procedure: rather than re-evaluating the full candidate set at every step, it alternates between global candidate screening, adaptive pruning, and localized window refinement, while preserving the essential region-ranking behavior of greedy search. We analyze PhaseWin under monotone evidence-accumulation conditions and show that, under feature-level structural assumptions, it attains controllable linear evaluation complexity together with near-greedy faithfulness guarantees. Extensive experiments on image classification, object detection, visual grounding, and image captioning show that, among all compared attribution methods, PhaseWin reaches high faithfulness with the fewest forward passes, empirically realizing the predicted reduction from $O(n^2)$ to $O(n)$. The code is available at https://github.com/Qihuai27/phasewin-va.

02.
arXiv (CS.CV) 2026-06-11

A Comprehensive Ecosystem for Open-Domain Customized Video Generation

Recent progress in video generation has shown impressive visual synthesis capabilities. However, open-domain customized video generation remains limited by the lack of large-scale, annotated datasets capturing diverse identity-specific attributes. To address this, we introduce PexelsCustom-1M, the first publicly available million-scale dataset for identity-preserving video generation, containing one million curated triplets across 8,000+ categories. Leveraging this, we propose CustoMDiT, a parameter-efficient framework that adapts a pretrained multimodal Diffusion Transformer into a customized video generator with only 8% additional learnable parameters. Our method surpasses prior state-of-the-art. However, benchmarks such as DreamBooth cover only 100 classes, which is insufficient for real-world applications. To overcome this, we construct OpenCustom, a new benchmark with 1,000+ categories, created via cross-dataset knowledge fusion from ImageNet and MS-COCO. Extensive experiments confirm the advantages of both our dataset and model. We will open-source the entire ecosystem–including dataset, pipeline, benchmark, and implementations–to support further research.

04.
medRxiv (Medicine) 2026-06-11

Conversational Speech for Respiratory Triage in Primary Care: A Pilot Study

Authors:

Background. Respiratory complaints account for a substantial share of adult ambulatory care visits, and triaging them accurately has direct consequences for antibiotic stewardship and pathogen-specific therapy. Prior work has investigated voice as a triage signal, but that literature is dominated by single-condition detection from scripted speech in crowdsourced or controlled clinical settings and has not been evaluated at primary care scale on conversational ambient audio. Methods. A dataset of 514,377 ambient-recorded primary care visits from 379,225 adult patients at a US clinic network was used, with per-visit clinically assigned ICD-10 diagnosis codes and de-identified demographic and geographic metadata. Patient audio was extracted from each doctor-patient conversation, and spectral, voice quality, and prosodic features were computed. Eleven binary classification tasks were defined, aligned with a respiratory triage cascade (e.g., acute respiratory versus acute non-respiratory illness, and lower versus upper respiratory tract infection). An acoustic model (feed-forward network) was trained independently for each task using patient-stratified five-fold cross-validation and evaluated on a held-out test set. Each task's model was also compared against six non-acoustic baselines using a single demographic, geographic, or temporal variable. The 11 trained classifiers were composed into a hierarchical cascade and illustrated as case studies on selected patients. Results. Test-set AUC across the 11 tasks ranged from 0.602 (95% CI: 0.588-0.614) to 0.745 (95% CI: 0.742-0.748), with a mean expected calibration error of 0.018. Six of eleven binaries outperformed all confounder baselines. Four binaries showed median within-stratum AUC of 0.62-0.70 when the confounder was held fixed, indicating acoustic discrimination beyond what the confounder alone explains. The exception was the pneumonia versus non-pneumonia lower respiratory tract infection binary, which failed against the patient-city confounder baseline, plausibly reflecting a clinic-level difference in ICD-10 coding. Conclusion. Conversational primary care audio carries acoustic signal that discriminates clinically meaningful respiratory contrasts. Absolute performance is moderate, but the conditions are stricter than prior work: conversational speech and differential-diagnosis contrasts among sick patients. This pilot study is a baseline for voice-based clinical AI moving beyond sick-versus-healthy detection toward differential-diagnosis panels and a proof-of-concept for hierarchical reasoning.

05.
arXiv (CS.AI) 2026-06-12

Physics-Guided Spatiotemporal Learning for Coastal Wave Peak Period Estimation from Video

arXiv:2606.13302v1 Announce Type: new Abstract: Wave parameters in the nearshore are crucial for coastal engineering, shoreline protection, marine hazard assessment, and coastal management for climate resilience. Traditional monitoring systems like buoys and radar platforms offer accurate monitoring but can have high installation and maintenance expenses and limited spatial coverage. Passive ocean monitoring using video has been achieved by leveraging deep learning, however, many methods are not physically interpretable, feasible, and validated for oceanography. In thiswork, a Physics-Guided Deep Spatiotemporal Learning Framework for direct estimation of nearshore wave peak periods from passive coastal video stream is proposed. The framework combines automated temporal-variance based region-of-interest detection, multi-stage Sim-to-Real transfer learning, and physics-informed regularization to enhance the predictive accuracy and physical consistency. A variety of spatiotemporal architectures were assessed, such as transformer-based and recurrent-convolutional ones, alongside synthetic pretraining,silver-label adaptation, and expert fine-tuning. The results show that transformer-based architectures outperformed in terms of the accuracy of the instantaneous prediction, while lightweight recurrent-convolutional architectures achieved higher temporal stability and operational oceanographic skill. Ablation studies also demonstrated the benefits of physics-guided regularization in terms of trend-following consistency, and physically implausible predictions. Explainability auditing also helped to focus attention in hydrodynamically active surf-zone regions and showed good agreement with the physically derived wave propagation behavior. In general, the proposed framework shows the promise of physics-guided video-based deep learning systems for long-term coastal wave monitoring that are cost-efficient and operationally feasible.

06.
arXiv (CS.CV) 2026-06-19

Relighting as a Probe of Visual Priors via Augmented Latent Intrinsics

Image-to-image relighting requires representations that separate illumination from scene properties while preserving dense geometry, material, and photometric cues. We use this task as a probe of visual priors: unlike recognition tasks that reward invariance, relighting tests whether visual features retain the information needed for light transfer. Through a controlled generative relighting framework, we find that strong semantic encoders can degrade relighting quality, exposing a semantic–photometric trade-off between abstraction and physical fidelity. We introduce Augmented Latent Intrinsics (ALI), which balances this trade-off by fusing dense, pixel-aligned visual features into a latent-intrinsic relighting model and refining it with self-supervision on unlabeled real image pairs. ALI improves relighting quality, especially on glossy, metallic, and transparent materials, and demonstrates that generative relighting is an effective tool for quantifying what visual encoders encode about the physical world.

07.
medRxiv (Medicine) 2026-06-17

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

Evidence-based medicine demands clinical answers that are not only fluent and medically plausible, but also anchored in traceable evidence, tailored to patient-specific clinical questions, sensitive to the hierarchy of evidence, and respectful of clinical safety boundaries. While general-purpose large language models (LLMs) exhibit strong medical language generation ability, they tend to lean on parametric memory, underuse retrieved evidence, hallucinate citations, conflate evidence levels, and draw conclusions that are not fully supported by the underlying literature. Such limitations pose particular risks in clinical decision support, where answer reliability, evidence traceability, and reasoning consistency are paramount. To address these issues, we present MedAgent, an evidence-based medical agent trained through an end-to-end pipeline that integrates supervised fine-tuning (SFT) cold start, reward modeling, and Group Relative Policy Optimization (GRPO). The agent is designed to execute a structured workflow encompassing clinical question understanding, PICO extraction, evidence retrieval, evidence stratification, citation-grounded answer generation, and quality evaluation. Specifically, a Qwen2.5-14B-Instruct backbone is first cold-started on 200 human-verified agent trajectories, equipping it with tool invocation, PICO parsing, structured response generation, and citation faithfulness. Next, a Qwen2.5-7B reward model is trained on 2{,}099 pairwise preference samples to provide semantic-level quality signals for evidence-based responses. Finally, GRPO reinforcement learning is conducted in a retrieval-augmented agent environment, where every rollout involves real evidence retrieval and is scored jointly by rule-based rewards and reward-model signals. To avoid over-reliance on training rewards, we further construct an independent evidence-based medical evaluation benchmark, MedTrustBench, which contains 200 clinical questions spanning 10 specialties and four difficulty levels. Each question is annotated with standardized PICO elements and rubric-based scoring criteria. The benchmark includes 1{,}187 rubrics across seven dimensions: question relevance, evidence hierarchy, evidence quality and timeliness, evidence-answer consistency, completeness and depth, logical rigor, and medical terminology. Under an identical RAG pipeline, retrieval tool, retrieval configuration, and evaluation protocol, MedAgentv17 attains 78.6 points, outperforming GPT-4.1 (75.3) and approaching GPT-5.4 (80.3). These results show that a 14B domain-aligned model can surpass strong general-purpose baselines on specialized evidence-based medical reasoning, while delivering practical advantages in cost, privacy, controllability, and hospital-oriented private deployment. The model and associated datasets are publicly released at https://www.modelscope.cn/profile/InfoxmedModel

08.
arXiv (CS.LG) 2026-06-12

Towards One-for-All Anomaly Detection for Tabular Data

arXiv:2603.14407v2 Announce Type: replace Abstract: Tabular anomaly detection (TAD) aims to identify samples that deviate from the majority in tabular data and is critical in many real-world applications. However, existing methods follow a ``one model for one dataset (OFO)'' paradigm, which relies on dataset-specific training and thus incurs high computational cost and yields limited generalization to unseen domains. To address these limitations, we propose OFA-TAD, a generalist one-for-all (OFA) TAD framework that only requires one-time training on multiple source datasets and can generalize to unseen datasets from diverse domains on-the-fly. To realize one-for-all tabular anomaly detection, OFA-TAD extracts neighbor-distance patterns as transferable cues, and introduces multi-view neighbor-distance representations from multiple transformation-induced metric spaces to mitigate the transformation sensitivity of distance profiles. To adaptively combine multi-view distance evidence, a Mixture-of-Experts (MoE) scoring network is employed for view-specific anomaly scoring and entropy-regularized gated fusion, with a multi-strategy anomaly synthesis mechanism to support training under the one-class constraint. Extensive experiments on 34 datasets from 14 domains demonstrate that OFA-TAD achieves superior anomaly detection performance and strong cross-domain generalizability under the strict OFA setting. The source code is available at https://github.com/Shiy-Li/OFA-TAD.

09.
arXiv (CS.AI) 2026-06-16

NVMOS: Non-Verbal Vocalization Quality Assessment in Speech

arXiv:2606.15888v1 Announce Type: cross Abstract: Non-verbal vocalizations (NVs), such as laughter, sighs, and coughs, are important acoustic cues for emotion and intent. Existing speech quality assessment methods typically focus on overall naturalness, while non-verbal TTS evaluations mainly examine whether a target NV appears with the correct type and position. However, the perceptual quality of NV events themselves remains underexplored. To address this gap, we construct an NV-MOS dataset containing outputs from multiple NV-TTS systems and naturally occurring NV samples, with ratings collected from three acoustic experts on a perceptual quality scale. We further analyze audio-capable multimodal large language models such as Gemini and find clear inconsistencies between their scores and expert ratings. These results suggest that general-purpose multimodal models cannot reliably replace human judgments for NV quality assessment. We then propose NVMOS, to our knowledge the first model that can reliably predict the perceptual quality of NV events in speech. Experimental results show that, with a local NV-event focusing module, NVMOS reaches expert-level or stronger agreement with human MOS.

10.
arXiv (quant-ph) 2026-06-17

Variational Quantum Eigensolver-Based Quantum Bootstrap Embedding for Molecules

Authors:

arXiv:2606.17095v1 Announce Type: cross Abstract: Simulating strongly correlated molecular systems on near-term quantum hardware remains challenging due to modern hardware's limited quantum volume and moderate-fidelity qubits. One potential way to circumvent this challenge is through bootstrap embedding (BE). Bootstrap embedding breaks molecules into smaller fragments that are then embedded into the "bath" of other fragments in an iterative way. Bootstrap embedding is appealing for quantum simulation because fragmenting the system reduces the qubit requirements for any given fragment. In this work, we develop a quantum bootstrap embedding (QBE) workflow that uses variational quantum eigensolver (VQE) fragment solvers and study the algorithmic choices that determine the overall VQE-QBE algorithm's success. To improve efficiency, we introduce FastAdaptVQE, a sparse matrix-accelerated form of the adaptive variational quantum eigensolver (ADAPT-VQE) that replaces symbolic commutator evaluation with direct statevector linear algebra, and MatrixFreeAdaptVQE, a matrix-free extension that removes the sparse-matrix memory bottleneck that appears when treating larger fragments. We also modify the ADAPT-VQE operator selection step by replacing the purely greedy choice with a look-ahead strategy. Benchmarks on $H_4$ and $F_2$ reach chemical accuracy, within 1 kcal/mol of bootstrap embedding results using a full configuration interaction (FCI) solver. These results show that combining QBE with VQE can accurately calculate energies of molecular systems. This research lays the foundation for extending energy calculations to larger molecular systems and quantum materials on near-term quantum hardware.

11.
arXiv (CS.CL) 2026-06-18

Output Vector Editing for Memorization Mitigation in Large Language Models

Large language models memorize and reproduce sequences from their training data, creating privacy, copyright, and security risks. Existing neuron-level mitigation methods equate editing with zeroing out neuron activations, but the activation only controls whether a neuron engages; the output vector is what writes to the residual stream and, through superposition, encodes multiple features. We propose output vector editing, a constrained-optimization weight edit that locates a small set of MLP neurons responsible for a memorized continuation and minimally modifies their output vectors to introduce a distractor in vocabulary space, redirecting their residual-stream contributions while leaving activations unchanged. Evaluating on four models from 360M to 7B parameters (SmolLM-360M, OLMo-1B, OLMo-7B, Llama2-7B), we center on OLMo-7B (whose open weights and pretraining corpus enable systematic mining) and mine 6831 memorized sequences, achieving up to 87.9% suppression. The 2.7$\times$ gap over zero ablation on the same located neurons shows the suppression comes from the output-vector edit, not localization alone. Four edit modes span a spectrum from aggressive suppression to minimal redirection; in ensemble they cover 96.5% of memorized sequences, while our recommended single-mode configuration reaches 81.5% with no catastrophic locality failures. We further identify a mechanistic boundary at ${\sim}14%$ of sequences unreachable by MLP-only editing; while these failures are not attention-driven overall, ablating the top contributing attention heads recovers 60–64% of them, with stronger recovery on continuations that copy tokens from the prefix, positioning attention as a complementary fallback rather than a primary mechanism. Edit mode ordering and the success-locality trade-off transfer across all four models, with success rates scaling with model size rather than family.

12.
arXiv (CS.LG) 2026-06-11

Efficient Multinomial Logistic Bandit via Frequent Directions

arXiv:2606.11968v1 Announce Type: new Abstract: This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of $\tilde{\mathcal{O}}(Kd\sqrt{T})$, but still requires $\mathcal{O}(K^3d^3)$ time and $\mathcal{O}(K^2d^2)$ space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and $Kd \times K$ spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and $K \times K$ eigenvalue computations, respectively. This yields dominant per-round time complexity $\mathcal{O}(Kd(m+K)^2)$ and space complexity $\mathcal{O}(Kd(m+K))$, where $m \ll d$ is the sketch size. We further prove a regret bound of $\tilde{\mathcal{O}}(\Delta_T(Kd\ln\Delta_T+m)\sqrt{T})$, where the sketching error factor $\Delta_T$ is controlled by the $m$-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.

13.
arXiv (CS.AI) 2026-06-16

ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition

arXiv:2606.16595v1 Announce Type: cross Abstract: Zero-shot cross-lingual phoneme recognition is often hindered by the fragility of direct acoustic-to-symbol mapping, which is susceptible to language-specific variations. Echoing joint-embedding predictive architecture (JEPA) work in vision, we propose ArtNet, a framework that explores a structured feature prediction task based on articulatory features to enhance acoustic robustness. Specifically, ArtNet integrates an articulatory predictor, designed to extract universal articulatory representations from self-supervised learning (SSL) features, with a variational information bottleneck (VIB) to suppress language-specific variations. Experiments on seven unseen languages demonstrate that ArtNet, particularly when synergized with the proposed vector-space inventory alignment (VSIA) strategy, significantly outperforms competitive baselines, achieving a 20.56\% relative reduction in phoneme error rate (PER) and 7.01\% in phoneme feature error rate (PFER).

15.
bioRxiv (Bioinfo) 2026-06-18

Predicting optimal growth temperatures of bacteria using learned structural information from a single protein

Temperature is a fundamental determinant of bacterial physiology and ecology. Optimal growth temperature (OGT) is highly variable across species, contributing to differences in where and when species are most likely to thrive. Although the OGTs for most bacteria remain unknown, the increasing availability of genomes from uncultivated and cultivated taxa has made it advantageous to build genomic, cultivation-independent models to infer OGT. However, pre-existing genomic models often lack the generalizability and mechanistic grounding required for robust inferences of OGT. We propose a novel framework for predicting bacterial OGT which uses learned protein structural signatures of thermal adaptation. We hypothesize that biophysical tradeoffs which dictate enzymatic functions across variable temperatures provide a more robust empirical basis for OGT prediction than broad genomic features. Our OGT-predicting model, ROSEATE, is based on a single gene, adenylate kinase (ADK), that encodes for a ubiquitous enzyme essential for energy homeostasis. ROSEATE uses high-dimensional latent space encoding via MSA Transformer, a protein language model which embeds ADKs in a manner which preserves biophysical information about embedded proteins. We show that the accuracy of the ROSEATE model is on par with other genome-based models, has a high degree of phylogenetic generalizability, and the ESM embeddings effectively capture key temperature-adaptive enzyme characteristics derived from AlphaFold structures. Because ROSEATE is based on analyses of a single ubiquitous protein, it can be used with metagenomic data to infer the community-level variation in bacterial OGTs. We demonstrate this feature of ROSEATE by reconstructing ADK sequences from over 500 environmental and host-associated metagenomes, successfully distinguishing community-wide thermal preferences across diverse habitats, from polar oceans to mammalian guts. By transitioning from genomic proxies to informationally dense protein structural features, this work provides an efficient, interpretable tool for predicting bacterial OGTs across taxa and whole communities.

16.
arXiv (CS.CV) 2026-06-16

Learning a Sampling-Free Variational DNN Plugin from Tiny Training Sets to Refine OOD Segmentation With Uncertainty Estimation

Deep neural networks (DNNs) frequently fail to generalize to out-of-distribution (OOD) medical images because of variations in scanners and acquisition protocols. Retraining DNN models to address these distribution shifts is often impractical due to the high cost of acquiring and annotating new medical datasets. To address this, we introduce VarDeepPCA, a novel lightweight variational DNN framework designed to restore/refine degraded segmentation maps by leveraging intrinsic geometric priors. Unlike existing approaches that require target-domain data or extensive pre-training, our VarDeepPCA explicitly learns a distribution of valid anatomical geometries using only small in-distribution (ID) datasets. Theoretically, our novel variational learning framework leverages a reinterpretation of the softmax mapping to implicitly perform exact distribution modeling, thereby enabling computationally efficient, sampling-free learning and inference. This also enables VarDeepPCA to provide uncertainty estimates associated with its restored segmentation maps. We empirically validate our framework across 4 distinct clinical applications, using 14 publicly available datasets, involving segmentation of the myocardium, neuroretinal rim, prostate, and fetal head. Comparisons against 15 existing methods demonstrate that VarDeepPCA consistently restores segmentation maps produced by the existing methods on OOD data to (i) significantly improve anatomical plausibility of geometries and clinical utility of the segmentations, and (ii) significantly reduce errors, without needing any more training data than that used by existing methods.

17.
arXiv (math.PR) 2026-06-19

Critical parameters of germ-monotone families of branching random walks

arXiv:2602.21062v2 Announce Type: replace Abstract: We introduce a broad class of families of branching random walks on a countable set $X$, which we refer to as germ-monotone branching random walks (GMBRWs). The processes in each family are parametrized by a positive parameter $\lambda>0$, which controls the overall reproductive speed, and they are monotonically increasing in $\lambda$ with respect to the germ order, a notion that extends classical stochastic domination. This framework encompasses a wide range of models, including classical continuous-time branching random walks, as well as discrete-time counterparts of certain non-Markovian processes such as ageing branching random walks. We define a general notion of critical parameter $\lambda(A)$ associated with each subset $A \subseteq X$, which serves as a threshold separating almost sure extinction in $A$ from positive probability of survival in $A$. This unifies and extends the classical global and local critical parameters $\lambda_w$ and $\lambda_s$, which can be recovered as special cases. We then investigate how modifications of the reproduction laws, either on a finite set or on a more general subset of $X$, affect these critical parameters. Our results extend earlier contributions in the literature.

18.
arXiv (math.PR) 2026-06-18

The FBSDE approach to sine-Gordon up to $6\pi$

arXiv:2401.13648v3 Announce Type: replace-cross Abstract: We develop a stochastic analysis of the sine-Gordon Euclidean quantum field $(\cos (\beta \varphi))_2$ on the full space up to the second threshold, i.e. for $\beta^2 < 6 \pi$. The basis of our method is a forward-backward stochastic differential equation (FBSDE) for a decomposition $(X_t)_{t \geqslant 0}$ of the interacting Euclidean field $X_{\infty}$ along a scale parameter $t \geqslant 0$. This FBSDE describes the optimiser of the stochastic control representation of the Euclidean QFT introduced by Barashkov and one of the authors. We show that the FBSDE provides a description of the interacting field without cut-offs and that it can be used effectively to study the sine-Gordon measure to obtain results about large deviations, integrability, decay of correlations for local observables, singularity with respect to the free field, Osterwalder-Schrader axioms and other properties.

19.
arXiv (quant-ph) 2026-06-11

A semi-definite programming formulation of the device-dependent guessing probability

arXiv:2606.12079v1 Announce Type: new Abstract: In quantum mechanics, a measurement applied to a state in general produces some amount of intrinsic randomness. This is not only a fundamental feature of the theory, but is also at the basis of any quantum process to generate random numbers. The simplest of such processes consists of a single, fully charaterized, measurement acting on a single, fully characterized, state. Unfortunately, no general method to estimate the intrinsic randomness produced in such setups is known. In this work, we address this issue by presenting a semidefinite programming formulation of the maximum probability with which an adversary, Eve, can guess the outcomes of characterized but untrusted prepare-and-measure setups. We then present several applications of this construction. First, we apply our method to a variety of specific setups, allowing us both to benchmark the approach and, more importantly, to determine the exact amount of certifiable randomness in scenarios where only upper bounds were previously available. Then, we show that the presence of entanglement between the device preparing the state and the measurement strictly increases Eve's predictive power, already in the most elementary setup of a binary measurement acting on a qubit state.

20.
arXiv (CS.CV) 2026-06-19

Does Head Pose Correction Improve Biometric Facial Recognition?

Biometric facial recognition models often demonstrate significant decreases in accuracy when processing real-world images, often characterized by poor quality, non-frontal subject poses, and subject occlusions. We investigate whether targeted, AI-driven, head-pose correction and image restoration can improve recognition accuracy. Using a model-agnostic, large-scale, forensic-evaluation pipeline, we assess the impact of three restoration approaches: 3D reconstruction (NextFace), 2D frontalization (CFR-GAN), and feature enhancement (CodeFormer). We find that naive application of these techniques substantially degrades facial recognition accuracy. However, we also find that selective application of CFR-GAN combined with CodeFormer yields meaningful improvements.

21.
arXiv (CS.AI) 2026-06-17

Beyond the Sampled Token: Preserving Candidate Support in RLVR

arXiv:2510.14807v3 Announce Type: replace Abstract: We revisit exploration collapse in reinforcement learning with verifiable rewards (RLVR), from the perspective of the candidate distribution for next-token prediction. We formally show that as probability concentrates on the top-$1$ candidate, the expected number of distinct responses collapses to one regardless of the sampling budget $K$. This theoretical implication is further verified by our empirical tracking of top-$N$ candidate probabilities during training, where the top-$1$ candidate progressively dominates while plausible alternatives are suppressed. These findings suggest a key desideratum for effective exploration: preserving non-negligible probability mass on the top-$N$ candidates. To this end, we propose Candidate-aware Support Preservation (CaSP), with two complementary designs. Specifically, CaSP redistributes positive gradients among top-$N$ candidates for correct responses, and applies a stronger penalty to the top-$1$ candidate for incorrect responses. Unlike many exploration-oriented methods that improve pass@$K$ at the cost of pass@1, CaSP improves pass@$K$ across the full $K$ spectrum. These gains generalize to 6 math, 2 logical-reasoning, and 2 coding benchmarks, and scales to 32B-parameter models and sampling budgets up to $K=1024$, positioning it as a principled, candidate-level approach for RLVR exploration.

22.
arXiv (CS.AI) 2026-06-11

Artificial Intelligence in Ship Finance: Applications, Opportunities, and a Case Study in AI-Augmented Loan Origination

arXiv:2606.11238v1 Announce Type: cross Abstract: Ship finance is a data-intensive and document-heavy segment of asset-based lending, requiring the integration of financial, technical, contractual, and regulatory information from heterogeneous and largely unstructured sources. Increasing environmental regulation and ESG reporting requirements are adding further complexity to underwriting and loan-origination processes. Recent advances in artificial intelligence (AI), particularly large language models (LLMs), create new opportunities for processing and analysing such information. This paper reviews potential applications of AI in ship finance, with a particular focus on LLM-based systems for document comprehension, information extraction, and workflow automation. We present ShipFinance.ai, a modular agentic architecture to support loan application workflows in ship finance. The proposed system combines an LLM-based extraction module, financial analysis components, external maritime data services, and a controlled document-generation module with a chatbot interface to support the preparation of standardized financing applications. The paper discusses the key challenges for using such models in production. We argue that AI-assisted systems can support maritime finance professionals in managing increasingly complex information and reporting requirements.

23.
arXiv (CS.CL) 2026-06-16

It's About Time: Temporal References in Emergent Communication

Emergent communication enables agents to develop bespoke languages that improve communication efficiency. Despite the known importance of temporal structure in natural language, there is no existing evidence of temporal references in emergent communication. This paper addresses this gap, by exploring how agents communicate about temporal relationships. We analyse three potential factors for the emergence of temporal references: environmental, external, and architectural. Our experiments demonstrate that altering the loss function is insufficient for temporal references to emerge; rather, architectural changes are necessary. A minimal change in agent architecture, using a different batching method, allows the emergence of temporal references. This modified design is compared with the standard architecture in a temporal referential games environment, which emphasises temporal relationships. The analysis shows that over 95% of the agents with the modified batching method develop temporal references, without changes to their loss function. We consider temporal referencing necessary for future improvements to the agents' communication efficiency, enabling future agents to use a closer to optimal coding as compared to purely compositional languages. These insights provide the basis for incorporation of temporal references into other emergent communication settings, and investigation of other aspects of language.

24.
arXiv (CS.CL) 2026-06-16

CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures

Formalizing complex reasoning from natural text is one of the central challenges in computational linguistics. It requires systems to understand not just keywords but also the context and complex reasoning embedded in a text. Current Argument Mining (AM) techniques identify basic claims and premises, yet they often struggle to capture the richer structural information required by advanced schemas such as the Carneades Argumentation Framework (CAF), which incorporates features such as premise types, proof standards, and argument schemes. We address this limitation by introducing CAF-Gen, an automated multi-agent framework designed to enrich shallow argument structures into CAF-compliant argument models. By employing an iterative Creator-Reviewer pipeline, a creator agent's output is validated by a critical agent to ensure structural integrity. This multi-agent collaboration is crucial for mitigating the structural instability typical of single-pass generative models. Our experiments demonstrate that the iterative feedback loop improves the quality of the resulting data and achieves strong alignment with the original annotations, while producing structurally richer models. Our findings show that the multi-agent system can overcome the limitations of single-pass generation, providing a robust methodology for the automated modeling of formal argumentation.

25.
arXiv (quant-ph) 2026-06-12

Certifying Nonclassical Proper-Time Histories with a Quantum Clock

Authors:

arXiv:2606.12755v1 Announce Type: new Abstract: Quantum clocks can acquire relativistic phases from motional or gravitational proper-time differences, but reduced clock dephasing alone does not certify nonclassical proper-time histories. We formulate this distinction as a channel-certification problem. First, we show that any two-level single-time dephasing signal, including one generated by an effective quantum proper-time label, admits a classical random proper-time representation. We then define the convex set of classical mixtures of experimentally specified proper-time histories and prove a Choi-rank separation criterion for conditioned coherent history recombination. A two-branch Ramsey protocol gives explicit bright- and dark-port population witnesses outside this classical set. The certification is operational and relative to the specified history set: it rules out classical mixtures of the same implemented proper-time histories, not arbitrary classical protocols with different histories or controls.