Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

arXiv:2606.20502v1 Announce Type: cross Abstract: Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manually curated Linux kernel samples spanning 74 CWEs. The framework enforces a strict temporal split (pre-2025 historical set / post-cutoff leakage-free set), preserves context-aware vulnerable–patched pairs, and introduces two diagnostic metrics: the Directional Failure Index (DFI) and Hierarchical Distance and Direction (HDD). We evaluate eight vanilla LLMs and 15 LoRA fine-tuned variants across non-targeted detection, targeted detection, and CWE classification. Our analysis yields two key results. First, data contamination provides no measurable advantage. Function-level analysis shows that 84% of nominally contaminated samples carry no usable memorization signal: vulnerable functions are absent or cross-mapped across datasets, and ~31% of contaminated samples carry CWE misclassification. Second, backbone directional priors dominate fine-tuning. Models exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist from historical to post-cutoff data and resist correction. Fine-tuning shifts the output threshold without changing the decision policy. This is calibration without comprehension: output distributions adapt to training data while the underlying security reasoning remains absent. The weakest backbone at binary detection (DeepSeek-R1) gains the most in coarse CWE classification, revealing that detection and understanding are decoupled capabilities. The best detection score reaches only 52.1% (+2.1 pp above chance); exact CWE ranking remains below 1.3% Top-1 accuracy, confirming that current LLMs lack reliable security reasoning for systems software, regardless of fine-tuning strategy.

02.
arXiv (CS.AI) 2026-06-15

Refusal Beyond a Single Direction: A Preliminary Comparison of Diff-in-Means and INLP

arXiv:2606.13720v1 Announce Type: new Abstract: Arditi et al. (2024) has shown that refusal in safety fine-tuned chat models is mediated by a single linear direction in the residual stream, recoverable by a difference-in-means (DiM) of harmful and harmless activations. We compare DiM-based interventions (activation addition and directional ablation) with two interventions derived from Iterative Nullspace Projection (INLP) – nullspace projection and counterfactual flipping – on five open-weight chat models, asking whether INLP can match DiM at steering refusal and whether its richer parameterisation yields more tweakable interventions. INLP counterfactual flipping is competitive with DiM directional ablation on refusal suppression, while nullspace projection is consistently weaker. Restricting INLP to the leading directions of the extracted subspace preserves most of the suppression effect at near-baseline perplexity, giving a tunable capability. Geometrically, the two INLP interventions land in qualitatively different regions of activation space: nullspace projection collapses transformed activations between the harmful and harmless clusters, while counterfactual flipping moves them into the opposite cluster, suggesting that the model encodes the absence of a concept differently from its opposite – an intriguing distinction that warrants further investigation in future work.

03.
arXiv (CS.CL) 2026-06-19

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

We present CombEval, a dynamic benchmark for evaluating combinatorial counting in large language models. CombEval represents each problem as a typed Cofola specification over entities, combinatorial objects, object dependencies, and constraints, enabling controlled generation of natural-language counting problems with exact solver-verified answers. Unlike static collections, CombEval supports systematic variation of object type, entity scale, constraint count, and reasoning depth. We evaluate 11 LLMs under direct and code-augmented settings and find that models remain brittle on ordered objects, indistinguishable elements, relatively positional constraints, and nested object dependencies. Error analysis further identifies failures in constraint interpretation and counting principles. CombEval provides a diagnostic testbed for studying when and why LLMs fail at combinatorial reasoning. The code and generated benchmark suites are publicly available at \url{https://github.com/YuxuZhou-CN/combination-problem-generation}.

04.
arXiv (CS.CV) 2026-06-12

Mana: Dexterous Manipulation of Articulated Tools

Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty of learning functional grasping and manipulation policies. We present Mana (Manipulation Animator), a general sim-to-real framework that reinterprets dexterous manipulation as an animation problem. Inspired by computer animation, Mana employs a coarse-to-fine pipeline that transforms procedurally-generated grasp keyframes into manipulation trajectories through motion planning and reinforcement learning. The data generation process is largely automatic, requiring only a few mouse clicks to specify functional affordances (

05.
arXiv (CS.CV) 2026-06-11

Bridging Day and Night: Unsupervised Cross-Domain Re-Identification with Synergistic Prompt and Prototype Learning

Cross-domain day-night re-identification (ReID) is fundamentally challenged by the substantial visual appearance discrepancies between daytime and nighttime scenes. Existing fully supervised methods rely heavily on labor-intensive annotations, which are costly and exhibit limited generalization across domains. In this work, we investigate unsupervised day-night ReID and propose a novel framework that synergistically combines prompt learning and prototype-based representation learning to associate identities across domains without requiring manual labels. Our approach follows a progressive two-stage training strategy. In the first stage, we exploit the vision-language model to generate instance-specific textual prompts in an annotation-free manner. We employ an instance-level alignment mechanism to embed visual features and textual prompts into a unified semantic space, aligning unlabeled day/night images with learnable prompts via instance-aware dynamic-bias adaptation. In the second stage, we construct domain-specific prototype memory banks and introduce two complementary modules: i) an intra-domain identity association module to enhance feature discriminability within each domain, and ii) a cross-domain prototype matching module to reliably identify positive and negative prototype pairs, thereby establishing robust identity correspondences across day and night. Extensive experiments on public benchmarks validate the effectiveness of our method. Under the unsupervised setting, our framework attains Rank-1 accuracy comparable to state-of-the-art fully supervised methods.

06.
Nature (Science) 2026-06-23

Europe must seize the moment to lead on free and open science

作者: 未知作者

An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent. An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent.

07.
medRxiv (Medicine) 2026-06-11

Malaria Risk among Internally Mobile Individuals and Heterogeneous Mobility Patterns in Two Hypoendemic Communities: Implications for Malaria Elimination in the Peruvian Amazon.

Background: Human mobility is increasingly recognized as a key factor influencing malaria transmission dynamics, particularly in low-transmission settings approaching elimination. This study aimed to assess mobility patterns and their association with malaria risk in two hypoendemic communities in the Peruvian Amazon. Method: A longitudinal study was conducted in the communities of Libertad and Urcomirano (Mazan River basin). Monthly population screenings were combined with weekly active and passive case detection. A total of 678 individuals were enrolled. Mobility patterns were assessed through structured questionnaires, and social network analysis was used to characterize travel connections. Log-binomial regression analysis was applied to identify risk factors associated with malaria infection. Result: Internally, mobile individuals in Libertad showed a higher malaria incidence (>32.47 cases per 1,000 person-months) than those in Urcomirano (

08.
arXiv (CS.LG) 2026-06-17

Continuous-time Optimal Stopping through Deep Reinforcement Learning

arXiv:2606.17545v1 Announce Type: new Abstract: Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal expected reward, whereas on a very fine grid, approximation errors accumulate through the backward recursion. To remove this limitation, we develop a new reinforcement-learning inspired algorithm that enables us to learn the exercise rule at arbitrarily fine time resolution. Our CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping) algorithm utilizes an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. Starting from a coarse time grid, we progressively increase the frequency of stopping opportunities, while in parallel training the ADNN to refine its timing-value estimates. We moreover design an adaptive sampling strategy that gradually concentrates training effort near the stopping boundary. Benchmarked results show that CARLOS delivers higher prices than existing Bermudan solvers, approaching the American upper bound, and achieves high computational efficiency relative to non-RL comparators.

09.
arXiv (CS.AI) 2026-06-18

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

arXiv:2606.18820v1 Announce Type: cross Abstract: Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information–action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information–action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

10.
PLOS Computational Biology 2026-05-29

A prototype-augmented graph representation learning framework for identifying brain disorder-associated genes and facilitating drug repurposing

作者:

by Jiafang Li, Yifei Li, Siying Lin, Jiahua Rao, Huiying Zhao Many genetic loci were identified as associated with neuropsychiatric disorders and neurodegenerative disorders by Genome-wide association studies (GWAS). How these loci impact these diseases is unclear. Advances in deep-learning approaches and multi-omics data have the potential to link GWAS findings with disease mechanisms. Here, we proposed the Multi-omics Graph Transformer Network (MOGT), a semi-supervised graph neural network that leverages graph representation learning to model biological networks derived from multi-omics data to predict disease-associated genes. MOGT outperforms the current approaches in disease gene prediction for two psychiatric disorders and three neurodegenerative/neurological diseases. High-risk genes (HRGs) for Parkinson’s disease (PD) predicted by MOGT were used to drug discovery by integrating with the CMAP database. Finally, 10 drugs were identified as potential candidates. Among them, the effect of drug UK-356618 was experimentally verified in a primary neuron model, showing that UK-356618 reversed the abnormal expression of PD-associated genes and improved the cell-level phenotypes of PD. Together, these results indicate that MOGT can be used to identify HRGs for brain disorders, and these predicted HRGs provide high-level insights into the mechanisms and treatments of brain disorders.

11.
arXiv (CS.CL) 2026-06-16

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2

12.
arXiv (CS.LG) 2026-06-16

Transformers Learn the Mestre-Nagao Heuristic

arXiv:2606.15036v1 Announce Type: new Abstract: We train a two-layer transformer encoder to classify rational elliptic curves $E/\mathbb{Q}$ of conductor $\leq 10000$ as either rank 0 or rank 1 from the first 128 normalized Frobenius traces. We achieve >99% accuracy on both classes, and accuracy is essentially unchanged on test curves with no isogeny or quadratic-twist relative in the training set. We then apply techniques from mechanistic interpretability such as attention analysis, linear probing, activation patching, logit attribution, and neuron-level circuit analysis to reverse-engineer the algorithm the (centroid in function space) model learned. We find that a sparse circuit of 20 out of 512 layer-1 MLP neurons is sufficient for rank prediction under a linear probe with an AUROC of 0.992 at plateau, implementing a push-pull detector architecture of rank-0 and rank-1 detectors with a one-sided readout. However, we notice that the model has sub-optimal readout problems indicating a mismatch in rank-order between the readout pathway and the discriminative circuit. Critically, the learned input weights of the top discriminating neuron match the Mestre-Nagao sum heuristic weights $\log(p)/(p\cdot \log{B})$ with a Spearman coefficient $r = 0.997$ and Pearson coefficient $r = 0.952$: the model has learnt a result from analytic number theory from the Frobenius trace data alone. We additionally find that all 50 independently trained models concentrate CLS attention on prime positions at 2-50$\times$ the rate of composite positions. The CLS embedding encodes $\log{L(E,1)}$ with $R^2 = 0.962\pm 0.011$ across the 50 models (after controlling for the conductor). Activation patching analysis reveals that attention weights are dissociated from causal information flow. Additionally, the 50 solutions from training are near-identical in function space (with pairwise agreement $>$98.8%) despite large weight space barriers.

13.
arXiv (CS.CV) 2026-06-19

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense, temporally consistent masks from inexpensive supervision, including weak annotations (points, scribbles) and semi-supervision with few densely labeled frames. This setting is clinically valuable but challenging due to weak contrast, ambiguous boundaries, motion blur, and specular highlights, compounded by sparse pixel-level guidance. While SAM2 can generate dense masks from sparse inputs, direct pseudo-labeling often yields geometry-degraded masks with boundary leakage, underutilizes temporal consistency, and ignores reliability. To address these issues, we propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS initializes coarse masks from available supervision: SAM2 converts points/scribbles, while dense labels serve as reliable anchors. A debate-and-judge vision-language agent selects reliable temporal anchors under weak supervision, which are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. These components assess mask reliability, evolve anchors over time, transport target identity across frames, and down-weight noisy supervision instead of discarding difficult samples. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate that ARTEMIS achieves state-of-the-art performance. Code will be released at https://github.com/wangtong627/ARTEMIS.

14.
arXiv (CS.CV) 2026-06-18

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

15.
arXiv (CS.AI) 2026-06-16

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

arXiv:2604.13085v2 Announce Type: replace-cross Abstract: Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolidation in continual reinforcement learning. AMC is conceptually inspired by the qualitative structure of synaptic tagging and capture (STC) theory, the idea that memories transition through discrete stability phases, but makes no claim to model the underlying molecular or synaptic mechanisms. AMC models memory as a continuous crystallization process in which experiences migrate from plastic to stable states according to a multi-objective utility signal. The framework introduces a three-phase memory hierarchy (Liquid–Glass–Crystal) governed by an Itô stochastic differential equation (SDE) whose population-level behavior is captured by an explicit Fokker–Planck equation admitting a closed-form Beta stationary distribution. We provide proofs of: (i) well-posedness and global convergence of the crystallization SDE to a unique Beta stationary distribution; (ii) exponential convergence of individual crystallization states to their fixed points, with explicit rates and variance bounds; and (iii) end-to-end Q-learning error bounds and matching memory-capacity lower bounds that link SDE parameters directly to agent performance. Empirical evaluation on Meta-World MT50, Atari 20-game sequential learning, and MuJoCo continual locomotion consistently shows improvements in forward transfer (+34–43\% over the strongest baseline), reductions in catastrophic forgetting (67–80\%), and a 62\% decrease in memory footprint.

16.
arXiv (CS.CL) 2026-06-19

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

17.
bioRxiv (Bioinfo) 2026-06-13

Virus-human protein-protein interactions predict viral phenotypes

Viral phenotypes such as host and tissue tropism are critical determinants of viral infection and transmission. Inferring viral phenotypes presents unique challenges compared to cellular organisms, as viruses rely entirely on host machinery for replication and survival. Current methods for predicting viral phenotypes mainly rely on viral genomic data, often overlooking host-related information. Here, we evaluated the utility of predicted virus-human protein-protein interactions (PPIs) in inferring diverse viral phenotypes using machine-learning algorithms. For predicting human infectivity, a PPI-based machine learning model outperformed both virus genomic and protein sequence-based models that used large language model embeddings. It also surpassed previous methods that incorporated both viral and host genomic data. The human proteins identified by the model were significantly enriched in functions related to viral infection and immune response. In predicting various phenotypes of human RNA viruses, PPI-based models performed better than virus sequence-based models in forecasting virulence, human transmissibility and transmission routes, while showing comparable performance to genomic sequence-based models in predicting tissue tropism. Finally, we demonstrated that a PPI-based model could distinguish high-risk HPV genotypes from low-risk ones. Proteins associated with high-risk HPV were involved in apoptosis and immune regulation, whereas those linked to low-risk HPV were enriched in telomere maintenance and DNA repair. Collectively, this study is the first to demonstrate the value of predicted virus-human PPIs in inferring viral phenotypes, thereby enhancing our understanding of the molecular mechanisms underlying these phenotypes. It also provides effective tools for risk assessment of emerging viruses, contributing to improved pandemic preparedness.

19.
arXiv (CS.LG) 2026-06-15

Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns

arXiv:2604.04611v2 Announce Type: replace Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.

20.
arXiv (CS.LG) 2026-06-16

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

arXiv:2606.07622v2 Announce Type: replace Abstract: Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

21.
arXiv (CS.CL) 2026-06-19

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording of the prompt. While literature on prompt engineering is expanding, few studies focus on classification tasks, and even fewer address domains like psychology, where constructs have precise, theory-driven definitions that may not be well represented in pre-training data. We present an empirical framework for optimizing LLM performance for identifying constructs in texts via prompt engineering. We experimentally evaluate five prompting strategies – codebook-guided empirical prompt selection, automatic prompt engineering, persona prompting, chain-of-thought reasoning, and explanatory prompting - with zero-shot and few-shot classification. We find that persona, chain-of-thought, and explanations do not fully address performance loss accompanying a badly worded prompt. Instead, the most influential features of a prompt are the construct definition, task framing, and, to a lesser extent, the examples provided. Across three constructs and two models, the classifications most aligned with expert judgments resulted from a few-shot prompt combining codebook-guided empirical prompt selection with automatic prompt engineering. Based on our findings, we recommend that researchers generate and evaluate as many prompt variants as feasible, whether human-crafted, automatically generated, or ideally both, and select prompts and examples based on empirical performance in a training dataset, validating the final approach in a holdout set. This procedure offers a practical, systematic, and theory-driven method for optimizing LLM prompts in settings where alignment with expert judgment is critical.

22.
arXiv (CS.CV) 2026-06-16

Sustainable Face Recognition on Low-Power Devices with VQ-VAE Embeddings

Face recognition has become a cornerstone of modern AI applications, yet conventional approaches often rely on computationally intensive models deployed in cloud environments, leading to increased network traffic, high energy consumption, and a heavy carbon footprint. This work introduces a sustainable, edge-deployable face recognition framework based on Vector-Quantized Variational Autoencoders (VQ-VAE), which generates compact and semantically rich latent representations of facial images. By leveraging the compression capacity and reconstruction quality of VQ-VAE embeddings on the edge and combining them with the power of pre-trained face embeddings in a knowledge distillation setup, our system achieves comparable accuracy to state-of-the-art face embedding models while significantly reducing memory and computation requirements on the edge, making it suitable for low-power edge devices. The integration of VQ-VAE compression minimizes network overhead while keeping the matching accuracy high by retaining only the most informative facial features in the latent space. As a result, the reconstructed images preserve the key identity characteristics, improving the robustness and overall performance of the face embeddings.

23.
arXiv (CS.AI) 2026-06-18

Large-Scale OD Matrix Estimation with A Deep Learning Method

arXiv:2310.05753v2 Announce Type: replace Abstract: The estimation of origin-destination (OD) matrices is a crucial aspect of Intelligent Transport Systems (ITS). It involves adjusting an initial OD matrix by regressing the current observations like traffic counts of road sections (e.g., using least squares). However, the OD estimation problem lacks sufficient constraints and is mathematically underdetermined. To alleviate this problem, some researchers incorporate a prior OD matrix as a target in the regression to provide more structural constraints. However, this approach is highly dependent on the existing prior matrix, which may be outdated. Others add structural constraints through sensor data, such as vehicle trajectory and speed, which can reflect more current structural constraints in real-time. Our proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. This approach combines the advantages of both deep learning and numerical optimization algorithms. The neural network(NN) learns to infer structural constraints from probe traffic flows, eliminating dependence on prior information and providing real-time performance. Additionally, due to the generalization capability of NN, this method is economical in engineering. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset. Subsequently, we verified the stability of our method on real traffic data. Our experiments provided confirmation of the benefits of combining NN and numerical optimization.

24.
arXiv (quant-ph) 2026-06-17

Intrinsic Pointer Basis and Irreversible Classicality from Coherence Contraction

arXiv:2604.23304v4 Announce Type: replace Abstract: This work analyzes an operational route to classical behavior for reduced quantum states using the intrinsic reference basis (IRB). Relative to a fixed physical conjugation, the IRB separates intrinsic populations from a real antisymmetric cohesion sector. A globally bounded cohesion index is defined and its exponential contraction is proved for phase-free dephasing dynamics aligned with the IRB; for general aligned dephasing, the corresponding modulus-based coherence functional contracts at the same computable rates. The results provide distance bounds to the IRB-diagonal description and a logarithmic upper bound on the time required to reach a prescribed experimental tolerance. The IRB projectors constitute state-derived candidate pointer sectors, and they become dynamically stable pointer sectors when the effective dephasing generator is aligned with them and damps the relevant inter-sector coherences. Degenerate population sectors lead naturally to block-classicality and protected intra-block coherence. In a two-level active sector, the cohesion index equals fringe visibility, giving a direct interferometric test of the contraction law. The construction is independent of any spacetime- or unification-emergence hypothesis and is intended as a channel-level complement to environment-induced einselection.

25.
arXiv (CS.LG) 2026-06-12

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

arXiv:2606.13501v1 Announce Type: cross Abstract: Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration throughout its lifetime. However, DiT workloads exhibit substantial heterogeneity across requests, execution stages, and system conditions, making static parallelism inefficient and often leading to poor GPU utilization and degraded service quality. This paper argues that DiT serving should treat GPU parallelism as a first-class schedulable resource. We present GF-DiT, a policy-programmable runtime for elastic DiT serving that dynamically adapts the parallelism of running requests according to workload demands and service objectives. GF-DiT introduces an asynchronous execution abstraction that decomposes requests into independently schedulable trajectory tasks and enables online GPU reallocation. To make elastic parallelism practical, GF-DiT further proposes group-free collectives, a lightweight communication abstraction that supports low-overhead online formation and reconfiguration of arbitrary execution groups. We implement GF-DiT in vLLM-Omni and evaluate it on representative image and video diffusion workloads. Compared with fixed-pipeline execution with static parallelism, GF-DiT improves throughput by up to 6.01$\times$, reduces mean latency by up to 95%, lowers SLO violation rates by up to 90%, and reduces communication-group setup overhead from 778 ms to approximately 60 $\mu$s.