Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-12

Adaptive generative moment matching networks for improved learning of dependence structures

arXiv:2508.21531v2 Announce Type: replace-cross Abstract: An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and improved learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time remains similar, adaptively training GMMNs (AGMMNs) significantly increases training performance, which is shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs and parametric copula models is also demonstrated in terms of three applications. First, convergence rates of estimators based on quasi-random versus pseudo-random samples from copulas are investigated in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications demonstrate the improved training of AGMMNs for a copula model implied by the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE 100 are used to demonstrate that the improved training of AGMMNs indeed translates to an improved model prediction.

02.
bioRxiv (Bioinfo) 2026-06-13

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Authors:

De novo protein binder design has been dominated by structure-based pipelines that require known three-dimensional target conformations and consume substantial compute and generation time per design, limiting their throughput and accessibility for routine large-scale binder exploration. Sequence-only generative models promise a faster and lighter alternative, yet existing systems remain uniformly dense and frequently reintroduce structural computation at inference, undermining the core advantages they were intended to deliver. Across the broader language modelling community, transformers have meanwhile transitioned from fully dense designs to sparse Mixture-of-Experts architectures that decouple capacity from per-token compute, a shift that has yet to reach sequence-only protein binder generation. We present MoE-Bind, an autoregressive protein binder generator that, for the first time in this domain, combines Multi-head Latent Attention with a sparse Mixture-of-Experts feed-forward network and is evaluated under two independent structure predictors, Boltz-2 and AlphaFold2-Multimer. Despite activating less than half the per-token parameters of compute-matched dense baselines, MoE-Bind matches or exceeds them on full-length receptor-conditioned binder generation on a leakage-free Docking Benchmark 5.0 evaluation, transfers without peptide-specific training to short-peptide design, and reduces training and inference compute by a large margin. Routing analysis on generated binders reveals interpretable expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models. These results show that sparse architectural design, rather than scale, can deliver fast, structure-free, and interpretable protein binder generation.

03.
arXiv (quant-ph) 2026-06-19

Topological Codes Based on Space Groups

arXiv:2606.20548v1 Announce Type: new Abstract: Topological codes form one of the most important classes of stabilizer codes. Most existing algebraic constructions and analyses of topological codes assume translation invariance. Here we show that topological codes can arise in more general settings by incorporating point group operations. The central construction is a class of Calderbank-Shor-Steane (CSS) codes called space-group codes, whose check operators are built from group-algebra templates over space groups that combine translations with point-group operations. We develop methods for analyzing topological properties of space-group codes using ring-modules and their invariant theory. At first glance, space-group codes might appear to complicate practical implementation; however, we find that they can exhibit greater locality than previous codes based purely on translations. Our framework thus extends the landscape of topological codes and opens up a broader design space for the co-design of topological codes with quantum computing platforms.

04.
arXiv (CS.AI) 2026-06-17

Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal Semantics

arXiv:2602.02881v2 Announce Type: replace-cross Abstract: This paper articulates a long-term research vision for formal methods at the intersection with artificial intelligence, outlining multiple conceptual and technical dimensions and reporting on our ongoing work toward realising this vision. It advances a forward-looking perspective on the next generation of formal methods based on the integration of automated contract synthesis, semantic artifact reuse, and refinement-based theory. We argue that future verification systems must builds towards individual correctness proofs toward a cumulative, knowledge-driven paradigm in which specifications, contracts, and proofs are continuously synthesised and transferred across systems. To support this shift, we outline a hybrid framework combining large language models with graph-based representations to enable scalable semantic matching and principled reuse of verification artifacts. Learning-based components provide semantic guidance across heterogeneous notations and abstraction levels, while symbolic matching ensures formal soundness. Grounded in compositional reasoning, this vision points toward verification ecosystems that evolve systematically, leveraging past verification efforts to accelerate future assurance.

05.
arXiv (CS.LG) 2026-06-15

Federated Learning for Feature Generalization with Convex Constraints

arXiv:2606.14416v1 Announce Type: new Abstract: Federated learning (FL) often struggles with generalization due to heterogeneous client data. Local models are prone to overfitting their local data distributions, and even transferable features can be distorted during aggregation. To address these challenges, we propose FedCONST, an approach that adaptively modulates update magnitudes based on the parameter strength of the global model. This prevents over-emphasizing well-learned parameters while reinforcing underdeveloped ones. Specifically, FedCONST employs linear convex constraints to ensure training stability and preserve locally learned generalization capabilities during aggregation. A Gradient Signal to Noise Ratio (GSNR) analysis further validates the effectiveness of FedCONST in enhancing feature transferability and robustness. As a result, FedCONST effectively aligns local and global objectives, mitigating overfitting and promoting stronger generalization across diverse FL environments, achieving state-of-the-art performance.

07.
arXiv (CS.CL) 2026-06-12

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demonstrations and rarely encounter the instruments, transparent liquids, or fixed protocol workflows found in scientific laboratories. Closing this gap requires both laboratory-specific supervision and a unified learning framework that can accommodate the diverse robot embodiments used to execute experimental protocols. We therefore identify data and embodiment as central bottlenecks alongside model design. To address the data side, we build RoboGenesis, a simulation-based workflow and data engine that composes configured laboratory workflows from atomic skills, validates and filters rollouts, and exports structured demonstrations across supported robot profiles. On the policy side, we present LabVLA, trained with a two-stage recipe: FAST action token pretraining first makes the Qwen3-VL-4B-Instruct backbone action aware before any continuous control is learned, and flow matching posttraining then attaches a DiT action expert under knowledge insulation. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among all evaluated baselines under both in-distribution and out-of-distribution settings.

08.
arXiv (CS.AI) 2026-06-12

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

09.
arXiv (CS.AI) 2026-06-19

Movement Primitives in Robotics: A Comprehensive Survey

arXiv:2601.02379v2 Announce Type: replace-cross Abstract: Biological systems exhibit a continuous stream of movements, consisting of sequential segments, that allow them to perform complex tasks in a creative and versatile fashion. This observation has led researchers towards identifying elementary building blocks of motion known as movement primitives, which are well-suited for generating motor commands in autonomous systems, such as robots. In this survey, we provide an encyclopedic overview of movement primitive approaches and applications in chronological order. Concretely, we present movement primitive frameworks as a way of representing robotic control trajectories acquired through human demonstrations. Within the area of robotics, movement primitives can encode basic motions at the trajectory level, such as how a robot would grasp a cup or the sequence of motions necessary to toss a ball. Furthermore, movement primitives have been developed with the desirable analytical properties of a spring-damper system, probabilistic coupling of multiple demonstrations, using neural networks in high-dimensional systems, and more, to address difficult challenges in robotics. Although movement primitives have widespread application to a variety of fields, the goal of this survey is to inform practitioners on the use of these frameworks in the context of robotics. Specifically, we aim to (i) present a systematic review of major movement primitive frameworks and examine their strengths and weaknesses; (ii) highlight applications that have successfully made use of movement primitives; and (iii) examine open questions and discuss practical challenges when applying movement primitives in robotics.

10.
arXiv (CS.AI) 2026-06-19

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

arXiv:2606.20363v1 Announce Type: new Abstract: Explicit skill libraries make computer-using agents easier to inspect, but it remains unclear whether such libraries can be mined from interaction data in a way that improves downstream policies. We study this question through a three-stage pipeline that segments GUI trajectories, clusters segments into candidate skills, and trains a skill-aware policy from the resulting annotations. The mined clusters are readable on the source benchmark: five of eight clusters have at least 0.95 purity against InteraSkill Workflows labels. However, readability does not imply transfer. GRPO improves IW skill-step accuracy only from 18.5\% to 20.5\%, leaves BrowseComp+ essentially unchanged, and underperforms trivial frequency priors on key source-domain metrics. We therefore present the method as a diagnostic study: trajectory mining can expose inspectable skill structure, but the current boundary detector, orderless segment representation, and offline reward model are insufficient for reliable cross-domain policy improvement.

11.
arXiv (quant-ph) 2026-06-12

A Quantum Algorithm for Random Number Generation

arXiv:2606.13034v1 Announce Type: new Abstract: We present a quantum algorithm for random number generation that achieves a provable quadratic speedup over classical Markov chain mixing, building on the Diaconis-Shahshahani Fourier analysis of the top-to-random card shuffle. The algorithm integrates three quantum primitives into a unified mixing circuit: the Quantum Fourier Transform (QFT), which diagonalizes the Markov transition operator; controlled phase rotations, which encode the shuffle eigenvalue spectrum; and the Grover diffusion operator, which acts as a quantum analogue of the Aldous-Diaconis strong uniform stopping time by reflecting amplitudes about their mean at each iteration. For an n-qubit register, the mixing time is O(\sqrt{n \log n}) iterations. Extending to m qudits of local dimension d reduces this to O(\sqrt{\log_d N}) iterations, where N = d^m, compared to the classical O(n \log n) bound. The qudit formulation further reduces QFT circuit depth from O(\log^2 N) to O(\log_d^2 N) gates per layer by encoding the same N-state space using m = \log_d N subsystems instead of \log_2 N qubits. We validate both variants on IBM superconducting hardware.

12.
arXiv (CS.LG) 2026-06-12

Interpretable Factor Decomposition for Decision Intelligence in Large-Scale Financial Markets: Evidence from China's A-Share Market

arXiv:2606.12843v1 Announce Type: new Abstract: We present an interpretable machine learning pipeline to decompose Cross-Sectional Equity Return Predictability into auditable factor contribution. We apply an XGBoost model with TreeSHAP attribution and conduct stress testing on 3632 Chinese A-share stocks from 2009 until 2019. Using 60-month, rolling windows over 55 months of out-of-sample data, XGBoost obtains a mean AUC of 0.547 and +2.38%/month (Newey-West t = 5.94; Annualized Sharpe 2.23) long-short spread for the top vs bottom quintiles. This alpha is persistent after adjusting for the Carhart four-factor model (+2.31%/month; t = 7.48). SHAP Decomposition indicates that behavioral signals (turnover and momentum) account for 58.2% of predictive attribution compared to 10.7% for valuation ratios, on average, across 55 industry groups. Ablation analysis serves to cross-validate this ranking and provides evidence that SHAP and ablation diverge in a manner that highlights feature substitutability structure that is largely invisible to either method used in isolation.

13.
arXiv (CS.LG) 2026-06-19

On the QUEST for Uncertainty Quantification via Highest Density Regions

arXiv:2606.19569v1 Announce Type: new Abstract: Uncertainty quantification (UQ) is essential for reliable decision-making in safety-critical applications in probabilistic machine learning. For regression problems, dominant scalar UQ approaches - notably, those based on proper scoring rules - measure uncertainty via pointwise predictive risk. This can lead to counterintuitive results when the target statistic is not the conditional expectation. We propose an alternative framework, in which uncertainty is characterised by the volume of the most probable subset of a distribution's support. QUEST (Quantifying Uncertainty via highest dEnSiTy regions) is a novel approach to UQ based on the concentration of Lebesgue measure at a distribution's peak(s), evaluated at one or more values of a robustness parameter $\alpha$. We establish connections between our measures and classical statistics from information theory and economics. We show that, unlike popular alternatives based on proper scoring rules, QUEST measures of epistemic and aleatoric uncertainty satisfy a set of axioms adapted from the UQ literature, including monotonicity under distributional spread and invariance to location shifts. Selective prediction benchmarks confirm that QUEST performs favourably against standard measures such as variance and differential entropy.

14.
arXiv (CS.LG) 2026-06-11

A Riemannian Approach to Low-Rank Optimal Transport

arXiv:2606.12120v1 Announce Type: new Abstract: Low-rank optimal transport (OT) mitigates the quadratic scaling of classical solvers, yet existing approaches rely heavily on first-order mirror-descent updates that require careful hyperparameter tuning and ignore the optimization landscape's curvature. To address these limitations, we propose a unified Riemannian geometric framework for low-rank OT, modeling balanced and unbalanced rank-$r$ positive factored couplings as novel smooth embedded submanifolds of the positive orthant. By equipping these manifolds with the Fisher-Rao product metric, we derive tractable formulations for Riemannian projectors, retractions, and Hessian-vector products. Our cost-agnostic framework seamlessly extends to linear OT, Gromov-Wasserstein (GW), fused GW, and their unbalanced counterparts. For balanced OT, our geometric ingredients are computed via efficient conjugate-gradient and iterative Bregman updates. For the unbalanced OT, our operations elegantly reduce to closed-form scalings, completely eliminating inner iterative loops. In both regimes, per-iteration complexity scales linearly with dataset size, and we provide a rank-sufficiency certificate for global optimality verification. Extensive experiments across a range of problem sizes demonstrate that our regularization-free first- and second-order solvers achieve faster convergence and superior performance over existing state-of-the-art low-rank OT solvers.

16.
arXiv (CS.CL) 2026-06-19

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly report health-related quality-of-life results, such as EQ-5D data, requires a high level of clinical interpretation and poses challenges for human reviewers. This study investigates the use of Google's Gemini and Gemma large language models (LLMs) in automating EQ-5D detection in the PubMed biomedical database based only on published abstracts. A multi-phase framework is proposed that integrates few-shot prompting, weight ensembling aggregation, and a soft stacking meta-classifier. Nine LLMs are evaluated on a dataset of PubMed studies manually labeled by two experts regarding EQ-5D reporting. The weighted ensemble of gemini-2.5-pro, gemma-3-12b, and gemma-3-27b obtained a 0.74 weighted F1-score and 0.74 accuracy, exceeding individually attained results. The ensembling of top-performing models improved the balance between precision and recall compared to individual models, while the soft stacking approach provided greater reliability and interpretability. Feature analysis shows that the probability results from the models are important in guiding the final predictions. The findings suggest that an ensemble-based LLM setup is a reliable and scalable approach for automating screening in biomedical research.

17.
arXiv (CS.CL) 2026-06-16

How Far Can Machine Translation Quality Take You? Extrinsic Discourse Evaluation in Goal-Oriented Setups

Existing machine translation (MT) metrics and discourse-focused evaluations primarily assess translation quality intrinsically, without measuring the downstream consequences of translation errors. In this work, we focus on extrinsic discourse evaluation of machine translation under two distinct regimes: static and interactive. Under the static regime, we propose an entity counting task as a probe of referential consistency in discourse. We show that high intrinsic MT quality does not reliably predict downstream discourse success and strong MT systems still produce referential inconsistencies. For the interactive regime, we study the goal-oriented multi-agent Welfare Diplomacy game as a probe of long-horizon communication and coordination. We find that interaction-specific translation failures impact downstream coordination. Our results highlight goal-oriented environments as a viable framework for discourse-sensitive extrinsic MT evaluation.

18.
arXiv (CS.AI) 2026-06-17

Knowledge Reutilization in Meta-Reinforcement Learning

arXiv:2606.18132v1 Announce Type: new Abstract: Meta-reinforcement learning enables fast adaptation by extracting shared structure from related tasks, but existing end-to-end methods often couple task inference with embodiment-specific control. This coupling can obscure non-parametric task semantics, reduce sample efficiency, and limit cross-agent reuse. We propose a meta-knowledge reutilization framework that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents. The framework uses a Bayesian non-parametric prior to organize latent task modes and a high-level policy to generate task-level magnitude guidance. To bridge reusable task knowledge with different embodiments, we introduce a semantic-magnitude interface and a lightweight temporal adaptor, which convert frozen meta-knowledge into temporally aligned subgoals for embodiment-specific low-level controllers. Experiments on multiple locomotion agents show that our framework reduces final-step tracking error by 94.75% – 99.79% compared with recent state-of-the-art baselines and achieves comparable deployment performance with about 23.8% of their interaction data.

19.
arXiv (CS.AI) 2026-06-11

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

arXiv:2605.23243v2 Announce Type: replace-cross Abstract: We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detection (VulnLLM-R, across C/Java/Python) and black-box web application security testing (five production-style applications with 118 ground-truth vulnerabilities across 20+ CWE families, which we will open-source). We test six frontier models (GPT-5.4, Codex~5.3, Claude Opus~4.6, Sonnet~4.6, Gemini~3.1~Pro and Gemini~3~Flash) and two domain-specialized models across four testing paradigms. Our findings are sobering: (1)~every frontier model produces 10-50% false positive rates in white-box detection, systematically over-predicting vulnerabilities; (2)~in black-box testing, frontier models achieve only 4-8% ground-truth coverage, improving to just 10-19% even with external security tools (Playwright MCP, Burp Suite MCP); (3)~structured penetration-testing methodology encoded in domain-specialized agents raises per-family detection above 50%, demonstrating that methodology, not scale, is the primary lever; and (4)~a domain-specialized defense model achieves the highest precision (0.904) and lowest false positive rate (9.7%) among all models, on a single GPU. We identify the absence of structured security testing traces end-to-end request/response sequences, failure-heavy data, and multi-step attack chains as the fundamental training data bottleneck, and propose self-play security testing as a data generation strategy. Our results make the case for vertical foundation models purpose-built for cybersecurity.

20.
arXiv (CS.CV) 2026-06-12

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

Pixel-bin image sensors are becoming the default choice for smartphone cameras due to their resolution vs light-gathering trade-off. However, their larger inter-color separation compared to the Bayer color filter array (CFA) makes them challenging to demosaic. Furthermore, existing deep learning-based demosaicing methods are CFA-specific, requiring multiple individual models that take up precious onboard resources and demand larger development and maintenance efforts. In this work, we propose a modular unified architecture for demosaicing various pixel-bin sensors that provides higher image quality while being extensible and lightweight. Additionally, to enable plug-and-play operation, we introduce a learning-free CFA-identification module to detect the CFA type of raw data accurately.

21.
arXiv (quant-ph) 2026-06-19

All-valid-state HOBO encoding for constrained combinatorial optimization on NISQ devices

arXiv:2606.20017v1 Announce Type: new Abstract: Continued advancements in quantum computing have stimulated growing interest in translating quantum technologies into real-world applications. Consequently, the investigation of practically motivated NP-hard problems is of significant value. This study investigates the performance of a variational quantum eigensolver (VQE) in addressing the traveling salesperson problem (TSP) through noiseless simulations representative of noisy intermediate-scale quantum (NISQ) devices using higher-order binary optimization (HOBO) encodings. We construct a HOBO Hamiltonian with an efficient binary representation and propose an all-valid-state HOBO (AVS-HOBO) scheme based on cyclic mapping that eliminates one penalty term and reuses states that would otherwise be invalid. Using TSP instances of up to 20 cities, we compare the original HOBO and AVS-HOBO encodings from multiple perspectives, including the energy convergence behavior and the approximation, tour-length, and feasibility ratios. In addition to simulations, we perform computations on real quantum hardware with different device architectures, where we not only compare the performances of different chips but also investigate the effects of different error-mitigation methods on actual quantum machines. The results indicate that AVS-HOBO encoding enhances the practical reliability of VQE on NISQ devices and improves scalability for larger TSP instances, with broader applicability to constrained quantum optimization problems.

22.
arXiv (CS.LG) 2026-06-19

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

arXiv:2606.20559v1 Announce Type: cross Abstract: Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume complementary knowledge across viewpoints, modalities, and foundation model representations, yet remain deployable from egocentric video alone. To this end, we introduce a hierarchical multi-teacher distillation framework that produces UNIEGO, a unified egocentric encoder trained with nine teachers spanning ego-exo viewpoints, RGB, depth, and skeleton modalities, and four foundation models. Rather than distilling directly from heterogeneous teachers whose incompatible architectures and feature geometries induce conflicting gradients, our framework interposes a layer of representation-specific Proxy models that translate diverse teacher knowledge into a homogeneous egocentric space. A second distillation stage, Selective Proxy Distillation (SPD), then adaptively selects, for each training sample, the subset of proxies that are both correct and confident, distilling exclusively from reliable supervision and suppressing erroneous signals. SPD is further stabilized by initializing UNIEGO as a learned convex combination of proxy parameters, placing the unified model in a well-conditioned region of the loss landscape before distillation begins. UNIEGO achieves state-of-the-art performance across three egocentric video understanding tasks - action recognition, video retrieval, and action segmentation on three challenging ego-exo benchmarks, outperforming naive multi-teacher distillation baselines and demonstrating that structured, proxy-mediated knowledge transfer yields richer and more discriminative egocentric representations.

24.
arXiv (CS.CL) 2026-06-19

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

Effective Automated Essay Scoring (AES) are expected to support both reliable assessment and actionable instructional feedback. However, existing approaches often treat scoring and feedback as separate components: neural scoring models provide limited interpretability, while Large Language Model (LLM)-based feedback is typically insensitive to learners proficiency levels. To address this fragmentation, this work proposes PsyScore, a psychometrically-aware framework that integrates diagnostic assessment with instructional scaffolding through a shared latent ability representation. PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric interpretability, a ZPD-Scaffolded Feedback Generator, which conditions multi-agent feedback strategies on the diagnosed ability parameter to adapt instructional focus across different proficiency levels, and a Multi-Perspective Feedback Evaluation Strategy that assesses feedback quality via pairwise preference judgements and student revision simulations. Experiments on the ASAP++ dataset demonstrate that PsyScore achieves competitive scoring performance while providing more pedagogically aligned feedback.

25.
arXiv (CS.CV) 2026-06-16

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Dexterous interaction with articulated objects is important for household, assistive, and humanoid manipulation, where multi-finger hands can provide compliant contact patterns beyond parallel-jaw grasping. However, articulated-object manipulation differs from static-object manipulation: the target part cannot be directly actuated, and its motion must emerge through sustained physical hand–handle contact. This makes the transition from object-centric articulated generation to hand-driven dexterous hand–object interaction non-trivial, since geometric trajectory replay or open-loop execution does not model the contact dynamics required to move the articulated part. Moreover, policies trained only for task completion under fixed dynamics can overfit nominal contact loads, especially without tactile or force feedback, and may degrade when the contact load changes. To address these challenges, we present DragMesh-2, a contact-driven framework for dexterous interaction with articulated objects that extends articulated interaction from object-centric generation to hand-driven dexterous hand–object interaction, where articulated motion must arise through physical contact. We further propose PICA, a physically informed contact-aware training mechanism that injects physical signals into policy learning without tactile or force feedback, improving robustness and task success under changing contact loads. Finally, we conduct systematic evaluation across multiple damping conditions and articulated-object categories to study robustness under contact-load variation, and provide a pure-geometry dexterous interaction resource to support future loco-manipulation and humanoid hand–object interaction research. Across seven GAPartNet objects, DragMesh-2 achieves stronger robustness under contact-load variation than the compared methods while maintaining high task success across damping conditions.