Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-15

Quantifying and detecting quantum-state texture

arXiv:2604.07257v2 Announce Type: replace Abstract: Quantum-state texture is a recently proposed quantum resource that characterizes the inhomogeneity of a quantum state's matrix element distribution in the computational basis, enriching our understanding of quantum state structure. To expand its quantification toolkit and establish detection methods, in this article, we investigate the resource theory of texture from both quantitative and detection perspectives. First, we construct a texture measure $\mathcal{T}^{GR}_{\alpha,z}(\rho)$ based on the $\alpha$-$z$ Rényi relative entropy and present some of its inherent properties. Second, we analyze the mathematical relationships between several existing texture measures, revealing connections among different quantifiers. Finally, drawing on the witness concept from other resource theories, we systematically introduce texture witnesses into the texture theory and provide examples of texture witnesses with special properties.

02.
arXiv (CS.CL) 2026-06-11

APEX: Automated Prompt Engineering eXpert with Dynamic Data Selection

Large Language Models are highly sensitive to prompt formulation, necessitating automatic prompt optimization to unlock their full potential. While evolutionary algorithms have emerged as the dominant paradigm, they suffer from a critical bottleneck: data efficiency. Current methods treat the development dataset as a static benchmark, wasting significant compute budget on uninformative data. In this work, we introduce APEX (Automatic Prompt Engineering eXpert), a novel framework that optimizes the data usage alongside the prompt search. APEX dynamically stratifies the dataset into Easy, Hard, and Mixed tiers based on the optimization lineage. By prioritizing the Mixed tier, which identifies the data where the LLM has mixed performance, we identify two high-leverage subsets: the addressable frontier for generating informative mutations and the rank-sensitive frontier for distinguishing candidate quality. We evaluate APEX across three diverse benchmarks: IFBench, SimpleQA Verified, and FACTS Grounding. Under a fixed budget of 5,000 evaluation calls, due to its data efficiency, APEX outperforms the initial prompt by an average of 11.2% on Gemini 2.5 Flash and 6.8% on Gemma 3 27B, demonstrating that a data-centric approach is key to efficient and effective prompt optimization.

03.
arXiv (CS.AI) 2026-06-18

AI Sovereignty as National Learning Capacity: A Human-Centered Learning Mechanics Viewpoint on France, the United States, and China

arXiv:2606.00729v2 Announce Type: replace Abstract: Artificial intelligence in France is often discussed through separate dimensions such as investment, compute, regulation, employment, sovereignty, and education. This viewpoint paper proposes a unified interpretation: France can be analyzed as a national AI learning system. Building on Human-Centered Learning Mechanics (HCLM), we use HCLM not as a validated econometric model, but as a conceptual and diagnostic lens for interpreting national AI development as a balance between information injection, absorptive capacity, and institutional dissipation. Information injection includes compute, data, talent, research, capital, industrial deployment, and policy experimentation. Institutional dissipation refers to avoidable frictions such as administrative overload, coordination failures, energy constraints, regulatory uncertainty, talent mobility pressures, and weak industrial absorption. Regulation is not treated as mere friction: adaptive governance, trusted data spaces, and safety-oriented standards may increase long-term learning capacity by improving legitimacy, interoperability, and social trust. The central claim is not that a country follows neural-network equations, but that AI sovereignty depends on how effectively it converts distributed information into absorbed, coordinated, and socially legitimate capability. The paper connects HCLM with neural scaling laws, endogenous growth theory, creative destruction, absorptive capacity, and coordination mechanisms. It offers a formal heuristic, policy indicators, illustrative scenarios, and implications for France. The numerical results are diagnostic scenarios, not econometric estimates or official rankings. The proposed viewpoint reframes AI policy as the governance of an open, strategic, non-equilibrium learning system that should be tested with historical and cross-country data.

04.
arXiv (CS.CL) 2026-06-16

Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence

While LLMs have substantially advanced text-to-code synthesis, many real programming tasks specify intent through visual artifacts such as screenshots, charts, documents, vector drawings, videos, and interactive states. These tasks require models to connect visual perception to executable programs, because correctness depends not only on syntax but also on layout, geometry, data semantics, editability, interaction behavior, and domain-specific constraints that apply after execution. This survey examines Multimodal Code Intelligence, covering systems that generate, edit, refine, execute, or reason with code under visually grounded inputs and outputs. We first formulate the field by the role that code plays in each task, distinguishing code as a rendered artifact, an editable symbolic structure, a scientific representation, an intermediate reasoning trace, or an executable policy or tool interface. We then organize benchmarks and methods into four domains: Graphical User Interface, Scientific Visualization, Structured Graphics, and Frontier Tasks and Frameworks. This taxonomy connects mature artifact-generation problems to emerging agentic and unified settings and allows us to compare how different tasks treat evidence of correctness. Looking ahead, we argue that future research may benefit from four verification-centered directions. Multi-signal validation can combine complementary evidence of correctness, multi-state verification can test behavior across execution trajectories, cross-task transfer testing can probe reusable visual-code skills, and verifiable agent traces can reveal whether agent actions are grounded in visual evidence. Together, these directions may move multimodal code generation from single-output imitation toward evidence-grounded executable systems.

05.
arXiv (CS.CV) 2026-06-17

Similarity-based representation factorization for revealing interpretable dimensions in representational data

The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representations are often studied and compared through similarities between stimuli, current methods provide only limited access to the dimensions that shape these representations and are often limited in interpretability. To overcome these challenges, here we introduce Similarity-Based Representation Factorization (SRF), a general computational method for recovering low-dimensional, non-negative, interpretable embeddings from similarity matrices derived from measured data. Across simulations and many neural, behavioral, and computational datasets, SRF recovers interpretable dimensions from diverse forms of representational data, even for very sparsely sampled, incomplete data. The dimensions derived from these datasets match those obtained by task-specific models, predict independent behavioral properties, improve exploratory analysis, and offer higher power for confirmatory hypothesis testing than comparing similarity matrices. Together, these results establish SRF as a general-purpose method with broad applications for uncovering, understanding, and using the dimensions underlying representations.

06.
arXiv (CS.AI) 2026-06-12

Interaction-Centered Intelligence: Toward an Interaction-Based Theory of Human-AI Co-Creation

arXiv:2606.00807v2 Announce Type: replace Abstract: Traditional artificial intelligence has largely conceptualized intelligence as isolated computation occurring within bounded agents. Across classical AI, machine learning, and many generative systems, the dominant unit of analysis remains the individual model or autonomous system evaluated through outputs, benchmarks, prediction accuracy, or optimization performance. While these approaches have produced major advances, they often under-theorize the role of interaction in the emergence of intelligence, creativity, meaning, and adaptive behavior. This paper proposes interaction as the primary unit of analysis for co-creative AI and interaction-centered intelligence more broadly. Drawing from distributed cognition, embodied cognition, enaction, participatory sense-making, human-computer interaction, and computational creativity, the paper traces a historical progression toward increasingly relational accounts of intelligence. Building upon prior work in Creative Sense-Making, quantified co-creation, and co-creative systems such as the Drawing Apprentice and AI Drawing Partner, it argues that intelligence emerges through evolving interaction dynamics among agents, environments, and socio-technical systems rather than solely through internal computation. The paper introduces Interaction-Centered Intelligence as a framework for understanding human-AI co-creation, collaborative emergence, adaptive participation, and interactional dynamics. Rather than evaluating intelligence solely through generated outputs, the framework emphasizes interaction trajectories, coordination patterns, participatory engagement, adaptive regulation, and interactional drift unfolding through time. Implications for explainable co-creative AI, hybrid intelligence, enactive AI, and future human-AI systems are discussed.

07.
arXiv (quant-ph) 2026-06-19

Charge-Conjugation Violation and Population Asymmetry in Bipartite Fermionic Lattices

arXiv:2606.06138v2 Announce Type: replace-cross Abstract: Charge conjugation violation (CCV) is a central concept in particle physics and appears also for quasiparticles in quantum many-body systems, which typically relies on an embedded external symmetry breaking to the underlying system. An open question is how an intrinsic CCV mechanism could emerge and what its macroscopic consequences would be. We establish sublattice kinks in bipartite fermionic lattices as a concrete setup showing intrinsic CCV. The intrinsic CCV of the sublattice kink is based on the graph-topological nature of the underlying Hamiltonian, with no explicit symmetry breaking taking place. It leads to a population asymmetry of different configurations and imprints a hidden leaf-like structure in the eigenenergy spectrum. The population asymmetry also leads to an imbalanced sublattice-kink production triggered by the vacuum-instability in the quench dynamics. Our work demonstrates the graph topology as the microscopic origin of intrinsic CCV, with the population asymmetry as the macroscopic consequence, of which the proposed setup is highly amenable to experimental implementation via cold-atom quantum simulators.

08.
arXiv (CS.AI) 2026-06-17

Prefill/Decode-Aware Evaluation of LLM Inference on Emerging AI Accelerators

arXiv:2606.17104v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accelerators claim advantages for LLM inference, yet it remains unclear under which conditions such accelerators outperform GPUs in practice. Recent inference systems decompose execution into Prefill and Decode phases, which exhibit distinct computational characteristics and latency metrics, commonly captured by time to first token (TTFT) and time per output token (TPOT). This paper presents a phase-aware evaluation of LLM inference performance across GPUs and emerging AI accelerators using a common model, Llama2-7B. By separately measuring Prefill and Decode performance, we reveal that accelerator advantages differ by phase and metric. Our results show that GPUs consistently excel in the compute-intensive Prefill phase, while GroqRack achieves significantly lower TPOT during Decode (batching not currently supported). However, GPUs regain an advantage in Decode throughput as batch size increases. These findings demonstrate that each platform exhibits distinct phase-dependent strengths. We further analyze heterogeneous Prefill/Decode disaggregation across different accelerator platforms, identifying performance gains and the workload and network conditions under which such gains are realized.

09.
Nature (Science) 2026-06-09

A unicellular relative links aggregative multicellularity to animal origins

作者:

How animals evolved complex multicellularity from their unicellular ancestors remains unanswered. Unicellular relatives of animals exhibit simple multicellularity through clonal division, formation of multinucleate coenocytes, or aggregation. 1 Therefore, animal multicellularity may have evolved from one (or a combination) of these behaviours. Aggregation has classically been dismissed as a means to complex multicellularity. 2 However, aggregation occurs in many extant animal cells and has also been recently described in three close unicellular relatives of animals (the choanoflagellates Salpingoeca rosetta and Choanoeca flexa, and the filasterean Capsaspora owczarzaki). 3-5 It is unclear whether aggregation in these species is derived or ancestral, and its relevance for animal origins remains unknown. To fill this gap, we investigated whether an additional close unicellular relative of animals can undergo aggregation. We discovered that the marine free-living bacterivorous filasterean Ministeria vibrans 6 forms homogeneous aggregates with reproducible kinetics that have long-term stability, and that improved feeding and mating may be evolutionary drivers of this aggregation. Notably, we found that homologs of many animal multicellularity genes involved in cell adhesion, signalling, and transcriptional regulation were deployed during the aggregation process, indicating that they may have been used for aggregation in the unicellular ancestors of animals before being co-opted into animal multicellular development. Thus, our results imply that aggregative multicellularity was key to the development of the multicellular animal genetic toolkit.

10.
medRxiv (Medicine) 2026-06-15

High Demand, Low Possession: Dilemmas and Strategies for Research Capability Cultivation in Clinical Medicine Postgraduates

Most previous studies have examined medical postgraduate research training from a single dimension, lacking a full-chain analysis that integrates capability demand, actual possession, obstacles, and output. Consequently, the measurement of capability gaps and the analysis of underlying training model deficiencies remain insufficient. To address this gap, we administered a self-designed multidimensional questionnaire to 86 clinical medicine postgraduates at a medical school, covering research cognition, interest, capability demand and possession, participation pathways, difficulties, and outputs. The aim was to systematically characterize the current situation, identify problems, and propose optimization strategies. Over 90% of participants expressed interest in research, yet only 1.16% self-rated as very knowledgeable. The largest demand-possess gap was for writing and publication (86.05% vs. 16.28%), followed by independent research capability (75.58% vs. 11.63%). A total of 59.30% cited lack of foundational knowledge, making experiments very difficult, as the greatest challenge, and 66.28% had no research achievements. The primary source of research topics was supervisor assignment (54.65%), with only 4.65% choosing topics independently. No statistically significant differences were found across grades or training types (P > 0.05). These findings reveal a structural high demand, low possession gap in medical postgraduate research training, with early research experience deficit and a passive research model as key constraining factors. Accordingly, an integrated bachelor-postgraduate progressive research competency training system is proposed.

11.
arXiv (CS.CL) 2026-06-16

AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels

AthDGC ("Athens-PROIEL") is an open, end-to-end workflow and dataset. It is, to the best of our knowledge, the first openly licensed dependency-parsed treebank of Greek that spans eight diachronic periods, namely Archaic, Classical, Koine, Late Antique, Byzantine, Late Byzantine, Early Modern, and Modern Greek, under a single PROIEL XML 2.0 schema, with verse-level cross-alignment of the New Testament to Latin (Vulgate), Gothic (Wulfila), Old Church Slavonic (Marianus), and Classical Armenian. AthDGC builds on the PROIEL Treebank Family (Haug and Johndal 2008; Eckhoff et al. 2018), which established the schema and the Koine-Greek reference set for the project. Annotation uses the Stanford Stanza PROIEL-trained workflow; sentence-level alignment uses LaBSE, a multilingual sentence-embedding model; word-level alignment uses multilingual-BERT attention through the AwesomeAlign procedure. The v0.4 release provides curated samples and the open-source toolkit; the full annotated corpus partitions remain under v0.5 audit on the Greek national HPC. Quantitative scale, per-witness verse counts, and per-period annotated-row counts are reported in the v0.5 release notes, after the audit pass completes. Concept DOI: 10.5281/zenodo.20439182.

12.
arXiv (CS.LG) 2026-06-16

Constraining the outputs of ReLU neural networks

arXiv:2508.03867v2 Announce Type: replace-cross Abstract: We introduce a class of algebraic varieties naturally associated with ReLU neural networks, arising from the piecewise linear structure of their outputs across activation regions in input space, and the piecewise multilinear structure in parameter space. By analyzing the rank constraints on the network outputs within each activation region, we derive polynomial equations that characterize the functions representable by the network. We further investigate conditions under which these varieties attain their expected dimension, providing insight into the expressive and structural properties of ReLU networks.

13.
arXiv (CS.CL) 2026-06-17

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

Two recent studies (Jones et al. (2026); Zeng et al. (2026)) reach apparently contradictory conclusions about whether LVLMs can coordinate on efficient referring expressions. We control for task differences between the studies while directly comparing their prompting styles. We replicate the finding that models can coordinate efficient referring expressions when explicitly prompted to do so, suggesting that other task differences are not responsible for divergent results. However, we also find that the same models fail to infer the need for communicative efficiency from a more implicit prompt, highlighting critical differences between how humans and AI systems communicate.

14.
arXiv (CS.LG) 2026-06-16

Drivers, Receivers, and Dynamic Linkages: The Directed Structure of SDG Interdependence, 2000–2024

arXiv:2601.20875v2 Announce Type: replace-cross Abstract: Governments with limited fiscal and administrative capacity need to know which Sustainable Development Goals (SDGs) propagate progress through the goal system and how quickly. We map the directed interdependence structure of all seventeen goals using a balanced panel of 114 countries observed annually from 2000 to 2024. The goal series are persistent, trending, and cross-sectionally dependent, so we apply two estimators matched to this regime: a Dumitrescu-Hurlin panel Granger non-causality test, run on first-differenced series, to recover the directed interaction network, and panel local projections with Driscoll-Kraay standard errors to measure the dynamic magnitude of 31 theory-derived indicator linkages. Of 272 directed goal pairs, 84 linkages survive false-discovery control (40 synergies, 44 trade-offs; network density 0.31). Synergies and trade-offs occur at comparable strength, so no single goal behaves as a universal accelerator, and the goal-level hierarchy itself is fragile. Driver-receiver rankings correlate weakly across lag orders and centrality metrics, and under a country bootstrap only two roles are distinguishable from zero: peace and strong institutions as the clearest net receiver, and poverty reduction as the most probable effect-size-weighted driver. The supported linkages are dynamic, accruing over four to five years: sanitation and poverty improvements are the strongest predictors of lower child mortality, and the education-child-health association is corroborated in independent World Development Indicators data across 183 countries. These results caution against rankings-based accelerator policy and support adaptive portfolios built on supported, time-lagged linkages monitored through constituent indicators.

15.
arXiv (CS.CL) 2026-06-16

Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction

Grammatical Error Correction (GEC) involves detecting and correcting the wrong usage of grammar. While large language models (LLMs) with in-context learning (ICL) capabilities have shown significant progress on various natural language processing (NLP) tasks, their few-shot performance on GEC remains suboptimal. This is mainly due to the challenge of retrieving suitable in-context demonstrations that capture error patterns instead of semantic similarity. In this paper, we demonstrate that LLMs can inherently capture information related to grammatical errors through their internal states. From these states, we extract the Grammatical Error Representation (GER), an informative and semantically neutral encoding of grammatical errors. Our novel GER-based retrieval method significantly boosts performance in ICL settings on multilingual GEC datasets, improving the precision of correction. For high-resource languages, our results on 8B-sized open-source models match those of closed-source models such as Deepseek2.5 and GPT-4o-mini. For low-resource languages, our $F_{0.5}$ scores surpass the baseline by up to a factor of 1.20. This method provides a more precise and resource-efficient solution for multilingual GEC, offering a promising direction for interpretable GEC research.

16.
arXiv (CS.CL) 2026-06-11

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather than targeting benchmark maximization, this work proposes a principled pipeline that decomposes knowledge synthesis into three coordinated phases: retrieval, curation, and composition, each governed by explicit intermediate representations and handoff contracts. Inspired by Agentic Context Engineering (ACE), the system introduces temporal-semantic reranking, bounded contradiction reconciliation, and citation-preserving composition as core architectural primitives. Competition results show that NightFeats surpasses proprietary baselines including Claude-SonnetV2 and Nova-Pro on LLM-as-a-Judge and Human Likert evaluations, confirming that architectural transparency and verifiable evidence grounding are better aligned with human preferences than systems optimizing narrowly for automatic similarity metrics.

17.
medRxiv (Medicine) 2026-06-22

A Randomized, Controlled, Double Blind Clinical Study to Evaluate Use of Hydron Alkaline Ionised Water (HAIW) in Healthy Participants

Background and Objectives: Alkaline Ionized Water (AIW) is considered among the highest quality healthy drinking water worldwide and is widely discussed for its various health benefits. Hydron Alkaline Ionized Water (HAIW) is produced through electrolysis, resulting in a stable pH of approximately 9.5 with a negative Oxidation Reduction Potential (ORP), making it an antioxidant beverage. The objective of this study was to evaluate the safety of HAIW and its effects on digestion, sleep, energy, and overall quality of life in healthy participants compared to Packaged Drinking Water (PDW). Materials and Methods: A randomized, controlled, double blind, prospective clinical study was conducted in which a total of 24 healthy participants between the age group of 21 to 40 years were randomized in a 1:1 ratio to either HAIW Group or Packaged Drinking Water Group with equal gender distribution. Participants were hospitalized for 7 days and asked to consume at least 3 litres of the assigned water daily. Primary outcomes were safety-related laboratory parameters and adverse event monitoring. Secondary outcomes included assessment of digestion (appetite, digestion, bowel habits), urine parameters, sleep quality, freshness after waking, fatigue, energy/stamina/strength, quality of life, and global assessment Results: All 24 participants completed the study with no dropouts. Baseline demographics were comparable between the two groups. Assessment of primary safety-related laboratory parameters including Complete Blood count, liver function tests, renal function tests, blood sugar, Electrocardiogram and serum electrolytes showed non-significant change from baseline to 7 days and remained within normal limits in both groups, with non-significant difference between groups (p>0.05). HAIW showed significantly better improvement in appetite, digestion, and bowel habits from Day 2 onwards compared to Packaged drinking water. Sleep quality and freshness after waking up showed significant improvement from Day 3 and Day 2 respectively in the HAIW and PDW group, with significantly better improvement in HAIW group. Fatigue scores showed significant reduction at Day 6 and 7 in both groups with non-significant difference between groups. A total of 5 adverse events were reported (3 in HAIW, 2 in PDW), all unrelated to study products and were mild in nature. Global assessment showed excellent to good overall safety and tolerability in both groups. Conclusion: HAIW was well tolerated by all participants without any adverse effects. All laboratory safety parameters remained within normal range. HAIW demonstrated significant improvements in digestive function (appetite, digestion, bowel habits), sleep quality, and freshness after waking as compared to PDW. The study concludes that HAIW can be safely consumed. HAIW improves digestive and sleep-related functions.

18.
arXiv (CS.AI) 2026-06-15

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

arXiv:2606.13753v1 Announce Type: cross Abstract: Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, others observe grokking with no fixed norm at all. We settle this by intervening on the norm during training rather than only observing it. Under free training with weight decay, networks grok when the weight norm reaches a value Wc that varies little across seeds and learning rates (CV 1 to 2 percent) and grows with the modular base as a power law. When we instead clamp the norm to a fixed multiple rho of Wc and hold it there, the network still groks, but the delay follows T_grok proportional to exp(alpha rho). One exponent, alpha near 7.5, fits this delay across four moduli (R^2 = 0.996). Over the swept ranges the held norm moves the delay by about 19x and the learning rate by only about 2x, and holding the norm above Wc slows grokking rather than preventing it. A final LayerNorm removes the dependence by decoupling weight scale from the network function; without it the exponential law returns. This pinned-norm delay is the exponential counterpart to the logarithmic delay predicted for a freely contracting norm.

19.
arXiv (quant-ph) 2026-06-12

Exotic critical states as fractional Fermi seas in the one-dimensional Bose gas

arXiv:2602.17656v2 Announce Type: replace-cross Abstract: Critical quantum field theories occupy a central position in modern theoretical physics for their inherent universality stemming from long-range correlations. As an example, the Tomonaga-Luttinger liquid (TLL) describes a wealth of one-dimensional quantum systems at low temperatures. Its behavior is deeply rooted in the emergence of an effective Fermi sea, leading to power-law correlations and Friedel oscillations. A promising direction to realize systems exhibiting novel universal behavior beyond TLL is through the generalization of the underlying Fermi sea. In this Letter, we show that fractional Fermi seas with reduced occupancy arise in an integrable Bose gas driven out of equilibrium by cyclic changes in interactions from repulsive to attractive. The correlation functions feature signatures of criticality incompatible with a conventional TLL, suggesting a novel critical phase. Our predictions, based on Generalized Hydrodynamics, are directly relevant to cold atoms.

20.
arXiv (CS.AI) 2026-06-16

MR-GVNO: A Geometry-Aware Variational Physics-Informed Neural Operator for Mindlin-Reissner Plates on Irregular Domains

arXiv:2606.16624v1 Announce Type: new Abstract: Plate and shell structures are widely used in engineering, making rapid response prediction under varying geometries, materials, and loads highly desirable. However, conventional finite element methods require repeated modeling and solution, resulting in high computational costs. This study proposes a geometry-aware variational neural operator for Mindlin-Reissner plate problems, termed MR-GVNO. The method uses boundary point clouds to represent irregular geometries and employs separate encoders for spatially varying material fields, pressure loads, and scalar physical parameters. A cross-attention mechanism integrates these inputs with query point information to predict transverse deflections and rotations at arbitrary locations. MR-GVNO is trained without labeled solution data using a variational physics-informed loss derived from the discretized total potential energy. It directly processes irregular point clouds and allows different physical fields to be discretized independently, avoiding interpolation onto a common grid. Numerical experiments on single-hole, double-hole, and L-shaped plates demonstrate accurate response prediction under homogeneous and heterogeneous materials and uniform and random loads. The model also achieves millisecond-level full-field inference and favorable cross-geometry generalization.

21.
arXiv (CS.CL) 2026-06-16

SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG

Fixed-length chunking in Retrieval-Augmented Generation (RAG) often leads to boundary fragmentation, where critical evidence is split across segments, degrading retrieval recall. While static windowing and parent retrieval improve recall, they introduce significant token overhead. We propose SCAR (Semantic Continuity-Aware Retrieval), an adaptive retrieval policy that selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty. SCAR uses a relative expansion threshold tied to each retrieved chunk's own query-relevance, yielding an approximately scale-invariant decision rule that transfers across embedding models without recalibration. Across four diverse corpora (RFC, GDPR, a 10-K report, and a Merger agreement; N=320 queries; 160 boundary-fragmented), SCAR achieves 92.8% recall on boundary-fragmented queries with only 7.84 chunks, a 22.9% reduction compared to static windowing (10.16 chunks). Paired bootstrap tests (B=10,000) confirm the chunk reduction is highly significant (p

22.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

23.
arXiv (CS.AI) 2026-06-19

Mitigating Legibility Tax with Decoupled Prover-Verifier Games

arXiv:2602.23248v2 Announce Type: replace Abstract: As large language models become increasingly capable, it is critical that their outputs can be easily checked by less capable systems. Prover-verifier games can be used to improve checkability of model outputs, but display a degradation in accuracy compared to a baseline trained only to maximize correctness – a phenonemon named legibility tax. We propose a solution by decoupling the correctness from the checkability condition and instead training a "translator" model that turns a fixed solver model's solution into a checkable form. This allows us to first train the solver to maximize correctness, and then train the translator to translate the solver into a checkable form while retaining the solver's answer. To accommodate this new objective of translation, we formulate a decoupled prover-verifier game (DPVG) where the equilibria correspond to faithful and checkable translators.

24.
arXiv (CS.AI) 2026-06-12

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

25.
arXiv (CS.AI) 2026-06-17

From Noise to Order: Learning to Rank via Denoising Diffusion

arXiv:2602.11453v3 Announce Type: replace-cross Abstract: Learning-to-rank (LTR) methods have traditionally been limited to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. We propose an alternative denoising diffusion-based generative approach to LTR that instead models the full joint distribution over features and relevance labels. While in discriminative LTR, an over-parameterized ranking model may find different ways to fit the training data, we posit that candidate solutions that can explain the full data distribution under the generative setting maybe better at estimating relevance. Thus, we propose DiffusionRank that extends TabDiff, an existing diffusion model for tabular datasets, to create generative alternatives to classical discriminative pointwise and pairwise LTR objectives. Our work demonstrates improvements from DiffusionRank over discriminative counterparts on four standard LTR datasets and points to a rich space for future exploration to leverage ongoing advancements in deep generative models for LTR. Our code is publicly available at https://github.com/sadjadeb/DiffusionRank.