Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-19

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

arXiv:2606.20183v1 Announce Type: new Abstract: Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the effective dimension $d_eff$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_eff$; in an overfitting regime, contracting $d_eff$ acts as ridge-like regularization. We analyze the mechanism: an exact decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_eff(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_eff$. Amplitude damping contracts $d_eff$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

02.
Nature (Science) 2026-06-10

The Amazon can be saved — with concerted action inside and outside Brazil

Authors: Unknown Author

As deforestation in the Amazon falls, fresh evidence shows that the rainforest can withstand global warming, but only if there is a worldwide effort to stop cutting it down. As deforestation in the Amazon falls, fresh evidence shows that the rainforest can withstand global warming, but only if there is a worldwide effort to stop cutting it down.

03.
arXiv (CS.AI) 2026-06-19

Analyzing the Narration Gap in LLM-Solver Loops

arXiv:2606.19588v1 Announce Type: new Abstract: Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

04.
PLOS Computational Biology 2026-06-12

A new method for augmenting short time series, with application to pain events in sickle cell disease

Authors:

by Kumar Utkarsh, Nirmish R. Shah, Tanvi Banerjee, Daniel M. Abrams Researchers across different fields, including but not limited to ecology, biology, and healthcare, often face the challenge of sparse data. Such sparsity can lead to uncertainties, estimation difficulties, and potential biases in modeling. Here we introduce a novel data augmentation method that combines multiple sparse time series datasets when they share similar statistical properties, thereby improving parameter estimation and model selection reliability. We demonstrate the effectiveness of this approach through validation studies comparing Hawkes and Poisson processes, followed by application to subjective pain dynamics in patients with sickle cell disease (SCD), a condition affecting millions worldwide, particularly those of African, Mediterranean, Middle Eastern, and Indian descent.

05.
arXiv (CS.CL) 2026-06-16

Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns. We propose DiSan(Disentangled Sanitization), a privacy-preserving sanitization framework and a built-in component of Intern-Shannon for multi-agent collaboration. DiSan uses a two-stream encoder to factorize text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local. Federated proto-type alignment and adversarial regularization enable joint training without centralizing raw text. Experiments show that identifier-level masking is insufficient: masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. By contrast, DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness on a distributed multi-agent RAG benchmark, and lowers Enron stylometric attribution by 73.2% under TF-IDF and 70.6% under a neural probe.

06.
arXiv (CS.LG) 2026-06-19

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

arXiv:2509.15822v3 Announce Type: replace-cross Abstract: Predictions from statistical physics postulate that recovery of the communities in the Stochastic Block Model (SBM) with a fixed number $K$ of communities is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold. Failure of low-degree polynomials (LDP) below the KS threshold was also proven, as long as $K\ll \sqrt{n}$, where $n$ is the number of nodes in the observed graph. When $K\geq \sqrt{n}$, Chin et al.(2025) recently proved that, in a sparse regime, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough led them to postulate a new threshold for the many-communities regime $K\geq \sqrt{n}$. In this work, we provide evidence supporting their conjecture:\\ 1- We prove that, for any graph density, LDP fail to recover communities below the threshold postulated by Chin et al.(2025) ;\\ 2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the sparse regime considered in Chin et al.~(2025), but also in moderately sparse regimes, by counting occurrences of some specific motifs inspired by the LDP analysis.\\ In particular, counting self-avoiding paths of length $\log(n)$, which is closely related to spectral algorithms based on the Non-Backtracking operator, is optimal only in the sparse regime. More complex motifs based on the blow-up of a cycle must be considered in denser regimes.

07.
arXiv (quant-ph) 2026-06-24

Improved State Readout in NV Centers using Regression Models and Rabi Driving

arXiv:2606.23454v2 Announce Type: replace Abstract: Readout of state populations in nitrogen-vacancy centers from fluorescence measurements at room-temperature is routinely achieved via contrast-based calibration. The fidelities achieved by this conventional approach are limited by reducing the dynamical fluorescence behaviour of the NV center to a scalar value, and calculating the population of each possible state independently. To address these limitations, we use regression models trained on experimental data to map the fluorescence signals onto ideal simulated populations. Additionally, we enhance the informational content of the fluorescence signals by performing measurements during induced Rabi oscillations. Our results demonstrate that including these dynamical signals significantly reduces state readout errors across multiple tested models. Notably, linear ridge regression performs nearly on par with a non-linear kernel-based model, showing that simple models already capture the relevant mapping between the enhanced fluorescence signals and the underlying state populations. This data-driven approach provides a robust alternative that achieves higher fidelities than conventional calibration in our setting, paving the way for high-fidelity state readout in solid-state quantum registers.

08.
arXiv (CS.CL) 2026-06-19

Code-Switching Reveals Language Anchoring in Multilingual LLMs

Multilingual Large Language Models (MLLMs) are increasingly expected to handle Code-Switched (CS) inputs, yet mixing languages frequently degrades performance relative to source- or target-language monolingual counterparts. To understand this degradation, we use grammar-forced CS as a controlled diagnostic setting for locating CS representations relative to their source and target counterparts. We introduce Anchor Bias, a geometric measure that quantifies language anchoring, whether a CS hidden state aligns closer to its source or target language counterpart. Across diverse MLLMs, Anchor Bias reveals a consistent grammar-frame effect: source-framed CS stays source-anchored, whereas target-framed CS shifts target-ward and shows larger Question Answering (QA) degradation. Motivated by this representational pattern, we propose CANVAS (Contextual Anchor-based Neural Vector Alignment Steering), an inference-time intervention that extracts a source-side canvas from the input and softly steers target-language hidden states toward the source anchor during prefill. CANVAS consistently recovers QA F1 across MLLMs and CS conditions, showing that internal anchoring signals provide an actionable target for mitigating CS inference failures.

09.
arXiv (CS.AI) 2026-06-24

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

arXiv:2605.08245v4 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) increasingly power high-stakes applications, from medical imaging to autonomous systems, yet they routinely hallucinate, confidently describing content not present in the input. We investigate the root causes of these failure modes with a mechanistic analysis focusing on the decoder-based VLMs. We trace these failure modes to a geometric over-alignment: to bridge the modality gap required by attention mechanisms, decoder-based VLMs over-align visual embeddings with the text manifold, injecting a statistical linguistic bias that systematically overshadows fine-grained visual evidence. While prior work either aggressively closes this gap or suppresses hallucinations through expensive black-box decoding strategies, none addresses the underlying geometric cause. We provide the first quantitative characterization of this over-alignment, demonstrating that linguistic bias concentrates in the top principal components of a universal, dataset-agnostic text subspace. Building on this insight, we propose two complementary remedies: a training-free inference strategy and a bias-aware fine-tuning paradigm, both of which explicitly project out this subspace from visual representations. Our methods significantly reduce hallucinations across POPE, CHAIR, and AMBER benchmarks, and improve CLAIR scores on long-form captioning tasks, with the training-free variant adding no computational overhead over the base model.

10.
arXiv (CS.AI) 2026-06-18

Synthetic Resonance: A Framework for Growth-Oriented Human-AI Relationships

arXiv:2606.18265v1 Announce Type: cross Abstract: As human relationships with artificial intelligence systems become increasingly frequent and sustained, existing language and theory fail to accurately capture the nature of these affiliations. Common descriptors such as mutual understanding, connection, or friendship risk anthropomorphizing systems that lack subjective experience, while dominant frameworks tend to reduce AI to either a tool or a threat. In this paper, I introduce the concept of synthetic resonance as an integrative framework for understanding human-AI relationships. Synthetic resonance describes how relationships humans define as meaningful can emerge between a human and an AI system without the need to attribute shared feelings or mutual awareness. I argue that synthetic resonance is best understood as a structured, dynamic pattern of interaction that can produce a sense of relationship without the presence of a second experiencing subject. By clarifying this distinction, the concept of synthetic resonance offers a more precise way of conceptualizing human-AI relationships and highlights their potential value and ethical implications. I also call for more research that tests the processes and outcomes of synthetic resonance.

11.
arXiv (math.PR) 2026-06-12

Non-commutative Law of iterated logarithm

arXiv:2509.22037v2 Announce Type: replace-cross Abstract: We prove optimal non-commutative analogues of the classical Law of Iterated Logarithm (LIL) for both martingales and sequences of independent (non-commutative) random variables. The classical martingale version was established by Stout [Sto70b] and the independent case by Hartman-Wintner [HW41]. Our approach relies on a key exponential inequality essentially due to Randrianantoanina [Ran24] that improves that from Junge and Zeng [JZ15]. It allows to derive an optimal non-commutative Stout-type LIL just as in [Zen15], from that martingale result we then deduce a non-commutative Hartman-Wintner type LIL for independent sequences of random variables.

12.
arXiv (CS.CL) 2026-06-19

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship

Large language models (LLMs) increasingly review and revise text, including their own. A documented self-preference bias (models favoring their own generations when acting as judges) raises the question of whether models also resist valid corrections to their own writing. We test this in a setting where "valid" is decided not by another model but by a deterministic verifier: instruction-following revision on IFEval. A model writes a draft; the official IFEval checker confirms the draft violates a constraint and that a candidate edit fixes it; the model then accepts or rejects that edit either as the genuine in-context author or as a fresh model that sees the draft neutrally. Across four mid-tier model families and 85 author-versus-fresh comparisons, we find no detectable self-preference: authors reject verified-good fixes to their own drafts at essentially the same rate as fresh models judging the same drafts (gap -5.1 pp, 95% CI [-12.9, +2.7]). A self-skepticism hint from a smaller pilot did not replicate at scale. The one robust observation is qualitative: when authors do reject a verified-good fix, 97% of their stated reasons are flaw-catching rather than preference, that is, about the character of rejections, not an elevated rate. Effects smaller than ~13 pp cannot be excluded at this sample size.

13.
medRxiv (Medicine) 2026-06-24

Cortisol Stress Response is Associated with Iron Status in Pregnancy

Background: Iron deficiency (ID) affects up to 40% of pregnant women in the third trimester, even in highly resourced and iron-supplemented populations, with adverse consequences for maternal health and long-term offspring development. Psychological stress may compromise iron status through hypothalamic-pituitary-adrenocortical (HPA) axis dysregulation and inflammation, but no study has directly examined cortisol in relation to iron status across human pregnancy. Objective: This longitudinal study examined associations between HPA function and maternal iron status across pregnancy and tested whether IL-6 and CRP mediated the relationship between cortisol and ferritin across gestation. Methods: One hundred sixty-eight pregnant Black women with Medicaid insurance completed up to four laboratory assessments across pregnancy. Salivary cortisol was measured before and in response to the Trier Social Stress Test, yielding basal and reactive cortisol indices. Serum ferritin, IL-6, and CRP were collected at each visit. Trimester-specific regression models examined cortisol reactivity in relation to ferritin; linear mixed-effects models with moderated mediation tested whether basal cortisol predicted ferritin via inflammation. Results: Higher cortisol reactivity was associated with lower ferritin specifically in the third trimester (std. {beta} = -0.197, p = .004). Higher basal cortisol predicted a steeper IL-6 rise across gestation (p = .002), and IL-6 was positively associated with ferritin (b = 0.236, p = .006), consistent with inflammatory iron sequestration. The indirect effect of basal cortisol on ferritin via IL-6 was statistically significant, and higher basal cortisol was negatively associated with cortisol reactivity in the third trimester. No pathway was observed through CRP. Conclusion: Greater cortisol reactivity predicted lower third-trimester ferritin, a pattern that suggests cumulative iron depletion, atypically sustained HPA reactivity in late pregnancy, or both. To our knowledge, this is the first prospective study linking cortisol reactivity to iron status across human pregnancy, identifying maternal stress physiology as a novel target for understanding and addressing gestational iron deficiency.

14.
arXiv (CS.LG) 2026-06-24

A Fast and Effective Method for Euclidean Anticlustering: The Assignment-Based-Anticlustering Algorithm

arXiv:2601.06351v2 Announce Type: replace Abstract: Anticlustering is an NP-hard combinatorial optimization problem that consists of partitioning a set of objects into equal-sized groups called anticlusters such that the objects in the same anticluster are as dissimilar as possible and thereby representative of the entire set of objects. Here we study the case where the dissimilarity metric is the squared Euclidean distance between the respective feature vectors. Applications of Euclidean anticlustering include social studies, cross-validation, creating mini-batches for stochastic gradient descent, and finding balanced K-cut partitions. In particular, machine-learning applications such as mini-batch generation involve million-scale datasets and very large values of K, making scalable anticlustering algorithms essential. We propose a new algorithm, the Assignment-Based Anticlustering (ABA) algorithm, that scales to instances with millions of objects and hundreds of thousands of anticlusters within seconds to minutes, which is far beyond what existing anticlustering methods can manage. We demonstrate here, via an extensive computational study, that our algorithm outperforms existing anticlustering methods in both solution quality and running time. This is so also for anticlustering with categories. For the related problem of balanced K-cut partitioning, our algorithm is superior to the well-known METIS method. The code of our algorithm is available on GitHub.

15.
Nature (Science) 2026-06-24

The mutational landscape of STING-induced immunity

Authors:

Stimulator of interferon genes (STING) is an evolutionary conserved immune signalling protein with key roles in host defence, cancer, senescence and inflammation1–3. Downstream of STING, type I interferon, inflammatory cytokine signalling and non-canonical autophagy are governed by a multilayered mechanism integrating ligand-induced structural transitions, protein–protein interactions and coordinated intracellular trafficking4–13. Despite its central role in immunity and relevance as therapeutic target14, the sequence elements that govern STING (in)activation in cells remain incompletely understood. Here we developed a massively parallel assay to systematically chart the sequence-function landscape of STING. Profiling thousands of single amino-acid variants, we identified structural and functional determinants that shape the immunostimulatory capacity of STING and its ability to translate ligand recognition into distinct signalling outputs. Cryogenic-electron microscopy structures of select STING hyperactive variants revealed new regulatory principles dictating conformational transition from inactive to signalling-competent states of STING. Mutational effects are widespread across the functional landscape and can sensitize STING towards the natural ligand 2′3′-cGAMP15–18 or decouple interferon induction from non-canonical autophagy, demonstrating a diversity of possible responses that can be accessed through single point substitutions. Finally, our data showed the clinical and evolutionary relevance of naturally occurring STING protein variants. Collectively, these findings define molecular principles that tune STING activity and chart the landscape of its functional potential across immune contexts. A massively parallel assay systematically charts the sequence-function landscape of the STING signalling protein, and the findings define molecular principles that tune STING activity and show its functional potential across immune contexts.

16.
arXiv (CS.AI) 2026-06-24

DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects

arXiv:2606.24779v1 Announce Type: cross Abstract: Birth defects are a major cause of fetal loss, neonatal morbidity and long-term disability. In the subset with suspected genetic etiologies, exome and genome sequencing have moved many cases from variant detection to post-sequencing interpretation: clinicians must rank patient-specific candidate variants under incomplete fetal or infant phenotypes and heterogeneous evidence from population genetics, variant-effect prediction, gene-disease validity, phenotype ontologies, cellular and pathway context, protein structure and clinical literature. We present DeepBD, a grounded agentic workflow for variant prioritization and diagnostic interpretation of genetic birth defects. DeepBD organizes the workflow into LLM-assisted case structuring, a pretrained evidence engine, specialist evidence modules and a grounded diagnostic review layer. The evidence engine learns patient-specific variant scores from structured rule evidence, sequence and variant-effect representations and phenotype-conditioned biological context, whereas specialist modules and the agentic layer provide tool-based refinement, candidate-pool review and diagnosis-oriented synthesis from ranked candidates. Developed using an in-house fetal and infant cohort comprising 18,622 cases, DeepBD achieved Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines evaluated on Exomiser-derived top-20 candidate variants. Ablation and overlap analyses show that rule evidence, mechanistic context, and specialist refinement provide complementary signals. These findings support a grounded agentic workflow that separates evidence integration, tool-based refinement, and LLM-assisted diagnostic review for retrospective variant prioritization in genetic birth defects.

17.
medRxiv (Medicine) 2026-06-11

Wealth-Related Inequalities in Cesarean Section Utilization Among Facility-Based Births in Bangladesh: Evidence from Public and Private Healthcare Facilities

Authors:

Background Bangladesh has experienced a rapid increase in cesarean section (CS) utilization over the past two decades. While previous studies have documented socioeconomic disparities in CS use, evidence on how wealth-related inequalities differ between public and private healthcare facilities remains limited. This study assessed the magnitude and drivers of socioeconomic inequality in CS utilization among facility-based births in Bangladesh. Methods We analyzed data from 3,008 facility-based births reported in the 2022 Bangladesh Demographic and Health Survey (BDHS). Survey-weighted multivariable logistic regression was used to identify factors associated with CS utilization. Wealth-related inequality was assessed using concentration curves and the Erreygers-corrected concentration index (ECCI). Regression-based decomposition of the standard concentration index was performed to quantify the contribution of socioeconomic, demographic, and healthcare-related factors to observed inequalities overall and separately for public and private facilities. Results Overall, 71.2% of facility-based births were delivered by CS, with substantially higher prevalence in private facilities (84.2%) than in public facilities (35.9%). Women delivering in private facilities had markedly higher odds of CS than those delivering in public facilities (adjusted odds ratio [AOR]: 9.07; 95% confidence interval [CI]: 7.17-11.47). Significant pro-rich inequality was observed overall (ECCI: 0.154; 95% CI: 0.117-0.191), with inequality substantially greater in public facilities (ECCI: 0.189; 95% CI: 0.114-0.264) than in private facilities (ECCI: 0.049; 95% CI: 0.014-0.084). Decomposition analysis showed that household wealth was the dominant contributor to inequality, particularly the richest wealth quintile, accounting for 81.5% of overall inequality, 63.8% in public facilities, and 109.7% in private facilities. Conclusions Wealth-related inequalities in CS utilization remain substantial in Bangladesh despite widespread use of the procedure. Although pro-rich inequality exists across both sectors, inequality is considerably greater in public facilities and is driven by different mechanisms across facility types. Policies should simultaneously improve equitable access to medically necessary CS and reduce unnecessary procedures, particularly within the private sector.

18.
arXiv (CS.AI) 2026-06-18

Skill-Guided Continuation Distillation for GUI Agents

arXiv:2606.18890v1 Announce Type: new Abstract: Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.

19.
arXiv (CS.LG) 2026-06-24

Robust and Fast Training via Per-Sample Clipping

arXiv:2605.02701v2 Announce Type: replace-cross Abstract: We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex optimization problems under heavy-tailed gradient noise. Moreover, we establish high-probability convergence guarantees that match the in-expectation rates up to polylogarithmic factors in the failure probability. We complement our theoretical results with multiple numerical experiments. In particular, we demonstrate that PS-Clip-SGD outperforms both vanilla SGD with momentum and standard gradient clipping when training AlexNet on the CIFAR-100 dataset, even after accounting for the additional computational time caused by per-sample clipping. We also empirically show that, in the presence of gradient accumulation, applying clipping at the mini-batch level can improve training performance while incurring virtually no additional computational cost. This finding is particularly interesting, as it contradicts the common practice of applying clipping only after all accumulation steps have been completed.

20.
arXiv (CS.AI) 2026-06-17

Unlocking LLM Code Correction with Iterative Feedback Loops

arXiv:2606.17514v1 Announce Type: cross Abstract: Large Language Models have shown remarkable capabilities in code generation. However, most existing evaluations focus only on single-attempt accuracy and overlook the iterative refinement process that is central to real-world programming. This study presents a systematic investigation of LLMs' ability to rectify their own code through execution feedback. Using real-world programming problems across four models and two major programming languages, this study evaluates performance using iterative refinement framework where LLMs receive compiler error messages and testcase feedback after each attempt. This study introduces metrics to evaluate code failures, analyze rectification patterns, and compare the effectiveness of reasoning and non-reasoning models, offering actionable insights into both the understanding and practical application of feedback loops in LLM-driven code generation systems. Results show that reasoning models consistently improve over iterations, substantially outperforming non-reasoning models in leveraging feedback, while syntactic and runtime errors are far more tractable than logical or algorithmic failures.

21.
arXiv (CS.CL) 2026-06-11

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in contrast to English and other languages, research into sensitive personal information has been limited in the Japanese language. In this study, we focus on sensitive personal data defined as special care-required personal information (SCPI) under Japan's Act on the Protection of Personal Information (APPI). We construct an SCPI dataset using LLM-based annotation and train machine learning models to rapidly detect SCPI in text. As a result, our SCPI classifier can effectively identify information related to SCPI. This study is the first to explore SCPI detection in Japanese text corpora, highlighting the challenges of accurate detection.

22.
arXiv (quant-ph) 2026-06-19

Optimizing resource allocation for accuracy in noisy variational quantum algorithms

arXiv:2606.20153v1 Announce Type: new Abstract: For quantum algorithms to achieve their full potential, we need methodologies to optimize them, such as reaching a given output accuracy with minimal resource costs. Here, we develop such a methodology for a class of Noisy Intermediate-Scale Quantum (NISQ) algorithms. We leverage simulations of a Variational Quantum Eigensolver (VQE) to propose a phenomenological model of such algorithms that captures the complex relationship between algorithmic accuracy, algorithmic resource costs, and the noise that exists in realistic quantum hardware. For this, we take the algorithmic resource cost to be the total number of quantum gate-operations in the algorithm; minimizing this cost typically makes the algorithm faster and more energy-efficient. We consider the subtle trade-off between quantum circuit size (small circuits are too imprecise, but large ones are too noisy), and the number of iterations of that quantum circuit for the full algorithm to sufficiently converge. Using a noise-metric-resource methodology, we identify the sweet spot (of circuit size versus iterations) that minimizes the algorithmic resource costs for a desired algorithm accuracy. It also gives the circuit size that maximizes algorithm accuracy for a fixed resource cost. Our methodology provides a practical guideline for near-term deployment of variational algorithms on realistic noisy hardware, including hardware that uses error mitigation.

23.
arXiv (quant-ph) 2026-06-19

Charge-Conjugation Violation and Population Asymmetry in Bipartite Fermionic Lattices

arXiv:2606.06138v2 Announce Type: replace-cross Abstract: Charge conjugation violation (CCV) is a central concept in particle physics and appears also for quasiparticles in quantum many-body systems, which typically relies on an embedded external symmetry breaking to the underlying system. An open question is how an intrinsic CCV mechanism could emerge and what its macroscopic consequences would be. We establish sublattice kinks in bipartite fermionic lattices as a concrete setup showing intrinsic CCV. The intrinsic CCV of the sublattice kink is based on the graph-topological nature of the underlying Hamiltonian, with no explicit symmetry breaking taking place. It leads to a population asymmetry of different configurations and imprints a hidden leaf-like structure in the eigenenergy spectrum. The population asymmetry also leads to an imbalanced sublattice-kink production triggered by the vacuum-instability in the quench dynamics. Our work demonstrates the graph topology as the microscopic origin of intrinsic CCV, with the population asymmetry as the macroscopic consequence, of which the proposed setup is highly amenable to experimental implementation via cold-atom quantum simulators.

24.
arXiv (CS.CV) 2026-06-16

Implementation of Licensed Plate Detection and Noise Removal in Image Processing

Authors:

Car license plate recognition system is an image processing technology used to identify vehicles by capturing their Car License Plates. The car license plate recognition technology is also known as automatic number-plate recognition, automatic vehicle identification, car license plate recognition or optical character recognition for cars. In Malaysia, as the number of vehicle is increasing rapidly nowadays, a pretty great number of vehicle on the road has brought about the considerable demands of car license plate recognition system. Car license plate recognition system can be implemented in electronic parking payment system, highway toll-fee system, traffic surveillance system and as police enforcement tools. Additionally, car license plate recognition system technology also has potential to be combined with various techniques in other different fields like biology, aerospace and so on to achieve the goal of solving some specialized problems.

25.
arXiv (math.PR) 2026-06-15

Secondary terms for first moments of Selmer groups of twists of elliptic curves over global function fields

Authors:

arXiv:2606.14274v1 Announce Type: cross Abstract: Let $E$ be a non-isotrivial elliptic curve over a global function field $\mathbb{F}_q(t)$ of characteristic coprime to $2$ and $3$. Under some explicit conditions, we determine the secondary terms for the first moments of prime Selmer groups of cyclic prime twist families of $E$ over $\mathbb{F}_q(t)$.