Human--LLM Collaboration Is Transforming Complexity Metrics in Scientific Texts

📄 arXiv:2606.27052 · 📥 PDF · 2026-06-25 · nlin.AO

Authors: R. Alexander Bentley [arXiv · scholar] , Blai Vidiella [arXiv · scholar] , Damian J. Ruck [arXiv · scholar] , Senjuti Dutta [arXiv · scholar] , Kai Li [arXiv · scholar] , Sergi Valverde [arXiv · scholar]

🕰 Orloj analysis

6.8

Total score

6.5

Consistency

6.0

Quality

⭐⭐

AD relevance

Tato práce zkoumá, jak velké jazykové modely (LLM) ovlivňují emergentní vlastnosti vědeckých textů na základě analýzy milionů abstraktů z arXivu. Autoři identifikují nárůst indexu stylu spojeného s LLM po roce 2023 a pozorují jemné změny v komplexitních metrikách, jako je obrat nejčastějších slov a zploštění vztahu mezi stylem LLM a komplexitou.

💡 Studie přináší včasné empirické důkazy o subtilních změnách v lingvistickém ekosystému způsobených LLM, což je cenný příspěvek k pochopení dynamiky lidsko-AI interakce v textu.

Categories: INF-7 MET-2 EMG-1 EMG-2

✓ falsifiable, modest_claims, large_dataset, timely_and_relevant_topic

⚠ No explicit mention of code/data availability, Lack of explicit statistical rigor details (e.g., error bars, p-values) in abstract, Limitations of natural experiment not discussed

✗ Domain mismatch (not a physics paper), LLM-associated style index definition unclear in abstract

📄 Abstract

While human language has long been studied as a complex system, Large Language Models (LLMs) are rapidly becoming contributors to its dynamics. Because LLMs are trained on human language use, their effects on the broader human-AI linguistic ecosystem are likely subtle at first. As their use becomes more widespread, however, LLMs may alter emergent properties of language, particularly as models increasingly train on mixed human-LLM textual data. Here, we draw on complexity science to look for subtle LLM effects in millions of arXiv abstracts from 2010 to 2025. The year 2023, when LLMs rapidly became widely used, serves as a landmark in a natural experiment. While we find a sharp increase in a composite LLM-associated style index after early 2023, we observe only subtle changes in the exponents of Zipf's law and Heaps' law. More compelling, however, are two subtle changes in complexity metrics that emerge from 2023 onward. First, turnover among top-ranked words increases sharply. Second, the positive relationship between the LLM-associated style index and three complexity metrics--vocabulary size and the exponents of Heaps' and Zipf's laws--becomes flatter after 2022. Together, these patterns are consistent with changes in the emergent properties of scientific text in a mixed human-AI linguistic ecosystem.

📄 arXiv abstract page 📥 PDF