Wikipedia:Responsibly using large language models
This is an essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article or a Wikipedia policy, as it has not been reviewed by the community. |
Per Wikipedia:Writing articles with large language models (LLMs), the use of LLMs to generate or rewrite article content is prohibited.[a] However, if used responsibly, LLMs can still serve purposes outside of content generation and this essay seeks to walk editors through their responsibilities relating to this content guideline and the usual core content policies.
Editors should only employ LLM assistance if they can do so competently and this essay also intends to set out some of the relevant pitfalls to be aware of, rooted in established policies and guidelines that have consensus within the Wikipedia community. It is not intended to prescribe a course of action beyond adhering to Wikipedia's policies and guidelines.
Research
editFinding sources
editLarge language models such as ChatGPT have seen increasing use as search engines and are able to rewrite a conversational prompt into a keyword-based web query, returning links cited to their answers.[1][2] However, empirical research has suggested that at least half of LLMs' responses are not fully supported by the online sources they cite.[3]
Therefore, to ensure that you are not adding content that contains original research, you should manually read these sources and write the content yourself such that it reflects them as accurately as possible. Per the guideline above, you should not use the LLM-written summary cited to the source and you may choose to ignore it completely per the Direction and self-education section below. Given that Wikipedia policy requires encyclopedic content to represent all majority and significant minority views, editors may still be required to read widely around a subject to ensure that what they end up writing is neutral. An LLM may not recommend the best sources available, which can potentially introduce bias into an article. LLMs can also be used to find sources across languages, though editors should at least be "reasonably certain" about the accuracy of any translated material. Editors should ensure that they read and verify these sources themselves, following the guideline on LLM-assisted translation if used.
Direction and self-education
edit
Building an encyclopedia can entail having to write authoritatively about vast and complex topics. Given that Wikipedia is a project driven by volunteers who aren't necessarily experts in the subjects they write about, it may sometimes be alluring to turn to an LLM to clarify a key concept or a piece of peripheral context to help you understand and accurately paraphrase a source.
Empirical research on the use of LLMs in education has been mixed.[4] Some studies have found LLMs to improve students' engagement while others highlight their potential to provide inaccurate or biased information that can mislead students or instill cognitive biases.[5] Other research has found that, even when aided by real-time links, learning through LLMs can lead an individual to develop a shallower understanding of a topic compared to traditional web search.[6] It is relevant to note that the Wikipedia community considers content generated by LLMs to be "generally unreliable".
In the course of their research, editors may turn to an LLM to help orientate them around the topic they're intending to write about, or to analyse an article for improvements and opportunities for expansion. Even if you consult the best sources available, having only used an LLM as an initial guide, they will still sometimes commit original syntheses or hallucinations. If these mistakes are not caught, this has the potential to significantly skew the direction of your research and undermine your understanding of a topic. Inaccurately paraphrasing a source such that it no longer reflects the meaning of the original text contravenes Wikipedia's core policy prohibiting original research. If you do not have the time to do the appropriate due dilligence around checking and correcting for the mistakes LLMs often make (or you don't think white hair would suit you), then it may be best to stick to human-driven methods of research.
Source-based LLMs
editSource-based LLMs such as NotebookLM allow users to upload PDFs or webpage links, with the LLM then being able to use and cite the source(s) to respond to conversational prompts.[7] Users can copy-paste the cited text and search for it in the uploaded source[b] to find, or re-find, material relevant to their query. Editors intending to use these LLMs to orientate them around a source or to find relevant information should be aware that they will still sometimes synthesise or hallucinate information,[8][c] meaning that they possess many of the same pitfalls as LLM-assisted web research. Editors should not use the LLM-written response and should ensure that they read the source(s) themselves and write any content such that it reflects them as accurately as possible.
Notes
edit- ↑ Exempted from this are basic copy-editing and LLM-assisted translation, though editors should familiarise themselves with the relevant guidelines before attempting.
- ↑ If you're not using a PDF viewer with built-in search, you can use the following keyboard shortcuts:
- ↑ Compared to traditional LLMs like ChatGPT or Copilot, LLMs based on retrieval-augmented generation are significantly less prone to hallucinations, though they may still occur.[9]
References
edit- ↑ "How ChatGPT is changing the way we search". Adobe Express. 7 July 2025.
- ↑ "ChatGPT search". OpenAI. Retrieved 23 March 2026.
- ↑ Wu et al. 2025, p. 1.
- ↑ Wang & Fan 2025, pp. 1, 3.
- ↑ Wang & Fan 2025, p. 3.
- ↑ Melumad & Yun 2025, p. 1
- Melumad, Shiri (19 November 2025). "Learning with AI falls short compared to old‑fashioned web search". The Conversation.
- ↑ Caplan, Jeremy (31 December 2025). "The complete guide to NotebookLM". Yahoo Tech.
- ↑ Reyna 2025, abstract.
- ↑ Reyna 2025, p. 2.
Bibliography
edit- Melumad, Shiri; Yun, Jin Ho (2025). "Experimental evidence of the effects of large language models versus web search on depth of learning". PNAS Nexus. 4 (10) pgaf316.
- Reyna, Jorge (2025). The Potential of Google NotebookLM for Teaching and Learning. eLearn 2025 Conference. Bangkok.
- Wang, Jin; Fan, Wenxiang (2025). "The effect of ChatGPT on students' learning performance, learning perception, and higher-order thinking: insights from a meta-analysis". Humanities and Social Sciences Communications. 12 621.
- Wu, Kevin; Wu, Eric; Wei, Kevin; Zhang, Angela; Casasola, Allison; Nguyen, Teresa; Riantawan, Sith; Shi, Patricia; Ho, Daniel; Zhou, James (2025). "An automated framework for assessing how well LLMs cite relevant medical references". Nature Communications. 16 3615.