Wikipedia talk:Large language models/Archive 1
This is an archive of past discussions about Wikipedia:Large language models. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
I like this draft!
I like this draft! It's thoughtful and strikes a good balance. Great work. Llightex (talk) 03:39, 14 December 2022 (UTC)
Feedback / Notes
- I think this is a good starting draft. In addition to this text, in very concise terms, perhaps as a box of some form -- the dos and don'ts should be mentioned. I see some text at User:JPxG/LLM_demonstration that can be pulled-in imo. Ktin (talk) 03:55, 16 January 2023 (UTC)
- Also, I think adding a tag of some form in the talk page indicating that the page or a subset of the page was created using AI generated output based on LLM models should be considered. Ktin (talk) 20:15, 16 January 2023 (UTC)
- See below. The T&C for OpenAI imply to me (IANAL) that each edit that introduces LLM content will require an edit summary saying it has done so. Mike Turnbull (talk) 11:48, 17 January 2023 (UTC)
- Yes, at the minimum we should be doing that. But, in addition, in the talk page we should be adding a template saying that a significant chunk (threshold to be decided) of the article has been generated by an AI. Today, if there are articles generated during certain edit-a-thons or through the Wikischools project we already add a template to the talk page. We should do something similar here as well. Ktin (talk) 19:39, 21 January 2023 (UTC)
- See below. The T&C for OpenAI imply to me (IANAL) that each edit that introduces LLM content will require an edit summary saying it has done so. Mike Turnbull (talk) 11:48, 17 January 2023 (UTC)
- Wouldn't it be nice if we keep track of the model(version)/prompt/(seed) of the generated text as reference. This could generate a curated set of fact — Preceding unsigned comment added by Derpfake (talk • contribs) 21:34, 8 February 2023 (UTC)
- @Derpfake Many of these models are deliberately set up so that some prompts give different answers if used repeatedly: they have an element of randomness with respect to a given seed text. In that respect, they behave differently from most algorithms and there is no "set of facts" one could rely on. Mike Turnbull (talk) 21:56, 8 February 2023 (UTC)
- That is why this seed is part of the reference. And that is why I call it curated. Underneath there does not need to be a consistent body of knowledge, but the output itself might be wort citing (like with any human ;)) Derpfake (talk) 22:04, 8 February 2023 (UTC)
- @Derpfake Many of these models are deliberately set up so that some prompts give different answers if used repeatedly: they have an element of randomness with respect to a given seed text. In that respect, they behave differently from most algorithms and there is no "set of facts" one could rely on. Mike Turnbull (talk) 21:56, 8 February 2023 (UTC)
Copyrights
The "Copyrights" section, as currently written, provides no useful, or at least no usable, information. What exactly is the copyright status of texts produced by LLMs? Do there exist sources treating this question, and if yes, is there a reasonable degree of consensus among them? Nsk92 (talk) 18:12, 16 January 2023 (UTC)
- @Nsk92, I believe they are fine to use, i.e. the person who gave the prompt holds copyright, unless of course the LLM is regurgitating material from elsewhere. — Qwerfjkltalk 18:21, 16 January 2023 (UTC)
- What is your opinion based on? I did a little bit of google searching and the question appears to be murky. E.g. this article from Forbes[1] quotes a lawyer, Margaret Esquenet, saying: "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training." Nsk92 (talk) 20:08, 16 January 2023 (UTC)
- See c:Template:PD-algorithm; that template's perspective is that it is PD in the US, where the servers are hosted. 🐶 EpicPupper (he/him | talk) 02:55, 17 January 2023 (UTC)
- The OpenAI FAQ at this URL says "
OpenAI will not claim copyright over content generated by the API for you or your end users. Please see our Terms of Use for additional details
." These say "As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output.
" The T&C's only other reference to copyright is to provide guidance for anyone who believes that their existing copyright has been infringed by the output generated. - It is also relevant that the T&C say "
You may not...(v) represent that output from the Services was human-generated when it is not;
" That means, I think, that any Wikipedia editor who has used a LLM to generate content for an edit must include an edit summary saying that they have done so. That's going to stretch our WP:AGF guidance, I think. Mike Turnbull (talk) 11:43, 17 January 2023 (UTC)- Thanks, Alalch_E. for adding this stipulation to the specific guidelines. Mike Turnbull (talk) 18:01, 18 January 2023 (UTC)
- What is your opinion based on? I did a little bit of google searching and the question appears to be murky. E.g. this article from Forbes[1] quotes a lawyer, Margaret Esquenet, saying: "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training." Nsk92 (talk) 20:08, 16 January 2023 (UTC)
I have added a section on the relationship between LLMs and copyright, adapted from posts I've made elsewhere. I think the key points are that, while it is not the case that all LLM output is copyrighted, the potential for unfree content exists. Like with human writers, if you ask them (intentionally or unintentionally) to write copyrighted content, they are capable of doing so, and this is something to watch out for. Some more specific guidance should be given, although I am busy today and cannot be arsed to write it right now. jp×g 21:38, 22 January 2023 (UTC)
- Just noting here (maybe you didn't see): I had also added something about copyright today: Special:Diff/1135127144 —Alalch E. 21:46, 22 January 2023 (UTC)
- If the legal situation is unclear due to lack of precedent, then it is impossible to say whether LLM-derived text can be in compliance with our basic policies or not. That seems rather undesirable, to put it mildly. XOR'easter (talk) 18:19, 23 January 2023 (UTC)
Devin Stone ("LegalEagle" on YouTube) released a video on this topic, but the only clear answer is on whether or not a computer program can hold a copyright (US courts have ruled that a human must hold copyright). The answers to (a) does the output have sufficient creativity to qualify for copyright? and (b) is the output a derivative work of the training text? remain unclear until there are court cases or new laws are passed to establish guidance. (The video also discusses the legal issues of using copyrighted training text.) isaacl (talk) 17:55, 27 January 2023 (UTC)
@Nsk92:A lot of the issues were covered in the Village Pump (policy) discussion on chatbots. Here's that section again, for convenience, as the discussion will likely get archived and removed from VPP soon. As to what the copyright status of Chat-GPT output is, well, that was inconclusive. — The Transhumanist 09:23, 30 January 2023 (UTC)
- @The Transhumanist: Can you just link to it (and update the link to the VPP archive when it's moved)? 24,000 bytes of text is a gigantic amount to paste into here. jp×g 09:58, 30 January 2023 (UTC)
- No problem. See below... — The Transhumanist 10:21, 30 January 2023 (UTC)
Copyrights discussion concerning chatbot output, at Village pump (policy)
This subject has been discussed extensively at Wikipedia:Village pump (policy)#Copyright status, identifying many issues. — The Transhumanist 10:13, 30 January 2023 (UTC)
The policy draft should provide full disclosure concerning copyrights and an editor's assertion when pressing "Publish page" or "Publish changes"
The policy draft appears to advocate the addition of chatbot output to Wikipedia:
LLM output should be used only by competent editors who review pasted LLM output before pressing "publish page".
This appears to include generated text (the main use of chatbots). But, we don't know what the inherent copyrights of such output are. Nobody does, yet.
Only the owners of a creative work can place it under open license. When editors edit Wikipedia and press the "Publish page" or "Publish changes" button, they are in essence asserting that they own the copyrights to the material they have typed in or inserted (except for content that falls under fair use) and agree to place it under open license. But, in the case of chatbot output, they don't know if they own the copyrights or not, because the legal issue is up in the air. Can they legally license material that they don't even know that they own?
And since nobody knows if they own the output, should we be making a policy that directs, or appears to direct, editors to add material generated by a chatbot?
I'm for going for it, and having the policy direct exactly that with full disclosure.
The problem is, that the policy doesn't make the issue plainly clear that the "copyrights of chatbot-generated text are uncertain, and, when you press the "Publish page" or "Publish changes" button, you are declaring that you have the right to place the material under open license".
Some editors will be comfortable with doing that, and others won't. But, they should all understand the issue before pressing "Publish page" or "Publish changes".
And that goes for the community as well. Because, the draft will eventually be going before the community for approval, and it should be very clear, with no hidden or obfuscated issues. — The Transhumanist 09:31, 30 January 2023 (UTC)
- I disagree that anything is obfuscated. The draft makes clear the following: Does LLM output inherently violate copyright law -- no (that is what is asserted, see the example with Apple and iPhone photos); Is attribution to the LLM provider needed -- unlikely; Is LLM output capable of violating copyright law -- yes. This is more clear than saying "it's uncertain". —Alalch E. 10:19, 30 January 2023 (UTC)
- @Alalch E.: If an issue isn't in there that should be, then it is a hidden issue. The policy does not yet address the ownership of chatbot output copyrights and the assertion editors make when they press the "Publish page" or "Publish changes" button. Do editors have the right to place chatbot output under open license? Because, that's what they are asserting when they press the button. It should be made clear that that isn't clear. — The Transhumanist 10:58, 30 January 2023 (UTC)
The problem seems to be that the copyright status is currently not clear. This is problematic for our policy because we can neither ban nor endorse it based on copyright grounds. One solution would be to put the responsibility on the editor, maybe by using something like the following text:
By adding the text to Wikipedia, it is published under the Creative Commons license and the GNU Free Documentation License. It is the editor's responsibility to familiarize themself both with the copyright and sharing policies of their AI-provider as well as the general legislation on the copyright status of AI-generated texts to ensure that their addition does not infringe anyone's copyrights. Any text found to be in violation of someone's copyright will be removed.
At this stage, making big contributions based on AI-generated texts is a little like playing with fire. Depending on how these issues are eventually resolved, many contributions may have to be undone later. Phlsph7 (talk) 11:11, 30 January 2023 (UTC)
- If the machine outputted a verbatim copy or a close paraphrase of a non-free text, or created a derivative work such as an abridgement, then just publishing such content infringes copyrights. If the machine outputted something which cannot infringe on anyone's copyright, then it's something that no one is currently able to assert copyright to, neither the Wikipedia "author" nor the entity which owns the model; but they can try to require attribution, which they can't enforce legally, only de facto by controlling access within their domain. But if the Wikipedia editor substantially originated the content in the sense that they had a idea about what they want to create, and assisted themselves with the machine, adjusting and reviewing the output to fit their idea, they probably own the rights. So either no one owns the rights or the editor owns the rights (of all the parts which are not a copyright violation). —Alalch E. 11:36, 30 January 2023 (UTC)
- You seem to be saying that no one owns the copyright, the editor owns the copyright, or someone else does. This statement seems uncontroversial to me. Do you think we should warn editors of the third possibility? Phlsph7 (talk) 12:33, 30 January 2023 (UTC)
- This is currently on the page: "If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license. ... Apart from the a possibility that saving an LLM output may cause verbatim non-free content to be carried over to the article, these models can produce derivative works. For example, an LLM can rephrase a copyrighted text using fewer, the same, or more words than the original – editors should mind the distinction between a summary and an abridgement."I'm unable to make up my mind at the moment about if adopting your text would make it better. —Alalch E. 12:43, 30 January 2023 (UTC)
- Thanks for quoting the passage. Unfortunately, it is not very explicit on what all of this means in terms of copyright violations, for example, in relation to derivative works or to minding "the distinction between a summary and an abridgement". In its current form, some parts of the copyright section read more like an essay than like a policy. The main point of my suggestion is relatively simple: put the responsibility on the editor and make it clear to them that this is a complex issue and there could be problems. Phlsph7 (talk) 13:41, 30 January 2023 (UTC)
- This is currently on the page: "If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license. ... Apart from the a possibility that saving an LLM output may cause verbatim non-free content to be carried over to the article, these models can produce derivative works. For example, an LLM can rephrase a copyrighted text using fewer, the same, or more words than the original – editors should mind the distinction between a summary and an abridgement."I'm unable to make up my mind at the moment about if adopting your text would make it better. —Alalch E. 12:43, 30 January 2023 (UTC)
- You seem to be saying that no one owns the copyright, the editor owns the copyright, or someone else does. This statement seems uncontroversial to me. Do you think we should warn editors of the third possibility? Phlsph7 (talk) 12:33, 30 January 2023 (UTC)
- Regarding the Wikipedia community, it's the transparency of the "playing with fire" issue that concerns me. The policy draft should be clear on this risk to Wikipedia, so that when it goes before the community for their approval, they will be able to take that risk into consideration in making their decision. As Phlsph7 put it: "At this stage, making big contributions based on AI-generated texts is a little like playing with fire. Depending on how these issues are eventually resolved, many contributions may have to be undone later." — The Transhumanist 02:41, 31 January 2023 (UTC)
- It seems to me that The Transhumanist and I are on the same page that the policy should be more explicit on potential copyright problems. The current section on copyright starts with "Publishing LLM output on Wikipedia does not inherently violate copyright law". I take this to mean that: "not every single LLM output is a copyright violation". This seems correct, but the focus is clearly wrong. We should not reassure editors that some outputs do not violate copyright but warn them that some outputs may violate copyright and that it can be difficult to assess. I suggest the following as a replacement for our current copyright section:
AI-generated texts are a rapidly evolving field and it is not yet fully understood whether their copyright status is compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia. Editors should use extreme caution when adding significant portions of AI-generated texts, either verbatim or user-revised. It is their responsibility to ensure that their addition does not infringe anyone's copyrights. They have to familiarize themselves both with the copyright and sharing policies of their AI-provider as well as the general legislation on the copyright status of AI-generated texts. Any addition found to be in violation of someone's copyright will be removed.
- Such a warning seems to be in tune with other points raised in the current draft, which prohibits copy-pasting LLM output directly and warns against adding copy-edited texts. Phlsph7 (talk) 07:49, 31 January 2023 (UTC)
Potential widespread copyright violations and ongoing lawsuits
- @Alalch E.: In response to this edit: I don't think it's clear at this stage that LLM outputs are public domain.
- From [2]:
There’s no issue around personal use of ChatGPT as a conversational assistant. And the rules around using ChatGPT to generate term papers seem pretty clear (don’t even think about it). But when it comes to applying AI-generated prose in content intended for wider distribution — say marketing materials, white papers, or even articles — the legalities get a little murky. When it comes to intellectual property, the model for ChatGPT “is trained on a corpus of created works and it is still unclear what the legal precedent may be for reuse of this content, if it was derived from the intellectual property of others,” according to Bern Elliot, analyst at Gartner.
- Or from [3]:
According to the ICML, the rise of publicly accessible AI language models like ChatGPT — a general purpose AI chatbot that launched on the web last November — represents an “exciting” development that nevertheless comes with “unanticipated consequences [and] unanswered questions.” The ICML says these include questions about who owns the output of such systems (they are trained on public data, which is usually collected without consent and sometimes regurgitate this information verbatim) and whether text and images generated by AI should be “considered novel or mere derivatives of existing work.”
- From [2]:
- See also Wikipedia:Village_pump_(policy)#Copyright_status and the ongoing lawsuits mentioned here. Because of these uncertainties, I think it would be a good idea to mention that it's not yet clear what the copyright status is and whether it's compatible with Wikipedia. Phlsph7 (talk) 18:44, 1 February 2023 (UTC)
- From the same Forbes article:
As a result of the human authorship standard, “under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training,
. So in absensce of evidence to the contrary, it's public domain or a derivative work, or a more blatant copyright violation (something proven to be possible but not mentioned by the quoted expert). We only want edits that are not and do not contain derivative works and that don't copy verbatim from or closely paraphrase sources to the extent that it's a copyright violation. It doesn't say that publishing the output inherently violates copyright becaue the output belongs to, say, OpenAI. I completely agree with JPxG's detailed analysis of this which is the exact reasoning that underpins the current wording (Special:PermanentLink/1135137436). —Alalch E. 19:28, 1 February 2023 (UTC) - For example you can feed a relatively short copyrighted article into ChatGPT and tell it to reword it using twice as many words while not changing the meaning. That would definitely be a derivative work, and simply posting that on Wikipedia and doing nothing else would definitely violate copyrights. —Alalch E. 19:31, 1 February 2023 (UTC)
- @Alalch E.: Have you considered the possibility that the issue of derivative works does not just mean "excessively close paraphrase" for a few individual outputs but is a really widespread issue since all the outputs are based on the training set and are therefore "derived" from it? To me it seems that's what the articles and some of the lawsuits are about. I don't think that JPxG's analysis addresses this point. But it could be compatible with JPxG's claim that not every single output is a copyright violation (for example, because not every work in the training set is under copyright). Phlsph7 (talk) 20:36, 1 February 2023 (UTC)
- @Phlsph7: I have considered this possibility, and so did JPxG, very much so, when he wrote
A derivative work is not a "derivation of works". It's a derivative of a work. If you take ideas and information from multiple works and synthesize a new work, as long as it is not an assembly of derivative (or copied) works side by side, as in a compilation, but a relatively homogeneous amalgam, that's a new work. Otherwise nothing would be an original work. —Alalch E. 20:45, 1 February 2023 (UTC)Whether artificial neural networks are capable of producing original intellectual output is less of a legal issue and more of a philosophical/anthropological one. It should be noted that human brains are themselves neural networks; much has been said, in a variety of fields, on the subject of whether humans create original works versus whether they merely juxtapose or recombine motifs and concepts that they're exposed to through participation in society. While interesting (and humbling), these discussions are unrelated to whether neural networks which have been exposed to copyrighted material in the course of their existence are capable of later creating original works under the purview of intellectual property law: they are (and if this were not the case, a large majority of creative work would be illegal -- good luck finding a band where none of the musicians have ever heard a copyrighted song before).
- @Alalch E.: Thanks for your explanation based on the distinction between synthesis and compilation and also for taking to time to look up JPxG's analogy between humans and LLMs. However, I'm not sure that your argument is successful. In the case of AI-generated images and code based on training sets, I think it's not successful. From [4]:
The artists — Sarah Andersen, Kelly McKernan, and Karla Ortiz — allege that these organizations have infringed the rights of “millions of artists” by training their AI tools on five billion images scraped from the web “without the consent of the original artists.” ... Butterick and Saveri are currently suing Microsoft, GitHub, and OpenAI in a similar case involving the AI programming model CoPilot, which is trained on lines of code collected from the web. ... Whether or not these systems infringe on copyright law is a complicated question which experts say will need to be settled in the courts.
- The general point from this and the other articles seems to be: if an AI learns from a training set containing copyrighted works then it could be the case that it violates those copyrights (even if it synthesizes them instead of reproducing them in a superficially changed form). The underlying issue seems to concern whether training AI on copyrighted works falls under fair use:
The creators of AI art tools generally argue that the training of this software on copyrighted data is covered (in the US at least) by fair use doctrine.
This issue is also discussed at [5] for Copilot and a direct comparison is drawn to openAI:Microsoft and OpenAI are far from alone in scraping copyrighted material from the web to train AI systems for profit. Many text-to-image AI, like the open-source program Stable Diffusion, were created in exactly the same way. The firms behind these programs insist that their use of this data is covered in the US by fair use doctrine. But legal experts say this is far from settled law and that litigation like Butterick’s class action lawsuit could upend the tenuously defined status quo.
The fact that there are several ongoing lawsuits means that this is not some distant maybe but a real possibility. I'm sorry if my previous explanation in terms of "deriving" works was confusing. I hope this makes the issue clearer. Phlsph7 (talk) 03:58, 2 February 2023 (UTC)
- @Alalch E.: Thanks for your explanation based on the distinction between synthesis and compilation and also for taking to time to look up JPxG's analogy between humans and LLMs. However, I'm not sure that your argument is successful. In the case of AI-generated images and code based on training sets, I think it's not successful. From [4]:
- @Phlsph7: I have considered this possibility, and so did JPxG, very much so, when he wrote
- @Alalch E.: Have you considered the possibility that the issue of derivative works does not just mean "excessively close paraphrase" for a few individual outputs but is a really widespread issue since all the outputs are based on the training set and are therefore "derived" from it? To me it seems that's what the articles and some of the lawsuits are about. I don't think that JPxG's analysis addresses this point. But it could be compatible with JPxG's claim that not every single output is a copyright violation (for example, because not every work in the training set is under copyright). Phlsph7 (talk) 20:36, 1 February 2023 (UTC)
- From the same Forbes article:
- @Alalch E.: In response to this edit: I don't think it's clear at this stage that LLM outputs are public domain.
Summary of who owns chatbot output copyrights
Let me see if I have this straight. When a chatbot produces output, ownership is as follows:
A) If the chatbot output includes a copy of someone else's work, the copyright of that portion belongs to that 3rd party.
B) If the chatbot output includes a derivative work of someone else's work, the copyright of that portion belongs to that 3rd party.
C) If the chatbot output or a portion thereof is not a copy or derivative work, its copyright ownership is not legally established. The possibilities include being:
- 1) Part of the public domain, as works originated by non-human entities such as animals (See Monkey selfie copyright dispute).
- 2) Owned by the chatbot owner. Publishers of software applications generally do not own the output of their products, such as word processors and spreadsheets, and the applications are treated as tools in this respect. The difference here is that chatbots are much more autonomous, and so, the issue is unclear. OpenAI has assigned all its claims to rights over its chatbot output to the user, thus removing themselves from the debate.
- 3) Owned by the user. Creative input is required to own output, but is a prompt enough input to be considered creative input? How about a series of prompts? Even this is uncertain, legally.
- 4) Owned by nobody. I don't see how that can be the case. Wouldn't that default to the public domain?
Please correct, expand upon, and clarify the above, as necessary. Thank you, — The Transhumanist 02:28, 31 January 2023 (UTC)
- Just wanted to add that for B, derivative work is a very limited definition (e.g. close paraphrasing). This wouldn't apply to most content where models "synthesize" texts and create a new one.
- As well, for C, 4 indeed doesn't exist (would be public domain). Wikimedia's current perspective is that computer-generated text is in the public domain (see Commons template). This WIPO article says that the US, Australia, and EU considers such texts PD. Hong Kong SAR (China), India, Ireland, New Zealand, and the UK assign copyright to the programmer. The US would largely be the most relevant jurisdiction though, considering it's where the servers are hosted. If you'd like, WMF Legal could be contacted for a preliminary opinion (if not already). You've also noted the OpenAI position, which is relevant for all jurisdictions.
- Thanks for your Village Pump posts, which have produced extensive, useful community input!
- Best, EpicPupper (talk) 03:19, 31 January 2023 (UTC)
- Maybe you're seeing some semantic difference, but I wouldn't say "owned by nobody" doesn't exist; that is by definition public domain. As I mentioned earlier, there needs to be sufficient creativity in a given work in order for it to qualify for copyright. It's unknown at this point how the United States courts will interpret existing copyright law regarding when a program's output can be considered to meet the required standard. I don't think there's a need to seek legal advice at this point, because the case law or regulated standards aren't there yet. isaacl (talk) 04:25, 31 January 2023 (UTC)
- @EpicPupper: I wouldn't be so sure about neglecting B. From [6]:
As a result of the human authorship standard, "under U.S. current law, an AI-created work is likely either (1) a public domain work immediately upon creation and without a copyright owner capable of asserting rights or (2) a derivative work of the materials the AI tool was exposed to during training"
. In relation to AI-generated images based on training sets containing copyrighted images, there are already lawsuits on these grounds, see [7] and [8]. See also [9] for a lawsuit against GitHub Copilot claiming they violated open-source licenses. Copilot is an LLM for creating code. Phlsph7 (talk) 07:36, 31 January 2023 (UTC)- Thanks, it is worth nothing the changing legal landscape. I’ll note that your link mentions the
level of similarity between any particular work in the training set and the AI work
as a factor for consideration; presumably, then, texts very different from training data (not close paraphrasing) might not be considered derivative works, in the same sense that me reading a book and writing an article on its topic isn’t one. EpicPupper (talk) 16:23, 31 January 2023 (UTC)
- Thanks, it is worth nothing the changing legal landscape. I’ll note that your link mentions the
BERT
I removed BERT from the page with the justification that it can't generate text; this isn't technically true, as it can, it's just exceedingly difficult and produces very poor results, especially with regards to wikitext. 🐶 EpicPupper (he/him | talk) 03:11, 17 January 2023 (UTC)
Reference verifiability +
One thing that occasionally happens for human editors and I think LLMs are at higher risk of: Including a reference for a statement that seems relevant and appropriate, but upon reading doesn't not support the statement being made. These sorts of errors are quite hard to spot since it requires checking the cited source and comparing the sentence being supported. The guidelines currently includes the direrction to ccite sources that "directly support the material being presented", but it might be worth being more explicit by reminding the user to read all refs they're citing (roughly equivalent to if a lecturer had a student write the article and uploaded it on their behalf)? T.Shafee(Evo&Evo)talk 06:59, 17 January 2023 (UTC)
- I agree. In my experience LLMs are *really* bad at providing references, mostly just making them up or using inappropriate ones. Alaexis¿question? 08:06, 17 January 2023 (UTC)
- Yep. One more way in which the effort needed to fix LLM text so that it might be halfway acceptable here is no easier than writing the text oneself in the first place. XOR'easter (talk) 18:01, 23 January 2023 (UTC)
Editor skill level
I like the way this is coming together. The only thing that jumps out at me is the section that starts with "LLM output may only be used by editors who have a relatively high level of skill"
which doesn't quite jive with the way we do things on the "encyclopedia that anyone can edit". Yes, competence is required, but it's not the norm to restrict a task to editors of a certain skill level unless permissions are involved. It's also unclear how this would be enforced. Is there a way this could be worded as more of a suggestion that gets the point across? –dlthewave ☎ 20:26, 20 January 2023 (UTC)
- @Dlthewave: This is from more than a week ago, but yeah, your suggestion was acted upon, and the wording has since been improved. The idea was/is that "Editors should have substantial prior experience doing the same or a more advanced task without LLM assistance". Sincerely—Alalch E. 21:54, 29 January 2023 (UTC)
Feedback
I think our policy on LLMs should be consistent with use of machine translation, which also has the potential to generate huge amounts of dubious-quality text. I agree with this page and think it could become half-explanatory supplement and half-new policy/guideline. I would support a new rule requiring edit summaries to note whenever LLM output text is introduced to any page on Wikipedia. However, I think copyright is a bigger question mark and if we are to permit LLM output then we should offer guidance as to how a user is supposed to check that the text is not a copyright violation.
Note that the content translation tool requires XC rights, so it's not unprecedented that we would limit LLM use to volunteers who are somehow "experienced". — Bilorv (talk) 12:44, 21 January 2023 (UTC)
Attribution
OpenAI's Sharing & Publication Policy asks users to "Indicate that the content is AI-generated in a way no user could reasonably miss or misunderstand."
This is a good practice for all AI content and may be worth adding to our policy.
To that end, mentioning AI in the edit summary is insufficient. I would suggest a template similar to Template:Source-attribution with wording similar to OpenAI's recommendation:
"The author generated this text in part with GPT-3, OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication."
I'm not well versed in creating templates, so it would be great if someone could take this on. –dlthewave ☎ 14:00, 22 January 2023 (UTC)
- @Dlthewave: I have created Template:GPT-3. There is no inline functionality as with Template:Source-attribution because it would not suit the purpose. —Alalch E. 21:21, 22 January 2023 (UTC)
- @JPxG: What do you think about this template? —Alalch E. 21:37, 22 January 2023 (UTC)
- I think it looks good, although I am not quite sure where it would go. I remember seeing some very old template along the same lines, from many years ago, to indicate that an article used text from one of the old public-domain Britannicas. Maybe we could do something like that. As regards the actual template formatting, it would probably be better to have a generic title that took parameters for models (and version/settings information as optional parameters), like {{LLM content|GPT-J-6B|version=Romeo Alpha Heptagon|temperature=0.74|prompt="write a wikipedia article in the style of stephen glass, 4k, octane render, f5.6, iso 400, featured on artstation"}} or something. We could have different boilerplate for different models, and provide the model settings as a refnote or something. At least that is how I would do it if I weren't phoneposting. jp×g 00:08, 23 January 2023 (UTC)
- JPxG, that's exactly what I had in mind: Parameters for the AI model with the prompt in a note or reference. Should we make it a point to mention "artificial intelligence" which might be more recognizable than LLM? The template could go either at the top of the article or beginning of the section if only part of it is AI generated, I've seen it done this way with public domain US government sources and it seems to work well. –dlthewave ☎ 00:28, 23 January 2023 (UTC)
- I was thinking something similar. This should go to the talk page of the article. imo. We today have templates that indicate that a page was worked on as a part of a edit-a-thon or as a part of the WP:WPSCHOOLS project. Ktin (talk) 00:11, 27 January 2023 (UTC)
- I think it looks good, although I am not quite sure where it would go. I remember seeing some very old template along the same lines, from many years ago, to indicate that an article used text from one of the old public-domain Britannicas. Maybe we could do something like that. As regards the actual template formatting, it would probably be better to have a generic title that took parameters for models (and version/settings information as optional parameters), like {{LLM content|GPT-J-6B|version=Romeo Alpha Heptagon|temperature=0.74|prompt="write a wikipedia article in the style of stephen glass, 4k, octane render, f5.6, iso 400, featured on artstation"}} or something. We could have different boilerplate for different models, and provide the model settings as a refnote or something. At least that is how I would do it if I weren't phoneposting. jp×g 00:08, 23 January 2023 (UTC)
- @JPxG: What do you think about this template? —Alalch E. 21:37, 22 January 2023 (UTC)
- We wouldn't need an attribution template if we did the right thing and forbade sweeping the output of a bullshit generation machine into an encyclopedia. XOR'easter (talk) 17:27, 23 January 2023 (UTC)
- It will be harder to deal with the problem if a blanket ban is instituted; the good thing about attribution and declaring in an edit summary is that we can identify such edits. —Alalch E. 17:39, 23 January 2023 (UTC)
- The only ethical response to an edit that is attributed to an LLM is reversion. XOR'easter (talk) 17:55, 23 January 2023 (UTC)
- If an edit is marked as LLM-assisted, and we see an experienced editor making it – for example, to add a missing lead, and the edit seems entirely fine at a glance, the situation in that instance would indeed be fine. If we were to see an inexperienced editor doing the same, we would revert. —Alalch E. 19:36, 23 January 2023 (UTC)
- Sorry, but I can't agree that that "would indeed be fine". Writing a good lede takes care. Verifying that the output from Son of ELIZA was actually edited into something that suits the purpose can't be done "at a glance". (This presumes that using a text which is a derivative of the LLM output is actually legal, which has yet to be established.) And who decides when an editor is "experienced"? For that matter, why does "experience" matter here at all? NPOV violations don't become acceptable just because an editor has been around the block. Nor do original synthesis, turning biographies into attack pages... XOR'easter (talk) 20:39, 23 January 2023 (UTC)
- I think it's legal at least for OpenAI's models based on everything that was said in the ongoing discussions, and what's currently in the draft. The issue is more whether attribution is really necessary. The issue of an LLM output creating a potential copyright violation by copying something from a source too verbatim is the usual issue of copyvio, and different from whether mere use of an LLM as something someone owns is inherently problematic for copyright. In the example of generating a summary of what's already in a Wikipedia article there is no meaningful risk of copying non-free content. When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article to speed himself up and would, of course, make the needed manual adjustments before publishing. He would be required to mark the edit as LLM-assisted nevertheless, which he would do. It would be relatively easy for others to check if the newly added lead captures the most important content points. Just sticking to this example, but the implications are broader. —Alalch E. 21:30, 23 January 2023 (UTC)
When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article...
I cannot imagine any of the experienced editors I know using an LLM to summarize an article at all. The experienced editors in the corner of the project that I inhabit start their comments on the topic by saying things like,Beyond the propensity of these things to just make shit up...
[10]. XOR'easter (talk) 23:29, 23 January 2023 (UTC)- I respect your principled stance but, pragmatically speaking, we need to settle on some hypthetical justifiable application of an LLM just to give ourselves headroom not to implement a blanket ban because a blanket ban is not compatible with editors declaring, and we want them to declare so we can locate such edits. It's inconstent to ask them to declare if it's entirely banned. Merely declaring would not by itself mean that the edits are fine and not disruptive. Kind of similar to paid editing. —Alalch E. 00:45, 24 January 2023 (UTC)
- Taking a hard line now is the pragmatic move. If we allow editors to declare, then all the undeclared instances will slip by even more easily, because looking for a declaration will become a shortcut to evaluating acceptability, and we will be sending the signal that piping a bullshit machine into an encyclopedia is, in principle, fine, as long as you're polite about it. XOR'easter (talk) 15:40, 24 January 2023 (UTC)
- I respect your principled stance but, pragmatically speaking, we need to settle on some hypthetical justifiable application of an LLM just to give ourselves headroom not to implement a blanket ban because a blanket ban is not compatible with editors declaring, and we want them to declare so we can locate such edits. It's inconstent to ask them to declare if it's entirely banned. Merely declaring would not by itself mean that the edits are fine and not disruptive. Kind of similar to paid editing. —Alalch E. 00:45, 24 January 2023 (UTC)
- I think it's legal at least for OpenAI's models based on everything that was said in the ongoing discussions, and what's currently in the draft. The issue is more whether attribution is really necessary. The issue of an LLM output creating a potential copyright violation by copying something from a source too verbatim is the usual issue of copyvio, and different from whether mere use of an LLM as something someone owns is inherently problematic for copyright. In the example of generating a summary of what's already in a Wikipedia article there is no meaningful risk of copying non-free content. When I say "experienced editor", just imagine someone whom you would consider an experienced editor in a positive sense (not "experienced" at consistently making subpar edits). Such an editor would use an LLM to generate a summary of the article to speed himself up and would, of course, make the needed manual adjustments before publishing. He would be required to mark the edit as LLM-assisted nevertheless, which he would do. It would be relatively easy for others to check if the newly added lead captures the most important content points. Just sticking to this example, but the implications are broader. —Alalch E. 21:30, 23 January 2023 (UTC)
- Sorry, but I can't agree that that "would indeed be fine". Writing a good lede takes care. Verifying that the output from Son of ELIZA was actually edited into something that suits the purpose can't be done "at a glance". (This presumes that using a text which is a derivative of the LLM output is actually legal, which has yet to be established.) And who decides when an editor is "experienced"? For that matter, why does "experience" matter here at all? NPOV violations don't become acceptable just because an editor has been around the block. Nor do original synthesis, turning biographies into attack pages... XOR'easter (talk) 20:39, 23 January 2023 (UTC)
- If an edit is marked as LLM-assisted, and we see an experienced editor making it – for example, to add a missing lead, and the edit seems entirely fine at a glance, the situation in that instance would indeed be fine. If we were to see an inexperienced editor doing the same, we would revert. —Alalch E. 19:36, 23 January 2023 (UTC)
- The only ethical response to an edit that is attributed to an LLM is reversion. XOR'easter (talk) 17:55, 23 January 2023 (UTC)
- It will be harder to deal with the problem if a blanket ban is instituted; the good thing about attribution and declaring in an edit summary is that we can identify such edits. —Alalch E. 17:39, 23 January 2023 (UTC)
@XOR'easter: I see that you are of a strong opinion on this, although I am inclined to disagree with the main thrust of it. The example you linked was a tweet where someone deliberately instructed it to give false output. I don't think this really demonstrates anything, other than "if you type words into a computer, the screen will say those words, and they do not magically become true by being on the screen". This seems like a fairly pedestrian observation which is tautologically true in almost any circumstance. I could type "2+2=5" onto this very talk page, and there it would be: a completely wrong statement. But who cares? I could also call up my friend and say "listen, Bob, I don't have time to explain: I need you to say that two plus two is five". Does this demonstrate that phones are untrustworthy? Well, sure: somebody speaking words into a phone doesn't make them true. But it is nonetheless possible for words to come out of a phone and have some utility or value. jp×g 10:12, 25 January 2023 (UTC)
- That's one example of many where the output is bad. ChatGPT has repeatedly been shown to invent fake sources, for example. It breaks the chain of attribution. A website full of LLM output would necessarily be an unreliable source; copying text from such a site into Wikipedia would be plagiarism of an unreliable source, and even "fixing" the text before insertion would be highly dubious on legal grounds. Like writing an article by closely paraphrasing an advertisement, it's poor conduct on multiple levels. If one thinks that an LLM is "just a tool", then demanding people disclose their use could arguably be an assumption of bad faith, as noted above. One could argue that it is a requirement for editors to belittle themselves. Do we require editors to disclose whether they ran a spellcheck before saving? Whether they composed text in Emacs or vi before adding it here? Whether they got the creative juices flowing with a shot of Jack Daniels? If an LLM is just a writing aid, then there's not even a slippery slope here: requiring disclosure of one is exactly on the same level as requiring disclosure of any of these other things. If, on the other hand, they are more problematic than other tools, then considering the reasons they are so, is this draft guideline adequately stringent? Regarding OpenAI products specifically, their terms and conditions require that
The published content is attributed to your name or company
, which would mandate in-text attribution of Wikipedia content to Wikipedia editors, which is just not how we do things. XOR'easter (talk) 14:27, 25 January 2023 (UTC)- @XOR'easter: Okay, I understand what you are talking about a little better now. I think I agree with most of this. Personally, I would be happy (and I suspect you might also be happy) if, regardless of whatever other policies, we had something like this:
If you use a language model to generate an entire article in one go, including the inline citations and the reference titles, and then paste it directly into a redlink, [something bad happens]
. I don't know what "something bad" means in this circumstance: maybe the article is a candidate for speedy deletion, maybe it is instantly draftified and put at the back of the AFC queue, maybe the user is blocked unless they can demonstrate that the claims are true and that the references are real. I do think it's important to stop people from doing this, at least with currently available models. What do you think of it? jp×g 23:29, 25 January 2023 (UTC)- Should be speedied as more work to verify than it’s worth. Kind of a WP:TNT situation. I’d like a G category for CSD for this. — rsjaffe 🗣️ 01:05, 26 January 2023 (UTC)
- @Rsjaffe: I've added some language to this effect (i.e. raw LLM output pasted into the article box can be speedied as G3) -- let me know if it's prominent enough. I am thinking maybe it should be moved up further? jp×g 05:42, 26 January 2023 (UTC)
- Should be speedied as more work to verify than it’s worth. Kind of a WP:TNT situation. I’d like a G category for CSD for this. — rsjaffe 🗣️ 01:05, 26 January 2023 (UTC)
- XOR'easter: Someone actually experimented along the lines of my writing-the-lead thought experiment: User:DraconicDark/ChatGPT. The prompts could have been a lot better I believe. Honestly I see some potential here. —Alalch E. 01:31, 26 January 2023 (UTC)
- @XOR'easter: Okay, I understand what you are talking about a little better now. I think I agree with most of this. Personally, I would be happy (and I suspect you might also be happy) if, regardless of whatever other policies, we had something like this:
Burden of Verification
As a new page patroller and a person who has dug up a bunch of articles using LLM generation, I have a serious issue with using LLM for article text. Since LLMs "make up stuff" frequently, each statement needs to be verified. This is a new burden for patrollers and others trying to verify things, as we tend to "test verify", checking to make sure references are reasonably connected to the article. It takes a long time to completely verify each statement.
Secondly, some of these llm articles are using off-line references. As far as I know, every LLM is being trained on electronic sources.
Thirdly, the confabulation llms engage in make it easy for an editor to believe that a statement it makes must be true.
Therefore, my proposal is that "Every statement in LLM-generated text must be validated by footnoting to an online reference that is readily-available to reviewers. Statements not verifiable in an online reference may be deleted and cannot be re-added without a verifiable reference." — rsjaffe 🗣️ 16:59, 25 January 2023 (UTC)
- I think this is mostly covered by WP:V, although I agree that someone typing "write a Wikipedia article about ______" into a prompt window will produce output that ranges from "completely useless" to "actively harmful to the project". I will write something like this into the page. jp×g 22:48, 25 January 2023 (UTC)
A better title is needed
If this is going to become a policy or guideline, we are going to need a better title than 'large language models', which is a technical term that relatively few people will be familiar with, and is entirely non-descriptive. 'AI-generated content' would be clearer, though maybe someone can come up with something better. AndyTheGrump (talk) 23:02, 25 January 2023 (UTC)
- Agree completely. —DIYeditor (talk) 10:37, 26 January 2023 (UTC)
- @AndyTheGrump: The considerations here are partly for the sake of precision -- "AI" is an extremely nebulous buzzword that can (in both theory and practice) mean basically anything. This goes from fully unsupervised learning with neural networks to a bunch of if-then statements... I wrote an AI for a video game once in about 10 bytes of code, where it would detect your position and change the enemy's acceleration to match, and I'll be darned if it didn't look like it was being piloted by a real player.
- Along with the ambiguity of the buzzword comes a good deal of philosophical and political baggage, as well. We may recall in the news a few months ago there was a big kerfluffle about whether language models were "sentient" (whatever that means -- I don't think anybody involved with this was reading Descartes, much to their detriment, and everybody else's). I don't think we need to be taking a side on this issue, or at least not yet.
- Lastly, the term "AI" is used within the field mostly as a catch-all term and for marketing purposes, like "space-age" or "cutting-edge" or "advanced": note that OpenAI's name has "AI" in it, but the research publications refer to "generative pre-trained transformers" as a type of "language model". jp×g 12:32, 26 January 2023 (UTC)
- We need a title that describes the subject matter for non-technical Wikipedia contributors. 'Large language models' doesn't. And if people were ever to use 'bunch of if-then-statements AI' in article creation (which would seem unlikely) the proposed guideline/policy ought to cover that too. It really doesn't matter what the algorithms being used are, the objections to their use still apply. The issue is the output, not the algorithm. AndyTheGrump (talk) 12:52, 26 January 2023 (UTC)
- My suggestion for a title would be "Computer-assisted text generation" but I don't think it matters very much as we can create lots of other shortcuts to the guidance, which will be in WP space. We already have WP:AI, which is about something entirely different and WP:Artificial intelligence, for example. The latter was recently created and duplicates part of what is now being drafted. Mike Turnbull (talk) 13:01, 26 January 2023 (UTC)
- I like that and would suggest that WP:BOT and WP:CHATBOT also point to it. — rsjaffe 🗣️ 20:21, 26 January 2023 (UTC)
- But the first shortcut already covers the actual Wikipedia bots, which are helpful for the most part. WP:AIGENERATED would be a good shortcut. 2001:448A:304F:52BA:B834:10F7:8013:7F11 (talk) 00:11, 27 January 2023 (UTC)
- "Computer-assisted text generation" includes what I just typed now, considering the spell check and possible auto-complete or auto-correct at play. That said, maybe we should include that under the umbrella of this policy/guideline. —DIYeditor (talk) 22:43, 27 January 2023 (UTC)
- I like that and would suggest that WP:BOT and WP:CHATBOT also point to it. — rsjaffe 🗣️ 20:21, 26 January 2023 (UTC)
- My suggestion for a title would be "Computer-assisted text generation" but I don't think it matters very much as we can create lots of other shortcuts to the guidance, which will be in WP space. We already have WP:AI, which is about something entirely different and WP:Artificial intelligence, for example. The latter was recently created and duplicates part of what is now being drafted. Mike Turnbull (talk) 13:01, 26 January 2023 (UTC)
- We need a title that describes the subject matter for non-technical Wikipedia contributors. 'Large language models' doesn't. And if people were ever to use 'bunch of if-then-statements AI' in article creation (which would seem unlikely) the proposed guideline/policy ought to cover that too. It really doesn't matter what the algorithms being used are, the objections to their use still apply. The issue is the output, not the algorithm. AndyTheGrump (talk) 12:52, 26 January 2023 (UTC)
What benefit is sanctioning any use of AI?
How does it benefit Wikipedia to permit any use of AI to generate articles even by competent editors? What purpose does LLM text serve? Is a "competent editor" not able to type out what they want to say? Does it save any time to have to verify every single word vs. to generate it from one's own mind with reference to sources? Or is this just to allow for experimentation? I'm not against it at all, I find machine learning fascinating and think it is the way of the future - I'm just not sure I see any benefit to it right now.
I do think each edit using LLM generated text should be required to carry a special tag along the lines of 2017 wikitext editor, Reply, and Source. —DIYeditor (talk) 10:35, 26 January 2023 (UTC)
- There are about a dozen demonstrations linked from the last section; personally, I don't think there is any reason to have them generate entire pages (they are unbelievably bad at this). Other things, like formatting and identifying potential areas for article improvement, worked pretty well. jp×g 12:20, 26 January 2023 (UTC)
- No benefit and plenty of harm for new text generation. May be useful for editing certain things. — rsjaffe 🗣️ 20:23, 26 January 2023 (UTC)
not entirely good
Personally, I don't agree with a blanket statement that "even experienced editors will not have entirely good results when assisting themselves with an LLM in [specific situations]". The ending clause to the sentence, "...the extra effort is not worth it compared to purely human editing", seems to agree that it is possible for the result to be acceptable, just with an undue amount of effort. Perhaps some of the instructions can be simplified to something like "treat machine-generated text as if it came from an unknowledgeable editor: every aspect must be verified and cited to sources as appropriate". On a side note, I find the use of "entirely good" to be awkward. Perhaps something like "flawless" can be used. isaacl (talk) 17:35, 27 January 2023 (UTC)
- I'd like to act on your suggestion, I'm just not in right state of mind right now; feel free to make these changes yourself in the meantime. —Alalch E. 00:50, 28 January 2023 (UTC)
- I'm a bit wary of making changes without knowing of any others will object (though of course perhaps a specific edit might make it easier for them to form their opinions). If there is anyone who thinks these changes may be helpful or has concerns, can you please provide feedback? isaacl (talk) 17:27, 28 January 2023 (UTC)
- I did some rewording to avoid the awkward phrase "entirely good" and made some changes to the introduction for the section listing things for which LLMs are not a good fit. isaacl (talk) 05:13, 29 January 2023 (UTC)
- Yeah, that's better, thank you. —Alalch E. 05:58, 29 January 2023 (UTC)
Removal of suspected machine-generated text
I don't like singling out suspected machine-generated text for summary removal. Any edits suspected of being made without adequate verification are subject to removal, in accordance with the "be bold" guidance and verifiability policy. I prefer not listing out special cases as it can give the impression that non-listed scenarios are not subject to this guidance. isaacl (talk) 17:42, 27 January 2023 (UTC)
- The goal here really is to create something that should discourage large additions at once. —Alalch E. 06:11, 29 January 2023 (UTC)
- By large additions, do you mean to one article? Or many articles (which would harken to the mass creation RfC which is still in progress)? Is this something that needs to be addressed regardless of how the changes were created? isaacl (talk) 06:16, 29 January 2023 (UTC)
- This was originally written by North8000 (including the phrase "en masse"), but from the original context of the surrounding discussion, and the apparent aim, the meaning was to deal with an (I say "an" but it's not really speculative at this point) influx of LLM-generated material which very often comes in larger chunks ("en masse" doesn't depict that well); this has to do with the rationale already on the page about how it's tiresome and depressing to check these kinds of edits. If the addition is small, (as in 1-3 average sentences) I believe that the claims could be easily reviewed from the standpoint of verifiablity and copyedited; but if there is a lot, even with references, and LLMs can output a lot – due to the fact that the references, if not bogus (well, when that's detected, the whole thing has to go away obviously), have probably been inserted after the fact by the human... while the prose was machine-generated; this will cause certain incongruences which may be extra annoying to detect but they could actually be pretty serious errors. So the idea is that edits should come in chunks at a normal pace similar to what we're used to. They must not come in very quick succession (MEATBOT is also mentioned, and is important). It isn't just about verifiability/OR, it's also about copyright, but also, plausibly about NPOV. —Alalch E. 06:45, 29 January 2023 (UTC)
- I think concerns about rate of submission are better dealt with by the mass creation discussion. Problems with reviewing throughput matching submission throughput exist independently of how the content was created. (Note a countervailing viewpoint during that discussion is that a reviewing backlog doesn't matter if the submitted content is of a sufficient quality.) isaacl (talk) 16:40, 29 January 2023 (UTC)
- I added a pointer to this discussion at Wikipedia talk:Arbitration Committee/Requests for comment/Article creation at scale#Machine-generated text. True enough that the RfC is about mass article creation so only covers one aspect of the concern being discussed here. All the same, I think mass changes to an article should be throttled based on volume concerns, without having to speculate on how the edit was created. In practice, large edits to articles do get reverted in accordance with the bold, revert, discuss cycle, when reviewers feel that it would be better to review the changes in smaller parts. isaacl (talk) 16:52, 29 January 2023 (UTC)
- @Isaacl: Good now? —Alalch E. 12:16, 30 January 2023 (UTC)
- Not really; I'm suggesting this should be handled more generally for any large edit that reviewers want to break down to review, without having to hypothesize on how the edit was created. isaacl (talk) 17:32, 30 January 2023 (UTC)
- If there are no sources you are right that it doesn't matter how the edit originated. But if there are at least some sources in a large addition, wholesale reversion could be seen as unconstructive. And, with LLM-assisted edits, it should not be seen as unconstructive. It's because when a human writes three long paragraphs of prose about a topic they're interested in, while refering to some materials, even if they are not citing everything as they shuld, the idea of what they are going to write is being formed as they are reading the materials, and doing research (not always the case, granted). But with LLMs the generated prose comes first, and then a human may selectively add citations to make it somewhat verifiable, but it takes a lot of work to make everything fit together. Someone could doubt that this work was done, and they may revert, which could, but should not, then be undone by saying "don't revert, just make incremental improvements to what I added yourself, it's a collaborative project" etc etc, which could soon develop into an unpleasant dispute. Allowing for reverting more summarily is a mechanism to avoid such disputes, and put the burden on the adder of stuff to establish valid proof of work by doing his own incremental edits, supported by descriptive summaries. So this is intended to clarify that WP:BURDEN is even more strongly on the editors who adds material than usual, not just by providing (some) references, but by demonstrating that there's a human process behind the entire change. —Alalch E. 17:49, 30 January 2023 (UTC)
- I think it's a digression to worry about how the edit was generated. If the edit has characteristics of being unproductive and is larger than can be comfortably reviewed, editors today will revert it and request that it be broken down into more easily reviewed parts. There are drawbacks to this (it introduces significant inertia to pages, which some may also see as a benefit, depending on what level of quality you think the current page has), but it's a balance between the needs of those reviewing changes and not overly impeding new changes. If someone is consistently spamming Wikipedia with text which they have not personally reviewed to be compliant with policy, then our behavioural policies should be used to deal with it. It doesn't matter why they're not following policy.
- Side note: ultimately, the problem with dealing with large edits is that, in spite of Wikipedia's ideal of a crowd-sourced encyclopedia, writing large paragraphs of text in a group doesn't work very well. Crowd-sourcing is good at incrementally editing an existing base text. This parallels what happens in the real world when a group writes a document: it gets divvied up into sections for different people to write, and then the result is reviewed together. isaacl (talk) 18:28, 30 January 2023 (UTC)
- I agree more than not. How would you change the Wikipedia:Large language models#Verification section then? —Alalch E. 18:47, 30 January 2023 (UTC)
- My apologies; I failed to note that you removed the section to which I was primarily objecting, so I'm mostly OK with the changes. I think the sentences you added, while being a good suggestion, might be a little too prescriptive ("if A, then do B"). I suggest trying to prevent problems with the initial edit, perhaps with something like
Instead of making one large edit, consider breaking down your planned changes into multiple edits, and make them one at a time, leaving a period between each to allow for review.
isaacl (talk) 04:58, 31 January 2023 (UTC)
- My apologies; I failed to note that you removed the section to which I was primarily objecting, so I'm mostly OK with the changes. I think the sentences you added, while being a good suggestion, might be a little too prescriptive ("if A, then do B"). I suggest trying to prevent problems with the initial edit, perhaps with something like
- I agree more than not. How would you change the Wikipedia:Large language models#Verification section then? —Alalch E. 18:47, 30 January 2023 (UTC)
- If there are no sources you are right that it doesn't matter how the edit originated. But if there are at least some sources in a large addition, wholesale reversion could be seen as unconstructive. And, with LLM-assisted edits, it should not be seen as unconstructive. It's because when a human writes three long paragraphs of prose about a topic they're interested in, while refering to some materials, even if they are not citing everything as they shuld, the idea of what they are going to write is being formed as they are reading the materials, and doing research (not always the case, granted). But with LLMs the generated prose comes first, and then a human may selectively add citations to make it somewhat verifiable, but it takes a lot of work to make everything fit together. Someone could doubt that this work was done, and they may revert, which could, but should not, then be undone by saying "don't revert, just make incremental improvements to what I added yourself, it's a collaborative project" etc etc, which could soon develop into an unpleasant dispute. Allowing for reverting more summarily is a mechanism to avoid such disputes, and put the burden on the adder of stuff to establish valid proof of work by doing his own incremental edits, supported by descriptive summaries. So this is intended to clarify that WP:BURDEN is even more strongly on the editors who adds material than usual, not just by providing (some) references, but by demonstrating that there's a human process behind the entire change. —Alalch E. 17:49, 30 January 2023 (UTC)
- Not really; I'm suggesting this should be handled more generally for any large edit that reviewers want to break down to review, without having to hypothesize on how the edit was created. isaacl (talk) 17:32, 30 January 2023 (UTC)
- @Isaacl: Good now? —Alalch E. 12:16, 30 January 2023 (UTC)
- This was originally written by North8000 (including the phrase "en masse"), but from the original context of the surrounding discussion, and the apparent aim, the meaning was to deal with an (I say "an" but it's not really speculative at this point) influx of LLM-generated material which very often comes in larger chunks ("en masse" doesn't depict that well); this has to do with the rationale already on the page about how it's tiresome and depressing to check these kinds of edits. If the addition is small, (as in 1-3 average sentences) I believe that the claims could be easily reviewed from the standpoint of verifiablity and copyedited; but if there is a lot, even with references, and LLMs can output a lot – due to the fact that the references, if not bogus (well, when that's detected, the whole thing has to go away obviously), have probably been inserted after the fact by the human... while the prose was machine-generated; this will cause certain incongruences which may be extra annoying to detect but they could actually be pretty serious errors. So the idea is that edits should come in chunks at a normal pace similar to what we're used to. They must not come in very quick succession (MEATBOT is also mentioned, and is important). It isn't just about verifiability/OR, it's also about copyright, but also, plausibly about NPOV. —Alalch E. 06:45, 29 January 2023 (UTC)
- By large additions, do you mean to one article? Or many articles (which would harken to the mass creation RfC which is still in progress)? Is this something that needs to be addressed regardless of how the changes were created? isaacl (talk) 06:16, 29 January 2023 (UTC)
writing code
I disagree with suggesting that writing programming code is a good fit for the use of large language models. Small changes in code can result in very different results. Replicating the solution to a well-known problem can be done pretty easily with machine-generated code. But the chances of a small mistake for a new problem are quite good. isaacl (talk) 18:14, 27 January 2023 (UTC)
- I had moved this to "Things that LLMs are not a good fit for" and was reverted. Similar to the vaguely plausible AI-generated articles that we've seen, I just don't trust it not to generate code that looks good and seems to work yet has some unnoticed flaw. I think the best policy would be to prohibit use by default and allow exceptions for specific use cases where it has been vetted and shown to be consistently reliable, similar to our bot policy. –dlthewave ☎ 19:31, 27 January 2023 (UTC)
Frankly, I think "don't make disruptive changes to high-use templates without being prepared to revert" and "don't execute code if you have no idea what it does" are such fundamentally basic principles of programming that they shouldn't need to be mentioned at all, and the only reason I wrote them out was to take an abundance of caution. If someone doesn't understand these things, there is no force in the universe strong enough to protect them from breaking things (except maybe a power outage).
As an example of the tool's potential for constructive work, look at the diffs here and here, which instantly fixed a couple rather large bugs: mind you, the feature that caused these bugs to crash the module was written and tested in a sandbox by two human programmers, me and Mr. Stradivarius). While I am not very well-versed in Lua syntax specifically, it is pretty obvious what is going on in if type(self.data.authors) == "table" then return table.concat(self.data.authors, ", ")
. jp×g 07:24, 28 January 2023 (UTC)
- To me the rough analogy is a grammar checker: it's good at giving suggestions about existing text, though it's not necessarily always correct. I think it overstates matters to say that writing code from scratch is a good fit in the general case. (Regarding the specific diffs: checking that data exists before trying to return it is a fairly common error that various code verifications tools can identify; no need for neural networks to flag it.) isaacl (talk) 17:19, 28 January 2023 (UTC)
- I'd be a lot more comfortable with guidance suggesting that machine-generated code can produce useful results for mechanical boilerplate tasks. Without qualifications, I think it's too strong of a recommendation to say that machine-generated code can "work great", or that it may be a good fit. The exact same concerns described under "writing articles from scratch" are concerns with writing code from scratch. isaacl (talk) 05:23, 29 January 2023 (UTC)
I propose changing the text to parallel the analogous paragraph for articles (starting with "Asking an LLM for feedback on an existing article"). I suggest something like Asking an LLM for feedback on a template or module: LLMs can provide guidance on standard coding patterns. Caution should be exercised, though, as subtle differences can alter the code's behaviour significantly.
isaacl (talk) 22:12, 30 January 2023
- If there are no further comments, I plan to make the change, and of course feedback can be given then. isaacl (talk) 20:51, 31 January 2023 (UTC)
Regarding this edit: I disagree with restoring the earlier text, as per my previous comments. (I don't have an issue with moving the text to another subsection.) I don't agree with a recommendation that using an LLM to write new code is warranted at present. isaacl (talk) 16:50, 1 February 2023 (UTC)
- Isaacl I'm in agreeance that we shouldn't be recommending this yet. Maybe this should be part of a larger discussion, but I think we're seeing a conflict between the aspirational (what a well-developed LLM could be used for in the future) and the practical (what current models are capable of doing well). In my opinion we should stick to the latter and mirror the bot policy in prohibiting/discouraging by default and only approving specific use cases that have been demonstrated to be reliable. –dlthewave ☎ 18:30, 1 February 2023 (UTC)
- I strenuously disagree here: I have provided specific citations to diffs where LLM output fixed bugs, and nobody has given a single instance of them creating bugs. The concern here is entirely hypothetical, and nobody has given an explanation of how the risks are in any way different from a human doing the same thing. How is it more likely to happen? How is it more damaging? "Do not put sweeping changes to widely-used software into production without testing it" is a fundamental, common-sense principle. "You would die if you drank coffee with cyanide in it" is an argument against cyanide, not against coffee. jp×g 01:18, 2 February 2023 (UTC)
- As I explained, the bug fix examples you gave can be done (and are done) with code correctness tools, including checks built into compilers for various languages. They are not special fixes that need an LLM to resolve. My proposed changes are in line with the examples you gave: using an LLM to review code. Extrapolating from a couple of examples isn't a great idea, in any case, and I think it is reasonable to warn caution. isaacl (talk) 01:38, 2 February 2023 (UTC)
- I don't think extrapolating from zero examples is a better idea, and the section I wrote consists almost entirely of repeated urging of caution, four separate times:
you should make sure you understand what it's doing before you execute it
,bugs and errors can cause unintended behavior
,Common sense is required
,you should not put large chunks of code into production if you haven't tested them beforehand, don't understand how they work, or aren't prepared to quickly reverse your changes
. Is there some other part of this you're objecting to, like "LLMs can write code that works great, often without any subsequent modification"? This is not a critical part of the section and I am fine with it being revised or removed. jp×g 02:04, 2 February 2023 (UTC)- Yes, I've already stated my objection to this and made a proposal that removed it. My version doesn't have four warnings, just one urging caution. In the submission I made, I kept a version of your last warning. isaacl (talk) 02:29, 2 February 2023 (UTC)
- I don't want to speak for Isaac but I think the distinction here is between writing code and debugging/linting existing code. The latter has existed for a long time, the former is bad news.
- Since no one has given an instance of LLM output creating bugs, at least one study found that about 40% of auto-generated code from GitHub Copilot (same company as ChatGPT) contained errors and/or security vulernabilities. Gnomingstuff (talk) 17:36, 3 February 2023 (UTC)
- I don't think extrapolating from zero examples is a better idea, and the section I wrote consists almost entirely of repeated urging of caution, four separate times:
- As I explained, the bug fix examples you gave can be done (and are done) with code correctness tools, including checks built into compilers for various languages. They are not special fixes that need an LLM to resolve. My proposed changes are in line with the examples you gave: using an LLM to review code. Extrapolating from a couple of examples isn't a great idea, in any case, and I think it is reasonable to warn caution. isaacl (talk) 01:38, 2 February 2023 (UTC)
- I strenuously disagree here: I have provided specific citations to diffs where LLM output fixed bugs, and nobody has given a single instance of them creating bugs. The concern here is entirely hypothetical, and nobody has given an explanation of how the risks are in any way different from a human doing the same thing. How is it more likely to happen? How is it more damaging? "Do not put sweeping changes to widely-used software into production without testing it" is a fundamental, common-sense principle. "You would die if you drank coffee with cyanide in it" is an argument against cyanide, not against coffee. jp×g 01:18, 2 February 2023 (UTC)
- While LLMs aren't yet unequivocally good at anything, coding is probably what they're best at. So I see no issue with the current phrasing.
- Further, I disagree that LLMs are good at "providing guidance on standard coding patterns" (very imprecise). They're (very roughly) good at outputting ready-to-run code if you input some desired specs, or input buggy code and ask it to fix it. The current phrasing reflects that well. Whether this is risky or not is a separate issue. But I strongly doubt that they're any good at explaining why their code is better, or why some code is bad, or giving any guidance. DFlhb (talk) 08:42, 3 February 2023 (UTC)
- I didn't mean that LLMs could review and explain code, but that you could give it code and ask it to resolve any issues, as would be done in a human code review. I disagree that coding is what LLMs are best at; I think they are better at generating human language text where there is a greater scope for the intended results, and thus more flexibility in following the language correlations wherever they may lead. I propose that the text is changed to the following:
Asking an LLM for feedback on a template or module: LLMs can be given existing code and review it for inconsistencies with standard coding patterns. Caution should be exercised, though, as subtle differences can alter the code's behaviour significantly. You should not put code into production if you haven't tested it beforehand, don't understand how it works, or aren't prepared to quickly reverse your changes.
isaacl (talk) 15:22, 3 February 2023 (UTC)
- I didn't mean that LLMs could review and explain code, but that you could give it code and ask it to resolve any issues, as would be done in a human code review. I disagree that coding is what LLMs are best at; I think they are better at generating human language text where there is a greater scope for the intended results, and thus more flexibility in following the language correlations wherever they may lead. I propose that the text is changed to the following:
- @Isaacl, I managed to get ChatGPT to add a feature to a module; see Special:Diff/1138239911. My prompt was "I want to rewrite it so it can check multiple templates, and return the first one's parameter value." (I had inputed the code in an earlier prompt). This change seems to have no problems; I haven't tested it fully. — Qwerfjkltalk 18:11, 8 February 2023 (UTC)
- The introduction of a loop to loop over multiple templates is a pretty standard coding pattern. What does interest me is the elimination of the
allcases()
function. Technically, that is a change in behaviour from the original code. However if I'm understanding correctly, the implementation of the function is too broad in the current code. Only the case of the first character of the template name ought to be ignored, while parameter names are case-sensitive. I wonder what prompted the program to eliminate the function? It is an example of how domain-specific knowledge is needed to code this correctly, and how the output must be reviewed carefully. I'm not sure if this specific example is a net reduction in effort for a coder. Plus there's additional tidy-up steps in this case that adds more work: when adding functionality, often it's desirable to avoid certain types of cosmetic changes to make the change easier to review. (Alternatively, the cosmetic changes could be isolated into a separate edit.) Changing single quotes to double quotes is unnecessary and would ideally be reverted or isolated. Also the replacement of method calls on the string objects is unnecessary, and I think should be reverted. isaacl (talk) 17:36, 10 February 2023 (UTC)- I just realized the sandbox and the module are not synced, and so the cosmetic differences and the removal of the
allcases()
function is due to that. Here is a clean diff between the production module and the sandbox. Deleted discussion of a change that was due to the request made to the code generator. isaacl (talk) 18:46, 11 February 2023 (UTC)
- I just realized the sandbox and the module are not synced, and so the cosmetic differences and the removal of the
- The introduction of a loop to loop over multiple templates is a pretty standard coding pattern. What does interest me is the elimination of the
Related discussions
- Wikipedia:Administrators' noticeboard/Incidents#Artificial-Info22 using AI to produce articles
- Wikipedia talk:Criteria for speedy deletion#New "G" variety for articles totally consisting of LLM text
- Wikipedia:Miscellany for deletion/Draft:Social Security in the United States of America (USA)
And a new maintenance tag / article cleanup template:
- Template:AI generated
- Template talk:AI generated - some discussion here about whether to encourage deletion via the tag's text or not
Templates for AI-generated articles
{{AI generated}} for passing editors to put on LLM-generated articles and {{AI generated notification}} for creators to put on the talk page of LLM-generated articles. The latter needs much more work. — Trey Maturin™ 20:58, 27 January 2023 (UTC)
- See the discussion above regarding naming this proposed policy/guideline. We should not be using technical jargon in templates. 'Large language model' means precisely nothing to most people. Explain what the problem is, in words that people understand. AndyTheGrump (talk) 21:08, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 21:31, 27 January 2023 (UTC)
- The problem goes deeper than just the template and the title. The entire proposal is full of the same jargon. It is unnecessary and counterproductive. AndyTheGrump (talk) 21:52, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 22:00, 27 January 2023 (UTC)
- Are you suggesting I take it on to myself to move the article to an alternate title and rewrite it substantially without prior agreement, while a significant number of other contributors are working on it? AndyTheGrump (talk) 23:17, 27 January 2023 (UTC)
- It would seem an option more likely to achieve anything at all compared to carping and expecting others to make major but nebulous changes for you, so… yeah, kinda. — Trey Maturin™ 23:57, 27 January 2023 (UTC)
- As someone who is working on it, I'm not a priori opposed to bold renaming and changes. While I like the current name because it is correct, I'm interested in discovering a name that on aggregate provides an increase in everyone's satisfaction, but I'm unsure about what the options are. Several people have said that "AI" is a bit of a buzzword; this is not such a simple topic and terminology that's a bit more exact may be good, dunno. —Alalch E. 00:17, 28 January 2023 (UTC)
- It would seem an option more likely to achieve anything at all compared to carping and expecting others to make major but nebulous changes for you, so… yeah, kinda. — Trey Maturin™ 23:57, 27 January 2023 (UTC)
- Are you suggesting I take it on to myself to move the article to an alternate title and rewrite it substantially without prior agreement, while a significant number of other contributors are working on it? AndyTheGrump (talk) 23:17, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 22:00, 27 January 2023 (UTC)
- The problem goes deeper than just the template and the title. The entire proposal is full of the same jargon. It is unnecessary and counterproductive. AndyTheGrump (talk) 21:52, 27 January 2023 (UTC)
- I added an example for LLM ("ChatGPT"). Should we also includes "AIs" as an example? — rsjaffe 🗣️ 22:06, 27 January 2023 (UTC)
- I'm also suggesting a different icon that's more "chatty". See and comment at Template_talk:AI_generated#Different_Icon?. — rsjaffe 🗣️ 22:37, 27 January 2023 (UTC)
- {{sofixit}} — Trey Maturin™ 21:31, 27 January 2023 (UTC)
Wording
"The creator has words like "AI", "neural", "deep" and similar in their username" why "Deep"? Deep has many meanings outside of AI so I'm a bit confused as to why this might be a red flag to being AI generated. In fact, Deepfriedokra has "deep" in their username. Obviously I know this isn't "if their username contains X they are guaranteed to be using this" but I'm confused as to why "Deep" is mentioned here. ― Blaze WolfTalkBlaze Wolf#6545 21:07, 27 January 2023 (UTC)
- Because of "deep learning". But, you're probably right that it's too frequent so I'll remove it. —Alalch E. 21:39, 27 January 2023 (UTC)
Requested move 28 January 2023
- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
It was proposed in this section that Wikipedia:Large language models be renamed and moved to Wikipedia:AI-generated textual content.
result: Move logs: source title · target title
This is template {{subst:Requested move/end}} |
Wikipedia:Large language models → Wikipedia:AI-generated textual content – 'Large language models' is jargon that relatively few contributors will be familiar with. Policies and guidelines need simple descriptive names, and the exact algorithm used to create the material isn't relevant. AndyTheGrump (talk) 00:18, 28 January 2023 (UTC)
Survey
- Comment as proposer, per above. We also need to reduce unnecessary jargon from the draft body - the issue is the output not the algorithm, and we are doing ourselves no favours by implying otherwise. AndyTheGrump (talk) 00:18, 28 January 2023 (UTC)
- Meh I agree that LLM is over the heads of many people. However, AI is not a great term either, even though it's frequently used. "AI" is overselling what these things typically are. I'm not sure what the best title is. In any event, we'll need redirects from related concepts. — rsjaffe 🗣️ 00:22, 28 January 2023 (UTC)
overselling
is a bold term; large language models are indeed artificial intelligence models... EpicPupper (talk) 03:20, 31 January 2023 (UTC)
- Comment I thought about doing this but as an RFC with the alternate Wikipedia:Computer-assisted text generation proposed by Michael_D._Turnbull as an option along side the existing title. If you would be amenable, would you consider an RFC with multiple choices and closing this Requested move? I probably support either your title or Michael's but I'm not sure which is better at this point. I think either probably should include a discussion of autocorrection and autocomplete. —DIYeditor (talk) 00:23, 28 January 2023 (UTC)
- AndyTheGrump has pointed out that what I suggested is not an accepted use of RFCs. Even so, I think we need to attract a wider audience for input not just in this requested move, but for this emerging guideline on the whole, and I really hope we can get some experts in the relevant field(s). We have very few people providing input in this discussion right now and I think it is very important for Wikipedia to get it right.
- I'll repeat that I think this requested move was premature and will take considerably more discussion, rather than being offered a single choice of a new title. —DIYeditor (talk) 23:00, 30 January 2023 (UTC)
- Also Procedural comment the move has to be to Wikipedia:AI-generated textual content not AI-generated textual content. Again asking for a close of this and change to RFC. —DIYeditor (talk) 00:27, 28 January 2023 (UTC)
- Ok, I think everyone can see that 'Wikipedia:..' was intended. As for an RfC, see WP:RFCNOT, which explicitly states that they aren't used for page moves. AndyTheGrump (talk) 02:01, 28 January 2023 (UTC)
- Didn't know that about RFCs. So the closer of this should not use any automated closing tool (are they used?) and should assume what I agree is obvious, that the title given is not the intended target? If so can we just change the text here, if it makes no difference otherwise? —DIYeditor (talk) 02:10, 28 January 2023 (UTC)
- Ok, I think everyone can see that 'Wikipedia:..' was intended. As for an RfC, see WP:RFCNOT, which explicitly states that they aren't used for page moves. AndyTheGrump (talk) 02:01, 28 January 2023 (UTC)
- Oppose: While I think the current title is fine, I would not be opposed to moving it, if everyone thinks that it should be located somewhere else. However, I am opposed to using "AI", per what I said above: it is a buzzword of unclear meaning. I think it is important for something like the title of a policy to be accurate. Something like "machine-generated" or "computer-generated" may be more appropriate, but this raises other problems: this page doesn't just cover generation of text, but also modification and copyediting. Moreover, there is a lot of "machine-generated text" which poses zero problem (I've written more than one computer program to do stuff like convert a text file into a wikitable). Here, we are specifically talking about a particular type of neural network (and not "AI" in the broader sense). jp×g 00:29, 28 January 2023 (UTC)
- "Generation" arguably includes modification and copyediting doesn't it? —DIYeditor (talk) 00:32, 28 January 2023 (UTC)
- Also OCR or anything else like that, any automated task, potentially blends in with the same concepts. OCR software will increasingly use language models. I see this policy/guideline as an umbrella that would include OCR, LLM/ChatGPT as such, autocorrection, autocomplete, machine translation, etc., and be as forward looking and inclusive as possible. —DIYeditor (talk) 00:42, 28 January 2023 (UTC)
- How about just going AI Generated Content. Drop the “textual”. Ktin (talk) 01:50, 28 January 2023 (UTC)
- There are other modalities beyond text. I think a policy that covered all of the text models, plus all of the image and audio models (DALL-E, Midjourney, Stable Diffusion, Craiyon, Riffusion, etc) would be unmanageably enormous. jp×g 03:09, 28 January 2023 (UTC)
- In my opinion I think we should outright ban any use of AI generated visual or audio content unless the AI content in question is a subtopic of the article. So we should be able to use such content in the case that it's needed as an example of content produced by those models, or if a specific piece of AI content is a subject of controversy or something, but it should be disallowed for any user to, for example, ask such an AI to produce "a photograph of George Washington" and place such an image on the George Washington article. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:30, 28 January 2023 (UTC)
- @BlankpopsiclesilviaASHs4: I don't think this is appropriate or even feasible. Any current or future photo editing software may incorporate "AI". In fact, I think you will find current high-end smartphone cameras incorporate "AI" image processing - this is the prime use of "AI cores" and such on phones. What you are asking is impossible to enforce without subjective and arbitrary determinations that distinguish between fundamentally identical software. All these recently added photos of things on Wikipedia are actually not straight photographs but and AI's idea of what the photograph was supposed to show. Nothing is completely generated by AI, so where do we draw the line? We need to come up with best practices and standards that can be applied now and going forward without a blanket ban. —DIYeditor (talk) 22:55, 30 January 2023 (UTC)
- I don't mean a normal photo taken by an iPhone that automatically applied white balance or filters to make the photo look ostensibly better. I mean that we should not allow someone to just ask a computer to make up a photo and then submit that to Wikipedia as representative of the subject. This is why I chose George Washington as an example, it should be obvious that no photos exist of him since cameras didn't exist then, and if someone uploaded a fabricated photo made with an AI model, that would be an easy target for deletion. Although I acknowledge that not all examples would be so obvious as that, I still believe that any AI generated image that was created by an AI with no human involvement should not be included, and deleted if any such image is identified, just the same as we seem to have agreed that people aren't allowed to blindly paste a text output. This does not mean deleting photos that have been slightly retouched by a CPU, just the same as a piece of text with typos automatically corrected by a word processor is not the same as an AI making things up.
- And sure, such a policy might be sorta unenforceable, technically, cause someone might just upload a picture and not tell anyone that an AI model made it whole cloth without a human pointing a lens at a subject, but that doesn't mean we shouldn't still have the policy. You might as well say that the Deletion policy is unenforceable cause we haven't identified and deleted every single article that shouldn't be included, or that the hoax policy is unenforceable because we haven't made 100 percent certain that every bit of untrue information has been scrubbed from the encyclopedia. silvia (BlankpopsiclesilviaASHs4) (inquire within) 22:40, 4 February 2023 (UTC)
- What percentage "made by AI" is acceptable then? —DIYeditor (talk) 22:43, 4 February 2023 (UTC)
- I think this is a really semantic argument, because there's no practical difference between pointing a phone camera at a subject and pointing an SLR camera at that same subject. A human still had to frame the image, consider the context of the shooting environment and the position and condition of the subject, and decide that they are happy with the image before pressing the button. Whereas if you type "give me an image of this person or thing" into an AI model, you are forfeiting all of those decisions in favor of letting a computer make them for you, and potentially make mistakes in how it represents the subject.
- I would be particularly concerned if people started using these AI models to generate images of living persons of whom (good) freely available photos don't exist. I can imagine there are very few people who would be pleased to see a computer-fabricated photograph of themselves on their Wikipedia page, and that could become a serious issue if the computer decided to depict them in a manner that they found to misrepresent them. silvia (BlankpopsiclesilviaASHs4) (inquire within) 22:58, 4 February 2023 (UTC)
- We never know how much or in what way the AI processing of a phone image has affected it vs. the raw data or what a film camera would've taken. Then comes the issue of photo processing software. It will increasingly use AI, today and going forward. Who is ensuring that AI "enhancement" or "sharpening" done by someone uploading an image to Wikipedia is accurate? There is a continuum between "purely" AI generated (no such thing really, it always took human direction and input, and visual source material) and a "true" photograph. That's why I ask where you draw the line. I don't think this is a Ship of Theseus, Sorites paradox or continuum fallacy situation. If you use DALL-E to make an image it is not pulling it out of thin air, it's based on real images. What it is doing is not completely different from what an AI sharpening algorithm does. If we are going to ban DALL-E, why are we not also in the same stroke going to ban AI image sharpening? —DIYeditor (talk) 23:27, 4 February 2023 (UTC)
- What percentage "made by AI" is acceptable then? —DIYeditor (talk) 22:43, 4 February 2023 (UTC)
- @BlankpopsiclesilviaASHs4: I don't think this is appropriate or even feasible. Any current or future photo editing software may incorporate "AI". In fact, I think you will find current high-end smartphone cameras incorporate "AI" image processing - this is the prime use of "AI cores" and such on phones. What you are asking is impossible to enforce without subjective and arbitrary determinations that distinguish between fundamentally identical software. All these recently added photos of things on Wikipedia are actually not straight photographs but and AI's idea of what the photograph was supposed to show. Nothing is completely generated by AI, so where do we draw the line? We need to come up with best practices and standards that can be applied now and going forward without a blanket ban. —DIYeditor (talk) 22:55, 30 January 2023 (UTC)
- Hadn't seen this, but I now agree that AI-generated text and AI-generated images pose sufficiently different problems that they belong in separate policies. DFlhb (talk) 23:03, 30 January 2023 (UTC)
- In my opinion I think we should outright ban any use of AI generated visual or audio content unless the AI content in question is a subtopic of the article. So we should be able to use such content in the case that it's needed as an example of content produced by those models, or if a specific piece of AI content is a subject of controversy or something, but it should be disallowed for any user to, for example, ask such an AI to produce "a photograph of George Washington" and place such an image on the George Washington article. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:30, 28 January 2023 (UTC)
- There are other modalities beyond text. I think a policy that covered all of the text models, plus all of the image and audio models (DALL-E, Midjourney, Stable Diffusion, Craiyon, Riffusion, etc) would be unmanageably enormous. jp×g 03:09, 28 January 2023 (UTC)
- Move to Wikipedia:Computer-generated editing or something similar which addresses the above stated concerns regarding ambiguity. silvia (BlankpopsiclesilviaASHs4) (inquire within) 08:28, 28 January 2023 (UTC)
- Support AndyTheGrump's proposal, but with "textual" replaced with "text". I appreciate the concerns about the "hype" surrounding the word AI (as well as the desire not to feed any prejudice against hypothetical future "real" AI), but the primary audience for this is brand new editors, who think they're helping us by generating new articles with ChatGPT. They've never heard of large language models, and they won't know this page is about their behavior if we name it obscurely. I'll add that we should have a second page, detailing how to use LLMs properly (copyediting, formatting, tables, all subject to manual review), which I believe JPxG and others are working on. DFlhb (talk) 17:47, 28 January 2023 (UTC)
- What do you think about keeping this title more exact and making a derived new-user-facing page like WP:ENC, with AI in the tile? — Preceding unsigned comment added by Alalch E. (talk • contribs) 19:05, 28 January 2023 (UTC)
- That does address my concern about new-user discoverability.
- The main thing we still haven't discussed is this page's intended scope: it's written like a guideline, and it's primarily about using LLMs to generate text from scratch. AndyTheGrump's proposal is the most natural title for that. If it'll be a how-to, and be under the current title, then it should provide tips for all use-cases where it's possible to use LLMs productively (with supervision), and list effective prompts. People will use LLMs, and the better they know how to use them, the less damage they'll make.
- I'll note that we also need to address non-LLM AIs, like Meta's Side. So this page will end up as just one of a series of AI-related "WP:"-namespace pages: we'll have one on article generation, one on image generation, and at least one on using AI for article maintenance (like Side's successors). Having "AI" in the name of all of them, and avoiding referencing specific technologies by naming these pages: "AI-generated text", "AI-generated images", etc, would be both more consistent and more future-proof.
Or, instead of function-specific guidelines, we could minimize redundancy and have a single guideline for all "AI", and a range of text/image/maintenance AI how-tos.Food for thought? - (BTW, I hadn't noticed this was the page JPxG had created, which I had in mind in my above reply; I indeed thought he was going for a how-to). DFlhb (talk) 21:15, 28 January 2023 (UTC); struck combined-policy idea; these AI media types are different, and will evolve at different paces, so they belong in distinct policies. 23:05, 30 January 2023 (UTC)
- I removed the thing this morning; my idea (kind of a goofy one) was something like, prior to this being ratified as an actual guideline/policy, the page could serve as a kind of general summary of what people tend to think of as consensus and best practices. The idea is to open an RfC at some point, although the page is evolving quite rapidly at this point, so I think we might not be quite there yet. jp×g 23:51, 28 January 2023 (UTC)
- What do you think about keeping this title more exact and making a derived new-user-facing page like WP:ENC, with AI in the tile? — Preceding unsigned comment added by Alalch E. (talk • contribs) 19:05, 28 January 2023 (UTC)
- Don’t care. These types of proposals, inviting editors to debate in detail how many angels can dance on the head of a pin (16, for future reference), are designed to get everybody to stop doing anything at all about anything anywhere until the proposers are ready themselves to deal with the underlying issues. That might be in ten minutes, that might be in ten years. But until then we should all be distracted with something shiny and unimportant to give them headspace. Yeah, no. — Trey Maturin™ 20:35, 28 January 2023 (UTC)
- Oppose - "Textual content" is unnecessarily wordy. "Text" is more concise. "Chatbot" is the most common term being used for LLM-agents these days, so that would be the most familiar to editors reading the title of the policy. Therefore consider "Chatbot-generated text" for the title. — The Transhumanist 04:49, 30 January 2023 (UTC)
- Chatbot means something different, it's merely a related idea. A chatbot is an application of an LLM. A chatbot could also use some other paradigm. —DIYeditor (talk) 03:10, 31 January 2023 (UTC)
- Support Though I think the simpler Wikipedia:AI-generated content would be even better.--Pharos (talk) 18:07, 30 January 2023 (UTC)
- Oppose at this time because this requested move was premature and there are too many possible titles that have been suggested. !voting on this move or closing it are just a mess right now. Since we are not allowed to use an RFC for this, a discussion of multiple possible titles needs to occur before an requested move suggesting a single destination. If I misunderstand how a requested move should work please let me know.
- Support changing the title and most of the text. "AI generated text" would probably be the simplest title. "Large language models" has two problems - first too many people won't understand it, and second we don't want to exclude other techniques if and when they raise the exact same issue. While there has been some quibbling over the use of "AI", I think it is the simplest and most clear way to communicate the intended meaning to readers. Alsee (talk) 13:30, 31 January 2023 (UTC)
- Oppose. Per WP:PAG, policy and guideline pages should avoid dumbed-down language. If and when other things raise the same issue, we can broaden the title. Until then, this is specifically about LLMs. —Alalch E. 17:45, 1 February 2023 (UTC)
- Oppose: "AI" is an inaccurate buzzword. I would however support moving to Wikipedia:Computer-generated content or something clearer to non-expert readers than "LLM" — OwenBlacker (he/him; Talk; please {{ping}} me in replies) 23:17, 5 February 2023 (UTC)
Discussion
I don't think adequate discussion has taken place on this topic yet, nor was the prior suggestion of Wikipedia:Computer-assisted text generation properly considered - the discussion above at Wikipedia talk:Large language models#A better title is needed was not concluded.
I concur with others that "AI" may be a misleading and inaccurate label to use. I'll also repeat my belief that along with ChatGPT et al in and of themselves, OCR, machine translation, autocorrect and autocomplete all relate to similar ideas and should be treated under an umbrella guideline/policy, because these other things may utilize language models as well, and going forward will inevitably be linked. I understand that Wikipedia:Content translation tool already exists and I don't think we should supplant that, just craft something that incorporates a general principle. One may say, well, right now there is no problem with "autocomplete" but I think it has a direct relationship and needs to be understood for this topic. —DIYeditor (talk) 02:22, 28 January 2023 (UTC)
It should probably be noted that "large language model" is the actual term used to describe these models in current research as well as journalism (even Gizmodo and WIRED have picked up on this). The phrase is new because the technology is new, but I don't think that using a more familiar phrase is a good reason to falsely describe something (Skype is VoIP, not a telephone; F-15s are airplanes and not zeppelins). jp×g 10:45, 28 January 2023 (UTC)
- Sorry, but the suggestion that describing LLMs as 'artificial intelligence' is 'false' is complete BS. OpenAI (clue in the name?), the developers of ChatGPT, describe it as such themselves. [11] AndyTheGrump (talk) 12:16, 28 January 2023 (UTC)
- I don't really trust the developers of this technology, who have a financial stake in the growth of its use, to describe it accurately. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:38, 28 January 2023 (UTC)
- I would in fact trust them to describe it as inaccurately as they can get away with. That's just how marketing works. XOR'easter (talk) 14:26, 28 January 2023 (UTC)
- If these are not AI, what would true AI be? Artificial intelligence != human intelligence. —DIYeditor (talk) 17:44, 28 January 2023 (UTC)
- I would in fact trust them to describe it as inaccurately as they can get away with. That's just how marketing works. XOR'easter (talk) 14:26, 28 January 2023 (UTC)
- They describe it as "AI" in marketing copy, but their papers are about transformer-based models; like I said, though, companies generally do not describe their products in NPOV terms. We describe cars as "midsize sedans" and not "sporty-yet-practical solutions", large-scale employee terminations as "layoffs" and not "right-sizing", etc. It remains to be seen what "AI" actually means, because the definition of the term changes wildly when advances are made in the field. jp×g 23:51, 28 January 2023 (UTC)
- Chat-GPT is an automated assistant, and it composes text in an automated fashion. Maybe call the policy... "Automated composition generators"? "Automated text generators"? "Text automation"? "Automated writers"? — The Transhumanist 05:03, 30 January 2023 (UTC)
- I don't really trust the developers of this technology, who have a financial stake in the growth of its use, to describe it accurately. silvia (BlankpopsiclesilviaASHs4) (inquire within) 12:38, 28 January 2023 (UTC)
My initial thought is inline with DIYeditor's comment: I think the policy ought to deal with machine-generated text more broadly. Guidance for specific subcategories can then be broken out separately (perhaps to separate pages). isaacl (talk) 17:34, 28 January 2023 (UTC)
Yup - keep it simple to “AI Generated Content”. I understand that there are advances in AI that has audio and video coming into play sooner if not already. Nevertheless starting with “content” as an aggregate and then breaking into any specific nuances for audio and video as you progress. Ktin (talk) 17:43, 28 January 2023 (UTC)
- How about "Computer Generated Content", which is a basic description for everything described in this discussion, including autocorrect. We could then have redirects from "AI Generated Content", "LLM generated...", "ChatGPT generated..." etc. The terminology is a dog's breakfast currently, but that doesn't mean we have to overtly endorse misleading terms like "AI", which is part of the reason we're here: the overselling of the intelligence of current content generators. — rsjaffe 🗣️ 23:55, 5 February 2023 (UTC)
- I like this, but would it include graphic/photographic (and audio) content as well per my discussion with BlankpopsiclesilviaASHs4 in the survey section above? This proposal has gone into considerable detail about LLMs in particular and it would grow to quite some length to include things like DALL-E and AI image sharpening, image restoration and so on.
- Maybe Wikipedia:Computer generated content would work as an new umbrella guideline for all computer generated content, briefly giving some principles that would cover machine translation, LLM, autocomplete/correct/suggest/etc., AI image enhancement, and so on, pretty much just saying that the human editor is completely responsible for any content they submit and ensuring it is accurate and that any claims made about it (in the case of ancillary material) are accurate. I can see that this rabbit hole may be deeper than it first appeared to me but I think we do need to confront the broader issues, not just what relates to each of them individually. Then a page like WP:LLM could serve as a specific and detailed reference, if necessary, being updated as necessary to reflect what the umbrella article says. —DIYeditor (talk) 05:58, 6 February 2023 (UTC)
- @Rsjaffe: I've gone ahead with a WP:BOLD draft of an umbrella policy for all Wikipedia:Computer generated content. —DIYeditor (talk) 07:07, 6 February 2023 (UTC)
List of tells?
It's certainly useful to have a list of "tells" for people who are patrolling new pages and draftspace. However, I feel like we might just be providing... basically a detailed how-to guide for putting ChatGPT barf into Wikipedia without detection (per WP:BEANS). It's also occurred to me that we might also want to be sparse on specific details of the GPT output detectors.
Also, the practice of detecting LLM output is a pretty broad subject area that we could write a lot about, so it might do better as its own separate page anyway. What do you all think of this? Pinging @Alalch E.: jp×g 00:34, 28 January 2023 (UTC)
- I was having the same thoughts for the last few hours. Almost all of it is what we've already seen (the "AI" username guy for example), but you are correct that there is a bit of WP:BEANS there. Maybe remove entirely, or leave as invisible text, for the time being. Separate page might be good too. Please do you what you think is the best. —Alalch E. 00:43, 28 January 2023 (UTC)
- WP:BEANS, so hide it, but the info is useful to some reviewers, e.g., New Page Patrollers. — rsjaffe 🗣️ 00:53, 28 January 2023 (UTC)
- Hmm. This isn't quite the same as COI/UPE tells or sockpuppet tells. I think this is probably more like Google Translate tells or something. I don't think it'd be particularly dangerous to list tells like "ChatGPT tends to use fictitious references". If some evil-doer read that, and wants to go to the effort to change all the citations to real journal articles instead of fake journal articles... the chances of that seems a bit low to me. –Novem Linguae (talk) 03:30, 28 January 2023 (UTC)
- That's a hilariously bizarre claim, in my opinion. If they edit the text to get rid of the "tells", ie the bad AI parts, that's good. Then they've accomplished exactly what is wanted from us, the people using the text creators to properly change and use the text in an appropriate and well written manner with proper sourcing. We should give as many "tells" as possible to get more people to fix their submissions. SilverserenC 03:38, 28 January 2023 (UTC)
- Well, I will refer to a preserved copy here, User:JPxG/LLM_dungeon/Factors_affect_brain_activity. In this case, all the references were fake, there were no inline citations, and the body text was random unsourced jibber-jabber. I don't think that removing the "Conclusion" paragraph would have made this article usable, it would have just made it harder to detect. I also think that linking to the detector demo in the beginning of that section would cause us some pain as well: they aren't that robust, and even minor rephrasing can defeat them completely. Note that some of these detectors, like https://contentatscale.ai/ai-content-detector/ are explicitly designed for the purpose of pasting LLM content in and giving you advice on which words to change around to make it less detectable. The sidebar speaks for itself:
- Want undetectable AI content?
- Our platform is the only one of it's kind that allows you to upload up to 100 keywords and get back 100 entire human quality blog posts (title to conclusion) without any human intervention. All the while, bypassing AI detection as it's the most human-like AI content ever produced.
- Our proprietary system uses a mix of 3 AI engines, NLP and semantic analysis algorithms, crawls Google, and parses all the top ranking content to put it all together.
- This isn't an AI writing assistant, this is a human level long-form blog post producing machine!
- Much to think about... jp×g 07:39, 28 January 2023 (UTC)
- Moved to the cleanup template's documentation. —Alalch E. 17:41, 28 January 2023 (UTC)
- Well, I will refer to a preserved copy here, User:JPxG/LLM_dungeon/Factors_affect_brain_activity. In this case, all the references were fake, there were no inline citations, and the body text was random unsourced jibber-jabber. I don't think that removing the "Conclusion" paragraph would have made this article usable, it would have just made it harder to detect. I also think that linking to the detector demo in the beginning of that section would cause us some pain as well: they aren't that robust, and even minor rephrasing can defeat them completely. Note that some of these detectors, like https://contentatscale.ai/ai-content-detector/ are explicitly designed for the purpose of pasting LLM content in and giving you advice on which words to change around to make it less detectable. The sidebar speaks for itself:
- One tell I've noticed, though it's unlikely to be there forever, is the use of "In conclusion,..." which ChatPT insists in appending to all of the Wikipedia articles it tries to write, although actual Wikipedia articles never use this phrase.--Pharos (talk) 18:22, 30 January 2023 (UTC)
- Yeah that's one of the things that was removed from the page. —Alalch E. 21:27, 30 January 2023 (UTC)
Reliability of sources that use LLMs
While this proposal primarily covers editors using LLMs to write article content, one thing we may have not yet considered (not as far as I've observed anyway) is the possibility of editors using citations from sources producing articles generated by LLMs. We may already have a policy about this (I'm not sure where, if we do have one), but should there also be some sort of an acknowledgement of this facet of the issue here? And if an information source is known to be using LLMs to generate content, should we consider them no longer a reliable source? I suppose that'd depend on if the source(s) in question signpost such articles, and how much we trust them to do so accurately. In any case it's something possibly worth thinking about here. silvia (BlankpopsiclesilviaASHs4) (inquire within) 17:11, 28 January 2023 (UTC)
- It depends on how they're using them, as you said. If they're just being used for structural writing and there is still an actual author involved, then there's no issue. If they're using LLMs to write entire articles, then there's a problem. If the source is properly marking those whole articles as being LLM written in some way, then that's fine, we'll just note (such as in WP:RSP) that those particular article series aren't reliable, but the rest of the articles the source puts out are fine. The main problem is if a source is using LLMs to make entire articles and is giving no indication at all which articles those are, with the worst case scenario being if they put actual author names in the bylines, yet the authors weren't actually involved in writing said text. SilverserenC 17:15, 28 January 2023 (UTC)