Wikipedia talk:Wikipedia Signpost/2023-02-20/Essay

Discuss this story

Latest comment: 1 year ago21 comments13 people in discussion

I've found that ChatGPT can be good at writing articles. I asked it to write an article on The Crew Motorfest (out of curiosity, not to actually write the article as I had already been written and published) and the article came out pretty decent with only 1 inaccuracy I found. I asked it source it and it came up with BS sources however so it isn't perfect. ― Blaze Wolf^Talk_{Blaze Wolf#6545} 18:15, 20 February 2023 (UTC)Reply
Also, mind posting a link to the AI detector you used? ― Blaze Wolf^Talk_{Blaze Wolf#6545} 18:16, 20 February 2023 (UTC)Reply

The links are in the "Finding More" section, the primary one is https://openai-openai-detector.hf.space/ ☆ Bri (talk) 18:21, 20 February 2023 (UTC)Reply

Thanks! I didn't see a finding more section but I Could just be dumb. ― Blaze Wolf^Talk_{Blaze Wolf#6545} 18:23, 20 February 2023 (UTC)Reply

ChatGPT creates plausible-sounding bullshit. In cases where it has a lot of very similar sources to draw from, such as mostly-empty space-filler articles about an upcoming racing video game (for which it would have about a thousand examples) it can generate something low on nonsense. For something more unique, the bullshit quota is higher. In all cases, though, you can't tell what's bullshit without checking it line by line, because it's all plausible-sounding. Similarly, the sources will always be nonsense, because it isn't generating text based on specific sources, it's generating plausible-sounding reference text bullshit, with no connection to anything. --Pres N 19:29, 20 February 2023 (UTC)Reply

Yes I'm not trying to argue that we should be using ChatGPT (because frankly no one should), simply that it isn't 100% bad all of the time. ― Blaze Wolf^Talk_{Blaze Wolf#6545} 19:31, 20 February 2023 (UTC)Reply

IN fact I have encountered situations where it likes to hallucinate (I asked it a few things regarding Splatoon and it kept thinking the special gauge was the amount of ink the weapon had which is not true whatsoever) no matter what I tell it. ― Blaze Wolf^Talk_{Blaze Wolf#6545} 19:33, 20 February 2023 (UTC)Reply

One of the data sources for ChatGPT is Wikipedia, so if you ask it to write about something already in Wikipedia, there’s a likelihood that it will select correct information for its output. — rsjaffe 🗣️ 22:24, 20 February 2023 (UTC)Reply

WP:Randy in Boise can also make good contributions most of the time, but the few times he's wrong still make him a net negative. AI seems to be a long way from getting past this level of ability. Daß Wölf 20:24, 24 February 2023 (UTC)Reply

I test for articles typically using https://openai-openai-detector.hf.space/ - this and various other currently available "ChatGPT detectors" (including OpenAi's own) are highly unreliable. https://openai-openai-detector.hf.space/ actually already says on the tin that it is a detector for GPT-2 (released in 2019 and very different from ChatGPT). Given the article's focus on the dangers of misinformation, it's a bit sad and ironic that the Signpost is itself providing such dubious recommendations here without any caveats.

Regards, HaeB (talk) 11:12, 21 February 2023 (UTC)Reply

The article glosses over a lot of the issues regarding detection. It was just a brief intro. I emphasized in the article that I was using a very insensitive method of finding LLM-generated text. There were a couple of reasons I went about things as described there (and to note: I no longer rely solely on GPT-2 detector). 1) at the time I started, other detectors available were very opaque as to how they were constructed; 2) the nature of the output, even though the models are different, has many similar characteristics, so a GPT-2 detector would have some sensitivity and specificity; 3) I intentionally minimized false positives as those irritate article contributors, by doing a vigorous pre-screen of the text. As to point two, note that at least one of the recommended detectors (https://gptzero.me/) is not based on the GPT model, but rather on the text output characteristics. As to point three, I used the authors' feedback as an indicator of the false positive rate: getting no complaints after a lot of tags is a decent indicator that the false positive rate is low. — rsjaffe 🗣️ 18:53, 21 February 2023 (UTC)Reply

Good to hear that you are proceeding diligently when patrolling new articles (and to be clear, this is very important work and it's good to call attention to this issue). But the part with the tool recommendations was not including any caveats about false positives, and should not have been published in this form.

the nature of the output, even though the models are different, has many similar characteristics, so a GPT-2 detector would have some sensitivity and specificity - what research is this claim based on? (I mean, of course any detection method has "some sensitivity and specificity", the question is whether they are good enough.)

is not based on the GPT model, but rather on the text output characteristics - it seems that there is some fundamental confusion here between the model that is doing the detection and the model whose output is being detected (and/or the features of its output). https://openai-openai-detector.hf.space/ is also not using "the GPT model" (there are many actually) to detect GPT-2 output, but RoBERTa instead.

Regards, HaeB (talk) 06:57, 24 February 2023 (UTC)Reply

I'm not convinced that this is a problem that isn't already addressed by the vast number of policies and guidelines on this site. If artificial intelligence ever becomes capable of generating Wikipedia articles that are verifiable, written in a neutral point of view and devoid of original research, then I'm all for it. Until then, the usual system of separating cruft from quality will continue. It's possible that garbage will be generated faster than ever, but that seems like a technical issue rather than a policy one.~T P W 19:46, 21 February 2023 (UTC)Reply
Read the proposed policy. I'd look at the LLM policy more as an explanation as to how text generation fits into current policies rather than setting new precedent. The problem is that most people do not understand the policy issues raised by LLMs. The proposed policy explains them. — rsjaffe 🗣️ 20:29, 21 February 2023 (UTC)Reply
If AI is integrated into Wikipedia to write articles and whatnot, we should have some sort of Pending Changes Protection when AI is used so we could double-check the accuracy of the article(s). ‍ ‍ Helloheart ‍ ‍ 00:49, 22 February 2023 (UTC)Reply
If robots can reliably write articles of ordinary WP quality, then there's no need for WP. People who want to know something can just ask the robot and get an answer tailored to the asker's known preferences and knowledge. Jim.henderson (talk) 20:46, 22 February 2023 (UTC)Reply
Wouldn't that make for a bubble, though? Here's an article to prove whatever you already believe! Adam Cuerden ^(talk)_{Has about 8.2% of all FPs. Currently celebrating his 600^th FP!} 03:14, 23 February 2023 (UTC)Reply

How do I know I'm smart? My computer friend always tells me I'm right, that's how. Tyrants have suffered bad advice from ego-stroking yes-men forever; now everyone can be Ethylred the Unready.Jim.henderson (talk) 02:49, 26 February 2023 (UTC)Reply
Hallucination is a major problem with these models, anyone not verifying each and everything they are using AI tools for should be immediately sanctioned. There was an interesting discussion to that effect on Villagepump I believe, even before these tools came into vogue. Gotitbro (talk) 13:10, 1 March 2023 (UTC)Reply
Good and timely article. I also tried out ChatGPT to see what it can do. It's documented on this page. My overall conclusion was that "it appears that us Wikipedia volunteers aren't out of a job just yet". Schwede 66 22:23, 7 March 2023 (UTC)Reply
After reading another article about how ChatGPT is coming for all our jobs, I signed up and asked it (3.5 I presume) "how many neutrons are in a liter of water" (yes, it insisted on that spelling of litre). It wrote out a four or five paragraph reply explaining exactly how it arrived at the figure of 556 neutrons. Typing the same into the integrated engine in Bing resulted in the claim that a litre of water has no neutrons. 4.0 did somewhat better, off by only two orders of magnitude. So I think the bit about "AI-generated text is not reliably correct" needs to be bolded. Maury Markowitz (talk) 20:34, 18 March 2023 (UTC)Reply

Add topic