Wikipedia talk:WikiProject AI Cleanup

This is the talk page for discussing WikiProject AI Cleanup and anything related to its purposes and tasks.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1: 90 days

To help centralise discussions and keep related topics together, all non-archive subpages of this talk page redirect here.

This page has been mentioned by multiple media organizations:

Maiberg, Emanuel (9 October 2024). "The Editors Protecting Wikipedia from AI Hoaxes". 404 Media. Retrieved 9 October 2024.
Nine, Adrianna (9 October 2024). "People Are Stuffing Wikipedia with AI-Generated Garbage". ExtremeTech. Retrieved 10 October 2024.
Harrison Dupré, Maggie (10 October 2024). "Wikipedia Declares War on AI Slop". The Byte. Retrieved 10 October 2024.

I wanted to share a helpful tip for spotting AI generated articles on Wikipedia

Latest comment: 1 month ago2 comments2 people in discussion

If you look up several buzzwords associated with ChatGPT and limit the results to Wikipedia, it will bring up articles with AI-generated text. For example I looked up "vibrant" "unique" "tapestry" "dynamic" site:en.wikipedia.org and I found some (mostly) low-effort articles. I'm actually surprised most of these are articles about cultures (see Culture of Indonesia, Culture of Qatar, or Culture of Indonesia). 95.18.76.205 (talk) 01:54, 2 September 2024 (UTC)Reply

Thanks! That matches with Wikipedia:WikiProject AI Cleanup/AI Catchphrases, feel free to add any new buzzwords you find! Chaotic Enby (talk · contribs) 02:00, 2 September 2024 (UTC)Reply

A new WMF thing

Latest comment: 14 days ago1 comment1 person in discussion

Y'all might be interested in m:Future Audiences/Experiment:Add a Fact. Charlotte (Queen of Hearts • talk) 21:46, 26 September 2024 (UTC)Reply

Is it possible to specifically tell LLM-written text from encyclopedically written articles?

Latest comment: 11 hours ago4 comments3 people in discussion

The WikiProject page says "Automatic AI detectors like GPTZero are unreliable and should not be used." However, those detectors are full of false positives because LLM-written text stylistically overlap with human-written text. But Wikipedia doesn't seek to cover all breadth of human writing, only a very narrow strand (encyclopedic writing) that is very far from natural conservation. Is it possible to specifically train a model on (high-quality) Wikipedia text vs. average LLM output? Any false positive would likely be unencyclopedic and needing to be fixed regardless. MatriceJacobine (talk) 13:29, 10 October 2024 (UTC)Reply

That would definitely be a possibility, as the two output styles are stylistically different enough to be reliably distinguished most of the time. If we can make a good corpus of both (from output of the most common LLMs on Wikipedia-related prompts on one side, and Wikipedia articles on the other), which should definitely be feasible, we could indeed train such a detector. I'd be more than happy to help work on this! Chaotic Enby (talk · contribs) 14:50, 10 October 2024 (UTC)Reply

That is entirely possible, a corpus of both "Genuine" Articles and articles generated by LLMs would be better though, as the writing style of for example ChatGPT can still vary depending on prompting. Someone should collect/archive articles found to be certainly generated by Language Models and open-source it so the community can contribute. 92.105.144.184 (talk) 15:10, 10 October 2024 (UTC)Reply

We do have Wikipedia:WikiProject AI Cleanup/List of uses of ChatGPT at Wikipedia and User:JPxG/LLM dungeon which could serve as a baseline, although it is still quite small for a corpus. A way to scale it would be to find the kind of prompts being used and use variations of them to generate more samples. Chaotic Enby (talk · contribs) 15:24, 10 October 2024 (UTC)Reply

Add topic