This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
| |||||
For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its "Extended Thinking" version) to find an error in "Today's featured article". In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.
This experiment was inspired by an October 2 Twitter thread by Noam Brown (a research scientist at OpenAI):[remarks 1]
My new hobby is asking GPT-5 Thinking to find errors in every @Wikipedia page I read. Interestingly, almost every page I checked has at least one error.
This caught my interest as a Wikipedian with over 50,000 edits since 2003 (a large part of which has consisted in fixing factual errors and other problems, rather than adding new information). I decided to test whether Brown's bold "almost every page" observation would hold up for the most difficult dataset I could think of: Featured articles are considered among Wikipedia very best. They have been awarded the community's highest quality rating, after a review for accuracy, neutrality, completeness, and style. Currently only 0.1% of articles have achieved this stage. Every day, one of them is selected to be showcased on Wikipedia's main page as "Today's featured article", leading to additional scrutiny.
I used the exact same prompt as in one of Brown's tweets:
Find at least one error in this Wikipedia page: [link to article]
I ran this query towards the end of each TFA's time on the main page, or shortly afterwards. I verified all the alleged errors myself to the best of my ability (often spending considerable time to first familiarize myself enough with the topic and the relevant sources), and provide my verdict on each below. Whenever I agreed that something was an error, I fixed it in the article (with a few exceptions, explained below), without taking up ChatGPT's suggestion to provide the correction itself (or, in the few cases where it already suggested a corrected wording, e.g. for the Oct 31 article, without using that suggestion).
Detailed results are available in the table below. To summarize:
Subjectively, I found a few of the
Implemented ones (see e.g. Oct 14, Oct 26) to be a bit nitpicky, even if valid. But otherwise I considered the vast majority of them serious and important to correct.
The idea to use LLMs to fact-check Wikipedia articles is not new. A few examples of related efforts:
a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims(Signpost coverage). However, it never made it beyond the proof of concept stage.
great promise as an aid to editors. Another editor used it to correct a 19 year old error on Wikipedia, and I myself used it correct one that had been present on Wikipedia for 20 years.)
A systematic error analysis (i.e. identifying causes for either the mistakes ChatGPT identified in the Wikipedia articles or in ChatGPT's mistaken claims about such mistakes) wasn't the objective here. But I'll offer a few casual, tentative observations.
Various mistaken error reports by ChatGPT appear to have had to do with:
Further insights might be obtained by inspecting the "Thinking" traces and the full list of sources consulted that are included in each ChatGPT answer (see e.g. this example for October 24, one of the cases where its correction suggestion went awry - click "Thought" for the former and "Sources" for the latter). The table below only includes the sources that ChatGPT included as citations in its final answer.
Apparent causes for some of the identified Wikipedia errors:
| "Find at least one error in this Wikipedia page:" | Response by ChatGPT 5 Extended thinking[remarks 2] | My verdict | Error analysis i.e. observations on why either Wikipedia or ChatGPT might have erred here (incomplete and somewhat speculative) |
|---|---|---|---|
Here are two clear errors on that page:
Authoritative sources list the tower as 67 floors (CTBUH) or 66 stories (NYC City Planning/LPC). Even the article’s own lead calls it “a 67-story… skyscraper,” so the “70” in the infobox is inconsistent with both the text and external references.
It currently reads: “$7 million (equivalent to about $ million in )”—the conversion template is empty. The underlying figure is '$7 million', per the LPC designation report, but the inflation-adjusted value and year are missing. (Also likely off: the infobox lists 24 elevators, while CTBUH lists 18 elevator cars—worth double-checking and aligning the definition used.) |
|
Error #1 (floor count) was the result of an apparent vandalism edit from 2020. I also reverted an edit by the same IP in a different article, where it had likewise persisted for over five years. | |
Here’s one clear error:
(If you want a second fix: the article says the Llano de Chajnantor Observatory is “less than 20 km south” of Licancabur; using published coordinates for Licancabur (22.83°S, 67.88°W) and the ALMA/APEX site on the Chajnantor plateau (≈23°00′21″S, 67°45′33″W) puts it roughly 25 km away, not <20 km.) Smithsonian Global Volcanism Programapex-telescope.org |
Re "one clear error": Re "a second fix": |
Re "one clear error": The erroneous (and conceivably politically sensitive) claim that "Licancabur is located in Bolivia" formed the alt text for the map in the infobox. It did not appear in the article's source wikitext, but rather seems to have been auto-generated via the infobox template and country map used in the article - a setup that makes sense for increasing accessibility without adding extra work for editors, but in this case was detrimental to the article's accuracy.
| |
| Terraria |
Here’s a clear one:
(If you want a second fix: the infobox credits Whitney Spinks as Producer, but official credits list her as Executive Producer.)mobygames.com |
"Wrong citations for the sales figure.": Re "a second fix": |
|
Here are two fixable issues I spotted on the current page:
If you want, I can propose exact replacement text for those lines. |
1. "People “Sexiest Man Alive” ranking": After reviewing relevant People articles available online ([6][7]) I believe ChatGPT is likely correct that the magazine did not assign an official "number two". But on the other hand, the HuffPo article cited in the Wikipedia article clearly supports the statement, so appears to be a WP:VNT case where a generally reliable source contains an error. 2. “Alaskan Nets” phrasing |
||
Here are (at least) two fixable errors on that page:
(Optional cleanup) The singles list shows “Excuse Me Mr.” as released June 21, 1997, which was the commercial Japan release; the song was serviced to U.S. radio earlier, on August 21, 1996. If the section intends first release dates, that line should reflect the 1996 U.S. radio release (and/or clarify territories). |
1. 2. Re "(Optional cleanup)": |
||
| Georg Karo permalink |
Here’s one clear factual error on that page:
(Another fix you could make while you’re there: the honours line calls his German award a “Knight Commander’s Cross.” The official notice in the Bundesanzeiger shows he received the Großes Verdienstkreuz mit Stern in July 1952, i.e., “Grand Cross of Merit with Star,” not a “Knight Commander’s Cross.” |
"one clear factual error" "Another fix": |
|
|
2021 World Figure Skating Championships |
Here’s one clear inaccuracy:
Fix: change that clause to something like: “with their free skate to a Queen medley (‘Bohemian Rhapsody’ / ‘We Are the Champions’).” OlympicsPopsugar |
||
| Here are two concrete issues I found (as of today, Oct 13, 2025):
In the Location section it says the tell is “less than 1 mi inland.” Using the site’s published coordinates (31.475817, 34.404631) and the mouth of Wadi Gaza on the coast (~31.46389, 34.37583), the straight-line distance to the shoreline is roughly 2–3 km (≈1.2–1.9 mi), i.e., more than a mile. Pleiades+1 The infobox lists Ayman Hassouna under “Archaeologists,” which reads as if he excavated the site in 1999–2000. The published excavation reports name Pierre de Miroschedji and Moain Sadeq as the excavation directors for the 1999 salvage and the 2000 season; Hassouna’s work at Tell es-Sakan was much later, as field director for GAZAMAP survey/monitoring in 2022–2023, not part of the original excavations. So the infobox is misleading as written. The BAS Library+2 Want me to draft a crisp correction note you can paste on the article’s talk page? |
Location: The Wikipedia article's "less than 1 mi" appears correct, based on a look at the map. Infobox: |
Location: ChatGPT clearly picked a wrong reference point for calculating the distance to the coast. | |
|
Fourpence (British coin) |
Error: In the section “Reaction, later issues and colonial striking,” the article says the silver threepence “was the same diameter as the groat, and the two coins could not co-exist.”
Why this is wrong: The threepence began being issued for general circulation in 1845, and groats continued to be struck for circulation through 1855—so they did co-exist for roughly a decade (even though the size clash caused confusion and helped doom the groat). royalmintmuseum.org.ukMuseums Victoria Collections Minimal fix: Change to something like: “The threepence, introduced for general circulation in 1845, was the same diameter as the groat; although both circulated concurrently until 1855, the overlap led to confusion and the groat’s discontinuation.” royalmintmuseum.org.ukMuseums Victoria Collections |
A bit nitpicky, not to say approaching the borderline between factual error and mere awkward phrasing. But ChatGPT is correct that the statement is factually wrong if taken in a concrete sense. (NB: The source it provided for its suggested "minimal fix" did not support the modified claim.) |
|
| Jozo Tomasevich permalink |
Here’s a clear one:
(Extra fix you might consider)
|
* “Early life…” section : The cited source says that while the peninsula "had no roads for vehicular traffic" until 1946, "During the French occupation at the beginning of the nineteenth century, some primitive roads were built, but these could accommodate pack animals only."
|
|
| Error: The lead says: “Through Corymore Productions … Lansbury assumed ownership of the series [Murder, She Wrote] …” — that’s incorrect.
Why it’s wrong: Murder, She Wrote was produced and is owned/distributed by Universal/NBCUniversal, not by Lansbury personally. Corymore Productions (Lansbury & Peter Shaw’s company) co-produced the show in its later seasons, and Lansbury served as executive producer—but that’s not ownership. What to change: Replace “assumed ownership of the series” with something like:Sources: Universal’s control of the property is shown by (a) Universal Pictures developing the Murder, She Wrote feature film, and (b) NBCUniversal’s distribution/branding of the series and channels: Deadline reporting Universal’s film project (Dec. 16, 2024) Deadline; NBCUniversal/Universal marketing the complete-series release and FAST channels for Murder, She Wrote (Universal Pictures Home Entertainment page; Deadline on NBCU–Roku channels) UpheDeadline. The series has long been produced and distributed by Universal Television (see history/coverage) csmonitor.com. |
The "assumed ownership" had been added in 2008 by an IP editor in a wording that suggests they had something like creative control by Lansbury in mind, rather than copyright and/or trademark ownership by her and her husband's company. | ||
Here are two fixable issues:
If you’d like, I can draft the exact wikitext changes next. |
Reception: Uilleann pipes: |
||
| Ulfcytel permalink |
Here’s a clean factual slip you can fix:
If you’d like, I can scan for more. |
This error was caused by a somewhat unintuitive parameter in the convert template, which also means it is not easy to spot for editors in the source wikitext (snapshot of the rendered page before the correction: [23]). | |
Here are two concrete errors you can fix:
If you’d like, I can propose specific wording changes for those spots. |
1.: 2.: |
||
|
Illinois Public Access Opinion 16-006 |
Here are two concrete errors on that page:
If you’d like, I can suggest precise wording fixes for those spots. |
1. 'Writer: ChatGPT's description is correct, but a footnote in the article already clarifies that "The opinion was signed on Madigan's behalf by Michael J. Luke, counsel to the Attorney General." 2. formatting: |
2. ChatGPT's erroneous complaint about a non-missing hyphen might possibly be due a parsing error on OpenAI's side, or a "r's in strawberry" type LLM issue. |
|
Manchester United F.C. 9–0 Ipswich Town F.C. |
Here’s a clean, verifiable one:
Geography error (and inconsistency within the article): The lead says the match was “played at Old Trafford, Manchester,” and the page header calls it “a 1995 football match in Manchester, England.” But Old Trafford stadium is not in the City of Manchester—it’s in the Old Trafford area of Stretford, in the Metropolitan Borough of Trafford, Greater Manchester. The infobox partly acknowledges this (“City: Trafford, Greater Manchester”), but even that label is wrong because Trafford isn’t a city; it’s a metropolitan borough. The lead and infobox should read “Old Trafford, Trafford, Greater Manchester” (or “Old Trafford, Stretford (Trafford), Greater Manchester”). WikipediaWikipediaWikipedia Suggested fixes:
|
(although not using ChatGPT's exact suggestion, e.g. because this infobox only supports a "city" field. As an aside, I'm not sure whether ChatGPT's additional explanation that the stadium is in the Old Trafford area of Stretford, in the Metropolitan Borough of Traffordis correct regarding Stretford. But in any case its main point is correct that the stadium is not located in the city of Manchester.) |
|
| Here’s a clean one:
ADG award year is wrong. The article says the episode won at the “2023 Art Directors Guild Awards.” The win was at the 2024 (28th) ADG Awards, in Half-Hour Single-Camera Series for “Reservation Dogs: ‘Deer Lady’,” credited to production designer Brandon Tonner-Connolly. See the ADG’s official winners page and the winners announcement. Art Directors GuildADG Deadline (Optional minor nit: several listings put the runtime at 29 minutes rather than 30—e.g., TV Guide—though stream runtimes can vary slightly.) TVGuide.com |
ADG award year: ChatGPT is correct that using 2023 (the year where the awarded work was relased) instead of 2024 (the year when the awarding took place) is inconsistent both with the ADG's own website and with third party reliable sources (e.g. the source cited in the Wikipedia article for this very statement called it "the 2024 ADG Awards"). That said, it's the approach long taken by English Wikipedia (example), i.e. not an issue with this article per se, and arguably also a bit of a judgment call. I would support correcting it, but it would need a wider discussion and changes in a considerable number of articles, going beyond the scopes of this project. runtime: |
||
Here are two clear issues on that page:
(Any one of the above suffices as an error; I included both a factual date mistake and stale record claims.) |
Re 1. #Date error: I would agree with ChatGPT that the year when such a scientific discovery was published in peer-reviewed form (here: 1998) seems more appropriate to highlight in such a case than the year when the work that resulted in that discovery began. But the Wikipedia sentence is correct as written and supported by one of the cited sources ("one of the two research teams, the Supernova Cosmology Project, initiated a decade earlier in 1988"). Re 2. #Out-of-date records: |
||
|
Red-billed quelea |
Here are clear, source-checkable errors on that page and how to fix them:
If you’d like, I can draft exact replacement sentences for those lines so you can paste them into the article. |
Re Réunion introduction year: ChatGPT is definitely wrong in asserting that
Re misspelled plant names: |
|
Here are (at least) two concrete errors on that page:
Optional clean-up: the infobox link labeled “Original Text” points to a Wikisource page that itself says the text is taken from Davies (1856), An English Chronicle…—a chronicle extract, not the enrolled statute/roll. If the label is meant literally, it should point to the Parliament Rolls (PROME/Rotuli Parliamentorum) or be relabeled to reflect the chronicle source. Wikisource How to fix:
If you want, I can draft the exact wikitext changes next. |
Re Wrong “Commencement” date in the infobox: Re Internal date inconsistency (24 vs 25/31 October) + a miscount: Re Optional clean-up: |
||
Here’s one clear, sourced error on the page:
If you want a second one to fix:
(Minor precision note you may also consider): the grid note says the title changed to “Magic Carpet” in January 1933; the masthead/title used in sources is “The Magic Carpet Magazine” (often even “Oriental Stories Combined With the Magic Carpet Magazine” on that first issue). onlinebooks.library.upenn.eduISFDB |
"Minor precision note": |
Bibliographic table: Apparently a parsing issue on OpenAI's side
| |
| One clear error: the article repeatedly misspells the Quietus critic Amy Pettifer as “Pettifier.” Her surname is Pettifer (one “f,” no extra “i”), as shown on her author page and on The Quietus’ review of 1989. The QuietusThe Quietus
If you’d like, I can draft a tidy edit note and correction for the page. |
|||
Illustration of Mathos by Poirson, 1890 Siege of Tunis (Mercenary War) |
Here are two concrete problems on that page:
If you’d like, I can draft precise, sourced wording to fix those two passages. |
1. Leaders captured at the Saw: 2. Aftermath dating and target towns:
|
2. might be an example of FUTON bias (ChatGPT not having access to the Hoyos 2007 book) combined with over-reliance on Polybius. Interestingly, this answer did not include any citations (although the reasoning trace shows that it retrieved an online version of the Polybius text it quotes from). |
| Error (infobox): It lists Clark’s service in the Confederate House of Representatives as June 10, 1864 – May 10, 1865.
Why this is wrong: The 2nd Confederate Congress met in two sessions—May 2–June 14, 1864 and Nov 7, 1864–Mar 18, 1865—and adjourned sine die on March 18, 1865. Clark is recorded as serving only in the second session, so his House term should end March 18, 1865 (and his start should align with that session, not June 10, 1864). Wikipedia+1 (Sources: the “2nd Confederate States Congress” page lists the session dates and shows “John Bullock Clark Sr., 3rd Missouri — session 2”; the “Confederate States Congress” page notes the House’s final adjournment on March 18, 1865.) |
"his House term should end March 18, 1865": This is what the article itself already said elsewhere ( He held this position from June 10, 1864, to March 18, 1865}). "his start should align with that session, not June 10, 1864" |
The "May 10, 1865" error had been present in the infobox for almost 13 years (introduced in an edit rather misleadingly described as "Formatting infobox/succession boxes."). | |
Cartoon in Punch 1858 Bradford sweets poisoning |
One clear, checkable error:
Distance to Shipley. The article says Neal sent Archer to “a druggist in the town of Shipley five miles (eight kilometres) away.” Shipley (and specifically Baildon Bridge in Shipley, where Hodgson’s shop was) is only about three miles (~5 km) from central Bradford. Multiple sources give ~3 mi straight-line (with ~5 mi as a driving distance today): see Travelmath’s straight-line vs driving figures for Bradford↔Shipley, the Baildon page noting Baildon is 3 mi north of Bradford city centre, and a detailed local history write-up placing Hodgson’s pharmacy “three miles (4.8 km) … at Baildon Bridge in Shipley.” Travelmathen.wikipedia.orgen.wikipedia.org Suggested fix: change “five miles (eight kilometres)” to “about three miles (five kilometres)” (optionally adding that it’s roughly 4–5 mi by road). |
||
|
Gott der Herr ist Sonn und Schild, BWV 79 |
One clear error is in the lead:What Bach actually did was reuse three specific movements from BWV 79 across two Missae breves—not “most” of the cantata, and not “most” of either Mass:
This mapping is documented by Bach Digital and other scholarly notes; it totals three of six BWV 79 movements (i.e., not “most”). bach-digital.deThe Bach Choir of Bethlehememmanuelmusic.org Suggested fix: replace “most of the music for two of his four short masses” with: “Bach later parodied three movements from BWV 79 in two of his short masses: the opening chorus and the duet in the Missa in G major (BWV 236), and the alto aria in the Missa in A major (BWV 234).”bach-digital.deThe Bach Choir of Bethlehememmanuelmusic.org |
I consider this one a bit nitpicky. But ChatGPT has a point that "most" for three out of six movements can be seen as a bit misleading (and in any case this sentence had apparently been added to the start of the article while overlooking that the topic had already been covered with more precision later in the lead section). |
|
Here are two solid, source-checkable errors on that page:
Optional cleanup note: In the “Taxonomy and evolution” lead, the etymology sentence is missing the glosses (it currently reads “albi meaning [blank] and nucha meaning [blank]”). You could complete it as “albi (‘white’) + nucha (‘nape’).” (No external source needed for Latin basics, but many field guides use that derivation.) |
1. Diet: 2. "Synonym year": |
Re "Optional cleanup note": This looks like a parsing fail on OpenAI's side involving the {{gloss}} template. (The source wikitext reads | |
Here’s a clear one:
(If you want another tidy-up: the article calls his father “Ferdinand Walters,” but ADB gives the full name Arthur Ferdinand Walters—worth aligning to the source.) adb.anu.edu.au |
1928–1962: "another tidy-up": |
1962 had been changed without explanation to 1963 in Special:Diff/1054517782 four years earlier, quite likely a vandalism edit.
| |
| Nizaa language |
Error: The lead claims Nizaa is “the only Bantoid language that allows multiple verbal suffixes on one verb.”
Why this is wrong: Bantu languages (e.g., Swahili, Bemba, Chichewa) routinely stack multiple verbal (derivational) suffixes—this is a well-documented hallmark of Bantu morphology. And Bantu is a subbranch of Southern Bantoid, i.e., within Bantoid. So Nizaa cannot be the only Bantoid language with such stacking. roa.rutgers.edujzeller.debritannica.com Suggested fix: Change to something like: “Nizaa is notable among North Bantoid/Mambiloid languages for allowing multiple verbal suffixes on a single verb,” and cite an appropriate source for the North-Bantoid/Mambiloid rarity rather than using “only Bantoid.” |
The cited source used a different definition of "Bantoid language" than English Wikipedia currently does (see article talk page for the messy details). | |
Reanne Evans, the first woman in the tournament 2019 Champion of Champions |
Error found:
In the lead, the article says Neil Robertson “required four snookers in the penultimate frame” of the final. Contemporary match reports say he needed one snooker to tie frame 18, forced a re-spotted black, won that frame, and then took the decider. snooker.orgwpbsa.comRTÉ [42] Suggested fix: Change “required four snookers” to “required a snooker (and won the frame on a re-spotted black)”. |
The error had been introduced a week before the article was featured on the mainpage, in a "Copy edit" that changed "having required foul shots in the penultimate frame" to "having required four snookers in the penultimate frame, apparently misreading "foul" as "four". | |
One clear error is in the lead:The officer who led the posse was Sir Richard Walsh, the (High) Sheriff of Worcestershire (the county), not the sheriff of the city of Worcester. Contemporary and reputable accounts identify him as the Sheriff/High Sheriff of Worcestershire, and even the body of the same article later uses the correct title (“Richard Walsh (Sheriff of Worcestershire) … besieged Holbeche House”). ourwarwickshire.org.ukgutenberg.orgWikipedia Suggested fix: change “Sheriff of Worcester” to “Sheriff of Worcestershire.” |
This error appears to have been present for almost 16 years [45]. |
I also tried this out once on German Wikipedia, for the "article of the day" of October 19, 2025 (with the prompt translated into German, slightly modified). This uncovered a major error (two figures being off by a factor of 1000) which had been present in the article for over 11 years and survived FA review (Exzellenzkandidatur).
Extended content
| ||||||||
|---|---|---|---|---|---|---|---|---|
|
might be the beginning of the end for Wikipedia, in a since deleted tweet that was followed by some more nuanced statements.
Discuss this story