The Signpost
Single-page Edition
WP:POST/1
20 February 2023

News and notes
Terms of Use update, Steward elections, and Wikipedia back in Pakistan
In the media
Arbitrators open case after article alleges Wikipedia "intentionally distorts" Holocaust coverage
Disinformation report
The "largest con in corporate history"?
Essay
Machine-written articles: a new challenge for Wikipedia
Tips and tricks
All about writing at DYK
Featured content
Eden, lost.
Gallery
Love is in the air
Traffic report
Superbowl? Pfft. Give me some Bollywood! Yours sincerely, the world
From the archives
5, 10, and 15 years ago: Let's (not) delete the Main Page!
Cobwebs
Editorial: The loss of the moral high ground
Humour
The RfA Candidate's Song
 

Wikipedia:Wikipedia Signpost/2023-02-20/From the editors


2023-02-20

Superbowl? Pfft. Give me some Bollywood! Yours sincerely, the world

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Benmite, Max BuddyRoo, and SSSB.

Come, Mr. DJ, song pon de replay (February 12 to 18)

Rank Article Class Views Image Notes/about
1 Rihanna 4,527,454 After five years, the greatest thing to ever come out of Barbados returned to the stage in a 13 minute concert in Glendale, Arizona. Sure, sometimes she was taking advantage of the playback to not actually sing. But between the sequence of hits (plus some Kanye West covers) and that impeccable stage presence, with the only special guest being the one in her womb (only 9 months after she gave birth!), the world's reaction to Rihanna was "I just can't refuse it, like the way you do this, keep on rockin' to it, please don't stop the, please don't stop the music!"
2 Raquel Welch 2,729,570 An icon of beauty, eternized in a best-selling poster wearing the fur bikini off One Million Years B.C. (you know the image, it's even been used in a prison escape), who has passed away at 82.
3 Valentine's Day 2,707,459 The holiday for the lovers. That this year could be celebrated like it was 1998, given Titanic was back in theaters!
4 Patrick Mahomes 2,318,975 Ah yes, before and after Riri's concert there were some guys passing around an oval thing. This guy led the team who won the game, the Kansas City Chiefs.
5 Ant-Man and the Wasp: Quantumania 2,030,180 Marvel Cinematic Universe: Phase Five has begun with size shifters Scott Lang and Hope Van Dyne fighting Kang the Conqueror, bound to become the newest big bad that requires the Avengers to defeat. And it seems like again every new MCU movie will inspire rifts among the fanbase, with a mixed response mostly finding that a microscopic epic was not as fun as the character's preceding solo outings that focused on the comedy. Still, expect big box office numbers and a few more appearances on this list.
6 ASAP Rocky 1,532,735 #1's 'surprise guest' turned out to be another baby, and it's this rapper's fault.
7 ChatGPT 1,464,968 Wonder if I should make an account for this still very discussed chatbot, if only to see what it would write on the subjects that enter the Report? (for certain it would not try to pass music lyrics as commentary)
8 Pathaan (film) 1,455,834 India is still raving about this Bollywood thriller, now the country's fifth highest-grossing movie ever .
9 The Last of Us (TV series) 1,398,207 Knowing it was hard to compete with #1 (OK, technically #4, but more people wanted to see her), HBO kept the Sunday broadcast of this post-apocalyptic video game adaptation while releasing the episode two days earlier on streaming.
10 Travis Kelce 1,275,119 One of #4's teammates, who defeated his older brother in the big game.

Living together, growing together, just being together (February 5 to 11)

Rank Article Class Views Image Notes/about
1 Pathaan (film) 2,115,794 Along with making all the crore, Bollywood's biggest hit of the year has also managed to keep the top of this list for three straight weeks. It only seems improbable to pull a fourth with Super Bowl LVII right around the corner.
2 ChatGPT 1,871,580 No big new developments when it comes to this all-in-one chatbot that can do just about everything but become human – but let's not jinx it. Still, it's generated more chatter about it than even it could produce, hence its steadily rising position on this list.
3 The Last of Us (TV series) 1,658,357 HBO's TV adaptation of what is commonly believed to be one of the greatest, if not the greatest, video games of all time is now on its way to being hailed as one of the greatest series of all time if IMDb ratings are to be believed. The week had two episodes because HBO chose to anticipate one to Friday rather than take their chances against the Super Bowl. I haven't gotten around to watching it myself, but it has a bittersweet gay romance and Anna Torv – two of my favorite things – so I should probably get on that.
4 Kiara Advani 1,538,943 This popular Indian actress, who's starred in two movies on the list of highest-grossing Indian films, got married to #9 this week.
5 2023 Turkey–Syria earthquake 1,492,216 Syria has already been devastated by over a decade in a civil war, and things were worsened once neighbor Turkey was struck by the deadliest earthquake since the 2010 Haiti earthquake, leaving widespread damage and tens of thousands of casualties in both countries, not helped by a winter storm covering the rubble in snow.
6 Pedro Pascal 1,284,539 Stars in #3 as Joel, the protagonist of both the series and its source material. And also hosted Saturday Night Live.
7 Burt Bacharach 1,272,831 Bacharach's might be a name you don’t recognize if you’re below a certain age, but rest assured that even if you haven’t heard of him, you’ve heard him – that is, you’ve heard one of the countless hit songs he wrote from the ‘60s all the way to the ‘90s that were performed countless times by countless artists before his death this week at age 94.
8 Kim Petras 1,241,199 Pop singer Petras made what I suppose you could call history at the Grammys if you consider Best Pop Duo Performance to be the most essential award of the night. Petras became the first trans person to win that award, but not, as many online might deceive you into believing, the first trans person or even trans woman to win a Grammy. That honor belongs to electronic music pioneer Wendy Carlos, who won three Grammys all the way back in 1970 for her debut album Switched-On Bach.

Petras may not have the same creds as Carlos, but she made sure to thank the trans women who came before her to make her win possible. Absent from her victory speech, though, was an acknowledgment of her continued work with and fervent support of Dr. Luke, the 2010s hitmaker accused of sexually assaulting fellow pop star Kesha. Talk about unholy.

9 Sidharth Malhotra 1,105,868 Got married to #4, who he met while filming Shershaah in 2021 – where, in a case of life imitating art, they were playing lovers.
10 Deaths in 2023 972.921 Here's one of #7's songs you might've heard (even Spider-Man likes that one):
But there's one thing I know
The blues they send to meet me
Won't defeat me, it won't be long
Till happiness steps up to greet me

God I love the sweet taste of India (January 29 to February 4)

Rank Article Class Views Image Notes/about
1 Pathaan (film) 3,980,500 This Bollywood film continues to make waves in India as it's already one of the highest grossing Indian films of all time, bringing this and #10 into the Top 25.
2 The Last of Us (TV series) 1,983,197 The HBO show based on #7 continues to make the top 2 in the report. And this week's episode revealed Ron Swanson was one of the survivors of the apocalypse!
3 ChatGPT 1,788,228 The chatbot AI is on the list again, with an epidemic of kids using it to cheat on their schoolwork. Recently, someone made an app to detect GPT plagiarism, so we will see where it ends up in the coming weeks. Some other popular uses for it are a site that mimics characters (and making them gain sentience), and (failed) attempts to pass the bar exam.
4 Royal Rumble (2023) 1,773,416 The WWE PPV event made the report again this week, with positive reception, and a YouTuber involved in a crypto scam.
5 Annie Wersching 1,694,774 An actress best known for TV work like 24, Bosch and Runaways (and who was also part of #7's cast), who died at 45 of cancer.
6 Pamela Anderson 1,293,055 Last year Hulu's Pam & Tommy dramatized how this sex symbol's sex tape was leaked. Now Netflix has released Pamela: A Love Story, a documental miniseries where Pam recalls her life story, serving as a companion piece to her autobiography Love, Pamela.
7 The Last of Us 1,037,352 #2's source material, originally released in 2013 for the PlayStation 3.
8 Deaths in 2023 1,024,280 We were caught up and lost in all of our vices
In your pose as the dust settled around us...
9 Cindy Williams 990,960 Laverne & Shirley are reunited in the afterlife, as five years after Penny Marshall death came for Cindy Williams, who before Happy Days and that spin-off appeared in the movies American Graffiti and The Conversation.
10 List of highest-grossing Indian films 979,151 While the global list has Avatar: The Way of Water entering the top 5, this country-specific one sees #1 already in the top 10.

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.


2023-02-20

Arbitrators open case after article alleges Wikipedia "intentionally distorts" Holocaust coverage

Wikipedians rebut paper alleging "intentional distortion" of Holocaust history

Jan Grabowski of the University of Ottawa, one of the two authors of the paper

An essay published on 9 February 2023 in The Journal of Holocaust Research and reported on by Haaretz and Polish daily Wyborcza, as well as San Diego Jewish World and Ynet, alleges that Wikipedia engages in "intentional distortion of the history of the Holocaust". The abstract of the essay, written by Jan Grabowski of the Department of History at the University of Ottawa and Shira Klein of the Department of History at Chapman University in Orange, California, says:

This essay uncovers the systematic, intentional distortion of Holocaust history on the English-language Wikipedia, the world’s largest encyclopedia. In the last decade, a group of committed Wikipedia editors have been promoting a skewed version of history on Wikipedia, one touted by right-wing Polish nationalists, which whitewashes the role of Polish society in the Holocaust and bolsters stereotypes about Jews. Due to this group's zealous handiwork, Wikipedia's articles on the Holocaust in Poland minimize Polish antisemitism, exaggerate the Poles' role in saving Jews, insinuate that most Jews supported Communism and conspired with Communists to betray Poles (Żydokomuna or Judeo–Bolshevism), blame Jews for their own persecution, and inflate Jewish collaboration with the Nazis. To explain how distortionist editors have succeeded in imposing this narrative, despite the efforts of opposing editors to correct it, we employ an innovative methodology. We examine 25 public-facing Wikipedia articles and nearly 300 of Wikipedia’s back pages, including talk pages, noticeboards, and arbitration cases. We complement these with interviews of editors in the field and statistical data gleaned through Wikipedia's tool suites. This essay contributes to the study of Holocaust memory, revealing the digital mechanisms by which ideological zeal, prejudice, and bias trump reason and historical accuracy. More broadly, we break new ground in the field of the digital humanities, modelling an in-depth examination of how Wikipedia editors negotiate and manufacture information for the rest of the world to consume.

On 13 February 2023, Wikipedia's primary disciplinary body, the Arbitration Committee, took the unusual step of initiating a case request sua sponte in response to the essay, "invoking its jurisdiction over all matters previously heard and exercising its authority to revisit any proceeding at any time at its sole discretion." The topic area – including many of the edits and behaviours discussed by Grabowski and Klein – has been the subject of multiple arbitration proceedings before, from the 2009 Eastern European mailing list case to the 2019 Antisemitism in Poland case.

On 15 February 2023, Wyborcza (the Polish newspaper that carried Grabowski and Klein's summary of their essay) published a rebuttal by Piotr Konieczny of the Department of Media and Social Informatics at Hanyang University, one of the Wikipedians (User:Piotrus) named by Grabowski and Klein. Piotr said the essay contained many assertions of fact that were not borne out by edit histories recorded on Wikipedia, as well as instances of selective quoting. For those who don't subscribe to Wyborcza – the paper is paywalled – the text of the rebuttal is available here. A longer, English-language response by Piotr raising some of the same issues is here. Volunteer Marek, another editor named in the essay, has also published a multi-part response in English on his Substack.

Watch out for an independent review of the paper in the upcoming issue of the Signpost's monthly "Recent research" section. In the meantime, see also previous Signpost coverage of similar complaints raised in 2019. – AK

"Why we should be wary of Wikipedia"

Placeholder alt text
Investigative journalist Russ Baker

This is the premise of an article series investigative journalist Russ Baker kicked off on 6 February 2023 with a piece on his whowhatwhy.org website.

Baker is a veteran reporter who has written for top publications like The New Yorker and The Washington Post. He has tussled with the Church of Scientology. In 2005, he won the Deadline Club award for his exclusive reporting on George W. Bush's military record. Baker was among the first to cast doubt on Colin Powell's now-infamous presentation on Iraq at the United Nations – at the time a very unpopular stance – and among the first to make Americans aware of the impending genocide in Rwanda. But presently, he is concerned about Wikipedia's biographies.

Baker notes that discussions of bias on Wikipedia have generally focused on its alleged "white, American" bias as well as its alleged "leftist" bias:

But none of these critiques really get at what I'm talking about: how professional or amateur "hit men" can infiltrate Wikipedia and go after individuals and ruin them in the public eye.

Some years ago, when I began researching this, I found very little online about this phenomenon, despite the fact that I knew a fair number of individuals who had been victims of the practice. Now that I look again, I still see no sign that this problem is being addressed or even vestigially discussed.

What this means is, nobody is minding the store to make sure that we don’t end up in some type of artificial informational construct that edits the facts about powerful actors and institutions to conform to a subjective agenda instead of reality.

Baker feels that anti-establishment actors like himself are particularly likely to suffer, given that Wikipedia uses the mainstream media as its arbiter, and these media are in many ways an integral part of the establishment.

The fact is, anyone who is out there "making trouble" for the system doesn't stand a chance. Why? Because it would take a relentless, inhuman vigilance to battle those persistent and tidal forces bent on controlling the narrative … And most of us don't have the time, expertise, or energy to do that.

Also, because, for someone to make their case, they have to prove that good things have been said about them … by the establishment.

In other words, if The Washington Post likes you, then you appear in a positive light on Wikipedia. If it doesn't, then what the public sees on the platform is a person or entity it should apparently not like or trust.

In the second part (published a week later) Baker looks at his own Wikipedia biography, which he says is highly selective in a way that is unfavourable to him.

And yes, I received several scathing reviews from establishment organs, but my Wikipedia page never quoted any of the good ones I received — and from prominent people. In fact, they implied there were none. Here’s what you won’t find on Wikipedia:

  • One of the most important books of the past ten years. — Gore Vidal
  • An investigative gem filled with juicy revelations. — Sydney Schanberg, Pulitzer Prize winner, The New York Times
  • A tour de force… Family of Secrets has made me rethink even those events I witnessed with my own eyes. — Dan Rather
  • Russ Baker's work stands out for its fierce independence, fact-based reporting, and concern for what matters most to our democracy… A lot of us look to Russ to tell us what we didn't know. — Bill Moyers
  • This is the book people will be mining for years to come. — David Margolick, Newsweek and Vanity Fair

There is no reason to believe these quotes aren't genuine – Schanberg, for example, joined Baker for readings of the book in question. The problem is that these are "Praise for ..." quotes from a marketing blurb rather than quotes taken from published reviews. Wikipedians would generally avoid citing marketing materials, and look for independently published reviews in the press. So, is Baker merely whining?

Well, no. Reading the fairly sympathetic Boston magazine article quoted in Baker's biography, it's hard to escape the notion that editors selected quotes to construct a narrative completely at odds with the overall tenor of the cited article. The Boston article concludes by asking, in light of important stories broken by Baker in the past, "which is more dangerous, listening to Russ Baker, or ignoring him?" – AK

"Share profits with authors!"

OpenAI, the creators of the generative pre-trained transformer (of which ChatGPT is one) – reportedly paying its clickworkers in India and Africa $2 an hour

This is the provocative title of an article in Germany's Der Tagesspiegel newspaper, opining that generative models like ChatGPT that create text, images and music are committing "data theft" and leaving creators "naked". The article discusses the unsung contributions of the many:

Wikipedia authors, book authors, illustrators, editors, photographers. Their work creates the raw materials that then enters an industrial process: the training data used to feed the AI.

Tech companies like Google and Amazon have used free and open internet content like the English-language Wikipedia as a quarry for years, without giving the authors or organisations a share. The paltry sums that Google and Amazon donate to the Wikimedia Foundation are dwarfed by the economic benefit these corporations derive from the online encyclopaedia.

Now, it has always been an inherent flaw of the commons idea that profit-oriented actors are as welcome to benefit from non-profit work as the general public. The "tragedy of the commons" dilemma is well known from economics. One cannot forbid Amazon to train its voice assistant Alexa with Wikipedia texts – or Wikipedia would have to jettison its foundational principles overnight.

But the relentlessness with which tech companies graze the digital commons and use it to feed their own business models raises the question under what circumstances commons will continue to be produced in the future. Who will maintain Wikipedia articles if they are used for commercialised search queries or answer modules? Who will still write books if language models glue together set pieces into third-rate novels and publishers use them to fill their portfolios?

Der Tagespiegel proposes a compensation system that gives authors an appropriate share in AI systems' profits, citing Germany's long-established VG Wort (cf. Authors' Licensing and Collecting Society) as an example. This has authors registering and then being routinely compensated with fees collected from re-users of their works, according to a complex allocation formula.

After all, no one would have a street artist paint their portrait and then, after taking a digital picture of it and editing it with an AI-based Instafilter, tell the painter, without paying, "Thanks a lot, that was fun!" Respect for art is also expressed through decent payment.

(VG Wort has previously already indicated that Wikipedia would be eligible for payments under its existing system. However, in a 2011 poll the German Wikipedia community overwhelmingly voted against participating in the scheme as a website, although some individual Wikipedia editors were collecting payments separately for "their" articles, amounting to 300 Euros in one case. See previous Signpost coverage: "German Wikipedians reject author payments scheme".)

The Tagesspiegel article ends by noting that OpenAI, the developer of ChatGPT, employs thousands of clickworkers in Uganda, Kenya and India, who label potentially offensive text to help train the models (including violent or sexual text). In Kenya, where average wage is about $18 per day, these workers are paid less than $2 an hour. – AK

Wikipedia blocked and unblocked in Pakistan

As discussed in this issue's news and notes, Wikipedia is back in Pakistan after a fairly brief block. The Prime Minister found that "the unintended consequences of this blanket ban outweigh its benefits", and formed a new committee to look at technical measures for selectively blocking specific objectionable content. However, back in 2015 Wikipedia switched to HTTPS, specifically to make it more difficult for ISPs and other men-in-the-middle to know what part of Wikipedia you're browsing. Numerous reports were made on the subject, including Dawn (again here), Bloomberg News, ABC News, Al Jazeera, and NPR AC

Wikimedia Foundation vs. NSA

The Washington Examiner reports that next week, the United States Supreme Court justices will decide whether to take up the longstanding case between the Wikimedia Foundation and the National Security Agency. "To this day, no public court has determined whether upstream surveillance complies with the Constitution. If the government can obtain dismissal here, it will have every incentive to make overstated or exaggerated claims of secrecy to close the courthouse doors on suits like Wikimedia's – suits seeking accountability for government overreach or abuse in the name of national security," the article's author, Bob Goodlatte, states. – AK

UPDATE: The Supreme Court denied the Wikimedia Foundation's petition on 21 February 2023, marking the end of the case. – AK

In brief

Former Wikimedia CEO Katherine Maher
  • "Major search engines to alter results with AI": Former Wikimedia CEO Katherine Maher spoke in a five-minute interview to ABC News (Australia) about the likely impact of tools like GPT on search engines and Wikipedia.
  • Google Bard AI trained in part on Wikipedia: Search Engine Journal reports that Google's Bard AI is 12.5% based on English Wikipedia. Another 12.5% comes from Google's C4 Dataset ("Colossal Clean Crawled Corpus"), which apparently also includes Wikipedia as one of its main sources. The other 75% come "from the internet" but their precise origins are "murky", the article says – though the author makes some educated guesses.
  • Santos is funny: especially the disgraced US parliamentarian's official web link to Wikipedia, according to the Indiana Daily Student. The link just is to the 118th United States Congress which only mentions that Santos represents New York's 3rd district. The honorable representative might also want to link here or to previous coverage in The Signpost. Where's Randy Rainbow when we need him?
  • Wikipedia and its "outsize influence on judicial reasoning": Legal Futures (UK) says that the "widespread use of online source Wikipedia by senior judges could mean fake information spreading, leading to bad judgments, an update of research first revealed last year has warned".
  • Political "scrubbing": In a "flashback" article published on the occasion of the appointment of Jeff Zients as the White House's new chief of staff, Fox News recaps a 2020 report by Politico about how consulting firm Saguaro Strategies had "scrubbed" politically damaging information from the Wikipedia page about him (see also "In the media" from December 28, 2020).
  • Jimmy Wales comments on Online Safety Bill: IT Pro quotes comments by Jimmy Wales on the Online Safety Bill, a proposed UK law that was covered extensively in our previous issue (see Special report).
  • You got something against Bigfoot?: Slate says you do, you cryptid-phobe.
  • American Physical Society partnership: APS News says Wikipedia Has a Problem That Physicists Can Help Solve – a gender gap problem addressed through the Wiki Scientist Program. More about the program is at the WikiEdu blog.
  • Cry foul: A "former controversial umpire" for Major League Baseball may be editing his own Wikipedia page, according to The Sporting News [1]. The Signpost can confirm that a user account has been blocked and talkpage access revoked following a legal threat on the account's talkpage.



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next week's edition in the Newsroom or leave a tip on the suggestions page.


Wikipedia:Wikipedia Signpost/2023-02-20/Technology report


2023-02-20

Machine-written articles: a new challenge for Wikipedia


Interview with ChatGPT

Artificial Intelligence.

This interaction with ChatGPT was recorded on January 29, 2023

Author: Are any Wikipedia articles written by ChatGPT?

ChatGPT: No, ChatGPT is not capable of writing articles on Wikipedia. It can only generate text based on the input it receives and its training data.

Author: What should I do with a Wikipedia article someone wrote using ChatGPT?

ChatGPT: Articles generated by language models like ChatGPT are not suitable for use on Wikipedia as they are not written by humans and lack context and accuracy. If you have come across such an article on Wikipedia, you should report it to the Wikipedia community for review and potential removal, as Wikipedia has strict policies regarding the sources and reliability of the information on the platform.

My Experience

Near the end of January, I ran across some odd-sounding articles in draftspace. All the articles had the same structure and layout, even though they were about different things. The articles also had impeccable English, though the author's comments did not. Suspecting something was up, I found an artificial intelligence (AI) article detector that then pronounced all these articles machine generated with greater than 99.9% confidence. Not knowing what to do, I went to the calm environ of the Administrators' noticeboard for incidents. That discussion is worth reading, as it shows a bunch of knowledgable Wikipedians struggling and debating how to deal with these articles. Were they hoaxes? Copyright violations? Good for publication? How does generated text fit within Wikipedia's requirements for articles?

Policies?

As it turns out, an effort was already underway to develop a policy regarding articles written by ChatGPT and its relatives: Wikipedia:Large language models. There, and in its associated talk page, you can see the reasoning related to these articles. In short, AI-generated text is not reliably correct, may not have a neutral point of view, needs verification, can occasionally violate copyright, and can downright lie. This is all in its inherent nature. It is fed information from a large corpus of text, much of which would not meet Wikipedia's sourcing and neutrality criteria, and it synthesizes its output without regard as to whether the text maps to a real source. To quote the ChatGPT general FAQ: "These models were trained on vast amounts of data from the internet written by humans, including conversations, so the responses it provides may sound human-like. It is important to keep in mind that this is a direct result of the system's design (i.e. maximizing the similarity between outputs and the dataset the models were trained on) and that such outputs may be inaccurate, untruthful, and otherwise misleading at times."

Finding More

I then started going through recent drafts and new articles looking for text reminiscent of the text I had seen in the first articles I identified. It didn't take long to find more. The current limiting factor is that I don't have the tools for rapidly reviewing Drafts as I do for new articles as a new page patroller, and I don't have the resources (including time and patience) needed to do this consistently and daily.

I've put the {{AI-generated}} template on those articles and had not one author disagree with the finding. You can search for the template with hastemplate:AI-generated in the Wikipedia search box. Expand the search to Drafts to see the drafts so marked. About sixty articles have been tagged. Several of the previously tagged articles have either been deleted or de-tagged once the generated text was replaced with real text, but many remain. You can then read those remaining examples and get a feel for AI-generated articles.

Editor's note: If you are reading this in the future and none of these are available, some representative drafts have been retained more permanently at User:JPxG/LLM dungeon.

I have been conservative in identifying articles: only testing articles that had a similar appearance and using > 99% assurance that it was machine-generated. I am sure I've missed many more articles. I was looking for typical phrases like "in conclusion" starting the last paragraph, use of the article's title repetitively without abbreviation or variation, and consistent sentence and paragraph length. A more sophisticated AI user would use better prompts to the AI software and produce harder-to-detect output; the ones I've found typically were produced by asking ChatGPT something like "Write a Wikipedia article about XXXX."

I test for articles typically using https://openai-openai-detector.hf.space/, though other sites exist, including https://detector.dng.ai/, https://gptzero.me/, https://platform.openai.com/ai-text-classifier, and https://contentatscale.ai/ai-content-detector/. Before testing, I remove headings, inline references, and other text and markup that appears to have been added after text generation, as those can confuse the analyzer.

Problem?

Is this a real problem? I believe it is. Many of the articles sound reasonable but may have serious errors. The conversation at the administrators' noticeboard includes an analysis of a generated article on geckos. The article contains a lot of specific plausible-sounding information (e.g., size range), much of which is wrong.

ChatGPT will even provide references if requested, but those references are synthesized from its input text and, while sounding correct, usually do not point to real articles. For example, when I asked it for references on an article ChatGPT wrote for me on Sabethes cyaneus (a mosquito), one of the references was "Sabethes cyaneus" (Encyclopedia of Life): https://eol.org/pages/133674. That page does exist, but is for Clavaria flavopurpurea, a fungus. Another reference it provided also had a link, but the link pointed to an article about a mink, and the reference itself was fictional.

There are efforts to improve these programs, and I am sure that eventually they will be successful. Currently, however, those efforts fall well short. One such effort, "Elicit" (elicit.org), only searches research papers and summarizes them. I asked Elicit "What are the characteristics of Sabethes cyaneus?" It summarized one reference as "Sabethes cyaneus is a species of frog."

These false but plausible answers are an inherent property of the current models and is called "hallucination".

Even once these programs improve, there will still be significant concerns limiting the direct use of generated text, such as:

  1. Did the program only use sources acceptable to Wikipedia?
  2. Is the information up to date?
  3. Can the program identify correct references? (One of the biggest hurdles, as the inherent nature of current models is that the output is not linked to a specific source.)
  4. Will the program avoid hallucinating?

However, programs like ChatGPT are great for generating ideas for articles and helping to "mock up" a good article, if the user takes a sophisticated view of the output, using it more for inspiration than for a source of truth. The proposed policy Wikipedia:Large language models has additional information on how these programs can be used to improve Wikipedia.

Wikipedia:Wikipedia Signpost/2023-02-20/Opinion


2023-02-20

Terms of Use update, Steward elections, and Wikipedia back in Pakistan

Terms of Use update

The logo of the Wikimedia Foundation

Consultations that should lead to an updating of the Wikimedia Foundation's Terms of Use will begin as early as this Tuesday, February 21 according to WMF Legal. The discussions will take place on Meta-Wiki.

The updates are necessary in order to:

Other possible changes include strengthening enforcement of the requirement that paid editors must declare their employers, clients, and affiliations. S

Results of the Universal Code of Conduct Enforcement Guidelines vote

Wikimedia Foundation Board Member Shani Evenstein announced the results of the vote on the Wikimedia-l mailing list on 13 February 2023:

Today the results of the Universal Code of Conduct Enforcement Guidelines vote were tallied. We are pleased to report the results show that the Enforcement Guidelines are strongly supported by the community, with 76% of participants voting in support of the Enforcement Guidelines.

3,097 voters from 146 Wikimedia communities participated. Statistics for the vote are available. A more detailed summary of comments submitted during the vote will be published soon.

Next, the results and comments collected during this vote will be submitted to the Board of Trustees for their review. It is expected that the Board of Trustees review process will complete in March 2023, at which time another update will be published. AK

Steward elections and confirmations underway

Placeholder alt text
Stewards are a global group of users with complete access to the wiki interface on all public Wikimedia wikis

The Steward elections are underway. New candidates for Stewardship are:

The Steward confirmations – an annual performance evaluation by the community that determines whether or not existing Stewards will retain their status for another year – are also currently underway.

Voting for the election and confirmation processes will remain open until 26 February 2023, 21:00 (UTC). AK

Prime Minister of Pakistan intervenes to unblock Wikipedia

Man in his seventies wearing a great jacket, blue shirt and dark tie
Shehbaz Sharif, the Prime Minister of Pakistan, decided that Pakistan had thrown out the baby with bathwater and ordered access to Wikipedia restored

Now you see it, now you don't – Wikipedia had hardly been blocked by Pakistan before it was unblocked.

Shehbaz Sharif, the Prime Minister of Pakistan, stepped in and convened a committee composed of the Minister for Law and Justice, the Minister for Economic Affairs and Political Affairs and the Minister for Information and Broadcasting to examine the matter.

The committee took the view that –

Wikipedia was a useful site/portal which supported dissemination of knowledge and information for the general public, students and the academia. Blocking the site in its entirety was not a suitable measure to restrict access to some objectionable contents/sacrilegious matter on it. The unintended consequences of this blanket ban, therefore, outweigh its benefits.

The government subsequently barred the Pakistan Telecommunica­tion Authority (PTA) from blocking any website without first consulting the Minis­try of Information Tech­nology and Tele­communi­cation.

Wikimedia Foundation General Counsel Stephen LaPorte said on the Wikimedia-l mailing list:

We received news yesterday that the Pakistan Telecommunications Agency was directed to restore access to Wikimedia projects, including Wikipedia, in Pakistan. Our internal traffic reports confirm that the block has been lifted and we've updated our public statement.

We're happy that the people of Pakistan can continue to use Wikipedia to access knowledge and share their experiences and culture with the world. These situations are a reminder of the importance of the work of our movement and the vital role our communities play in building the largest collection of open knowledge in history. It also underscores the need to protect people's right to access free knowledge and participate in contributing to its growth globally. Thank you for your support. We'll continue to keep you updated if there are any new developments.

AK, R

For the media coverage of this event, see this issue's in the media.

AI takeoff begins: the first wave of bullshit

A bull with broad horns standing in a field, regarding the photographer.
Another BS generator without a sense of what is real and what is fake

Last issue, we reported on AI-generated articles on Wikipedia. It was a bit of fun. Find the AI-generated article! The impostor is sus!

If only it remained that way: English Wikipedia editor Rory Jaffe (rsjaffe) has found a number of them, including Draft:Cow Farts – not actual bull shit, but pretty close. Here's what he says about his discovery technique.

I've been looking for "tells" before testing [with a classifier tool that can detect AI generated text with some accuracy]. The text is more "lifeless" than human text typically is, tends to have uniform length sentences, and may have a paragraph at the end that is a summary paragraph. The text is also unlikely to have in-line references. This doesn't catch all of them but tends to be a high-likelihood way of finding them. Some of these AI-generated articles also have AI-generated references. The references are almost always fake: the AI confabulates reasonable-sounding references!

Articles or drafts tagged by Jaffe and reviewed by The Signpost were created as early as 17 December 2022 (Draft:Brisbane Lord Mayor YAC). The topics tagged by Jaffe and others range from the whimsical and neologism-ic (Draft:Twinkanic Awareness, Draft:Sharks should be saved from humans) to the plausibly- and obviously-promotional, and appear to be created by many editors, including single-purpose accounts that may be using them for promotion. Take Draft:Dachshund harness, "an essential accessory for owners of this unique breed of dog", which has a list of references at the end related to shopping for such a device. Given affiliate marketing exists, this could be promotional. Or it could be purely accidental. We don't know. Some of the rest may be deleted by the time you read this – a few examples are retained permanently at User:JPxG/LLM dungeon.

The Signpost asked Jaffe "Do you have a personal reaction to the quality of the AI generated articles? Briefly, are you for or against this method of creation?" He told us this:

Strongly against. These language generators are good at stringing together reasonable-sounding text, but:

1. Don't rely solely on what Wikipedia recognizes as reliable sources.

2. Don't have a sense of what is real and what is fake.

3. Will fabricate information to fill in gaps.

4. Cannot identify where any specific bit of "information" came from.

So the text can fail WP:RS and WP:V, yet may look very convincing.

However, they're good at helping to break "writer's block", by giving an example of how to write something. So I see having some text generated by ChatGPT would be helpful as a writing prompt, but without using any of the generated text directly—just looking at it and then writing a well-researched article similar to the prompt.

More on Jaffe's experience and conclusions on this topic can be found at the Essay in this issue.

We asked ChatGPT for a commentary as well. It said:

It's interesting to see that people are now using AI to generate articles on Wikipedia, but it's also concerning as this could lead to a proliferation of inaccurate or fake information on the platform. It's good to see that editors like Rory Jaffe are using techniques to identify AI-generated text, but it's clear that the AI-generated articles are becoming more sophisticated in their structure and references, making it harder to detect them.

It's important for Wikipedia to maintain its high standards of accuracy and reliability, so it's crucial that the platform continues to monitor and regulate the use of AI-generated content. While AI can be a valuable tool for assisting with research and writing, it should never be a substitute for human expertise and critical thinking.

B, AC

Wikimedia Enterprise financial report

Wikimedia Enterprise logo, a stylised blue/violet/red squirrel looking right, with the words WIKIMEDIA ENTERPRISE written below in black
The Wikimedia Enterprise project sells API services to Google and provides them free of charge to the Internet Archive. The team has not released information on any additional customers acquired since then.

Wikimedia Enterprise has published its first detailed financial report, complete with a summary of product updates.

Revenue in the 2022 calendar year (from subscription and professional services) totaled $3.12 million, vs. expenses of $3.18 million ($1.06 million in cost of services and $2.12 million in operating expenses), for a net loss of $60,000.

The Enterprise team points out that the present report for the 2022 calendar year should be considered a "beta" edition. The Wikimedia Foundation normally operates on a financial year starting in July and finishing the following June. But as January 1, 2022 marked the official start of commercial operation of the Enterprise project, this first report covers the 2022 calendar year. For future financial reports the Enterprise team intends to align with the normal financial reporting schedule of the rest of the Wikimedia Foundation. The next report will be published in late 2023.

The report adds that:

As the LLC is wholly owned by the Wikimedia Foundation, all of the financial information presented here will also be included within the Wikimedia Foundation's audited financial statements and will be in the next Wikimedia Foundation "Form 990" filing as it relates to fiscal year 2021–2022, and future years.

The Enterprise team says it is very happy with its first year in business and looks forward to growth and profitability this year:

In 2023, our second year of operations, with the addition of new customers and also new features, both revenue and expenses are expected to increase – but revenue growth is expected to outpace expenses. By comparison to other startup commercial API projects, to reach this stage within one year of operations is extremely rapid progress. The governance of these and all other commercial customer relationships is consistent with how the Wikimedia Foundation treats large corporate donations.

No information has been released on how many paying and non-paying customers Wikimedia Enterprise has at the moment. Corresponding inquiries on Meta-Wiki have been answered as follows:

As stated in the original press release, Google and the Internet Archive are indeed the first to receive paid and free access (respectively) but we have not publicized the subsequent customers (paid or free) who have signed-up to the service. ... Maintaining a public and comprehensive list of paying and free/trial customers would look like advertising/promotion of those customers, and also introduce a new privacy (and potentially security) problem: i.e. in the same way that it would be inappropriate to make a public list of "all people who have used the Wikidata Query Service this month" (for example) – it goes against our privacy culture. Nonetheless, we do intend to be making "use case" blog posts – which will describe how some users (either general categories or individual cases with their permission) are benefiting from the service in the real-world.

Video recording of the 10 February Zoom meeting

The Enterprise API features are designed with large commercial reusers in mind – primarily search engines and voice assistants, although education (question-and-answer modules), finance and the fast-growing AI sector represent additional market opportunities the team may explore.

But other kinds of reusers, including individual volunteer Wikimedians, can also benefit. There are several access methods to the datasets available at no cost/no registration, including a free Wikimedia Enterprise account via the project's homepage. Ongoing updates for the community can be found on the project's MediaWiki page.

The publication of the financial report was followed by a public Zoom meeting on 10 February, a recording of which is available. Click the picture on the right or go to the Wikimedia Enterprise talk page on Meta-Wiki, which also includes a list of questions discussed during the meeting and their approximate timecodes. The file description in Commons even includes an automated transcript. – AK

Tides Advocacy

In response to multiple inquiries, the Wikimedia Foundation has posted a breakdown of how and when Tides Advocacy spent the $4.223 million Annual Plan Grant portion of the $8.723 million the organisation received from the WMF in the 2019–2020 financial year. – AK

Brief notes

Episode 8 of Wikimove's podcast series covers movement strategy and features three guests from Africa.
  • WIKIMOVE: The WIKIMOVE podcast project, hosted by Wikimedia Deutschland's Nicole Ebber and Nikki Zeuner, has published Episode 8 of its English-language podcast series. This episode's topic is "Movement Strategy" – how the strategy came about, what the envisioned future looks like, and how this relates to knowledge equity and global governance. Guests this time round are Sharon Naa Deedei Tagoe, a volunteer at the Open Foundation West Africa who joined the Wikimedia movement last year, Likambo Isaac Kangu, a South Sudanese refugee living in Uganda who is the diaspora coordinator for the User Group South Sudan in the refugee settlements in Uganda, and Ruby Damenshie-Brown, Senior Program Officer at the Open Foundation West Africa, who's been a Wikimedian since 2019. The episode is also available as a YouTube video with subtitles and as a transcript. A newsletter of the Open Foundation West Africa is available here. – AK
  • This Month in GLAM: The January 2023 Newsletter is now available to read on Wikimedia Outreach.
  • Ombuds Commission: The composition of the 10-member Ombuds Commission has been announced.
  • AffCom liaison: The Wikimedia Foundation Board of Trustees' Community Affairs Committee elected Lorenzo Losa as an additional Affiliations Committee liaison for the Board.
  • Speedy deletion criterion F10 removed: Following a Request for Comment, speedy deletion criterion F10 (Useless non-media files) was deprecated in favour of WP:PROD and marked obsolete.
  • Global bans: One Japanese newbie account, after making legal threats on Japanese Wikipedia.
  • Milestones: The following Wikimedia projects reached milestones this fortnight: Ladin Wikipedia reached 100,000 articles, Inari Sami Wikipedia reached 5,000 articles, Luganda Wikipedia reached 2,000 articles, and Chichewa Wikipedia reached 1,000 articles.
  • Articles for Improvement: This week's Article for Improvement is Centrism. Please be bold in helping to improve this article! Next week's article (beginning 20 February 2023) will be Textbook.

Wikipedia:Wikipedia Signpost/2023-02-20/Serendipity Wikipedia:Wikipedia Signpost/2023-02-20/Op-ed Wikipedia:Wikipedia Signpost/2023-02-20/In focus Wikipedia:Wikipedia Signpost/2023-02-20/Arbitration report


2023-02-20

The RfA Candidate's Song

Back by popular demand: this little ditty, taken from Wikipedia:Songs about Wikipedia/The RfA Candidate's Song, was recorded with considerable panache and musicianship back in 2006 – which means there's probably a whole new generation of editors who've never heard it. Enjoy! Lyrics below.


RfA: A self-nomination (parody) (sung to the tune of the Major-General's Song)


I am the very model of a modern Wikipedian:
I've knowledge of Greek drama, both comedic and tragedian;
I know historic battlefields from Ancient Rome to World War Two,
and neutral is the mode in which I always keep my point of view!
I'm very well acquainted, too, with Polynese ethnology.
I understand Samoan lore and Mā-ori mythology.
From mountain peaks of Suriname to passes of the Great Divide,
I see to it that everything is quite correctly wikified!
He sees to it that everything is quite correctly wikified!
He sees to it that everything is quite correctly wikified!
He sees to it that everything is quite correctly wiki-wikified!
Some people think my entries are little more than trivia —
that episode of Barney or that woodwind of Bolivia
and yet, in matters great and small, high and low and median,
I am the very model of a modern Wikipedian.
In short, in matters great and small, high and low and median,
He is the very model of a modern Wikipedian!
Black and white picture of a general with a mop
I must become a wiki admin-candidate
I SIGN all of my comments; and give summaries for each edit;
I never BITE; I never POINT; I follow wiki etiquette;
I assume good faith when wiki-flamed, and wear a thickened wiki-skin.
For I am the very model of a proper wiki-citizen:
I'm often found at village pumps discussing rules and policy,
and judging notability while over at the AfD,
In matters wiki-sophic, I'm no exopede inclusionist,
but an incremento-eventuo-darwikian-delusionist!
an incremento-eventuo-darwikian-delusionist!
an incremento-eventuo-darwikian-delusionist!
an incremento-eventuo-darwikian-delusion-usionist!
I am the very model of a modern Wikipedian.
I wrote this wiki-parody to ease some wiki-tedium.
When dealing with my colleagues I'm a friendly wiki-editor,
but the sight of trollishness makes me a vandal-stalking predator;
When dealing with his colleagues he's a friendly wiki-editor,
but when provoked he can become a vandal-stalking predator!
Reverts and warnings I have made while on the recent change patrol,
I play some cat-and-mouse with them, but now I want to block-a-mole,
and so at last I must become a wiki admin-candidate:
since I'm my only fan, I have no choice but to self-nominate.
As prep for this I've learned about all bans and blocks and redirects,
now I can tell you anything on SEMIs and full page protects,
and when you hear all that I know of Wikipede sysop-ery,
you’ll say no better candidate has ever sought the moppery
you’ll say no better candidate has ever sought the moppery.
you’ll say no better candidate has ever sought the moppery.
you’ll say no better candidate has ever sought the moppa-moppery.
Now the adminship, as Jimbo says, is really no big deal at all:
many fail and so might I (my role in portal talk's been small).
But still in matters great and small, high and low and median
I am the very model of a modern Wikipedian.
In fact, in matters great and small, high and low and median,
He is the very model of a modern Wikipedian!


If articles have been updated, you may need to refresh the single-page edition.

















Wikipedia:Wikipedia Signpost/Single/2023-02-20