The Signpost
Single-page Edition
WP:POST/1
28 November 2022

News and notes
English Wikipedia editors: "We don't need no stinking banners"
In the media
"The most beautiful story on the Internet"
Interview
Lisa Seitz-Gruwell on WMF fundraising in the wake of big banner ad RfC
Opinion
Privacy on Wikipedia in the cyberpunk future
Disinformation report
Missed and Dissed
Op-Ed
Diminishing returns for article quality
Book review
Writing the Revolution
Technology report
Galactic dreams, encyclopedic reality
Essay
The Six Million FP Man
Tips and tricks
(Wiki)break stuff
Recent research
Study deems COVID-19 editors smart and cool, questions of clarity and utility for WMF's proposed "Knowledge Integrity Risk Observatory"
Featured content
A great month for featured articles
Obituary
A tribute to Michael Gäbler
Concept
The relevance of legal certainty to the English Wikipedia
Traffic report
Musical deaths, murders, Princess Di's nominative determinism, and sports
From the archives
Five, ten, and fifteen years ago
CommonsComix
Joker's trick
 

Wikipedia:Wikipedia Signpost/2022-11-28/From the editors

2022-11-28

Musical deaths, murders, Princess Di's nominative determinism, and sports

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, YttriumShrew, SSSB and anonymous user.

Rain drop, drop top (October 30 to November 5)

Rank Article Class Views Image Notes/about
1 Takeoff (rapper) 3,156,575 Another addition to the list of murdered hip hop musicians, as one third of Migos was shot dead at age 28 outside a Houston bowling alley. (Police reports indicate Takeoff was not the intended victim of the shot that killed him).
2 ICC Men's T20 World Cup 2,642,554 Cricket may not make sense to some people, but its popularity is evident from the continued high views of this page due to #4.
3 Aaron Carter 1,820,734 Another death of a relatively young musician, namely, the brother of Backstreet Boy Nick Carter. Aaron had followed him as a singer. He was found drowned in his bathtub at the age of 34.
4 2022 ICC Men's T20 World Cup 1,465,153 The pool stages for this tournament are now done and dusted. In Group 1, New Zealand topped the table, becoming the first team to qualify for the semifinals, and were followed by England, who overtook Australia after defeating Sri Lanka in the group's final match. In a crowded Group 2, India qualified after South Africa were knocked out by the Netherlands, and were joined by Pakistan after their defeat of Bangladesh. The groups have been marked by several upsets courtesy of Ireland, Zimbabwe, and the Netherlands.
5 Halloween 1,428,521 SPOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOKY!
6 Charles Cullen 1,372,687 The latest true crime flick to hit Netflix is The Good Nurse, a well-reviewed drama starring Eddie Redmayne (pictured) as this murderous nurse, and Jessica Chastain as the colleague who turned him in.
7 Migos 1,312,960 #1's three-man rap group, who in 2017 scored a chart-topper with "Bad and Boujee".
8 Jeffrey Dahmer 1,297,686 Mama they're in love with a criminal
And this kind of love is not logical...
9 Quavo 1,215,376 Another member of Migos, #7. He and his nephew Takeoff, #1, had released an album less than a month ago. Quavo witnessed Takeoff's fatal shooting.
10 Elon Musk 1,112,744 Musk's takeover of Twitter keeps him in the news. In an effort to make the company profitable, he conducted mass layoffs, and users have been asked to pay $8 a month for premium services.

Is this the feeling I need to walk with? (November 6 to 12)

Rank Article Class Views Image Notes/about
1 Aaron Carter 4,200,862 Aaron Carter started singing as a child, at 10 he was already releasing an album thanks to his older brother (#10) being a Backstreet Boy. Carter would serve as an opening act for both them and Britney Spears. In the last few years he showed the same derailed path of many former child stars (the "Personal Life" section here includes subsections on Legal issues, Health and Controversies, along with the fact he opened an OnlyFans account to sell naked pictures!). He died at 34, being found lifeless in a bathtub. He left behind one son and six studio albums. (The last one was originally scheduled for release on his birthday in December. At his death, the producers suddenly released it – supposedly as an homage – without first asking for permission of Carter's management first).
2 ICC Men's T20 World Cup 3,087,853 This week saw the semi-finals (although not technically the final, which I am watching, and which I am holding back on commenting on until next week). The first saw Pakistan comfortably defeat New Zealand, and the second saw England breeze past India. This set up a final between England and Pakistan, inspiring a lot of comparisons to the 1992 Cricket World Cup, mostly from hopeful Pakistan fans. My team was knocked out, but the Black Ferns did win, so I can't complain too much.
3 Black Panther: Wakanda Forever 2,168,881 The last Marvel Cinematic Universe movie of the year and Phase 4 – only a Disney+ special left! – returns to the Afrofuturism of Wakanda as it is attacked by mermen led by Namor. The unexpected death of Chadwick Boseman looms hard over Wakanda Forever. The script was changed to try to compensate for the lack of main character T'Challa by giving plots to just about everyone, leading to a bloated running time. Still, the emotional tributes and the expected action, cool visuals, and jokes earned a warm response from reviewers and audiences. It should be making lots of money at the box office.
4 John Fetterman 1,770,289 The lieutenant governor of Pennsylvania briefly became Democrats' favorite politician after winning Pennsylvania's open Senate seat. Fetterman ran a campaign perhaps defined most by his prolific use of memes, with policies such as universal healthcare, the decriminalisation of cannabis, and raising the minimum wage. He almost didn't survive the campaign after suffering a stroke in May, which significantly impaired his speech processing,. This led to an ugly debate by Republicans over his fitness for office.
5 2022 United States elections 1,638,777 The fancy name for the midterms, which people presumably went to as a summary page for all the elections taking place.
6 Ron DeSantis 1,452,290 Florida governor won reelection by a significant margin, and has Donald Trump worried that the Republican Party might prefer DeSantis run for president in 2024.
7 2022 ICC Men's T20 World Cup 1,213,300 I'm not exactly sure why this article has been getting much lower viewership than the article about the general event (#2). I would guess it's from people looking up past results, but I'm really not sure.
8 Mohamed Al-Fayed 1,094,593 Season 5 of The Crown is here. Given that it features the divorce of the current Charles III and Lady Di, the person who viewers sought the most was her next in-law, an Egyptian businessman played in the show by Salim Daw.
9 2022 FIFA World Cup 1,089,960 No matter if it's in a questionable location that even forced the usual mid-year scheduling to be delayed to November-December, football's greatest event opens on the 21st, with teams already having issued the uniforms and squads that will appear in Qatar.
10 Nick Carter (singer) 974,178 #1's career was kickstarted by his older brother being one of the Backstreet Boys. Nick released a mourning statement that acknowledged he was heartbroken in spite of a tumultuous relationship with Aaron due to his brother's drug addiction and mental illness. He cried at a London concert that featured a tribute the day after Aaron's death.

Arabian football 'neath Arabian moons (November 13 to 19)

Rank Article Class Views Image Notes/about
1 2022 FIFA World Cup 2,061,296 So... we're really doing this, are we? The biggest football (soccer to a small number of you) tournament – and the biggest tournament in the world – starts on November 20. Normally, the World Cup would be in June-July but as one journalist would put it: "you could fry an egg on [his] head". But the biggest talking point is off the pitch. Not only Qatar's abysmal human rights record, but also the fact they allegedly got the World Cup after bribing officials.
2 Black Panther: Wakanda Forever 2,006,318 Marvel's return to afrofuturism was well-received and made half a billion dollars worldwide. In the meantime, star Letitia Wright is trying to recover her reputation.
3 Mohamed Al-Fayed 1,323,833 Of all the characters in season 5 of The Crown, these two seem to have inspired the most interest, perhaps as they are much less well known than the series' other characters. #3 is an Egyptian billionaire (played by Salim Daw), and #4 is his son (played by Khalid Abdalla), who was in a relationship with Diana, Princess of Wales at the time of their deaths in a car crash.
4 Dodi Fayed 1,148,453
5 Sam Bankman-Fried 1,038,577 The cryptocurrency exchange FTX, which had been worth billions of dollars, suddenly collapsed after a news article about some shady stuff they were doing sparked an investor panic and run on the bank. This inspired a lot of views for its founder and now-ex-CEO. The arena of the Miami Heat will be forced to change their name, and the bets are on if something like this will also happen with the Lakers' one.
6 ICC Men's T20 World Cup 973,287 The Barmy Army rejoiced as England won the final, defeating Pakistan fairly comfortably, featuring brilliant bowling from Sam Curran and masterful control from Ben Stokes. Pakistan had hoped for a '92 repeat, but ended up disappointed.
7 Deaths in 2022 939,040 I see it in your eyes, take one look and die
The only thing you see, you know it's gonna be
The Ace of Spades! The Ace of Spades!
8 Christina Applegate 915,477 A former child star with plenty of comedies to her name, including her breakout role in Married... with Children, Applegate is "in" due to both her latest work, as Netflix released the final season of Dead to Me, and a recognition of her career with a star in the Hollywood Walk of Fame. Its unveiling was Applegate's first public appearance since she revealed a multiple sclerosis diagnosis last year.
9 FIFA World Cup 849,195 #1 marks the 23rd edition of football's greatest tournament, the second in Asia after 2002.
10 Elizabeth Holmes 828,044 Ten months after the fraudster who promised a clinical revolution with Theranos was found guilty, she has been sentenced to 11.25 years in prison. And because there always seem to be complications, Holmes is pregnant.

Had we but world enough and time... (November 20 to 26)

Rank Article Class Views Image Notes/about
1 2022 FIFA World Cup 9,405,304 After a delay, to the chagrin of those who dislike football or call it soccer, the most popular sport in the world is dominating. The group stage of the tournament has had it all, lopsided massacres (England 6-2 Iran, France 4-1 Australia, Spain 7-0 Costa Rica - all three defeated teams won round 2, because this game is unpredictable), hilarious upsets (Saudi Arabia and Japan defeating Argentina and Germany!), pretty goals, and unfortunately, more boring 0-0 draws than one would ever want.
2 FIFA World Cup 3,960,859
3 Jason David Frank 2,346,767 Tommy Oliver, one of the longest-running members of the Power Rangers, was found dead at just 49. It also marks the fourth time a Red Ranger, who is usually the leader, ended up in something shocking, as the original one was arrested for fraud, another killed his roommate with a katana and a third was convicted for domestic assault before killing himself.
4 Qatar 2,004,830 Back to the World Cup: the current hosts, filled with oil money but whose national team is clearly not ready for prime time; the next edition, spread all across North America and bloated from 32 to 48 teams; the Portuguese wunderkind who became the first player to score goals in five editions; and the last edition, whose host country is currently banned by FIFA for what they have done to Ukraine.
5 2026 FIFA World Cup 1,920,068
6 Cristiano Ronaldo 1.773.080
7 2018 FIFA World Cup 1.618.719
8 Wednesday (TV series) 1.298.420 Let's celebrate that Jeffrey Dahmer is off this list, and the room for "imbalanced people on Netflix" is filled by a fictional and comedic case, Wednesday Addams, played by Jenna Ortega. Christina Ricci, who revived the role in the 90s, has a cameo.
9 List of FIFA World Cup finals 1,254,460 All bets are on for who will enter the 22nd of those one week before Christmas.
10 Enner Valencia 1,210,321 The leading goalscorer of #1 so far is this Ecuadorian who scored both goals against the hosts (#4) and the one who tied the game with the Netherlands. He also scored thrice in 2014, making an impressive 6 goals in 5 World Cup games!

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.


2022-11-28

"The most beautiful story on the Internet"

Could ads turn Wikipedians into Facebook content moderators?

At least Scrooge paid his workers something.

In a well-timed coincidence for this issue, an Australian Broadcasting Corporation Op-Ed by Nicholas Agar asked "Could ads turn Wikipedians into Facebook content moderators?". Agar, a professor of ethics, notes that with the wrong policies around content monetization, the Wikimedia Foundation could "turn Wikipedia into just another tech business using its vast store of data to pursue profit". He recommends the Foundation "ask for help, not money"... – B, J

Internet search in Russia

The BBC investigated how well internet search engines were working in Russia. Yandex has 65% of the market in Russia, followed by Google with 35%. The BBC used a virtual private network (VPN) to view search results for both search engines on controversial topics to make it appear that the search requests originated in Russia. They also used the VPN to give the origin as the UK for Google requests. All requests were typed in Russian. For example, they searched for Bucha, the Ukrainian town where hundreds of civilians were killed during the current war. Yandex search results predominantly gave links to sites following the Russian government's viewpoint. "Glimpses of independent reporting only occasionally appeared in Yandex search results with links to Wikipedia articles or YouTube." Google searches originating in Russia were a bit better, and Google searches originating in the UK gave a full range of viewpoints, even with the search requests typed in Russian. – S

Maybe not so altruistic after all

Cash donations preferred from now on

Many have told the tale of the dramatic flameout of Sam Bankman-Fried's cryptocurrency exchange FTX and sister company Alameda Research, following a series of boneheaded moves that require a couple of whiteboards to explain in full detail. Suffice it to say that there was a bunch of money, and now there isn't.

In a Washington Post article titled "The do-gooder movement that shielded Sam Bankman-Fried from scrutiny", Nitasha Tiku claims that his lost fortune may have been built – at least in part – on his connections in the effective altruism (EA) community. Bankman-Fried's net worth was estimated at $15.6 billion in early November. The bankruptcy of his cryptocurrency firms, and the devaluation of his own securities, is expected to leave him with a net worth estimated at jack shit, and one million unpaid creditors. Yowza! Tiku went further to say that there was an "EA group devoted to writing Wikipedia articles about EA"; it's unclear whether this refers to off-wiki coordination, or merely to the existence of a legitimate EA WikiProject on the English Wikipedia. – S, B, JPxG

Toaster hoax

The BBC has published an in-depth article and radio programme about the Alan MacMasters toaster hoax, featuring interviews with the protagonists as well as Heather Ford (see Book review in this issue). "How did this hoaxer get away with it for so long? And how did an eagle-eyed 15-year-old eventually manage to expose his deception?" (See also prior Signpost coverage in August's In the media, titled "Alan MacMasters did not invent the electric toaster".) – B

In brief

Almost as much fun as building an encyclopedia together
Sportsball: Some things never change. But some do: the fans didn't vandalize encyclopedias so much a century ago.
A study summarized by Mongabay suggests people notice seasonal changes in nature, like bird migration, and turn to Wikipedia to understand.



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next week's edition in the Newsroom or leave a tip on the suggestions page.



2022-11-28

Galactic dreams, encyclopedic reality

"AI" is a silly buzzword that I try to avoid whenever possible. First of all, it is poorly defined, and second of all, the definition is constantly changing for advertising and political reasons. If you want an example of this, look at this image, which illustrates our own article on "AI": it was generated using a single line of code in Mathematica. Simply put, the "AI effect" is that "AI" is always defined as "using computers to do things computers aren't currently good at", and once they're able to do it, people stop calling it "AI". If we just say the actual thing that most "AI" is – currently, neural networks for the most part – we will find the issue easier to approach. In fact, we have already approached it: the Objective Revision Evaluation Service has been running fine for several years.

With that said, here is some silly stuff that happened with a generative NLP model:

Meta, formerly Facebook, released their "Galactica" project this month, a big model accompanied by a long paper. Said paper boasted some impressive accomplishments, with benchmark performance surpassing current SoTA models like GPT-3, PaLM and Chinchilla – Jesus, those links aren't even blue yet, this field moves fast – on a variety of interesting tasks like equation solving, chemical modeling and general scientific knowledge. This is all very good and very cool. Why is there a bunch of drama over it? Probably some explanation of how it works is appropriate.

While we have made ample use of large language models in the Signpost, including two long articles in this August's issue which turned out pretty darn well, there is a certain art to using them to do actual writing: they are not mysterious pixie dust that magically understands your intentions and synthesizes information from nowhere. For the most part, all they do is predict the next token (i.e. a letter or a word) in a sequence – really, that's it – after having been exposed to vast amounts of text to get an idea of which tokens are likely to come after which other tokens. If you want to get an idea of how this works on a more basic level, I wrote a gigantic technical wall of text at GPT-2. Anyway, the fact that it can form coherent sentences, paragraphs, poems, arguments, and treatises is purely a side effect of text completion (which has some rather interesting implications for human brain architecture, but that is beside the point right now). The important thing to know is that they just figure out what the next thing is going to be. If you type in "The reason Richard Nixon decided to invade Canada is because", the LLM will dutifully start explaining the implications of Canada being invaded by the USA in 1971. it's not going to go look up a bunch of sources and see whether that's true or not. It will just do what you're asking it to, which is to say some stuff.

This would have been a great thing to explain on the demo page, but for some reason it was decided that the best way to showcase this prowess would be to throw a text box up on the Internet, encouraging users to type in whatever and generate large amounts of text, including scientific papers, essays... and Wikipedia articles.

So we made a request for an article about The Signpost in the three days the demo was up. The writing was quite impressive, and indeed was indistinguishable from a human's output. You could learn a lot from something like this! The problem is that we were learning a bunch of nonsense: for example, we apparently started out as a print publication. Unfortunately, we didn't save the damn thing, because we didn't think they were going to take everything down three days after putting it up. The outlaws at Wikipediocracy did, so you can see an archived copy of their own attempt at a Galactica self-portrait, which is full of howlers (compare to their article over here).


Ars Technica later wrote a scathing review of the demo. They note several issues, and a little digging into their sources found a Twitter user who managed to get Galactica to write papers on the benefits of eating crushed glass, and got multiple papers that resembled the basic appearance of valid sources, while containing claims like "Crushed glass is a source of dietary silicon, which is important for bone and connective tissue health", and a generated review paper described all the studies that show feeding pigs crushed glass is great for improving weight gain and reducing mortality. Of course, if there were health benefits of eating crushed glass, this is probably what papers about it would look like, but as it stands, the utility of such text is dubious. The same goes for articles on the "benefits of antisemitism", which mrgreene1977 wisely did not quote from, but one can imagine what kind of tokens would come after what kind of other tokens.

Will Douglas Heaven's article for MIT Technology Review "Why Meta's latest large language model survived only three days online" leads with the statement, "Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense", and things get worse from there. Apparently, the algorithm was prone to backing up its points (like a wiki article about spacefaring Soviet bears) with fake citations, sometimes from real scientists working in the field in question. Lovely! Well worth reading, with far too many great examples in there to quote, and even more if you follow their suggestion to look at Gary Marcus's blog post on it.

In their defense, the Galacticans did note, at the bottom of a long explanation of how much the website rules:

But then, even when attempting to use it correctly, it had problems. The MIT Technology review report links to an attempt by Michael Black, director at the Max Planck Institute for Intelligent Systems, to get Galactica to write on subjects he knew well, and ended up thinking Galactica was dangerous: "Galactica generates text that's grammatical and feels real. This text will slip into real scientific submissions. It will be realistic but wrong or biased. It will be hard to detect. It will influence how people think." He instead suggests that those who want to do science should "stick with Wikipedia".

Perhaps it would be best to give the last, rather spiteful word to Yann LeCun, Meta's chief AI scientist: "Galactica demo is offline for now. It’s no longer possible to have some fun by casually misusing it. Happy?"

What does it mean for us?

Most of the issues and controversies we run into with ML models follow a familiar pattern: some researcher decides that "Wikipedia" is an interesting application for a new model, and creates some bizarre contraption that serves basically no purpose for editors. Nobody wants more geostubs! But this is not a problem with the underlying technology.

The field of machine learning is growing extremely quickly, both in terms of engineering (the implementation of models) and in terms of science (the development of vastly more powerful models). Anyone who has an opinion about these things is simply going to be wrong about anything a few months from now. They will only grow in importance, and I think that any editor who does not try to read as much about it as possible and keep abreast of developments is doing themselves a disservice. Not wanting to be a man of talk and no action, I wrote GPT-2 (while its successor model is more relevant to current developments, it has identical architecture to the old one, and if you read about GPT-2 you will understand GPT-3).

Moreover, we have already been tackling the issue of neural nets on our own terms: the Objective Revision Evaluation Service has been running fine for several years. It seems to me that, if we were to approach these technologies with open minds, it could be possible to resolve some of our most stubborn problems, and bring ourselves into the future with style and aplomb. I mean, anything is possible. For all we know, the Signpost might start putting out print editions.

J, AC, B, S



2022-11-28

The Six Million FP Man

Sometimes, we all reach milestones in our time at Wikipedia. Sometimes you reach 100 featured articles. Sometimes you get elected to ArbCom. Sometimes you hit 600 featured pictures, which, as far as I can tell, is more than anyone else has ever achieved, about 8.2% of all featured pictures, and the result of fifteen years of work.

And sometimes, no one else cares about this fact.[1] So how does one write an article about oneself while not appearing completely vain and self-promotional? Well, one doesn't, but let's do it anyway because it'll be at least a couple years until the next milestone.

Option one: Select some of your favourites

Why not make a gallery of your favourite restorations, showing off how much work you put into these? For example, you could go to your user page and copy over the conveniently pre-formatted list you made, that shows before and after!

BEFORE AFTER

It's a good start! But maybe some sort of animation too?

Animation of a small section of an image before and after restoration, with a certain amount of degradation from conversion to GIF.

...Perfect!

Option two: How about a history?

You could describe how you got into your field of editing. For example, I got into image restoration through an image that I don't even count as one of my "official" list of featured pictures anymore (I do my official count based on the ones featured on Adam Cuerden, which ignores or gives half-value to anything I didn't work hard enough on, leaves out a lot of my very early works, and definitely ignores anything I just nominated). It's an illustration to the play The Princess by W. S. Gilbert. It's not the biggest restoration, nor the most impressive original, but if you look roughly under the "T" of "THE ILLUSTRATED LONDON NEWS" you'll see a very obvious white line that shouldn't be there. I spent hours fixing that in Microsoft Paint. 2007 was a very different time. I got better from there.

By 2009, I was scanning my own books, and doing rather impressive images from Gustave Doré. Would I do it different now? Well, I'd probably fix up the border a bit, but it's not bad. It's also so large that I couldn't upload the original file, because Commons wasn't configured to allow anything as large as a lossless file of that type has to be:

2010 saw the stitching together of the poster of Utopia Limited we saw earlier. 2012 saw this incredibly difficult Battle of Spottsylvania image, which is also about the time I started to get a bit more confident with colour:

In 2016 I went to Wikimania in Esino Lario, met Rosie Stephenson-Goodknight, and got introduced to Women in Red. This was the point I realised that there was rather a gender bias in my contributions, and I began work to improve things. It wasn't that I hadn't done images of women before, but they were a sometimes food, and images of women should be more of a main course. Here's a selection of my favourite images of women I brought to featured pictures after joining Women in Red, in no particular order because Wikipedia galleries work best if you space out landscape images with as many portrait orientation ones as possible:

I was originally planning for Ulmar to be my 600th featured picture. However, the vagaries of "Does Featured Picture Candidates have enough participation for things to pass?" said no, which leads us into our next technique of shameless self-promotion dealing with the issue at hand.

Option three: Talk about the thing that pushed you over the top

One could discuss the thing that pushed you over the top, and how it relates to your history in Wikipedia. While I don't talk about it much, I have eight featured articles, my first, from October 2006, was W. S. Gilbert, and that really got me into Wikipedia as a whole.[2]

So, when choosing something significant to my Wikipedia career....

I had been looking for a high-resolution picture of him for, well, over a decade, probably. I stumbled upon the Digital Public Library of America, decided to give it a go, and found this, of all places, in the University of Minnesota library collections. But then, I suppose it's always going to be somewhere a little unexpected if you checked everywhere you expected. I think this is one of my featured pictures where zooming in is necessary to really tell the work done, but having an image of him that can be safely zoomed in to about a foot wide or so is probably going to be very helpful to a lot of Gilbert and Sullivan societies out there.

Oh, and to answer the obvious question, Arthur Sullivan is, if anything, harder to find an image of than Gilbert. I mean, I did, he's Featured Picture number 601, but it wasn't easy to find.

Was kind of odd, though: I found him in a collection I thought I knew very well already. Which just goes to show you, I suppose. Anyway, he will hopefully be joining many more in the next months and years. See you for Number 700!

Note

  1. ^ Editor's note: 🤔
  2. ^ There's a bit of a mess of old account names due to anonymity vs. real names, for a while, shifting over to anonymity after some harassment on here. This was before most of the current policies around that kind of thing were finalised.



2022-11-28

Privacy on Wikipedia in the cyberpunk future

Ladsgroup has actively edited Wikipedia since 2006. On the Persian Wikipedia, he's a bureaucrat, oversighter and check user. He currently works as a Staff Database Architect for the WMF. As a volunteer he helps build tools for CheckUsers. This article was written in his role as a volunteer and any opinions expressed do not necessarily reflect the opinions of The Signpost, the WMF, or of other Wikipedians.

When you edit Wikipedia, it will be public. We all know that. But do you know what it actually entails?


Can others tell if I have multiple accounts (such as sockpuppets)?

Some trusted users called Checkusers are able to see your IP address and user agent. Meaning they will know where you live, maybe where you are studying or where you work. They don't disclose such information and it's subject to a really strong policy. However, that's not the only way you can be identified.

The way you use language is unique to you; it's like a fingerprint. There are bodies of research on that. With simple natural language processing tools, you can extract discussions from Wikipedia and link accounts that have similar linguistic fingerprints.

2
socks
3
not socks
Here's an example of two socks in Persian Wikipedia and two users that are not, analysed using a simple NLP system.

What does this mean? It means people will be able to find, guess or confirm their suspicions on other accounts you have. They will be able to link between multiple accounts without needing access to private data that could reveal where you live or work.

Who can analyze my edits?

Wikimedia projects are public: the license means that all information hosted on them can be reused for any purposes whatsoever, and the privacy policy allows for analysis of edits or other information publicly shared for any reason.

That means anyone with resources or knowledge can analyze data trends in your edit history, such as when you edit, what words you use, what articles you have edited. As technology has advanced, tools for analyzing trends in user data have as well, and include things as basic as edit counters, and as complex as anti-abuse machine learning systems, such as ORES and some anti-vandal bots. Academics have begun utilizing public data to develop models for combatting abuse on Wikipedia using machine learning and artificial intelligence systems, and volunteer developers have created systems that utilize natural language processing in order to help identify malicious actors.

As with anything, these technologies can be abused. That's one of the risks of an open project: an oppressive government or a big company can invest in it and download Wikimedia dumps. They can even go further and cross-check it with social media posts. While not likely in most cases, in areas of the world where free speech is limited, one should be conscious of what information you share on Wikipedia and other Wikimedia projects.

Beside external entities, volunteers have been building such tools to help Checkusers do their job better, with the potential to limit access to private data. The tool we showed graphs from here is being used in several wikis already but is only made available to Checkusers of that wiki by the developer. The tool doesn't give just a number, it builds plots and graphs to make decision-making easier.

Can we ban using AI tools?

Legally, there's nothing we can do to stop external entities from using this data – it's engraved in our license and privacy policy[1] that it's free to use for whatever purpose people see fit.

Because of this, restrictions on the use of natural language processing or other automated or AI abuse detection systems that do not directly edit Wikimedia projects are not possible. Communities could amend their local policies to prohibit blocks based on such technologies or to prohibit consideration of such analysis when deciding whether or not there is cause to use the CheckUser tool. Local projects cannot, however, prevent use of natural language processing or other tools completely because of the nature of the license and the current privacy policy.

Notes

  1. ^ From the Privacy policy: "You should be aware that specific data made public by you or aggregated data that is made public by us can be used by anyone for analysis and to infer further information, such as which country a user is from, political affiliation and gender."



2022-11-28

English Wikipedia editors: "We don't need no stinking banners"

Auditing the fundraising banners

Last month, we reported on discontent with fundraising on Wikipedia. It all came to a head this month, as a widely-participated "Request for Comment" survey rejected the current plans for the fundraising campaign. Luckily for us, given this all happened three days before publication, the closing admin, Joe Roe, provided a thoughtful, nuanced summary of the dispute and decision:


Maryana Iskander, Chief Executive officer of the Wikimedia Foundation, gave a detailed response to this, quoted below:

I'm sure we'll have an update of some sort next month as well. It's a bit inevitable once the fundraising campaign starts. Hopefully, though, it'll be entirely positive. – AC

WMF releases Fundraising Report and audited financial statements for 2021–2022 year

In related news, the Wikimedia Foundation this month published its –

Note that the Wikimedia Foundation's financial year runs from July 1 to June 30.

Fundraising Report

In 2021–2022, the Wikimedia Foundation took $165,232,309 USD from over 13 million individual donations, an increase of more than $10 million over the year prior. $58 million, or 35.1% of the donations total, was brought in by banner campaigns on Wikipedia. The breakdown was as follows:

2021–2022: Donations breakdown.
2021–2022: Donations breakdown.

For comparison, the donations total in 2020–2021 was $154,763,121 raised from over 7.7 million donors (a different way of counting was used this year), with banner campaigns bringing in $57.3 million, or 37% of the total.

As in 2020–2021, the Wikimedia Foundation ran a fundraising campaign in India this financial year (see previous Signpost coverage; note that while the 2021 Indian fundraising campaign was cancelled, the 2020 campaign was not held in the spring but in August, thus falling into the 2020–2021 financial year).

Consolidated Financial Statements

The Financial Statements reported an unusual situation: for the first time in its history, the Wikimedia Foundation reported a negative investment income: –$12 million. Investment income had been positive at +$4.4 million in 2020–2021 and +$5.5 million in 2019–2020. At the time of writing, the Wikimedia Foundation had not responded to questions about the precise circumstances responsible for the negative result.

  • Total support and revenue was $155 million (a decrease by $8 million compared to the year prior, with the negative investment result cancelling out the increase in donations).
  • Total expenses were $146 million (an increase of $34 million, or 30.5%, over the year prior). Some key expenditure items:
    • Salaries and wages rose to $88 million (an increase of $20 million, or 30%, over the year prior).
    • Professional service expenses: $17 million.
    • Awards and grants: $15 million.
    • Other operating expenses: $12 million.
    • Internet hosting: $2.7 million.
  • Net assets at end of year increased by $8 million to $239 million (net assets increased by $51 million in the year prior). Interestingly, the third-quarter (January–March 2022) Tuning Session presentation published by Finance & Administration in May 2022 still forecast a net asset increase of $25.9 million for the year.

For the 2022–2023 financial year, the Annual Plan envisages an increase in both income and expenditure to $175 million, representing a planned increase in revenue by $20 million and a planned increase in expenses by $29 million (20%, more than twice the rate of inflation) compared to the year prior (total expenses in 2021–2022 were $146 million).

According to the minutes of the June 2022 Wikimedia Foundation board meeting, WMF board members and executives looking ahead at the 2022–2023 financial year now underway anticipated "moderate growth in terms of staffing. Next year, the fundraising team will be increasing targets in each of their major streams, with a particular focus in Major Gifts." – AK

Brief notes

The results of the Wikimedia Summit 2022 participant feedback survey are available on Commons.

Wikipedia:Wikipedia Signpost/2022-11-28/Serendipity Wikipedia:Wikipedia Signpost/2022-11-28/Op-ed Wikipedia:Wikipedia Signpost/2022-11-28/In focus Wikipedia:Wikipedia Signpost/2022-11-28/Arbitration report Wikipedia:Wikipedia Signpost/2022-11-28/Humour

If articles have been updated, you may need to refresh the single-page edition.

















Wikipedia:Wikipedia Signpost/Single/2022-11-28