The Signpost

File:Edmund J Sullivan Illustrations to The Rubaiyat of Omar Khayyam First Version Quatrain-051 (cropped).jpg
Edmund J. Sullivan
PD
300
Cobwebs

Counting to a billion — manuscripts don't burn

We covered Wikipedia's billionth edit at News and notes in 2021 and somehow this contribution from other editors got lost in the shuffle. Here for your enjoyment is what was developed back then.
The original lead-in and story follow.

TKTK
Is it possible for a Wikipedia article to predate Wikipedia itself? Read on and find out. A clue is in this image.
The Signpost's editors calculate that English Wikipedia's billionth edit will occur on January 15 [2021], plus or minus five days. But what does this mean exactly?

A robust discussion happened in the newsroom that we wanted to share with our readers. It turns out that the history of the Wikimedia software, lost and partially restored databases from the early days, the Nupedia fork, and other Wiki-arcana – plus the international date line! – intrude when one wants to determine exactly which edit was the billionth.

Iridescent

The pedant in me is obliged to spoil the party by pointing out that Wikipedia's billionth edit has almost certainly already been and gone unnoticed and unremarked. Special:Diff/1000000000 is just going to be "the billionth edit made on the current software", since when we switched from UseModWiki to MediaWiki the counts were reset (Special:Diff/1 is completely unremarkable). Because the UseModWiki logs have been lost, we have no idea how many edits were made using it.

Smallbones

Thanks for the info. we'll have to say something like "the billionth edit since January 25, 2002". One question, the URL that results from clicking Special:Diff/1 is https://en.wikipedia.org/w/index.php?diff=1&oldid=294750 . Does the "294750" mean anything? Could about 300,000 edits have been made in the very first year?

WereSpielChequers

I thought that a lot of the early edits were recovered from an archive and reloaded. But presumably some deleted stuff was lost.

Iridescent

I doubt there are any exact answers since we know some of the early edits have been completely lost; one also needs to account that although we take 15 January 2001 as a "start date" that's just the day of the Wikipedia/Nupedia fork became separate entities. The articles on Nupedia were just ported across to Wikipedia (compare the last edit to "The Donegal Fiddle Tradition" on Nupedia with the first version of [[Donegal fiddle tradition]] on Wikipedia) but the Nupedia edit histories weren't preserved in the transfer. IIRC Graham87 has done some work to try to reconstruct the early days, but I would think too much history has been lost to ever be able to be more specific than "lots of edits". (Bear in mind also that prior to the Wikidata migration, things like changes to the interwiki language links also show up in the history as "edits", as do edits made to pages in other languages that were then transwikied for translation—for instance you (Smallbones) have officially made quite a few edits to German Wikipedia all of which go towards de-wiki's edit count, even though it doesn't appear you've ever touched that project in reality.)

WereSpielChequers

I know that there are edits on the German, Greek and perhaps other Wikipedias that are transwikied copies of edits from EN Wikipedia. Most of my edits on DE were made here and subsequently copied over when someone translated an article into German and copied all its history. But that only effects our billion edits calculation if people have imported article histories to here when they have translated articles into English, and as far as I'm aware we haven't done that (arguably for attribution we should).

Mercator style map of the world broken into time zones, with Chamorro Time Zone highlighted
Due to the International Date Line, Guam (timezone highlighted) is often a calendar day ahead of the rest of the United States.
Bri

I think all the wiki-spelunking (or is it archaeology?) above is a useful addition to a piece we write about the so-called billionth edit. Pedantic or not, newsworthy and will prove interesting to many.

Furthering the pedantry, History of Wikipedia says "The first portable MediaWiki software went live on 25 January" 2002. but Special:Diff/1 is 14:25 26 January 2002 UTC (which was also 26 January in the U.S. [except Guam where it was 27 January, take that pedants]). I wonder why?

Graham87

I think we're only out by a few hundred thousand edits, at the very most, perhaps closer to 150,000 or 200,000. We have relatively exact figures for the number of edits from 15 January to 17 August 2001 and from 20 November to 20 December, which total 88,837 edits. The 17 August 2001 database dump, which contains every edit from Wikipedia's founding until that date, contains 57,982 edits per a line count of one of the log files, rc.log (each edit is stored on its own line). The Nostalgia Wikipedia contains a snapshot of all edits up to 20 December 2001, and as far as I understand it contains a complete archive of edits between 20 November and 20 December of that year, because edits from that time weren't automatically removed. It contains 95330 edits, but that includes edits made by the conversion script in 2005 among other things. I did a quick and dirty database query to list all timestamps in the database, earliest first, and from there I found out that the Nostalgia Wikipedia contains 88,040 edits from 2001, 30,855 of which were made in the llast month of the database and therefore form a complete archive. (We can't do this on the current Wikipedia database since, as explained at Wikipedia:Usemod article histories, the final edit made to each page wasn't imported when the UseModWiki edits were added in 2002).

Another thing that would whack out the count a bit is that edits deleted before Wikipedia was upgraded to MediaWiki 1.5 lost their revision ID numbers and got new ones when they were undeleted. According to an old Bugzilla thread, there were 511,728 of those. Some of those got undeleted/re-imported, most notably at Wikipedia:Historical archive/Sandbox, which has over 20,000 revisions. There are also quite a few other revisions that have also been imported from the Nostalgia Wikipedia (I'd say 50,000 or so).

Smallbones

I agree with Bri on this. The story that is emerging here will be worth much more to many of our readers than just the date of the "billionth". That billionth-date article would after all just be a number and a date, a round number marking point with maybe a bit of nostalgia, but not much more. The emerging story however has got a lot more – an origin story, some mystery, there's even some controversy regarding the last Nupedia edit – see Talk:Donegal fiddle tradition. That article became a featured article – with very few edits after Larry Sanger wrote it on Nupedia – and it's still a good article. But it seems that Sanger was complaining that it was a copyright violation. So who wants to write it up? for December 20ish if we really want to emphasize the history, or late January if we just want to accompany the billionth-date article. Either way it would be an important part of our 20th birthday celebration.

HaeB

See also Wikipedia:Wikipedia Signpost/2010-04-19/News and notes#Briefly: On Friday, 16 April 2010 the Wikimedia projects passed a total of 1 billion edits, as measured by the edit counter.

Graham87

Re the discrepancy pointed out by Bri between the edit with ID 1 and the date given for the software upgrade, I don't exactly know the answer, but it says it was upgraded on the 25th of January at Wikipedia:Magnus Manske Day, the mailing list thread linked from there, and this mailing list thread. I seem to recall that there were some weird issues in the early days with time zones, but I thought they were about some timestamps being in Pacific Standard Time instead of UTC, which wouldn't make sense at all here. Also see more info about early timestamp bugs at User:Conversion script.

Smallbones

OK, the one billionth (since whenever) is for the English-language Wikipedia Wikipedia:Time Between Edits. The 4.5 billion from edit counter is for all WMF projects. This is getting complicated, but that should add to the mystery. It is over my head, though...

Afterword

by Smallbones

Three years ago, as the then-editor-in-chief, I declined the above article (which was essentially copied from an internal Signpost talk page). I really liked the original submission, and am glad @Bri: has dug it up again. It addressed topics I had long wondered about. It mainly just got lost in the run-up to a very important issue of The Signpost that marked the 20th anniversary of Wikipedia. But I did have one problem: it was a long swirling article that I thought ultimately went nowhere. What was the meaning of the article? What conclusions might the reader draw from it?

After three years, I think I've figured out the meaning. It's about the record of all edits on Wikipedia, and Mikhail Bulgakov's dictum "Manuscripts don't burn". Even the records of a Signpost talk page about a declined submission don't disappear or get burned. Even when the submission is about disappearing edits!

An engraved illustration of a woman lying over a book
The Moving Finger writes; and, having writ,
Moves on: nor all thy Piety nor Wit
Shall lure it back to cancel half a Line,
Nor all thy Tears wash out a Word of it
- Quatrain 51 of Edward FitzGerald's translation of the Rubáiyát of Omar Khayyám

So what did Bugalkov mean by "manuscripts don't burn"? He lived in Moscow, and was Stalin's favorite writer during the 1930s — a difficult place and time for anybody who expressed original thoughts. It was especially difficult for Bugalkov, as other writers attempted to censor him or cut him down to size. He meant that writing is engraved in the mind of the writer, and perhaps in the minds of anybody who has read it or helped edit it, even when they'd prefer to forget it. If you've ever had a draft submission disappear, you'll find that it's remarkably true: the neurons and synapses have a nearly permanent record that comes to life once you start rewriting it.

I first learned of the idea that creative works won't disappear from a quatrain of the Rubáiyát of Omar Khayyám, which was inspired by the story of the Belshazzar's feast in the Book of Daniel. Bugalkov just added the aide-memoire "manuscripts don't burn."

The opposite of a non-flammable manuscript is a memory hole, as George Orwell first described in in his novel Nineteen Eighty-Four. Orwell's memory holes were appurtenances, bored in to the wall of the Ministry of Truth, which sucked up any questionable piece of paper and sent it to a furnace. Wikipedia does have sort of memory hole, that sucked up maybe 100,000 edits from 2001 or 2002. That's out of the current total of about 1,226,000,000 (or roughly 0.01%). Almost all the rest are non-flammable manuscripts that can be looked up on Wikipedia. That's what the above submission means. And that's worth remembering.


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

(Almost) All the edits are there. However many are only available to gatekeepers. And many more are not easily findable, or useable. We really need a finding aid. All the best: Rich Farmbrough 14:48, 4 July 2024 (UTC).[reply]

@Rich Farmbrough: - yes, I left out a few thing about non-flammable manuscripts on Wikipedia. Many are not easily accessible to non-admins, some are not easily accessible to admins, and I think a very few are burned intentionally, perhaps for legal reasons. But there are at least a couple of exceptions to these exceptions, e.g. some controversial things on Wiki are intentionally saved by Wikipedians before deletion. So, deleted articles are hard to access for non-admins. But I've been told that I can ask an admin to send me a copy of a deleted article if I have a "good enough reason" that's not just a "fishing expedition." But there are off-wiki records as well, e.g. archive.org. If you have an article url or name, you can see what they have on the Wayback Machine. There are data dumps from long ago. There's an exception for "bonafide researchers" (I should really check out this exception).
Probably the biggest category of missing stuff is deleted articles, but AFD records are easily available so at least you can find out why they were AFDed, when, and how many times, quite often with a description of the sources.
But nobody said that "remembering everything" is easy.
I think a lot of people would just be happy with a good indexing system, even if it is incomplete. Basic Wikipedia search does some of that, and indexing is the type of thing that will get better in the future, even for old records.
Smallbones(smalltalk) 17:21, 4 July 2024 (UTC)[reply]
@Smallbones: I think your first paragraph refers to the "researcher" user access level? ☆ Bri (talk) 17:24, 5 July 2024 (UTC)[reply]
@Bri: An access level so infrequently used, there are between 3 and 0 people who currently hold it, and less than 10 who have ever held it!
(WP:RESEARCHER and Special:ListUsers/researcher both agree that there appear to be 0 users with that access level, currently, though it's possible I'm just not allowed to see them. Whereas meta:Research:Special API permissions/Log lists 9 people who've ever been granted that right. Six were 2011 summer-program participants with short-term access long since revoked. The other three reportedly have indefinite rights since 2010, 2015-06, and 2015-09. But they may be performing their research somewhere other than enWiki. At least one of them, FaFlo, mentions working with Wikimedia Germany in the RENDER research project. #WhateverThatIs) FeRDNYC (talk) 13:38, 6 July 2024 (UTC)[reply]

But nobody said that "remembering everything" is easy.
— Smallbones

Heck, you won't even catch me saying it's desirable. (Even Wikipedia embraces the right to be forgotten... to a limited extent.) FeRDNYC (talk) 01:13, 7 July 2024 (UTC)[reply]
  • I'm not saying that this should be the case, only that it often is. It's a bit of a warning - if somebody writes something on the internet, it can show up anytime - probably at the worst possible time. So don't write something on the internet if you'd be embarrassed seeing it in a prominent place. I guess I've been thinking about this lately because in the last 12 months I ran into a huge database (from a tipper who may not have known what it was). It wasn't all usable, but much of it could be verified. My immediate reaction was "why did anybody put this on the internet?" I can imagine some of their reasons, but I can imagine many more why they would never want anybody to see it. So it is worthwhile letting Wikipedia editors know that Wikipedia tries to keep everything they write here, and it pretty much works. Smallbones(smalltalk) 00:40, 8 July 2024 (UTC)[reply]
  • Also, the developers and sysadmins have always made clear that they do not guarantee the long-term availability of deleted revisions. In practice, it looks like the oldest visible/restorable deleted revision is from the end of 2004 ([1]) (I checked, and that revision is still visible and could presumably be undeleted), so many of those from the early days are lost as well, and it's possible more could be. And oversight really did used to be a "hard delete"; it was only later on that it was made into the "superdelete" that we know it as today. Seraphimblade Talk to me 20:33, 8 July 2024 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2024-07-04/Cobwebs