The Wikipedia Signpost
The Wikipedia Signpost
Single-Page View Archives



Volume 4, Issue 33 11 August 2008 About the Signpost

(← Prev) 2008 archives (Next →)

Study: Wikipedia's growth may indicate unlimited potential Board of Trustees fills Nominating Committee for new members
Greenspun illustration project moves to first phase WikiWorld: "George Stroumboulopoulos"
News and notes: Wikipedian dies Dispatches: Reviewing free images
Features and admins Bugs, Repairs, and Internal Operational News
The Report on Lengthy Litigation

Home  |  Archives  |  Newsroom  |  Tip Line Shortcut : WP:POST/A

SPV

Study: Wikipedia's growth may indicate unlimited potential

According to a new study, Wikipedia has a pattern of growth that may indicate unlimited potential.

In a study published in the August issue of Communications of the ACM entitled "The Collaborative Organization of Knowledge" (abstract; working draft), computer scientists Diomidis Spinellis and Panagiotis Louridas analyze the relationship between references to non-existent articles (redlinks) and the creation of new articles.

The study, based on the February 2006 dump of English Wikipedia, finds that the link rate from complete (i.e., non-stub) articles to incomplete (non-existent or stub) articles remained nearly constant between 2003 and 2006 (about 1.8 incomplete articles linked from every complete article). A long-term trend in either direction, according to the authors, would indicate an unsustainable growth pattern. If the average number of redlinks per article is increasing, it means that Wikipedia is becoming diffuse and will become less useful as more and more of the terms in the average article are not covered. If the average number is decreasing, it suggests that Wikipedia's growth will slow or stop as the number of links to uncreated articles approaches zero. The stable redlink ratio suggests that Wikipedia is a scale-free network, in principle capable of unlimited growth.

The study also notes that most new articles were created within the first month that they were referenced in another article. Furthermore, only 3% of new articles were created by the same user who created the first link to that article (whether as a redlink or a bluelink). This implies that the connection between redlinks and new articles is a collaborative one, and that adding redlinks actually spurs others to create new articles.

The statistics were re-run with a more recent dump (from January 3, 2008), with results that "don't appear to differ from the ones based on the study's 2006 data set", according to Spinellis (User:Diomidis Spinellis). Wikipedia's growth rate peaked in late 2006, and it declined slightly in 2007 and in the first 7 months of 2008. According to the updated statistics, the incomplete:complete ratio has been dropping gradually since early 2006, and was less than 1.4 in January 2008. However, Spinellis argues that "As long as the ratio is above 1.0, growth as we know it should continue."

Earlier studies

A 2006 study, "Preferential attachment in the growth of social networks: The case of Wikipedia", showed that Wikipedia's early growth (through June 2004) demonstrated preferential attachment: highly-linked articles were more likely to be the target of new links. According to the authors, this indicates one of two things: either Wikipedia editors failed to take full advantage of the wiki model to create a more balanced network, or preferential attachment to highly-linked articles results from "the intrinsic organization of the underlying knowledge". The former case would indicate that Wikipedia's structure cannot overcome the "bounded rationality" of its contributors, each of whom may have limited knowledge beyond his/her area of activity.

The new study is consistent with a "bounded rationality" model, since the creation of new articles depends significantly on the topics editors choose to link to from existing articles. However, it also suggests a possible mechanism for achieving more balanced coverage, as less-covered areas will contain more redlinks, leading to more coverage and even more redlinks.

In contrast to many of the academic studies of Wikipedia, long-term observers within the community have tended to analyze Wikipedia's growth trends in terms of changing content conventions and social dynamics. For example, in a series of blog posts from 2007 ("Wikipedia Plateau?", "Unwanted: New articles in Wikipedia", and "Two Million English Wikipedia articles! Celebrate?") Andrew Lih (User:Fuzheado) examined some of the community factors limiting new article creation. An analysis of article creation and deletion logs by User:Dragons flight from late 2007 showed that for every three articles created, one article was deleted.

A more complete picture of how the size and activity level of the English Wikipedia community has evolved in recent months and years should be available once Erik Zachte updates his statistics website with a recent dump. Zachte was recently hired as a Data Analyst by the Wikimedia Foundation.


SPV

Board of Trustees fills Nominating Committee for new members

On 8 August, Wikimedia Board of Trustees Chair Michael Snow announced the filling of a Nominating Committee, responsible for nominating candidates to fill the four "specific expert" seats on the Board, two of which are currently occupied (and whose terms expire on December 31, 2008). The Committee's six members are:

Snow describes how the Committee will handle the search over the next few months:

Initially, once the committee has been briefed, it will work on identifying the types of expertise that are most needed and developing criteria to evaluate candidates. At that point, we will more actively solicit suggestions for potential board members, including recommendations from the community. But if you have input that would be useful as we look to strengthen the board's skills, feel free to contact us at any stage of the process.

The Nominating Committee will make suggestions for the four positions. One of those "specific expert" seats has been held by Jan-Bart de Vreede since December 2006, while another seat was filled in April, with the appointment of Stu West, the Board's treasurer and a former senior executive at Yahoo!, TiVo and JPMorgan Chase. The other two seats were not filled upon the Board's April restructuring, with the plan that they would be filled either in late 2008 or early 2009.

The positions are open to both community members and outsiders; however, the stated goal of the Board is to fill the seats with candidates who have an expertise in an area useful for supervision of the Foundation. The positions of de Vreede and West are not guaranteed, as all four seats will open up at the end of the year, and the Board could choose not to bring back one or both of the incumbent Trustees.

According to the question-and-answer page regarding the Board's restructuring in April, the Committee will make their suggestions by October 15, though it's unclear whether that guideline still applies. The Board will then have the option to appoint anyone listed on the Committee's list of candidates.

The one-year terms will begin on January 1, 2009, and end on December 31. In the unlikely event that the Board should make an appointment prior to the end of the year, the appointment would expire on December 31, 2008 along with all other "specific expert" seats, but could be renewed for another year.


SPV

Greenspun illustration project moves to first phase

The Philip Greenspun illustration project, first authorized in September 2007 and announced in November, has reached its first phase of collecting illustrations in exchange for payments to contributors. In this phase, with 50 image requests, nearly US$2,000 may be distributed, mainly in increments of $40.

The first round was announced by Brianna Laugher, who has been in charge of the project. Of the 50 images, 48 will offer $40 payments, while two (grape and apochromatic lens) will pay out $15.

The illustrations are tracked using a bug tracker on the toolserver. The tracker is otherwise used mainly for bugs with toolserver-based tools, but a queue was added specifically for illustrations in order to track each request separately and efficiently. For each image, a volunteer comes forward and is assigned to that image; upon uploading the image, Laugher and other volunteers ensure that the image illustrates the concept correctly. Once all requirements are satisfied, payment will be made to the volunteer.

For Round 1, the list of requests includes such wide-ranging illustrations as Abney effect, Jellyfish and a request for an animation of Vomiting.

Laugher apologized for the nine-month wait between the announcement and the start of the project, noting that she had made a few mistakes in underestimating the volume and complexity of requests, and did not delegate tasks enough. WAS 4.250 replied, "In short, you are human."

Round 1 runs through October 10. As of press time, 19 of the 50 images had been assigned, and of those 19, three were "in review".


SPV

WikiWorld: "George Stroumboulopoulos"

This comic originally appeared on October 22, 2007.

This week's WikiWorld comic uses text from "George Stroumboulopoulos" and "The Hour". The comic is released under the Creative Commons Attribution ShareAlike 2.5 license for use on Wikipedia and elsewhere.


SPV

News and notes

Wikipedian Jeffpw passes away

Wikipedian Jeffpw passed away on 7 August. Jeff, a resident of Amsterdam, made over 9,000 edits since joining Wikipedia in May 2005. He was active in the LGBT studies WikiProject, and was largely responsible for bringing James Robert Baker to featured article status, and bringing Daniel Rodriguez to good article status.

Over the past few weeks, Jeff had been coping with the death of his husband, Isaac, something that friends within the Wikipedia community had been helping him with. Jeff's death was brought to the attention of the community by his sister, Debbie, who left a note on Jeff's talk page, thanking his friends:

Hello, I'm Debbie. Many of you have given your support and condolences to Jeffpw over the last many weeks since he lost Isaac. I'm Jeff's sister (and know nothing about Wikipedia, except that my brother enjoyed the site, the friendship and support).

I hope I'm going about this in the right way, if not, I'm so sorry.

My brother died yesterday- I suppose of a broken heart. But he received so much compassion from all of you, and you all made these last days (almost) bearable to him. I am so grateful to you and HE was so grateful. He intended to acknowledge each condolence sent to him individually, but since he can't, I thought I'd let you know.

The world will have a little less color without him. I love him and miss him already.

This statement was confirmed by checkuser evidence.

In Jeff's memory, a user sub-page has been created, containing condolences from the community. Jeff was 46 years old.

2,500,000th article

On Monday, the 2.5 millionth English article was created. No official attempt was made to determine what the milestone article was, although it has been suggested that Joe Connor, one of 35 pre-written stubs added by Wizardman, might be the milestone article.

Tech job openings

Three new job openings at the Wikimedia Foundation were all advertised this week, all within the technology department. The jobs are:

All positions are open until August 27.

Briefly


SPV

Dispatches: Reviewing free images

Wikipedia's best articles are often enhanced by images. Indeed, the featured article criteria ask for "images and other media where appropriate" and that, as for the use of all images in Wikipedia, they should have "acceptable copyright status. Non-free images or media must satisfy the criteria for inclusion of non-free content and be labeled accordingly." Similarly, the good article criteria require that images be "tagged with their copyright status" and that valid fair use rationales be provided for non-free content.

Images on Wikipedia are classified as either "free" or "non-free":

This dispatch discusses free images, and explains how to ascertain whether or not an image is actually free. A future Dispatch will cover the use of non-free images.

Although all Wikipedia content is expected to have acceptable copyright status, featured article candidates receive particular scrutiny for compliance with the image usage policy. Examining image licenses is not always straightforward. Ultimately, it is a matter of confirming that a copyright tag is present and that the information provided is sufficient to corroborate the tag that has been selected.

Copyright is a legal protection granting the creator of an original work – for our purposes here, an image – exclusive rights to that work. These rights prevent others from copying, redistributing or modifying the image without the author's permission. Copyright is generated automatically on the creation of such a work.

Copyright holders may choose to relinquish some or all of their rights, for example, by licensing their image so that others may copy, redistribute, or modify it without seeking permission. Such licenses are typically called "copyleft" licenses – a play on the word "copyright". Copyleft images are still under copyright; their creators have merely waived some, but not all of the protection that copyright affords them.

Commonly used licenses include:

Public domain

Works in the public domain are not owned, controlled or otherwise restricted by any person, entity or law in a given jurisdiction. A public domain image may be freely used, altered and published by the public at large without condition.

Generally, an image enters the public domain when it is no longer eligible for copyright protection, usually a certain number of years after its first publication or after its creator's death. The length of time before copyright protection lapses varies greatly from country to country. Because the Wikimedia Foundation servers are located in Florida, images used on the Wikipedia must be in the public domain in the United States.[1] Non-US images hosted on Wikipedia are not required to be public domain in their country of origin provided that they are public domain in the United States. Images hosted on the Wikimedia Commons, by contrast, must be public domain in both the United States and their country of origin; compliance with Commons policy, however, does not figure in the FA or GA criteria.

Copyright terms in the US vary according to several conditions. The most common encountered on Wikipedia are:

An image may also be voluntarily released to the public domain by its copyright holder or, in certain cases, may not be eligible for protection in the first place.

Reviewing images

Article reviewers generally need to take into account three aspects:

Policy-mandated elements

Wikipedia's image usage policy requires all images to have three pieces of information:

  1. A copyright tag
  2. A verifiable source
  3. An image summary

1. A copyright tag is a template, typically rectangular and appearing towards the bottom of an image page. The tag indicates the image's license or, if public domain, the reason the image is no longer eligible for copyright protection. The {{GFDL}} copyright tag, for example, appears as follows:

2. A verifiable source can be in the form of a simple weblink, citation for the published work from which the image was scanned or the name and method of contact for the author. The format and location of sourcing information on an image description page may vary. Optimally, images will use the {{Information}} template, which provides organized source and summary information. This template is not mandatory, however, and the information may be "hidden" within template boilerplate (example), if present at all.

3. An image summary provides the "necessary details to support the use of the image copyright tag". WP:IUP recommends the following:

Source

After confirming the presence of the three required elements, reviewers should also examine the source provided. Like prose quotations or statistics, images should have verifiable and reliable sourcing. By their very nature, image copyright tags (especially those claiming public domain) are "material challenged or likely to be challenged" and, consequently, subject to Wikipedia's verifiability policy (WP:V) and the necessity of utilizing reliable sources (WP:RS).

Consider, for example, the following copyright tag:

An image employing this copyright tag would be expected to have a reliable source explicitly indicating the author's date of death or dating the image such that no reasonable scenario would contradict the claim (e.g. the author of a painting dated 1740 could not possibly have been dead less than 100 years).

The following are examples of correctly formatted, verifiable and reliable sourcing:

WP:V notes that "the appropriateness of any source always depends on the context". A Geocities site, for example, claiming that an image is public domain will probably not be considered sufficiently reliable to support the claim. Institutional and research sites (e.g. libraries, museums and archival sites such as the Library of Congress) are generally the most reliable.

Copyright law is often nuanced and esoteric; consequently, there are many concepts of which image authors and uploaders may not be aware. "Derivative works" and "freedom of panorama", two such concepts, can be counter-intuitive and, as such, are a common cause of unintentional copyright violations.

Derivative works
This image of the Fountain of the Great Lakes is a derivative work. The copyright status of the fountain/statue and the photograph itself need to be considered.

A derivative work is a copy, translation or alteration of an existing work – for example, a scan of a page in a book or a picture of a stuffed animal. The Wikimedia Commons' derivative works guideline contains an example situation which explains the dilemma such images pose to Wikipedia:

By taking a picture with a copyrighted cartoon character on a t-shirt as its main subject, for example, the photographer creates a new, copyrighted work (the photograph), but the rights of the cartoon character's creator still affect the resulting photograph. Such a photograph could not be published without the consent of both copyright holders: the photographer and the cartoonist.

Wikipedians or external sources may believe in good faith that a scan, photograph, or screenshot that they have made is an entirely original work, thinking that, because they themselves made the scan or took the photograph, the resulting image is "self-made" and, thus, "free". This is not necessarily the case. Reviewers should consider whether the subject of the image is under copyright – a consideration independent of the copyright status of the image itself.[2]

Although not mandatory, derivative images will, ideally, have summaries identifying the copyright status of both the image and its subject. The image to the right, for example, contains a secondary copyright tag for the fountain/statue. In its case, the image as a whole is "free" and acceptable on Wikipedia, as the subject is demonstrably in the public domain. Alternatively, consider an image of a Batman action figure. Although the image itself could have any copyleft license, the image as a whole would still not be acceptable on Wikipedia, as the figure has not been published with a "free" license.

Freedom of panorama

Freedom of panorama is a copyright law provision that allows for photographs of works (e.g. buildings and sculptures) permanently installed in public places to be freely published, even if the works are still under copyright. Although such an image is still a derivative work (i.e. a translation of an existing work), it does not infringe the rights of the work's author in countries with freedom of panorama. In other countries, however, the derivative image requires consent of the subject's author to be freely licensed.

The United States does not have freedom of panorama, although pictures of buildings are exempt.[3] Hence "self-made" images of publicly-situated works in the United States require consent of the subject's author, as described above. This revision of an image depicting Jaume Piensa's Crown Fountain in Chicago, for example, is incorrectly tagged. As a photograph taken in a country without freedom of panorama (the USA), it would require the permission of the fountain's creator for it to be published with a CC or GFDL license.

Examples

Self-made

Unless an image is deliberately employing pointilism, the appearance of dots when the image is magnified may be a cause for concern.

"Self made" images are generally those which are uploaded by their authors (i.e. Wikipedian-created images). In addition to checking for the policy-mandated elements, it is helpful to consider several aspects pertaining to provenance:

Reviewing images requires common sense. Consideration of provenance is an art, not a science, and the above notes should not necessarily be used as a "checklist". Whereas any one of these considerations may be meaningless by itself, a combination of issues may bring the validity of an image into question. A talk page note to the uploader asking for clarification or a Google images search, for example, may be appropriate or necessary to be more confident that image is indeed "self-made".

Good image
In full compliance with Wikipedia image policy and properly licensed, the good people of Rhinebeck, New York are able to enjoy a sunny day.

The "self-made" image pictured to the right (as of this version) is in full compliance with Wikipedia policy and properly licensed.

  1. Is there a copyright tag? Yes, it asserts that Daniel Case has released the image as GFDL version 1.2.
  2. Is there a verifiable source? Yes, it asserts "self-made"; the uploader matches the author and a link to the author/uploader's profile is included.
  3. Is there an image summary (i.e. the "necessary details to support ... the image copyright tag")? Yes, the image has a complete {{information}} template.
  4. Does the provenance check out?
    • The image is high resolution (1,929 x 1,284 pixels – on the image description page, look below the image itself or in the "Dimensions" field of the file history).
    • The image contains camera metadata (on the image description page, under the metadata header).
    • The image does not appear posed, to have been taken in a studio or possess other such "professional" traits which would raise red flags.
    • The image is dated September 29, 2007 (i.e. well after the claimed license – GFDL version 1.2 – came into existence).
Flawed image
The Japanese Tea Garden is far more lush than the image's summary.

The "self-made" image pictured to the right (as of this version) is in not in compliance with Wikipedia image policy.

  1. Is there a copyright tag? Yes, it is using the {{GFDL-self}} copyright tag (note that, unlike the GFDL example above, this "self" variant begins with "I, the creator of this work").
  2. Is there a verifiable source? No, there is no explicit assertion of authorship, and, accordingly, no means of contacting the author.
  3. Is there an image summary (i.e. the "necessary details to support ... the image copyright tag")? No, the image only includes a description of the image's subject.
    • The image summary is essentially non-existent and, consequently, lacks necessary details. The copyright tag implies the uploader is the "I" in the copyright tag, but explicit indication is needed. Compare with the information present in the example above.
  4. Does the provenance check out?
    • The image is a mid-resolution (at 800 x 600 pixels, it is just under 0.5 megapixels). Although this is a higher resolution than most web images, it is lower than expected and is also a common computer screen resolution (i.e. what one might find at a computer wallpaper archive site).
    • The image does not contain camera metadata.

The verifiable source and image summary elements can, in many "self-made" cases, be reasonably treated as one thing. The uploader (i.e. presumed author) would really only need to add a statement to the effect of "Author: J. Ash Bowie" to the summary to resolve the issue.

Already published

Already published images are those which have been obtained from external websites, published works or are otherwise not the authorship of the uploading Wikipedian. Provenance considerations for these images include:

Good image
Hard work and diligence like that exhibited by Mackintosh and Spencer-Smith yields soundly-sourced images.

The image pictured to the right (as of this version) is in full compliance with Wikipedia policy and properly licensed.

  1. Is there a copyright tag? Yes, it uses the {{PD-US}} copyright tag.
  2. Is there a verifiable source? Yes, a full citation of the published source from which the image originated has been provided.
  3. Is there an image summary (i.e. the "necessary details to support ... the image copyright tag")? Yes, the citation contains the publication date (1920), which supports the copyright tag's assertion of first publication before January 1, 1923.
  4. Does the provenance check out?
    • The image has expected technical qualities. The image is black and white and generally appears to be old.
    • The image has reasonable subject matter. Mackintosh and Spencer-Smith were indeed in Antarctica before 1920 (the reported publication date).
Flawed image
Lacking a verifiable source and image summary, Emperor Valerian is humiliated in the ensuing chaos.

The image on the right (as of this version) is in not in compliance with Wikipedia image policy.

  1. Is there a copyright tag? Yes, it is using the {{PD-art}} copyright tag (claiming the image is in the public domain because the author has been dead more than 70 years).
  2. Is there a verifiable source? No, a source (e.g. web link or published source) has not been provided.
  3. Is there an image summary (i.e. the "necessary details to support ... the image copyright tag")? No, the information provided is not adequately supported. The names of the uploader and asserted author do not match, which indicates the image is not self-made and, thus, there exists an external (non-Wikipedia) source that needs to be cited.
    • Without a source, we cannot confirm that the asserted author (Hans Holbein the Younger) is indeed the original author.

Although this is likely public domain, verifiability, not truth, is the threshold for inclusion. Without a source confirming the author, this image could just as easily be a contemporary work.

Common misconceptions

Notes

  1. ^ WP:IUP: "U.S. law governs whether a Wikipedia image is in the public domain"
  2. ^ Copyright law contains a provision and exception for "article[s] having an intrinsic utilitarian function". A picture of a car or a chair, for example, would not be problematic. See the Commons guideline for elaboration.
  3. ^ 17 USC 120(a)
  4. ^ United States Copyright Office (2006). Copyright Office Basics. Retrieved August 1, 2008.

See also


SPV

Features and admins

Administrators

Two users were granted admin status via the Requests for Adminship process this week: WilliamH (nom) and Nev1 (nom).

Bots

Ten bots or bot tasks were approved to begin operating this week: AdminStatsBot (task request), ArkyBot (task request), Polbot (task request), Danielfolsom2 (task request), MelonBot (task request), MelonBot (task request), タチコマ robot (task request), DinoBot2 (task request), Erwin85Bot (task request), and Plasticbot (task request).

Four articles were promoted to featured status this week: Robert Sterling Yard (nom), Robert F. Kennedy assassination (nom), Greater Crested Tern (nom), and Proxima Centauri (nom).

Five lists were promoted to featured status last week: List of UEFA Cup winners (nom), List of Ontario birds (nom), 2008 WWE Draft (nom), List of Nine Inch Nails awards (nom), and NBA All-Defensive Team (nom).

Two topics were promoted to featured status this week: State touring routes in Warren County, New York (nom) and Smallville (season 1) (nom).

No portals were promoted to featured status this week.

The following featured articles were displayed this week on the Main Page as Today's featured article: Ann Arbor, Michigan, William Wilberforce, History of timekeeping devices, Yao Ming, PowerBook 100, Matthew Brettingham, and Parapsychology.

No articles were delisted this week.

No lists were delisted this week.

No topics were delisted this week.

The following featured pictures were displayed this week on the Main Page as picture of the day: Felbrigge Psalter, Khonds, Vietnam Veterans Memorial blueprint, The Spinning Dancer, Paris, Pasture Day Moth, and Flowers of the Asteraceae family.

Two sounds were featured this week: The Lost Chord (nom) and Israel in Egypt (nom)

No featured pictures were demoted this week.

Six pictures were promoted to featured status this week and are shown below.


SPV

Bugs, Repairs, and Internal Operational News

This is a summary of recent technology and site configuration changes that affect the English Wikipedia. Note that not all changes described here are necessarily live as of press time; the English Wikipedia is currently running version 1.44.0-wmf.8 (f08e6b3), and changes to the software with a version number higher than that will not yet be active. Configuration changes and changes to interface messages, however, become active immediately.

Fixed bugs

New features

Configuration changes

Other technology news

Ongoing news


SPV

The Report on Lengthy Litigation

The Arbitration Committee opened one new case this week, and did not close any cases, leaving four currently open. There is also a motion open on a previous case.

Motion

New case

Evidence phase

Voting phase

Motion to close



















Wikipedia:Wikipedia Signpost/2008-08-11/SPV