The Signpost

File:LOD_Cloud_-_2024-12-31.png
John P. McCrae
cc-by-4.0
5
300
Technology report

Wikidata Graph Split and how we address major challenges

Disclosure: I have a conflict of interest to favor all technology which I describe below, as I develop it as Wikimedian in Residence at the University of Virginia School of Data Science.

TL;DR summary – Wikidata has had a crisis since 2015, and in hindsight I wish we had talked about it sooner. More generally, I think that our Wikimedia Movement has a systemic problem of failing to identify and address our challenges. Comment below if you recognize missteps here in other Wikimedia systems.

If we had a problem, then would we talk about it?

About 1/3 of Wikidata items have always been metadata for scholarly articles from the WikiCite project, and now this is split from the main Wikidata graph.
The Linked Open Data cloud shows how open datasets link to other datasets. Since at least 2007 Wikimedia has been the most reused data resource. Consequently, any research institution which indexes its scholarly metadata in Wikidata is much more visible.

On 20 January 2026, the Wikimedia Foundation finalized the split of Wikidata into two collections of data, or "graphs". This Wikidata Graph Split affects the hundreds of regular contributors and thousands of regular tool users in the WikiCite community, who see value in curating a Wikimedia citation database. Since 2015, WikiCite's popularity exceeded the limits of Wikidata, or broke Wikidata, and consequently Wikidata has turned away new users, institutional partnerships, financial investments, and major content contribution projects due to our infrastructure lacking capacity to accept the contemporary standard of small data upload projects. All of us Wikipedia editors understand technical limitations throughout the Wikimedia projects, and to me Wikipedia's commitment to free and open-source software is endearing.

But in the case of Wikidata's limits, the problematic part was that since 2015, we tolerated uncertainty about if and when Wikidata's capacity would increase. We turned away users and projects for 10 years, and failed to signal a crisis and emergency. While I can understand Wikimedia governance planning fixes on a schedule in the context of our scarce resources, I want confidence that we have shared understanding of our challenges, and to reduce long-term uncertainty about if and when our tools will function as expected. If we had a major problem with a Wikimedia platform, then do we have the community infrastructure to talk about it?

My feeling is that our Wikidata challenge was not technical, but rather was about interpersonal relationships. For the future, I want confidence and trust that when we Wikimedia editors have major challenges, then we have a community governance system to recognize and discuss them. Look here with me at the circumstances which have slowed Wikidata growth for some years, and be hopeful with me about the success plan to fix things by summer 2027 when the Wikimedia Foundation will migrate Wikidata's backend to a new SPARQL engine.

Why anyone should care about WikiCite or Scholia

Scholia is a scholarly profiling service using Wikidata and affected by the split. Findings from this 2025 user survey included that users are enthusiastic to browse scientific research through Scholia as a Wikimedia research service.
Wikimedia annual plans all prioritize investing in the recruitment of more Wikipedia users. At the same time, we have gone many years without discussing Wikidata's limits as a major barrier to growth.
Scholia profiles for people visualize their scholarly publications, topics of works, co-authors, software use.

WikiCite is important for the Wikimedia community because it has been among the most popular Wikidata projects in terms of user count, content produced, investment attracted, university partnerships, active discussions, count of non-editor users, and stirring of passion. Universities are in the business of doing research, but lack an easy way to list their own researchers and own research publications. Only some universities can afford subscriptions to scholarly profiling services such as Web of Science or Scopus, but the WikiCite community seeks to provide this for free, to everyone, by using Wikidata to match citation metadata to researchers, institutions, and topics. The WikiCite project attracts contributors because it is easy to imagine a Wikipedia-aligned scholarly profiling service becoming fundamental to global research infrastructure.

WikiCite is the project to curate scholarly metadata in Wikidata. It includes the editing project, the community of editors and conferences, and outreach efforts through which institutions contribute their data, such as the WikiProject Program for Cooperative Cataloging project which recruited 50 universities to index their research in Wikidata. There are a handful of projects in the Wikimedia Movement which have 100s of editors and a portfolio of institutional partnerships. Although there are multiple reasons why editors come to WikiCite, a unique connection that the project has is that universities index their faculty and research publications in Wikidata both for Wikimedia community curation, and also because that indexing is a good investment as it surfaces the university's research output as linked open data in all other Internet services and AI which index research.

Scholia is a friendly web interface for accessing WikiCite collections. It is friendly in the sense that it has more than 400 scholarly queries already formatted, for example, list of a researcher's publications, list of people and research at a university, or profile of research on a topic. This sort of service is "scholarly profiling", and to sort this data, one needs the "scholarly graph of metadata" as Linked Open Data connecting topics to scholarly articles to authors to their institutions, co-authors, software, datasets, grants, and everything else. Scholia and WikiCite are the Wikimedia projects for scholarly profiling, and alternatives to services including Google Scholar, Web of Science, or OpenAlex. I am part of the Scholia team, and I am biased to favor it, but I think the WikiCite approach to connecting Wikimedia projects to a global scholarly database is one of the best and most popular project ideas that the Wikimedia Movement has developed. The WikiCite community includes a base of power users who also find value in this approach, as communicated in our 2025 survey of Scholia.

Exceeding the limits of Wikidata

In May 2024, The Signpost shared my story that "Wikidata would soon split as the sheer volume of information overloads the infrastructure". Disclosure, again: I am a Wikimedian in Residence who develops Wikidata content as a university researcher, so please note that I have an employer conflict of interest in this op-ed and in Wikidata's perpetual growth.

The split divided WikiCite content, which was 1/3 of the content of Wikidata, from everything else in Wikidata. The Wikimedia Foundation and Wikimedia community actually did discuss this, a lot. I really appreciate the Wikimedia Foundation staff who did many favors for me to give me many meetings monthly since 2024 by video, email, at conferences, and through referrals. Copied from the 2024 Signpost article, here again are the major discussion reports. The insight to gain from these reports is long term recognition of a major challenge, when all the while Wikidata is at reduced growth with no planned year in which we would increase capacity. No one did anything incorrectly, and delaying the decision always made sense at the time.

I see parts of the Wikimedia Movement that invest heavily in growing the editor community, and other parts of the Wikimedia community where I feel that technical challenges are incompatible with editor recruitment. In my view, Wikidata has been closed and in limbo for 10 years, but no community group ever organized to make a leadership statement of when Wikidata might update, and how we should make multi-year plans. There were thousands of hours of user time spent talking about the problem. We were unable to establish a governance plan to evaluate the cost of delay versus the scheduling of a decision. The worst part of this to me was that each year, there was the misunderstanding that someone was about to fix the problem, and that Wikidata service would expand. If this is a one-off in the Wikimedia Movement, then that might be tolerable, but I expect that if we had more robust community governance, then we might have a public ranked list of Wikimedia greatest challenges, and some estimate of the costs of decisions to address those challenges or delay.

Wikidata Graph Split

The Wikidata Query Service Split and its Impact on the Scholarly Graph (Q137374886) is documentation for institutions which need an explanation of the split.
Wikimedia servers use Grafana to track resource use. Here, the Wikidata Query Service has normal usage in November 2025 – January 2026.
Now that scholarly content is split into its own graph, it is hard to access. Use which was too high to manage has dropped to perhaps not at all in November 2025 – January 2026.

I am lacking insight, but now that Wikidata is split into two graphs, I am unaware of the existence of individual or institutional users of the scholarly graph which was supposed to be a solution to sustain Wikimedia community access to this content.

To clarify, Wikidata has two familiar parts: Wikibase, where users edit Wikidata; and Blazegraph, which hosts the query service. Wikibase is the data-oriented variation of MediaWiki; it is what most people think of when they are familiar with Wikidata, as it is the wiki for editing data. Wikidata's Wikibase is not split. The other part of Wikidata is its query engine, and that is split.

One of the splits is the Wikidata Query Service, now minus scholarly articles after the split.

After the graph split, now there is the scholarly graph, which is an endpoint containing only citation metadata.

This is jumping ahead a bit, but the Scholia team found the scholarly graph unusable, and migrated the full graph to a Qlever query engine. Anyone wanting to query a single graph can do so at

While WikiCite is a major Wikidata project, Wikidata is such a large platform that most Wikidata users do not curate citations, and will not notice the Wikidata Graph Split. For those who do want citation data through the Wikidata Query Service, then the Wikimedia platform solution is that they have to write a two-part query in which they seek some data from the Wikidata main graph, then get citation data from the Wikidata scholarly graph. In practice, this is too difficult. If there is a user community for the Wikimedia hosted scholarly split graph, then I have not yet seen their projects, and please someone link to them in the comments section of this article.

The Scholia team hosts virtual hackathons where anyone can put issues or problems in queue for the volunteer developer team to address in the next round. The April, November, and December events from 2025 all have documentation on what volunteers had to organize to prepare for the January 2026 graph split. There is a list of affected tools, some of which have updates. The Scholia team created Wikidata Query Service graph split documentation to describe how anyone should respond to the Wikidata graph split. This is both extraordinary that volunteers put these events and labor together, but also common across Wikimedia projects that volunteers organize responses and adaptations to keep tools functional in response to Wikimedia Foundation platform changes.

Blazegraph migration

Scholia 2026 Compliance with SPARQL 1.1 (Q138233208) reports that Scholia is updated to prefer standard-compliant SPARQL 1.1 in the Qlever SPARQL engine in favor of the older-versioned and customized Wikidata SPARQL for Blazegraph

The thing that everyone should know about Wikidata and Blazegraph is that Amazon acqui-hired everyone at the Blazegraph nonprofit organization, so it has not had a major update since 2015. Wikidata has been in trouble since that time in 2015.

Wikidata was established in 2012 as the linked data complement to Wikipedia's prose, and was part of our strategy to keep Wikimedia projects technologically advanced. The software backend of Wikidata is the scrappy Blazegraph, which is free and open-source software. At the time of Wikidata adopting it, it already had its own independence, development team, and funding to sustain it. While no one can buy or close open-source software, companies can hire every developer and expert on the software. Amazon acquired the Blazegraph team soon after Wikidata had committed to Blazegraph as its SPARQL engine for queries. Amazon Neptune is based on Blazegraph open software, but proprietary software. Consequently, Wikidata's SPARQL engine backend has not had a significant update since Wikidata established its SPARQL endpoint in 2015.

While the Wikidata graph split relieves the Wikimedia Foundation servers of the intense computation required of a larger dataset, the graph split is not intended as a solution, but just a way to delay the crash by 2 years, assuming that we also keep restrictions on data imports and deterring expected use. Blazegraph is now abandoned technology and inferior to alternatives. The planned solution to ready Wikidata for next generation editing is to migrate Wikidata's SPARQL engine to another database by summer 2027.

In September 2025, the Wikimedia Foundation announced a schedule for a Wikidata Query Service backend update. It is good news for Wikidata editors that there is a newly appointed Wikidata Platform WMF staff team doing these changes. Everyone should support them and wish them all success. They are available to meet during scheduled office hours.

Another major change which is timely now is that when Wikidata migrates to a new SPARQL engine, we could update to standard SPARQL 1.1. The Wikidata Query Service has been using a customized, older version of SPARQL only for Wikidata. The Wikidata version of SPARQL is easy to use especially for managing multiple languages, but using customized SPARQL also has drawbacks. One drawback is that if we migrate to another system, then either we need to redesign the customization, or require that every single Wikidata tool and query be updated to standard SPARQL. The previously mentioned list of tools affected by the graph split may be small in comparison to the changes needed if we migrate to standard SPARQL.

We in the Scholia team migrated to an option which uses standard SPARQL by modifying about 400 queries.

Selection of next-generation SPARQL engine

Benchmarking SPARQL Engines on Wikidata Queries (Q137374978) reports Wikimedia community-supported testing of various Blazegraph replacements

There is an exciting competition happening right now to decide the next SPARQL engine for Wikidata. The Wikimedia Foundation has selected two candidates: Qlever and Virtuoso. If all goes well, we should have a revived Wikidata by mid 2027 with greatly expanded capability for processing data and inviting institutional partnerships. Both of these options have 10–100× the capacity of Blazegraph, and are viable alternatives. Other candidates have already been disqualified after earlier testing.

The Scholia team has already made a commitment to Qlever. To avoid federated queries, there is a single Wikidata graph containing everything at https://qlever.scholia.wiki/ , and hosted by the Qlever team at the University of Freiburg. Virtuoso is a great candidate also and both should be tested; I am just sharing how things turned out.

Wikimedian Peter F. Patel-Schneider has been benchmarking various engines with 7 different competition benchmarking query sets, each of which is a large dataset designed to stress the systems with queries. In mid-February 2026 the Wikidata Platform team posted their WDQS Triple Store Evaluation using 3 of the simpler of those 7 datasets, and published their own benchmarking results. Communication between the Wikimedia communities and the new Wikidata Platform team is starting and ongoing. Wikimedia Switzerland has been supporting Wikimedia community engagement in the transition process, including by sponsoring research in this report and by hosting WikiCite 2025.

How we talk about challenges

The solution that I want for the graph split, and for many other existing Wikimedia Movement challenges, is simply to be able to see that there is some group of Wikimedians somewhere who have active communication about our challenges. I want to get public communication from leadership who acknowledges challenges and who has the social standing to publicly discuss possible solutions. I want to see that someone is piloting the ship upon which we all sail, and which no one would replace if it ever failed and sunk. For lots of issues at the intersection of technical development and social controversy – data management, software development, response to AI, adapting to changes in political technology regulation – I would like to see Wikimedia user leadership in development, and instead I get anxious for all the communication disfluency that we experience. Ten thousand of us or so participated in the 2018–2020 Wikimedia Movement Strategy, which had the goal of improving our governance infrastructure such that if we ever had a major problem, then we would quickly identify it and discuss it without fear. The Wikidata Graph Split is not the story here. The story here is that so much in the Wikimedia Movement is fragile, and that when we have major challenges then networks like WikiCite are unable to create chains of decision making to address them.

I appreciate all the effort that Wikimedia Foundation staff put into collaborating with the WikiCite community for the transition. The Wikimedia community is extraordinary for community participation in all levels of governance. The challenges we have are normal for Internet tech platform development anywhere, and is the way that user communities experience software updates.

What you can do

Happy Valentine's Day, everyone love one another

Participate in on-wiki conversations to make decisions.

  • If you want to talk with the Wikimedia Platform team, then there are migration office hours
  • Wikidata is currently having its boldest discussion on notability criteria. Is WikiCite in scope? What about locations in OpenStreetMap? Should we graph split biographies? Can we do WikiCite, but for Internet Archive holdings instead of scholarly publications? Is it finally time to import all proteins and all astronomical objects?
  • Also comment on mass editing policy, and other Wikidata requests for comment
  • The Wikimedia Foundation and Wikimedia Deutschland agreed off-wiki that even after migration from Blazegraph, the split graphs will not be rejoined, even if the new platform has capacity. There is no public discussion about this, and I want one. Please comment below or message me privately if you want to help arrange some public discussion for this.
  • The Wikimedia Foundation operates Wikidata's API, and Wikimedia Deutschland operates everything else Wikidata. They share power and money with each other. I do not know anyone in authority for Wikidata issues at either place, but right now is Valentine's Day time of year and I think they could be better pals. If anyone can, get interviews with representatives from both and get them to say publicly that each one wants the other to perpetually have all the power and control and money that they currently do. If either objects, then get them to talk it through.
  • Please sign to support meta:WikiCite (3), which is a proposal to establish WikiCite the citation database as an official Wikimedia project


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

The sad thing is that if you look at the financial information for the Wikimedia Foundation, you'll see that the foundation has plenty of money to throw at problems. Specifically, between net assets of the organization, and money in a separate endowment fund, there is at least $300 million for such things as increasing data storage and processing capacity, things that can and should be done well before a crisis arrives. -- John Broughton (♫♫) 18:00, 17 February 2026 (UTC)[reply]

The thing that everyone should know about Wikidata and Blazegraph is that Amazon acqui-hired everyone at the Blazegraph nonprofit organization, so it has not had a major update since 2015. Wikidata has been in trouble since that time in 2015. I think this was a major failure of vision on part of WMF. Nothing was stopping WMF from taking over the project. They could have hired people to work specificly on it. Open source is not just a place to get software without paying for it. You're expected to contribute back to make it meet your needs. We have hundreds of people working on mediawiki. Blazegraph was, as far as i can tell based on github stats, developed by basically just two people. WMF could have hired some folks to replace Systap. Bawolff (talk) 18:24, 17 February 2026 (UTC)[reply]

Both of these options have 10–100× the capacity of Blazegraph. I think an allegedly deserves to be added here. Blazegraph, according to marketing material, also supports significantly higher capacity then we are at currently. Its easy to claim theoretical high capacity when nobody puts it to the test. Bawolff (talk) 18:29, 17 February 2026 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2026-02-17/Technology_report