The battle to destroy Wikipedia's biggest sockpuppet army
Wikipedia editor and self-professed “bird geek” DocTree spends most of his time on the world’s largest encyclopedia editing the pages for long-dead ornithologists. So it was somewhat unusual when, in August 2012, he found himself working on the page for “CyberSafe,” a high-tech digital encryption company based out of Middlesex, England, with a pronounced dearth of ornithological relevance.
Someone on Wikipedia had nominated the page for deletion, and DocTree, who sometimes participates in deletion discussions on topics that fall outside his interests, decided to pitch in.
There are a number of possible reasons for a Wikipedia page to be deleted, but the most common justification is that it lacks “notability.” This is a loose standard that essentially asks the question: Is this subject important enough for a Wikipedia article? One metric editors use are the citations at the bottom of the page. The idea is that, if a subject has been reported on thoroughly by reputable sources--preferably a seasoned news organization or publisher—then it probably deserves an article.
At first glance, the CyberSafe page seemed to meet Wikipedia’s notability requirements. Every fact was backed up with citations to multiple news outlets. Then DocTree dutifully clicked on the links. The facade quickly crumbled.
“None of the references really dealt with CyberSafe,” DocTree told the Daily Dot. “The sources dealt with Internet security in general, but not CyberSafe.”
Whoever had created the page had done so with the assumption that most people wouldn’t bother actually clicking on the citations.
That was the first sign that something was off, but things got weirder. Numerous people showed up to defend CyberSafe, to argue that it shouldn’t be deleted. If you checked their editing history, however, you found either nothing at all or a series of edits to pages that fit the profile of CyberSafe—small companies or individuals who likely didn’t warrant Wikipedia pages. These CyberSafe defenders made very similar arguments, almost as if they were written by the same person.
“It was all smoke and mirrors,” said DocTree. “When I saw how similar the arguments were, it just didn’t look right.”
He submitted all five user accounts for a sockpuppet investigation, which checks to see if a single person is using multiple accounts to promote an agenda. Wikipedia’s own definition describes a sockpuppet as “an online identity used for purposes of deception.” Using sockpuppets is a major sin on Wikipedia because it fundamentally undermines the encyclopedia’s credibility: The community ought to be self-governing, but if one user controls an army of automatons who parrot his or her opinion in every discussion, how can you trust any decision?
DocTree had no idea his routine investigation was about to uncover the largest sockpuppet network in Wikipedia history.
Only a few people on Wikipedia are trusted with sockpuppet investigations. Like forensic investigations in the real world, they tend to reveal the most sensitive information. In this case, that means a Wikipedia user's IP address, the unique number that identifies the computer network you use to connect to the Internet. For any logged-in Wikipedia user, IP addresses are usually hidden.
The Wikimedia Foundation, the nonprofit that oversees Wikipedia and its sister sites, entrusts a small team of admins with the responsibility, but only after vetting them. The organization then runs a tool called CheckUser. This checks the IP addresses of all the accounts, but it also inspects a number of other markers you leave behind whenever you visit a website. A trained expert can tell what browser you viewed the page with, what operating system your computer runs on, and even whether you’ve updated to the latest version of Adobe Flash Player.
In a sockpuppet investigation, the goal of the CheckUser is not only to determine if multiple Wikipedia editors were writing from the same IP address, but from the same exact computer.
After the initial investigation, the CheckUser concluded two things: that these accounts were, indeed, sockpuppets, and the problem extended far beyond the five accounts DocTree had identified. From there, it was up to a cadre of dedicated Wikipedia editors and admins to hunt down the larger sockpuppet network, following a breadcrumb trail of pages and edits to identify new potential offenders and turn them over to CheckUser for confirmation.
The talk page laying out the entire investigation is fascinating, not just because it gives you insight into how these investigations are carried out. It also allows you to appreciate the sheer magnitude of the legwork involved.
“I spent more time than I'd like to admit gathering evidence, writing the long-term abuse report, and nominating articles for deletion—easily several days,” Wikipedia editor Rybec told me via email.
Of everyone, administrator Dennis Brown probably spent the most time ferreting out sockpuppets. “I literally put in 12 hours yesterday, verifying/tagging/blocking 199 socks, and I have about 100 more to go, plus a few dozen unlisted socks to research,” he wrote. “I'm a bit crispy around the edges at the moment, and I'm probably 25% done with the work at best."
Entries in the investigation page stretch from Aug. 14, 2012 to Sept. 20, 2013. If you scroll down you can see as the CheckUser made its way through the unearthed accounts and stamped them either with a “confirmed,” “technically indistinguishable,” “likely,” or “inconclusive.” The vast majority of the accounts came back “confirmed.”
At one point, one of the users accused of sockpuppetry, Bioengineer+attorney, appeared in the talk page to defend himself.
“Quite a warm welcome to a new Wikipedian!” he wrote.
“How about these facts. I am an [attorney]. I am a BSE in engineering. How about you? In a court this is called a preponderance of the evidence in favor of defendant. All I've learned is that (1) new users are unwelcome, and (2) the protocol is to attack if someone disagrees. Noted. I don't want to wear out the welcome mat!.”
The investigators were unmoved. “Odd, I worked in the field, and in a civil case the burden of demonstrating a preponderance is upon the plaintiff, whereas in a criminal case (the proper analogy) the threshold is entirely different,” Brown responded. “Very odd choice of phrasing, coming from a lawyer.”
By September of this year, the investigation talk page included over 900 edits from more than 50 authors. It had unearthed 323 user accounts as confirmed sockpuppets with an additional 84 suspected. The only other known sockpuppet network of this size and scope was the case of Bambifan101, a still-ongoing investigation that located 236 suspected and 249 confirmed accounts. In other words, this was one of the largest—if not the largest—discovered sockpuppet networks in Wikipedia history.
Nearly all the accounts uncovered by the investigation had a few things in common: Most of the pages created were about companies and living persons; the pages were generally positive and promotional in nature; they often cited articles that were written on websites that anybody could contribute to. The oldest account associated with the sockpuppet network was called "Morning277" and had been active since November 2008. Morning277 had a busy life on Wikipedia, making more than 6,000 edits.
In some cases the article citations would be misleading, like when they would cite CNN but it turned out the article came from CNN’s iReport, a website that allows citizen journalists to upload unvetted content; most damningly, the users would edit these pages, often simultaneously, without ever using the talk pages to communicate their intentions.
“For more than one person to be working on an article at the same time and never have a disagreement, that’s rare,” DocTree told me. “Almost always, if there are legitimate edits, there will be discussion on the talk page.”
In addition to banning all the confirmed sockpuppet accounts, administrators also deleted many of the pages that were created by the Morning277 network. In a matter of days, hundreds of pages comprising thousands of words that had likely taken untold hours to compose blinked out of existence.
Who would go through all this trouble to concoct such an elaborate system of deception?
There are a number of reasons why a user might create fake accounts, but given the promotional nature of the edits and their subjects—mostly small companies, many of which were based in Silicon Valley—it seemed obvious to all that the Morning277 network was made up of paid editors who had been hired by these companies to create pages for them.
Wikipedia has had a long, uneasy relationship with paid contributors. Many purists believe that a Wikipedia page’s subject, or anyone paid by that subject, has no business editing that page because his objectivity is compromised.
In 2006 a Wikipedia user named Gregory Kohs launched MyWikiBiz, a company that promised to create and edit your Wikipedia page for a fee. Shortly after MyWikiBiz’s debut, Jimmy Wales, the cofounder of Wikipedia, personally stepped in and banned Kohs’s account. Kohs told a reporter that Wales called him and said the paid editing was “antithetical” to Wikipedia’s mission.
Eventually, much of the community came to accept that someone mentioned on Wikipedia has a legitimate reason for wanting to be portrayed accurately and fairly. To the surprise of many, Wales weighed in on the subject in 2012 and seemed to revise his previous views on paid editing.
"My position is relatively simple,” he wrote. “I am opposed to people who are paid advocates being allowed to edit in article space at all, and extremely supportive of paid advocates being given other helpful paths to assist in our work usefully and ethically."
This meant that, while paid editors couldn’t contribute to an article directly, they could reach out to editors and make an argument for why a page deserved to exist or how it could be modified to meet the encyclopedia’s standards of neutrality. Most importantly, all paid work needed to be disclosed.
Whatever your opinion on paid editors, Morning277’s actions did not meet even a minimum level of disclosure.
Of the thousands of words written on the investigation page, not one pointed to the culprit. The administrators I spoke to refused to even speculate on the sockpuppet’s true identity. This is unsurprising given the community’s devotion to user privacy, even in cases of extreme abuse. But it wasn’t difficult to determine who was responsible. I simply had to email the companies that were featured in the deleted pages.
Of the few dozen companies I emailed for this article, four got back to me. All requested I keep their names out, and all told the same story: They hired a company called Wiki-PR to make pages for them.
Wiki-PR is no secret. Wikipedia admins have been aware of the company for some time. It openly boasts of its service on its website. Wiki-PR claims to have a “staff of 45 Wikipedia editors and admins helps you build a page that stands up to the scrutiny of Wikipedia’s community rules and guidelines.”
It claims a roster of 12,000 clients and offers them this ironic warning: “Don’t get caught in a PR debacle editing your own page.”
The company's “leadership” section lists two cofounders: COO Darius Fisher and CEO Jordan French. Both, according to LinkedIn, graduated from Vanderbilt University. Fisher currently lives in San Francisco while French resides in Austin, Texas.
Though the page claims the company respects “the [Wikiedia] community and its rules against promoting and advertising,” it later states that “we’ll both directly edit your page using our network of established Wikipedia editors and admins,” a direct flaunting of Jimmy Wales’s “bright line.”
Perhaps the most shocking claim on the Wiki-PR is that the firm employs admins. Wikipedia’s privileged few, admins possess special rights and powers they use to keep other editors in line. They can restrict editing access to a page (often when a page is being vandalized or is extremely controversial), ban users, and delete pages. Wikipedia admins (who, like almost other Wikipedia user, are volunteers) are often thought of as the site’s sacred guardians, committed to neutrality and fairness, able to wade into the most controversial and divisive entries and deliver impartial judgement.
If Wiki-PR’s claims are true, that means there may be “sleeper agents” among Wikipedia’s most powerful users, a revelation that would likely send chills down the spine of any devoted Wikipedian.
Of the four Wiki-PR clients I interviewed, all found out about the company through its aggressive email marketing.
“We received an email,” one client told me. “It said, ‘We notice you don’t have a Wikipedia page. If you’re interested, email us.’”
It’s a simple sales pitch, but an effective one.
“It was good timing because I was thinking we needed to create a Wikipedia page,” another company told me. “They said they would write it and they’d keep it up to date.”
The former clients said they paid between $500 and $1,000 to have the page created, then an additional $50 a month afterwards for “monitoring”—basically Wiki-PR promised to track changes to their pages and resurrect them if any got deleted. If a client didn’t meet Wikipedia’s “notability” standard, Wiki-PR offered to generate articles about you.
“They had seven articles written about us,” a client said. “People contacted us, sent some interview questions that I did with my CEO, and then they created our page.”
The clients all noticed when their pages went down, but they weren’t savvy enough on Wikipedia to know exactly why. Most just assumed theirs was an isolated incident.
“In July or so, I was giving a presentation and went to click on our page and it wasn’t there,” one told me. “And so I emailed the CEO [of Wiki-PR] kind of asking about it, and their reasoning was obviously a lie.”
Wiki-PR told the concerned clients that the pages failed to meet notability requirements, or that an activist admin had targeted them. In most instances, a Wiki-PR representative promised that the page would be back up shortly.
“Their timeline has been ridiculous,” said a client. “They said it would be up in five to 10 days, and now it’s two months later and it’s still not up.”
At no point were the clients told the real reasons their pages had gone down—that their pages were allegedly created by sockpuppets. All claimed ignorance of Wikipedia’s rules, claiming that Wiki-PR had boondoggled them.
“It seemed to me they were [adhering to Wikipedia’s guidelines],” a client said. “That was the impression that I got. They didn’t say they were black hat or that they were sneaking around.”
But the clients were equally annoyed with Wikipedia itself.
“I know [Wikipedia has] editors where they read stuff to make sure it’s valid,” one said.
“That should be the gatekeeper, not who published it … Wikipedia seems like it’d be better off with more people contributing. It’s not like we put a page up that lied about us or had false claims.”
It's unclear at this point how, with the increased scrutiny, Wiki-PR will be able to deliver on its promise to resurrect the deleted Wikipedia pages. There have already been cases where admins have swatted down these attempts. According to the clients I spoke to, the company has promised to refund their money if it fails to publish their articles. But given the large number of client pages that were deleted, repaying all those clients would be a hefty expense.
For dedicated Wikipedia editors and admins, the Morning277 investigation outcome was likely a paean to the community’s devotion to purity and neutrality. But while reporting this article I couldn’t help comparing the sockpuppet discovery to a large drug bust—perhaps it might take out a major kingpin, but at the end of the day it’s a relatively minor victory in what is an otherwise losing war on drugs.
According to Alexa, Wikipedia is the sixth most-trafficked website on the Web. It’s the first listing in a Google search for every topic from major corporations to celebrities to all manner of controversial topics. If biased, for-hire authors have infiltrated the encyclopedia to a broader extent, we should all be worried. Wikipedia is the primary source of knowledge on the Internet.
I asked DocTree, the editor who started this all, whether the investigation had effectively squashed most of the undisclosed, conflicted edits on the site.
“It’s just the tip of the iceberg,” he said after a brief pause.
“A sockpuppet investigation is not my thing,” DocTree added. “I would much prefer to just stay in the background, edit Wikipedia, and stick with my articles on old and dead ornithologists.”
Update 10/21/13: In a post on the foundation's official blog, Wikimedia chief Sue Gardner has released a statement on Wiki-PR and the information reported in this article:
The Wikimedia Foundation takes this issue seriously and has been following it closely.
With a half a billion readers, Wikipedia is an important informational resource for people all over the world. Our readers know Wikipedia’s not perfect, but they also know that it has their best interests at heart, and is never trying to sell them a product or propagandize them in any way. Our goal is to provide neutral, reliable information for our readers, and anything that threatens that is a serious problem. We are actively examining this situation and exploring our options.
In the wake of the investigation, editors have expressed shock and dismay. We understand their reaction and share their concerns. We are grateful to the editors who’ve been doing the difficult, painstaking work of trying to figure out what’s happening here.
Editing-for-pay has been a divisive topic inside Wikipedia for many years, particularly when the edits to articles are promotional in nature. Unlike a university professor editing Wikipedia articles in their area of expertise, paid editing for promotional purposes, or paid advocacy editing as we call it, is extremely problematic. We consider it a “black hat” practice. Paid advocacy editing violates the core principles that have made Wikipedia so valuable for so many people.
What is clear to everyone is that all material on Wikipedia needs to adhere to Wikipedia’s editorial policies, including those on neutrality and verifiability. It is also clear that companies that engage in unethical practices on Wikipedia risk seriously damaging their own reputations. In general, companies engaging in self-promotional activities on Wikipedia have come under heavy criticism from the press and the general public, with their actions widely viewed as inconsistent with Wikipedia’s educational mission.
The Wikimedia Foundation is closely monitoring this ongoing investigation and we are currently assessing all the options at our disposal. We will have more to say in the coming weeks.
Photos by Andrew Malone / flickr & Kristina Alexanderson / flickr (remix by Jason Reed)