You are currently viewing Leaked Documents Reveal How Google Search Gatekeeps the Internet

Leaked Documents Reveal How Google Search Gatekeeps the Internet

Search on Google often referred to as the gateway to the Internet, it is the first stop on most people’s journey to information online. But Google doesn’t say much about how it organizes the Internet, turning Search into a giant black box that dictates what we know and what we don’t. This week’s 2,500-page leak first reported by a search engine optimization (SEO) veteran Rand Fishkingave the world insight into the 26-year-old mystery of Google Search.

“I think the biggest takeaway is that what Google’s public representatives say and what Google’s search engine does are two different things,” Fishkin said in a statement emailed to Gizmodo.

These documents provide a more detailed understanding of how Google Search controls the information we consume. Getting the right web page onto your computer is not a passive task, as thousands of editorial decisions are made on your behalf by a secretive group of Googlers. For SEO, an industry that lives and dies by Google’s algorithms, the leaked documents are an earthquake. It’s like the NFL referees rewrote the rules of football in the middle of the season and you just find out while playing the Super Bowl.

Several SEO experts tell Gizmodo that the leak lists 14,000 ranking features that, at the very least, lay out a blueprint for how Google organizes everything on the web. Some of these factors include Google’s determination of a website’s authority on a given topic, the size of the website, or the number of clicks the webpage receives. Google has previously denied that it uses any of these features to rank in Search, but the company has confirmed that these documents are real, albeit imperfect in nature.

“We would caution against making inaccurate assumptions about search based on out-of-context, outdated, or incomplete information,” a Google spokesperson said in an email to Gizmodo. “We have shared extensive information about how search works and the types of factors our systems weigh, while working to protect the integrity of our manipulation results.”

Out of Google’s “cautiousness,” the company won’t confirm what is or isn’t correct in these documents. Google says it’s incorrect to assume this is exhaustive information about Search, and tells Gizmodo that giving away too much information could empower bad actors. Ultimately, we don’t know what goes into determining these factors or what weight, if any, Google Search gives to each.

“We’re just looking at different variables that they’re considering,” said Mike King, an SEO expert who was one of the first to analyze the leak, in an interview with Gizmodo. “It is the detail of which [Google] browsing websites.’

This leak was first noticed by Erfan Azimi, an SEO practitioner, who found the API documentation publicly available on GitHub. It is not clear whether these documents were indeed “leaked” or were somehow published by Google in a quiet corner of the web, perhaps by accident. Azimi aimed to publicize these documents by taking them to Fishkin last week, who asked King to help them make sense of them.

King notes that a ranking function “homepagePagerankNs” suggests that the bad reputation of a website’s homepage can supports everything he posts. Fishkin writes that the leak refers to a system called NavBoost — first mentioned by Google’s vice president of search, Pandu Nayak, in his testimony to the Department of Justice — that allegedly measures clicks to improve Google Search rankings. Many in the SEO industry take these documents as confirmation of what the industry has long suspected: a website deemed popular by Google may receive higher search rankings for a query, even though a lesser-known site may have a better information.

In recent months, several small publishers have done so have seen their Google Search traffic disappear. When Nilay Patel of The Verge asked Google CEO Sundar Pichai about this last week, Pichai said that it was not clear “whether this is a uniform trend.A ranking function that King invokes seems to categorize these small sites equally.

“They have a feature called ‘smallPersonalSite’ and we don’t know how it’s used, of course, but it’s an indication that [Google] wants to know if these are smaller sites,” King said. “With so many of these small sites being crushed right now, it just goes to show that [Google] doesn’t do anything to make up for what the big brand signals are.”

It should be noted that Pichai later mentioned in this interview with The Verge that in other cases Google has directed more traffic to small sites. These ranking features can show the levers that Google can use. As more and more national media organizations license their content to appear on ChatGPT, Google Search also appears to be targeting larger publishers. In general, this can have a crushing effect, compressing what most people hear into mainstream media organizations only.

The ripple effects of these leaked Google documents were widely felt. Kristen Ruby, CEO of Ruby Media Group, who has worked in digital PR and SEO for more than 15 years, tells Gizmodo that she received creepy text on Monday night: “Google will break tomorrow.”

Ruby quickly discovered the leak and noticed two ranking functions that stood out to her: “isElectionAuthority” and “isCovidLocalAuthority”. These features appear to be Google’s way of ranking the trustworthiness of a web page for providing correct information about elections and, accordingly, about COVID-19. In 2019, Ruby wrote a lot about how Google measures trustworthy web pages (which Google calls EEAT, which stands for Experience, Expertise, Authority and Trust) is inherently political. She notes that Google’s measurement of these factors tends to skew along political lines.

“I find it problematic that Google doesn’t provide context for critical elements in the data like ‘isElectionAuthority’ or ‘isCovidLocalAuthority.’ How does Google determine authority on these critical domains?” Ruby said in an emailed statement. “I shouldn’t have to guess what the answer is. Google should come and tell me what the answer is.”

Although Google is a business with a right to personal information, Ruby argues that Google has an obligation to answer questions about these ranking functions that shape the world around us. King and Fishkin also noted “isCovidLocalAuthority” and “isElectionAuthority” in their descriptions of the leak, both emphasizing the importance of search engines in increasing the quality of information.

“I think it’s really important that they provide that kind of recognition of the information, because like it or not, Google is actually a public service,” King said. “I probably get a lot of flak for saying this, but we think of it as the go-to source for how you get information on the web.”

How Google ranks the information in these examples is a microcosm of the entire Search ecosystem. On any given day, there are millions of questions about what information to amplify and what to withhold. While Google and several tech companies have long tried to portray themselves as opinion-free algorithms, these ranking features show that’s not quite the case. There are many more examples of ranking features revealed in the 2,500-page leak.

Search for answers among the Google algorithm

Since Google won’t elaborate on these documents, telling Gizmodo that giving away too much information could empower bad actors, SEO experts are left to make sense of it on behalf of everyone who uses Google Search. Several of those 14,000 ranking features identified in the past week are things Google has explicitly said it hasn’t used over the years.

In a 2016 video, a Google Search representative stated, “We have no website authority score.In a 2015 interview, another Google employee said:Using clicks directly in ranking would be a mistake.” It’s hard to understand the point of those comments now in light of the leaked documents and Google’s response.

“This response is a perfect example of why people don’t like or trust Google,” Fishkin said. “This is a statement that doesn’t address the leak, provides no value, and may have been written by an AI trained on the most soulless corporate messaging of the last decade.”

In the age of AI responses, Ruby notes that the way Google ranks web pages is more important than ever. Instead of a series of links to different points of view, you may get only one correct answer thanks to Google’s new AI reviews. However, we’ve seen 10-year-old Reddit posts get weird amounts of authority, telling some users to post glue in their pizza. How Google chooses authority is increasingly important, as the top result may be the only one with a voice now.

“We’re changing gears. We’re moving from one search engine to another,” Ruby said. “AI is having a huge impact on search results.”

Ultimately, it’s hard to tell what Google is really doing with these ranking features. It is clear that Google created these classifiers and potentially more to rank websites on the Internet. These rankings obviously require judgment, adding further evidence that Google Search is not an objective experience, but rather a series of editorial decisions made by people at Google.

Leave a Reply