You are currently viewing A HUGE Google Search document leak reveals the inner workings of the ranking algorithm

A HUGE Google Search document leak reveals the inner workings of the ranking algorithm

A set of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content.

What happened. Thousands of leaked internal documents that appear to come from Google’s internal Content API Warehouse were shared with SparkToro co-founder Rand Fishkin earlier this month.

  • Read on to find out what we learned from Fishkin, as well as Michael King, CEO of iPullRank, who also reviewed the papers (and plans to provide additional analysis to Search Engine Land soon).

Why do we care? This leak gives us insight into how Google’s ranking algorithm works, which is invaluable to SEOs who can figure out what it all means. In 2023, we got an unprecedented look at Yandex Search’s ranking factors through a leak, which was one of the biggest stories of the year.

That Google Doc leak? This will probably be one of the biggest stories in the history of SEO and Google Search.

What’s inside. Here’s what we know about the leaked documents from Fishkin and King:

  • Current: The documentation indicates that this information is accurate as of March.
  • Ranking Features: 2,596 modules are represented in the API documentation with 14,014 attributes.
  • Weight: The docs don’t specify how any of the features are weighted for ranking – only that they exist.
  • Twiddlers: These are reranking functions that “can adjust a document’s information retrieval score or change a document’s ranking,” according to King.
  • Downgrades: Content may be downgraded for a variety of reasons, such as:
    • The link does not match the target site.
    • SERP signals indicate user dissatisfaction.
    • Product reviews.
    • Location.
    • Exact match domains.
    • porn
  • Change history: Apparently, Google keeps a copy of every version of every page it has ever indexed. This means that Google can “remember” every change ever made to a page. However, Google only uses the last 20 URL changes when analyzing links.

Relationships matter. Shocking, I know. This leak confirms that diversity and relevance of relationships remain key. And PageRank is still very much alive within Google’s ranking functions. PageRank for the home page of a website is taken into account for each document.

Successful clicks matter. This shouldn’t come as a shock, but if you want to rank well, the leak elements make it clear that you need to keep creating great content and user experiences. Google uses a variety of metrics, including badClicks, goodClicks, lastLongestClicks, and unsquashedClicks. As King said:

  • “[Y]we should drive more successful clicks using a wider set of queries and earn more variety of links if you want to continue to rank. Conceptually it makes sense because a very strong content will do that. Focusing on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.”

Documents and testimony from the US v. Google antitrust trial confirmed that Google uses clicks in rankings. See more of our coverage:

Brand matters. Fishkin’s big takeaway from the leak is that brand matters more than anything else:

  • “If there was one universal piece of advice I had for marketers looking to massively improve their organic search rankings and traffic, it would be: ‘Build a prominent, popular, well-recognized brand in your space outside of Google search.'”

Creatures matter. Google stores author information associated with the content and attempts to determine whether an entity is the author of the document.

SiteAuthority: Google uses something called “siteAuthority”.

Chrome data. A module called ChromeInTotal shows that Google uses data from its Chrome browser for search rankings.

White lists. Several modules show that Google is whitelisting certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Although we have long known that Google (and Bing) have “exception lists” when “specific algorithms inadvertently affect websites”.

The articles.

The expiring one. Erfan Azimi posted this video, claiming responsibility for the leak.

Leave a Reply