Google Leak May 2024: In-House Documents on Search Ranking Create an SEO Thunderstorm

A recent Google leak has set the SEO world abuzz. A trove of internal documents detailing how Google ranks search results was accidentally published online. This material, describing an older version of Google’s Content Warehouse API, offers a rare peek into the inner workings of Google Search.

The documents were unintentionally committed to a publicly accessible Google-owned repository on GitHub on March 13, likely by an in-house bot. An Apache 2.0 open source licence was attached to the commit, a standard for Google’s public documentation. Although a follow-up commit on May 7 attempted to retract the leak, it was too late. SEO experts rapidly took note.

Key Takeaways from Google Leak

These leaked documents don’t contain any actual code but detail how Google’s Content Warehouse API works, including numerous references to internal systems. While there’s a similarly named public Google Cloud API, these files delve much, much deeper into the search algorithm.

Among the 2,500-plus pages, there are over 14,000 attributes associated with the API, though it remains unclear which signals are used and their significance. However, SEO professionals find the documents noteworthy for their insights into Google’s ranking priorities.

Contradictions and SEO Implications

The documents unleashed by the Google leak contradict several public statements made by Google’s representatives over the years. Most notably, they say that:

– Click-centric user signals are employed.

– Subdomains are considered separately in rankings.

– There is a sandbox for newer websites.

– A domain’s age is collected and considered.

Rand Fishkin of SparkToro highlights these contradictions, emphasising that Google’s public denials of these factors are at odds with the leaked information. Michael King of iPullRank points to the revelation of a “siteAuthority” score, which contradicts Google’s previous statements that it does not have a website authority score.

Granular Insights

Click Metrics: Different types of clicks (good, bad, etc.) influence webpage rankings. During the US v. Google antitrust trial, Google acknowledged using click metrics as a ranking factor.

Chrome Views: Google uses data from websites viewed in Chrome as a quality signal, with the parameter ChromeInTotal being part of the API. Maybe that’s why it uses so much RAM?

Content Freshness: Factors like content freshness, authorship, page relevance to the site’s central focus, alignment between page title and content, and the average weighted font size of terms in the document body are considered in search rankings.

Amending Your SEO Strategy

The SEO community is teeming with discussions on how these revelations can be incorporated into existing strategies. Understanding that click-centric signals play a significant role could lead to a renewed focus on improving user engagement metrics. Alongside this, recognizing the importance of subdomains might influence how websites are structured and managed.

The acknowledgment of a sandbox for newer websites suggests that SEO practitioners need to be patient and strategic when launching new sites, understanding that initial performance might be artificially suppressed. The revelation about domain age could lead to a reevaluation of how domain history impacts SEO strategy, especially for businesses considering rebranding or acquiring aged domains.

Google app icon | Google Leak May 2024: In-House Documents on Search Ranking Create an SEO Thunderstorm

SEO Industry Impact

Beyond SEO professionals, this leak might have broader implications for the tech industry and digital marketing as a whole. It raises questions about the transparency and reliability of Google’s public statements – both of which appear to be lacking heavily. For webmasters and businesses, this newfound information could alter how they approach their online presence and marketing strategies. 

This incident also highlights the potential risks associated with automated systems and the importance of securing internal documentation. It actually serves as a reminder for companies to regularly audit their repositories and automation processes to prevent similar leaks.


This Google leak has provided invaluable insights for SEO professionals, revealing the complex and multifaceted nature of Google’s search ranking algorithms. While the documents date back to an older version of the API, they shed light on what Google deems important for search relevancy, sparking renewed discussions and strategies within the SEO community. As the industry digests these shiny new leaks, the longer-term impact on SEO practices and digital marketing strategies will undoubtedly show its face over the next year or so.