Google has officially announced that from the 1^st September it will no longer support robots.txt files with the noindex directive included within the file.

In the announcement, it said that “in the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019. For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options.”

What is the function of the noindex disallow directive?

The robots.txt file and noindex directive give webmasters the power to tell Google which pages it should crawl and which pages it should index, and therefore display in the search results.

Noindex: tells Google not to include your page(s) in search results
Disallow: tells them not to crawl your page(s)
Nofollow: tells them not to follow the links on your page

Noindex (HTML tag on the page) + disallow can’t be combined because the page is blocked by the disallow, and therefore search engines won’t crawl it and discover the tag advising not to index.

Noindex (robots.txt) + disallow was the way webmasters could prevent crawlability and indexability of certain content. With the new update, SEOs will only be able to disallow content that they don’t want to be crawled and indexed before it goes live. For content that’s been published for a while, there are a number of alternative options.

What are the alternatives?

Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
404 and 410 HTTP status codes: Both status codes mean that the page doesn’t exist, this will drop the URLs from Google's index once they're crawled and processed.
Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google's index.
Disallow in robots.txt: Search engines can only index pages that they know exist, so blocking the page from being crawled usually means its content won’t be indexed.
Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google's search results.

What’s changing?

As the Robots Exclusion Protocol has never been official, there were no definitive guidelines on how to keep it up-to-date or make sure a specific syntax must be followed. Every major search engine has adopted robots.txt as a crawling directive and from now on it will be standardised. The first proposed changes came earlier this week:

"Requirements Language" section will be removed
txt now accepts all URL-based protocols
Google follows at least five redirect hops and if no robots.txt is found, Google treats it as a 404 for the robots.txt
If the robots.txt is unreachable for more than 30 days (5XX status code), the last cached copy of the robots.txt is used, or Google will assume no crawl restrictions
Google treats unsuccessful requests or incomplete data as a server error
"Records" are now called "lines" or "rules"
Google doesn't support simple errors or typos
Google currently enforces a size limit of 500 kibibytes (KiB), and ignores content after that limit
Updated formal syntax to be valid Augmented Backus-Naur Form (ABNF) per RFC5234 and to cover for UTF-8 characters in the robots.txt
Updated the definition of "groups"
Removed references to the deprecated Ajax Crawling Scheme.

In fact, Google released its robots.txt parser as an open source project along with this announcement the other day. After all, Google has been saying this for years; back in 2015, John Mueller said “you probably shouldn't use the noindex in the robots.txt file”.

You can get more information on robots.txt and robots meta tags in this video from John Muller.

Share this article:

About the author

Lucia Navarro

SEO Performance Director

More by this author

SEO

Top insights from BrightonSEO

On Friday, 12th April, search marketers from all over the world descended upon the UK’s south coast to pack out the Brighton Centre and take in fresh ideas from some of mos...

Mat Davis

03 May 2019

Google

The new Google Policy manager will make marketers’ lives a little bit easier

Disapproved keywords and ads can happen at any time and can take marketers by surprise – and while the fixing process isn’t terribly complex, Google is going to make advert...

20 Mar 2019

SEO

What do we know about Google’s March 2019 Core Update?

Algorithm updates are always the talk of the table within the SEO community, and recently rumours have been spreading about the latest core update that arrived on 12th Marc...

Lucia Navarro

18 Mar 2019

Google to stop supporting noindex in the robots.txt file

Lucia Navarro

View All

Google to stop supporting noindex in the robots.txt file

TO VIEW THIS CONTENT, PLEASE ENTER YOUR DETAILS

Lucia Navarro

View All