New Googlebot Crawler Configurations Now Detect Locale-adaptive Content

Viewing sites that adjust its content based on language and perceived location would be easier if separate URLs could be created for each locale. However, some sites cannot have separate URLs for each country. Consequentially, a number of websites harness locale-adaptive techniques—wherein a webpage returns different content depending on the visitor’s preferred language and perceived location.

For quite some time, Google had not been addressing this issue as its Googlebot crawler’s default IP addresses were based only in the United States. Plus, the crawler requests for pages without setting an Accept-Language HTTP request header. This would result in incomplete crawling, indexing, and ranking of local-adaptive content across the web. Thankfully, Google has finally addressed this issue.

Announced on January 28, 2015, Google has updated its crawling method with new locale-aware crawl configurations. According to Qin Yin, Software Engineer for Search Infrastructure, and Pierre Far, Webmaster Trends Analyst, the now locale-aware Googlebot crawler will be able to completely crawl, index, and rank websites detected to be using locale-adaptive techniques.

A website is considered locale-adaptive if the Googlebot crawler detects the following signals and hints, as explained in Google’s Webmaster Tools Help page:

Serving different content on the same URL—based on the user’s perceived country (geolocation)

Serving different content on the same URL—based on the Accept-Language field set by the user’s browser in the HTTP request header

Completely blocking access to requests from specific countries

Now, if a website is detected to have locale-adaptive content, the Googlebot crawler will automatically perform locale-aware crawling on the pages using one or both of the following configurations, as described in the announcement:

Geo-distributed crawling where Googlebot would start to use IP addresses that appear to be coming from outside the USA, in addition to the current IP addresses that appear to be from the USA that Googlebot currently uses.

Language-dependent crawling where Googlebot would start to crawl with an Accept-Language HTTP header in the request.

Given instances when a website blocks users located in the United States, but permits those in Australia, for example, with the new geo-distributed crawling configuration, the website’s server will block a Googlebot crawler coming from the U.S., but let a Googlebot coming from Australia crawl the pages.

Qin and Far note that webmasters may have noticed how these configurations have changed the way Google crawls and shows their website in the search results without modifying their CMS or server settings.

Despite the new locale-aware crawling method, Google still recommends that webmasters use separate URLs for each locale, as stated in the announcement:

Note that these new configurations do not alter our recommendation to use separate URLs with rel=alternate hreflang annotations for each locale. We continue to support and recommend using separate URLs as they are still the best way for users to interact and share your content, and also to maximize indexing and better ranking of all variants of your content.

Furthermore, Google said webmasters should ensure that their website configuration supports locale-aware crawling to be properly crawled and indexed by Googlebot.

To learn more about locale-aware crawling, visit Google’s Webmaster Tools Help page.

For questions and feedback, visit the internationalization Webmaster Help Forum.

About
Latest Posts

Marketing Digest Writing Team

The Marketing Digest Writing Team provides the content you need to keep you well-informed on the latest developments and trends in the digital marketing industry.