Google Indexing API tests with normal URLs, which have NEITHER job posting NOR livestream structured data

Tobias Willmann
5 min readApr 15, 2019

--

I was curious testing what Googlebot and maybe even indexation does, when sending URLs to Google’s Indexing API have NEITHER job posting NOR livestream structured data https://developers.google.com/search/apis/indexing-api/v3/using-api?hl=en

The API’s isn’t designed for that use case

You can use the Indexing API to tell Google to update or remove pages from the Google index. The requests must specify the location of a web page. You can also get the status of notifications that you have sent to Google. Currently, the Indexing API can only be used to crawl pages with either job posting or livestream structured data.

But let’s try:

Setup

This is useful to start with nodejs and Google Indexing API:

In addition I moved the whole thing to an AWS lambda.

There is the AWS Lambda, which triggers the Google Indexing API with URL_UPDATED or URL_DELETED https://developers.google.com/search/apis/indexing-api/v3/reference/indexing/rest/v3/urlNotifications#UrlNotification.

The Lambda is connected to a DynamoDB to store everything (more or less request an response). That could be useful because the Indexing API just offers the latestUpdate, so you need to store what happens before the latest updates if you want to compare with server log files. Maybe also useful for the handling of sending pushes to the API later…

The API Gateway makes it possible to invoke the Lambda and pass e.g. URLs to it.

URL_UPDATED

As described here https://developers.google.com/search/apis/indexing-api/v3/using-api?hl=en you can trigger a URL update e. g. like this

const urlUpdate = async (crawlurl,type) => {
const tokens = await jwtClient.authorize()
const options = {
url: 'https://indexing.googleapis.com/v3/urlNotifications:publish',
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
auth: { bearer: tokens.access_token },
json: {
url: crawlurl,
type: 'URL_UPDATED',
}
};
return await rp(options)
}

So let’s see what happens to an old test website.

That’s logs filtered for Googlebot before the test:

At 14.04.2019 until 18:00 it had 4 Googlebot Smartphone + 1 Googlebot crawls

Launched the test with two Indexing API requests:

Pushed at 2019–04–14T18:06:40.982Z to http://schlaf-tracking.de/oura-ring/

Pushed at 2019–04–14T18:09:27.649Z to http://schlaf-tracking.de/nokia-sleep/

Googlebot visited the two URLs just a view minutes later

Within the next hour I tested 30 more request to these 2 URLs… No additional hit of Googlebot in logs was seen.

So I tested more URLs and random URL parameters of URLs I sent to the API before. Googlebot was visiting all of these too within 5 Minutes.

Next day:

In my setup first the Metadata is requested with https://developers.google.com/search/apis/indexing-api/v3/reference/indexing/rest/v3/UrlNotificationMetadata. Afterwards URL_UPDATED or URL_DELETED is called. Feedback of the my Lambda API looks like this:

It was possible to trigger Googlebot again the next morning, with the same URLs.

URL_DELETED

When I push URL_DELETED for URLs, which were pushed for URL_UPDATED just a view minutes before Googlebot was not visiting. No Bot within 15 Minutes after pushing delete. For none of the test URLs. That’s an example with pushing URL_DELETED seconds after URL_UPDATED:

If you push a “new” URLs (which was not pushed to the Indexing API before) Googlebot is visiting instantly

Pushed delete:

Googlebot checking the site, about a minute later:

First learnings:

  • Looks like you can trigger Googlebot with the Indexing API, even for not job posting or livestream structured data URLs.
  • Googlebot always visited the URLs within the first 10 minutes after using the API
  • The first time I triggered the API with a new URL Googlebot was sent almost instantly. With repeated API calls for the same URL it was not willing to crawl again within the first hour. After some hours (don’t know exactly / maybe 8 hours) to push the same URL is possible again.
  • Parameter URLs seem to be handled like new URLs, even if canonicalized.
  • If you send URL_UPDATED and shortly after URL_DELETED Googlebot is not visiting. Seems to be the same “blocked time-frame” no matter what type of API call you do

Indexation

Until now I cannot see any change in Google SERPs

Especially the deletion hasn’t happened until now. 10 h and counting for deindexation…

Test 2: Set meta=noindex first and trigger API afterwards

Set to meta=noindex 19:00

Triggered Googlebot with URL_DELETED Indexing API instantly after

Googlebot visited as expected shortly after using the API

Still indexed at 19:30

Test Results:

NO impact on index or deindex, if you don’t have a jobs or live video markup

Used Googlebots I have seen:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)

AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Used tool for log files:

https://www.screamingfrog.co.uk/log-file-analyser/

--

--