Google Indexing API tests with normal URLs, which have NEITHER job posting NOR livestream structured data
I was curious testing what Googlebot and maybe even indexation does, when sending URLs to Google’s Indexing API have NEITHER job posting NOR livestream structured data https://developers.google.com/search/apis/indexing-api/v3/using-api?hl=en
The API’s isn’t designed for that use case
You can use the Indexing API to tell Google to update or remove pages from the Google index. The requests must specify the location of a web page. You can also get the status of notifications that you have sent to Google. Currently, the Indexing API can only be used to crawl pages with either job posting or livestream structured data.
But let’s try:
Setup
This is useful to start with nodejs and Google Indexing API:
In addition I moved the whole thing to an AWS lambda.
There is the AWS Lambda, which triggers the Google Indexing API with URL_UPDATED or URL_DELETED https://developers.google.com/search/apis/indexing-api/v3/reference/indexing/rest/v3/urlNotifications#UrlNotification.
The Lambda is connected to a DynamoDB to store everything (more or less request an response). That could be useful because the Indexing API just offers the latestUpdate, so you need to store what happens before the latest updates if you want to compare with server log files. Maybe also useful for the handling of sending pushes to the API later…
The API Gateway makes it possible to invoke the Lambda and pass e.g. URLs to it.
URL_UPDATED
As described here https://developers.google.com/search/apis/indexing-api/v3/using-api?hl=en you can trigger a URL update e. g. like this
const urlUpdate = async (crawlurl,type) => {
const tokens = await jwtClient.authorize()
const options = {
url: 'https://indexing.googleapis.com/v3/urlNotifications:publish',
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
auth: { bearer: tokens.access_token },
json: {
url: crawlurl,
type: 'URL_UPDATED',
}
};
return await rp(options)
}
So let’s see what happens to an old test website.
That’s logs filtered for Googlebot before the test:
At 14.04.2019 until 18:00 it had 4 Googlebot Smartphone + 1 Googlebot crawls
Launched the test with two Indexing API requests:
Pushed at 2019–04–14T18:06:40.982Z to http://schlaf-tracking.de/oura-ring/
Pushed at 2019–04–14T18:09:27.649Z to http://schlaf-tracking.de/nokia-sleep/
Googlebot visited the two URLs just a view minutes later
Within the next hour I tested 30 more request to these 2 URLs… No additional hit of Googlebot in logs was seen.
So I tested more URLs and random URL parameters of URLs I sent to the API before. Googlebot was visiting all of these too within 5 Minutes.
Next day:
In my setup first the Metadata is requested with https://developers.google.com/search/apis/indexing-api/v3/reference/indexing/rest/v3/UrlNotificationMetadata. Afterwards URL_UPDATED or URL_DELETED is called. Feedback of the my Lambda API looks like this:
It was possible to trigger Googlebot again the next morning, with the same URLs.
URL_DELETED
When I push URL_DELETED for URLs, which were pushed for URL_UPDATED just a view minutes before Googlebot was not visiting. No Bot within 15 Minutes after pushing delete. For none of the test URLs. That’s an example with pushing URL_DELETED seconds after URL_UPDATED:
If you push a “new” URLs (which was not pushed to the Indexing API before) Googlebot is visiting instantly
Pushed delete:
Googlebot checking the site, about a minute later:
First learnings:
- Looks like you can trigger Googlebot with the Indexing API, even for not job posting or livestream structured data URLs.
- Googlebot always visited the URLs within the first 10 minutes after using the API
- The first time I triggered the API with a new URL Googlebot was sent almost instantly. With repeated API calls for the same URL it was not willing to crawl again within the first hour. After some hours (don’t know exactly / maybe 8 hours) to push the same URL is possible again.
- Parameter URLs seem to be handled like new URLs, even if canonicalized.
- If you send URL_UPDATED and shortly after URL_DELETED Googlebot is not visiting. Seems to be the same “blocked time-frame” no matter what type of API call you do
Indexation
Until now I cannot see any change in Google SERPs
Especially the deletion hasn’t happened until now. 10 h and counting for deindexation…
Test 2: Set meta=noindex first and trigger API afterwards
Set to meta=noindex 19:00
Triggered Googlebot with URL_DELETED Indexing API instantly after
Googlebot visited as expected shortly after using the API
Still indexed at 19:30
Test Results:
NO impact on index or deindex, if you don’t have a jobs or live video markup
Used Googlebots I have seen:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)