Some Google Discover data learnings

Tobias Willmann
6 min readAug 19, 2021

Part 1: Understand better what topic performed in Discover. Use NLP entity detection and Google Search Performance keywords

Sadly Google Discover has no keywords but just URLs. This makes it hard and time consuming to read the Google Discover report. Going through all those URLs, quessing what the article was about (by checking URLs), clicking on some article (to really see the article in the browser)… puuuh.

So our idea was to use more data sources / keywords to make the report faster readable.

Google Search Performance Top keywords:

Copy the top search keywords of an URL to your Discover report. E.g. this article https://www.blick.ch/sport/olympia/tokio2020/blamage-lassen-sich-abschiessen-deutsche-pruegeln-auf-olympia-handballer-ein-id16724687.html had rankings in Google Search for “handball”, which is a quite good label keyword for the article in the Google Discover report.

NLP entity detection:

If you apply NLP entity detection to the text of an article you will get some “keywords”, too. https://www.blick.ch/sport/olympia/tokio2020/absicht-oder-nicht-marathonlaeufer-sorgt-fuer-unfassbaren-olympia-moment-id16736044.html had the detectable entity “morhad amdouni” (he’s a marathon runner)

Learnings of adding keywords with those two additional data sources

#1 For Google Search Performance keywords you need to use the GSC API

The export does either give you keywords or pages. If you want to map Google Discover report’s pages you want to map the page, but than get top keywords.

rowLimit: 25000 + pagination with startRow setting are super useful here: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query#request-body

I don’t see any option to this with exports from GSC, without spending hours downloading.

#2 You need strong filters for Search Performance data

You don’t want brand keywords, general news searches, all those typos. If you want to use Search data for labeling keep in mind it’s what users were searching and that’s not in any case a good label.

We built them manually.

#3 Search Performance keywords are not as “clean” as wanted

So e.g. for this article https://www.blick.ch/people-tv/international/ihr-vater-hat-800-millionen-auf-dem-konto-dr-dres-tochter-muss-im-auto-leben-id16729203.html we had rankings for “dr dre tochter” and “dr.dre tochter” (dr. dre daughter) but not for “dr. dre”. If you want to analyze thousands of articles you wrote within a year you don’t care anymore about a single story related to dr. dre’s daughter. You just care about all stories related to dr. dre.

Search has a lot of long tail keywords… for labeling Discover articles you probably want short head.

Here https://www.blick.ch/fr/sport/tennis/lynette-et-robert-federer-nous-navons-jamais-vu-roger-comme-une-future-star-id16735092.html too. Combined terms in search:

You could apply NLP entity detection to the search terms. Sometimes this provides something more short head.

#4 NLP will detect not important entities, too

If the entity detection algorithm detects “morhad amdouni” as keywords/entities/topic for this https://www.blick.ch/sport/olympia/tokio2020/absicht-oder-nicht-marathonlaeufer-sorgt-fuer-unfassbaren-olympia-moment-id16736044.html article, that’s of course correct. But “morhad amdouni” isn’t really famous for our readers, so this keywords is just noise in the report. Search data works much better here to label the article. E.g. with “marathon olympia getränke” (marathon olympia drinks) or “marathon olympia wasser” (marathon olympia water) being top searches to articles is described much better.

The same happens e.g. here https://www.blick.ch/schweiz/sahara-luft-aus-algerien-bund-erklaert-die-hitzewelle-und-wann-sie-vorbei-ist-id16722093.html where NLP detects “Sahara” but the article is about the weather in Switzerland being influenced by winds coming from Sahara.

#5 Multiple data source combined works best

Of course! As mentioned above NLP and Search performance data have there benefits and problems to generate keywords for Google Discover. Both data source need to be cleaned up and still are not 100% useful. The combination of both can help.

For this article https://www.blick.ch/people-tv/international/angelina-jolie-gondelt-im-luxus-zug-durch-die-schweiz-mordsspass-im-orient-express-id16727792.htm search and NLP entity detection list “angelina jolie”. That’s a good label for the article, if we want to analyze months of data:

Still the details of both source is super useful. The “orient express” hint in from search helps to understand that this article is cool for train enthusiast. The “schweiz” (Switzerland) term in NLP helps to understand that this article worked great because “angelina jolie” was in Switzerland, which makes it more relevant for our swiss readers.

Perfectly aligned is rare, but a strong signal:

Part 2: Merging multiple Google Discover reports

There is no Google Discover API yet (hopefully soon). So you have to work with CSV exports. To go beyond the 1000 rows of data in the export you need to combine multiple exports

Learnings of merged Google Discover reports

#1 Lower the date range at least until you are below 1000 rows of data to get all the data

If a 7 days export has less than 1000 rows of data your probably fine with combining weeks. If a 1 day export hast still more than 1000 rows, you need to use additional filters to get everything.

#2 For us there are almost no evergreen / seasonal articles which work again automatically (without re-optimization) in Discover

If you get CSV exports of single days or weeks you can see in how many exports (corresponds days or weeks) the URL was listed.

In the combined day exports you can see that most discover articles are ranking for 2–4 days

In the combined week exports 99.5% of our articles were ranking 1–2 weeks

As mentioned in many other SEO blogs we highly doubt that “8 Essential Tipps for Food Photography Props and Lighting” (which is shown in Google’s Discover info page https://blog.google/products/search/introducing-google-discover/) can rank for longer than a few days, even if its evergreen.

You need to write either news articles or try to really recycle them. If you recycle a proper revise is best.

#3 Metrics like articles in discover per day, average daily impressions per article or median daily clicks per article are interesting to monitor too

Do you have a better discover performance because more articles rank or is it longer + better rankings of fewer articles?

--

--