How to scrape Google Autocomplete with Excel
If you have read something on this blog in the past, you should already know I like scraping with Excel (here I managed to scrape Twitter followers); if not, well, I guess you’ll learn it now.
Without wasting time on what web scraping is, and supposing you already know why scraping Google Autocomplete would be useful for SEO and not only, I’m going straight to show you how to scrape it without coding just using Excel combined with:
- SEO tools for Excel
- Regular Expressions – here an useful cheatsheet
- Xpath – here a very complex explanation that we can sum up in Xpath is a query language for selecting nodes from an XML document
Before we start, an important statement you can find here on Google’s Official Webmaster blog:
In the interest of maintaining the integrity of autocomplete as part of Search, we will be restricting unauthorized access to the unpublished autocomplete API as of August 10th, 2015
We all know Google don’t like scrapers, even if they do it:
.@mattcutts I think I have spotted one, Matt. Note the similarities in the content text: pic.twitter.com/uHux3rK57f
— dan barker (@danbarker) 27 febbraio 2014
So pay attention to what you do since you might easily have your IP temporarily or permanently banned from using the search engine.
That said, let’s go.
Step 0 – Excel set up
Build your Excel file like mine:
In cell A2 you’ll type your query; in cell B2 the language code.
Please note I’m using an Italian version of Excel, I’ll try to use single quotes (‘) and commas (,) for the English version. If something doesn’t work try using double quotes (“) and semicolon (;).
Step 1 – Where to get the data from
This is very simple, you can send GET requests to this address
http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=yourquery
Here we have to notice 3 things:
- output=toolbar: this query-string specifies you want results in XML format
- hl=en: this asks for results in English language
- q=yourquery: of course you have to change yourquery with an actual query (words concatenated using plus (+) symbol – for example my+query+is+silly
Step 2 – Downloading data
Just open your Excel file and use this function in cell D2, available with SEO tools for Excel:
=DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2)
It downloads the entire content of the page and you’ll get an answer like this (here I’m using poker as my query):
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion><suggestion data="pokeradar"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker hands"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker rules"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker face"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerstars"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerus"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker games"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerap"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker table"/></CompleteSuggestion>
</toplevel>
Step 3 – Getting rid of XML syntax
We want to extract the text from that XML, so we start deleting the code we don’t need:
=RegexpReplace(
RegexpReplace(
DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2),
'((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?()?',
''
),
'/>',
';')
We get something like this:
"poker hands";"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"
Isn’t it cool? Yes, but we want to separate each item of the list…
Step 4 – Separating items
In cell C2 you use LEFT and FIND functions:
=RegexReplace(LEFT(D2,FIND(D2,';',1)),'"','')
We’re finding the first semicolon in the string and keeping what’s at its left, also deleting the double quotes; in C2 you’ll have:
poker hands
Step 5 – Iterative extraction
Once we have extracted the first item, we remove it from the string so to obtain a string containing the last 9 terms; in cell D3 type:
=RIGHT(D2,LENGHT(D2)-FIND(';',D2,1))
With this formula we are finding the first semicolon and keeping only what is at it’s right; in D3 you should have:
"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"
So now in cell C3 you can extract the first term of this string (the second of the original one) by typing:
=LEFT(D3,FIND(D3,';',1))
From now on just drag down C3 and D3 till row 11 and you’ll have in cells C3-C11 each of items.
If you have done everything correctly, this magic will happen:
Happy scraping (and if you want to thank me I’ll accept links and shares).
3 COMMENTS
Caro Signor Giuseppe
volevo farle i miei complimenti per l’arguzia con cui affrontata il problema “scraping google suggestions” semplice ed efficacie.
io programmo in basic “visual basic 2010 per la precisione”(che immagino per lei saranno “mattoncini lego”.. 🙂
e vorrei chiederle se secondo lei potrei riuscire anche io a fare scraping passando per una windows form.
ho navigato quasi ininterrottamente per giorni alla ricrca di un modo ed ho quasi perso le speranze, ma vista la sua competenza ho pensato di domandarle un consiglio.
attualmente posso salvare la richiesta(le prime parole chiave) in un file txt
e per me sarebbe il massimo riuscire a salvare le 20 o 30 risposte in un altro file txt.
purtroppo non conosco nulla della tecnologia SEO 🙁
secondo lei quale sarebbe la strada piu giusa da seguire per ottenere una macchinetta utile allo scopo?
ci tenevo a precisarle che userei un timer per limitare le richieste in modo da non danneggiare google e non farmi bannare( il che mi preoccupa molto:).
la ringrazio molto per la sua attenzione.
Vanny
Hi Giuseppe,
Still not so sure on how these scrapped data is useful.
Other than scrapping for users – which you have talked about in a previous post, what exactly would the kind of data above be useful for? Keywords?
hi, thanks for your tutorial. can you send me the excel file?