Our latest Tweets

CIA – Crawler, Index & Algorithm



We will gradually publish some Search Engine Optimisation basics for those of you who are starting off on the topic, and for those of you who are about to hire an SEO professional and want to get some background knowledge to help assess their credentials.

How do websites get listed on Search Engine Result Pages (SERPs)?

A dubious looking spy, representing the meaning of CIA which we are not using in this OPUS blog postYou got to make it easy for the CIA to access your files

To begin with we will discuss an overview of how search engines find a webpage, what they do when they find the webpage, and how the search engines decide whether to and where to place a link to the webpage in the search engine's result pages (SERPs).

Got to be Google

Not everybody uses Google. There are many other search engines to choose from, and search engine popularity can vary hugely depending on the part of the world you are in. However, today in 2015 Google still dominates with Google Search holding almost 90% of the of the worldwide market share.

 A simple bar chart displaying the top 5 search engines globally as of September 1, 2016 from OPUS Cambodia

So, we are going to focus on how Google Search works. A method of finding and listing websites which is more or less employed by many other less popular search engines.

the Crawler

Crawlers are also known as web spiders as represented in this illustration of a robotic arachnid by OPUS Cambodia(Also known as a web spider, robot, bot and in Google's case the Googlebot)

A web crawler follows links to webpages, PDFs, images and other files on the Internet. Once the crawler accesses a file that it finds it scans the code of that file and decides which parts to store in a database called the index.

If the file that the crawler has scanned contains links to other files (e.g. webpages) then the crawler can follow those links and similarly scan and assess the content of those files. On the other hand, if no websites or other Internet documents link to your website the crawler will not be able to find your website to add to the index.

The Googlebot can regularly recrawl webpages that it has already submitted to the index in case any changes have occurred to the contents of those webpages. If it finds changes it will then update the index.

And on the crawlers goes, continually finding files, scanning their data, recording parts of the data in the index and moving on through other links.

the Index

An illustration by OPUS Cambodia of a set of library shelves representing the index element of the search engineThe massive database to which the crawler submits content that it finds is called the index.

The index contains an alphabetical list of words and terms that the search engine's users look for. Each index entry contains lists of the files which contain the text of that entry. This mapping of search queries to files allows for swift access to files containing the words being searched for.

Rather than presenting you with the full list in traditional ways such as alphabetically or chronologically, modern search engines do their best to try and provide you with search results that place the most relevant first.

To do this Google uses a complex algorithm.

the Algorithm

Google's 'secret sauce'

An picture by OPUS Cambodia of a sauce bottle representing the "secret sauce" in Google's algorithmThe least understood element of these three key parts Google Search is the algorithm.

Once the content of your website has been added to Google's index, the search engine then has the ability to list your webpages in search results.

To do this Google uses a secret algorithm. The algorithm uses a large array of factors (over 200 hundred according to Google) to try and determine the results and the order of results that are most relevant to the user.

While we, as SEO professionals, do not know for certain the exact factors and importance of factors that the algorithm uses, we do know that the determining factors and their importance are regularly modified by Google. Google makes these changes in order to improve the quality of the results that it gives to its users. However, these changes can have a negative effect on our webpages' positions in Google's search results.

We aim to counter such negative effects by regularly monitoring the affects of the changes we make to our webpages, along with external changes such as links to our webpages for third party webpages. Through this continual analysis and adjustment we gain an understanding of some of the factors that Google uses to assess our webpages' suitability for specific search results.

When Google makes one of its changes to the algorithm we monitor the affect of that change on the positions of our webpages in those search results, and adapt our behaviour to try and improve those positions.

Simply put, we aim to be assessed positively by the algorithm so that it places us in favourable positions on Google's search engine results.

Remember to optimise your information for the CIA

So, this concludes a basic overview of how many search engines, and Google in particular learn about and assess websites in order to list them in search results.

An illustration of a Google spider on a web by OPUS CambodiaOver the coming months we will publish more articles about the basics of search and Search Engine Optimisation (SEO); as well as more specific and advanced tips for those of you out there that are already taken steps to optimise your webpages and content, but want to improve your understanding and methods.

Just remember to aim to have your files found, assessed and prioritised by the CIA!

LinkedIn Facebook Twitter Share
Share This