Learn more about search engine spiders
What are search engine spiders?
You might have already encountered this term 'spider' especially on internet marketing related websites or articles also mentioning
as 'robots', 'bots', agents' or simply 'web crawlers'. Their name reflects exactly what they actually are and doing. Search engine
spiders are basically automated software programmes or scripts which are crawling the whole web by following the hyperlinks found
on the web pages and gathering all data and indexing or storing them in the search engines huge databases. Spiders also have their
limits which mean they can not read images, flashes, frames or Javascripts. They can not follow drop down menus and stop crawling
if they find dynamic URLs. There are of course many types of spiders or worms we can distinguish and not all of them are search
engine spiders, some of them are harmful or created with bad purposes.
How do search engine spiders work?
The first thing you should know before explaining how search engine spiders are working you have to visualize the whole internet as
a web of a spider with billions of single pages interconnected by hyperlinks with each other. Spiders or robots are following these
links and reading the content of the pages fetching all those data in huge databases where other software programmes also known as
algorithms are compiling the retrieved data.
Many people consider that they should submit their newly built web site to the search engines which is of course an option to be
indexed but as long as you get a reciprocal link from other web site preferably relevant to your niche and which is already indexed
by the search engines than the spider will automatically index your web site as well. So it is very important to increase your link
popularity, the more quality link you have the better the chances for a high ranking as well as always fresh data in the search
engines index.
By checking your server logs or traffic statistics you can see how often the spider is indexing your pages and also you can
identify them by their user agent name like 'Googlebot' or 'Yahoo Slurp'.
There are of course many unidentifiable robots some of them are even human-powered.
After all the data has been retrieved by the spiders, advanced algorithms are being used to evaluate and score all the information
so that when a searcher enters a query into the search engine, it will list the most relevant result which would satisfy the user
needs.
Another important thing is that search engine spiders can be controlled with a robots.txt file where you can set rules how often
and what pages should spiders crawl or ignore.
How do search engine spiders read your pages?
As soon as the spider has reached a website he starts reading its visible contextual content as well as meta data content like
title, description. In the body part of the page the robot will read all the text content, image alt tags, headings, comment and
attribute tags and nevertheless the embedded anchored texts as being hyperlinks leading to other internal pages or websites.
The retrieved data is analyzed by other scripts following many factors before deciding on what the website is about and how
valuable it is. The evaluation methods search engines are using are different.
Search engines are updating their data on a regular basis and once you get indexed search engine robots will keep revisiting your
site to see if there is fresh new content and index them again to deliver always the most current data for the searchers.
Basically the more popular and active your website is the more often will be revisited by most of search engine spiders.
It is very important where are your web pages hosted, since in case it drops down too often, you risk not to be re-indexed any more
and deleted from search engines index database however usually if spiders can't access your web site they will try to revisit
later when it will be accessible.
|