What is a robots.txt file?

If you have pages or directories which you do not want to be indexed by the search engines you can add this information to the robots.txt file and place the file on the server. When the search engine spider visits your site it reads the file and follows the instructions

The robots.txt file need not exist but if it does it must be called "robots.txt" and must be written in ascii

It must be in the root directory of the web site as spiders will not look for it anywhere else

Note:
If you do not have a robots.txt file in the root directory of your web site you may find a large amount of 404 errors appear on your web stats. This is because the file was requested by bots or spiders and was not available.

To create a robots.txt file
• Create a text file using a Word Processor or HTML editor using the required coding as examples below
• Save the file as robots.txt
• Upload the robots.txt file to the root directory using your FTP software in ACSII mode

Examples
To exclude all robots from parts of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /misc/sitestats/


Exclude a specific spider from parts of the server
User-agent:slurp.so/
Disallow: /cgi-bin/
Disallow: /secure/
Disallow: /products/
Disallow:/misc/sitestats
/

This indicates that nothing is disallowed and the spider can follow all links
User-agent: *
Disallow:

To allow a single robot complete access and exclude all others
User-agent: Googlebot/1.0
Disallow:
User-agent: *
Disallow: /


This would prevent your entire web site from being indexed
User-agent: *
Disallow: /


Spider User-agents
Alta Vista : Scooter
Infoseek : InfoSeek Sidewinder Ultraseek Mozilla
Lycos : Lycos_Spider_(T-Rex)
Google : Googlebot/1.0
Inktomi : Slurp Slurp.so


The reasons for excluding files from some or all spiders could be privacy, log files or pages optimised for a particular search engines which you would not want indexing by other search engines

You can add the Robots meta tag to the head of your web page to instruct spiders what to index and what not to

<html>
<head>
<title>What Is A Robots text File</title>
<meta name="description" content="If you have pages or directories which you do not want to be indexed by the search engines you can add this information to the robot txt file and place the file on the server">
<meta name="robots" content="index, follow">
</head>
<body>

The RobotsMeta tag has the following options
Indexes the page and follows links
<meta name="robots" content="index, follow">

Does not index the page, but follows links
<meta name="robots" content="noindex, follow">


Indexes the page, but does not follow links
<meta name="robots" content="index, nofollow">


Neither indexes or follows links
<meta name="robots" content="noindex, nofollow">


You can use one of these tags on specific pages according to your requirements for each page

  • 8 Users Found This Useful
Was this answer helpful?

Related Articles

Windows 2003 Server does not stream FLV videos

Issue When Flash Player movie files that stream external FLV files (Flash videos) are placed on...

Directory Listing Denied

This may be due to your default file not being set. Go into your panel and then go to Domains...

Cant Connect to FTP

Problem: I cant connect to the ftp. When I type my ftp address given to me at signup and use the...

Cant View My website or Website Updates

PROBLEM:- My website is not showing the correct content.RESOLUTION:- 1. If you're site is...