Knowledgebase
Portal Home > Knowledgebase > Hosting > What is a robots.txt file?
What is a robots.txt file?
| If you have pages or directories which you do not want to be indexed by the search engines you can add this information to the robots.txt file and place the file on the server. When the search engine spider visits your site it reads the file and follows the instructions The robots.txt file need not exist but if it does it must be called "robots.txt" and must be written in ascii It must be in the root directory of the web site as spiders will not look for it anywhere else Note: If you do not have a robots.txt file in the root directory of your web site you may find a large amount of 404 errors appear on your web stats. This is because the file was requested by bots or spiders and was not available. To create a robots.txt file • Create a text file using a Word Processor or HTML editor using the required coding as examples below • Save the file as robots.txt • Upload the robots.txt file to the root directory using your FTP software in ACSII mode Examples To exclude all robots from parts of the server User-agent: * Disallow: /cgi-bin/ Disallow: /misc/sitestats/ Exclude a specific spider from parts of the server User-agent:slurp.so/ Disallow: /cgi-bin/ Disallow: /secure/ Disallow: /products/ Disallow:/misc/sitestats/ This indicates that nothing is disallowed and the spider can follow all links User-agent: * Disallow: To allow a single robot complete access and exclude all others User-agent: Googlebot/1.0 Disallow: User-agent: * Disallow: / This would prevent your entire web site from being indexed User-agent: * Disallow: / Spider User-agents Alta Vista : Scooter Infoseek : InfoSeek Sidewinder Ultraseek Mozilla Lycos : Lycos_Spider_(T-Rex) Google : Googlebot/1.0 Inktomi : Slurp Slurp.so The reasons for excluding files from some or all spiders could be privacy, log files or pages optimised for a particular search engines which you would not want indexing by other search engines You can add the Robots meta tag to the head of your web page to instruct spiders what to index and what not to <html> <head> <title>What Is A Robots text File</title> <meta name="description" content="If you have pages or directories which you do not want to be indexed by the search engines you can add this information to the robot txt file and place the file on the server"> <meta name="robots" content="index, follow"> </head> <body> The RobotsMeta tag has the following options Indexes the page and follows links <meta name="robots" content="index, follow"> Does not index the page, but follows links <meta name="robots" content="noindex, follow"> Indexes the page, but does not follow links <meta name="robots" content="index, nofollow"> Neither indexes or follows links <meta name="robots" content="noindex, nofollow"> You can use one of these tags on specific pages according to your requirements for each page
|
Add to Favourites
Print this Article
|
Also Read