The robots.txt file

robotsThe robots.txt is a file that can be used to restrict access to your site or parts of your site by preventing access from the search engine robots as they crawl the web. Most of the time this is used to prevent duplicate content (eg for a forum you do not want your printthread.php indexed as well) or just prevent parts of the forum that are not going to add any value if the search engines indexed them (eg the registration page). The other advantage of a robots.txt file is it reduces load on your server by preventing search engines crawler wasting your bandwidth on unnecessary sections of the site or forum

The robots.txt file is easily made with any text editor that can save it as a simple .txt file.The most basic file is:

User-agent: *
Disallow: /

This tells all crawlers to crawl all the site.

The file should be placed in the root directory of the website. You can check if a site has actually a robots.txt by going to http://www.forumdr.com/robots.txt

For vBulletin, I generally use this:

User-agent: *
Disallow: /admincp/
Disallow: /announcement.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /faq.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /misc.php
Disallow: /modcp/
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php

Sitemap: http://www.example.tld/sitemap.xml

This is what is most commonly recommended for a vBulletin site with two exceptions: I do not restrict the search engines from the members.php pages, as I want them indexed as I use them as a source of traffic. I also include the sitemap.xml location as part of the robots.txt (see this recommendation; and even Google include a direction to their own sitemap in their robots.txt file!)

For more on robots.txt, see the Wikipedia entry, Google’s explanation and Bing’s explanation. Below is a video from Matt Cutts of Google about if the robots.txt actually works:

User-agent: *
Disallow: /

No comments yet.

Leave a Reply