Introduction to Web Robots, ROBOTS.TXT, and RoboGen
Search engines such as Excite and
AltaVista use web spiders, also known as robots, to create the indexes
for their search databases. These robots transverse HTML trees by
loading pages and following hyperlinks, and they report the text and/or
meta-tag information to create search indexes. ROBOTS.TXT, a file
that spiders look in for information on how the site is to be cataloged.
It is a ASCII text file that sits in the document root of the server.
It defines what documents and/or directories that confirming spiders are
forbidden to index.
The robot exclusion protocol was
introduced by Martijn Koster in 1994 to deal with problems that had been
arising due to the increasing popularity of the internet and the toll web
spiders were having on system resources. Some of the problems were
caused by robots rapid-firing requests, that is loading pages in rapid
succession. Other problems such as robots indexing information deep
in directory trees, temporary information, and even accessing cgi-scripts.
The robot exclusion protocol was quickly adopted by webmasters and web
robot makers as a way to organize and control the indexing process.
Since then, the size of the Internet
has increased dramatically and millions of people are using it. The
number of web robots crawling the web is greater than before and it is
more important than ever for all web sites to have a properly created and
maintained ROBOTS.TXT file.
With RoboGen you create robot exclusion
files by selecting All Robots or a specific user-agent and adding documents
and/or directories by entering the path names manually or by selecting
them using FTP. Once all the restrictions and directives are set
you can save the robots.txt file to your hard drive or upload it directly
to your server.
It is important to remember that
robot exclusion files are not a security measure. Some robots will
simply ignore the file and others may purposely load the documents that
the files marks as disallowed. This means that robot exclusion files
are really only useful for controlling what appears in search engines.
RoboGen Product Overview
RoboGen is a visual editor
for Robot Exclusion Files. It allows the user to quickly and easily
create the ROBOTS.TXT files required to instruct web search engines which
parts of a web site are not to be indexed and made searchable by the general
web public. RoboGen does this by providing the user a way to log
onto his FTP server and then select the documents and directories which
are not to be made searchable.
For more information on RoboGen
and robot exclusion files, see Introduction
to Web Robots, ROBOTS.TXT, and RoboGen.
Feature Overview
-
Create Robot Exclusion Files
by selecting documents and directories.
-
Log into FTP servers and even
upload robots.txt from RoboGen.
-
Manage information for multiple
servers.
-
Database of over 180 know user-agents
and 10 major search engines.
-
Ability to edit the robot database
and add additional user-agents.
-
RoboGen recognizes common document
roots when logging into FTP servers and suggests that they be set as the
document root.
For a detailed overview of RoboGen's
features, see the RoboGen Feature Comparison
Chart.
Pricing
Base Cost:
- $24.95 for a single-license.
Shipping and Handling:
- Additional $5.75 for CD-ROM delivery.
- $0.00 for Download Only delivery.
|