Robot.txt File Structure

Brief Summary about Robot.txt File

Jun 22, 2023

Here's a summary of the robots.txt file:

Purpose: The robots.txt file informs web robots about the website's crawling preferences and any specific instructions regarding access to certain pages or directories.

User-agent: This directive specifies the specific web robot or crawler to which the following rules apply. For example, "User-agent: Googlebot" targets the Google search engine crawler.

Disallow: This directive indicates which parts of the website should not be crawled or indexed by search engines. It specifies specific directories or files that are off-limits. For instance, "Disallow: /private" tells robots not to access the "private" directory.

Allow: This directive specifies exceptions to the "Disallow" directive. It can be used to override certain restrictions and grant access to specific directories or files within a disallowed section.

Sitemap: This directive points to the XML sitemap file that contains a list of URLs on the website. It helps crawlers discover and index web pages more efficiently.

Crawl-delay: This directive suggests a delay (in seconds) between successive requests made by web crawlers. It can be used to prevent excessive server load by limiting the crawling rate.

User-agent: * (wildcard): When the asterisk (*) is used as the user-agent, it applies to all web robots. This is useful when setting general rules that apply universally.

AASAN || Tech by Wajid Khan

Discussion about this post