Discover more from AASAN || Tech by Wajid Khan
Robot.txt File Structure
Brief Summary about Robot.txt File
Here's a summary of the robots.txt file:
Purpose: The robots.txt file informs web robots about the website's crawling preferences and any specific instructions regarding access to certain pages or directories.
User-agent: This directive specifies the specific web robot or crawler to which the following rules apply. For example, "User-agent: Googlebot" targets the Google search engine crawler.
Disallow: This directive indicates which parts of the website should not be crawled or indexed by search engines. It specifies specific directories or files that are off-limits. For instance, "Disallow: /private" tells robots not to access the "private" directory.
Allow: This directive specifies exceptions to the "Disallow" directive. It can be used to override certain restrictions and grant access to specific directories or files within a disallowed section.
Sitemap: This directive points to the XML sitemap file that contains a list of URLs on the website. It helps crawlers discover and index web pages more efficiently.
Crawl-delay: This directive suggests a delay (in seconds) between successive requests made by web crawlers. It can be used to prevent excessive server load by limiting the crawling rate.
User-agent: * (wildcard): When the asterisk (*) is used as the user-agent, it applies to all web robots. This is useful when setting general rules that apply universally.