The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.
Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.
A robots.txt file is a text file which is read by search engine spiders and follows a strict syntax. These spiders are also called robots – hence the name – and the syntax of the file is strict simply because it has to be computer-readable. That means there’s no room for error here – something is either 1, or 0.
The robots.txt file is a simple text file placed on your web server which tells webcrawlers like Googlebot if they should access a file or not.
The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions: the original 1994 A Standard for Robot Exclusion document. a 1997 Internet Draft specification A Method for Web Robots Control
Google Robots.txt Parser and Matcher Library The repository contains Google's robots.txt parser and matcher as a C++ library (compliant to C++11).