Robots.txt example
Html / / November 13, 2021
Website owners use the /robots.txt file to give instructions about their site to bots from search engines such as Google, Yahoo, Bing, etc.
The system works simple, if a robot wants to visit the URL for example: https://www.ejemplode.com/ Before doing so, check out https://www.ejemplode.com/robots.txt and its content is fixed. Through the robots.txt content, bots are instructed not to index, or access certain files. It can be specific, for example, that certain bots do not enter, but others do.
There are two important considerations when using /robots.txt:
- Robots can ignore your robots.txt. Especially malware bots that scan the web for vulnerabilities, and email address scanners used to send spam.
- The robots.txt file is publicly available. Anyone can see the content of your robots.txt
So don't use robots.txt to hide information. Rather, use it so that certain content on your site is not indexed.
Here are several example robots.txt with their explanation
Code:User-agent: * Disallow: /
This code makes all robots unable to access any content on the site. The User-agent parameter is to specify the robots, in this case with the asterisk, we point to all of them. And Disallow is so that they cannot access. In this case, doing a Disallow in /, the robots cannot access anywhere.
Code:User-agent: * Disallow:
On the other hand, if we leave the Disallow empty, nothing happens. Robots can access any content.
Now another example
Code:User-agent: * Disallow: /contact.html. Disallow: /file.html
This following code makes all the robots not go through contact.html or file.html
Then we have this other example that allows no robot to pass through the site, except for the Google bot
Code:User-agent: Google. Disallow: User-agent: *
Disallow: /