Time to give search engines more instructions on where to crawl
Today we’ll do some traffic management using the text file called robots.txt, which tells search engine crawlers which pages or files the crawler can or can’t request from your site. The purpose of this step is to avoid overloading your pages with requests.
In simple words, when you create this file, you specify the directories of your website that web crawlers can and can’t access, using the instructions “allow” and “disallow.” The Google Bot always sorts out the robots.txt before crawling a website. So, the file is essential when you want to make sure that search engines focus on your website’s right content.
Decide that a section of your site shouldn’t be crawled and add the “disallow” instruction to the file. It looks like this: User-agent: * Disallow: /this directory. Note that this step doesn’t hide your page from Google, but it reduces the chances to have the page indexed.
Some more things you should know about robots.txt files:
- The file must be named “robots.txt,” without any upper case (not Robots.txt, or robots.TXT).
- Hosting services like Wix, Drupal, or Blogger, usually don’t allow you to edit your robots.txt file directly. Instead, you can use a search settings page (or something similar) to guide search engines on how to crawl your website.
- The way to check if a page has been crawled is to search for the page URL in Google.
- Not all crawlers follow the instructions in your robot.txt files, so if you want to hide data, you need to password-protect private data on your server instead (the information listed in the robot.txt is public).
Your tasks for today:
- Check if you have a robots.txt file. Type in your root domain, then add /robots.txt to the end of the URL, and see what comes out.
- Use the robot.txt tester by Google (you get access to this tool from your Google Search Console account) to test how well your file is performing.
- If you don’t have a robot.txt file, you need to create one. You can use any text editor that creates UTF-8 text files (don’t use word processors) or a robots.txt generator. The tricky part is to place the file at the root of your website. If you’re not sure about the process, it’s better to contact your web hosting service provider.
We know it’s a lot for one day, but if your robot.txt file blocks access to your website, you want to fix it as soon as possible. If you have more questions about how to manage your robot.txt file, we’re here to help. See you tomorrow!
It's your turn now
We really hope you're enjoying our callenge.
And now we’d like to hear from you.
Which one of our advices are you going to try out today?
Or maybe you'd like to optimize our blogs?
Either way, let us know by leaving your comment below
Comments