Robots.txt How important it is?

Posted on January 23, 2008November 13, 2017 by Dexter Panganiban

It is good that I am already using WordPress self hosted blog. There is another reason to be happy, I can alter my robots.txt . For blogs hosted in WordPress.com or Blogspot.com , I am afraid this is not applicable to you guys and gals.

This post will aim to educate readers on how to implement robots.txt for SEO purpose.

What is Robots.txt Google Says

How do I use a robots.txt file to control access to my site?

A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.

You need a robots.txt file only if your site includes content that you don’t want search engines to index. If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).

So this is it, robots.txt is used to restrict bot’s in indexing once site. If you are a regular reader here, You know that I am having some problem with those Unusual string showing at my Google Search Engine Results and this is the reason why I studied this matter, and later I found out that this is a must to do for SEO purpose.

How is my robots.txt made, I just change this yesterday and just 4 hrs ago it was crawled by Google smoothly.

Sitemap: https://www.techathand.net/sitemap.xml.gz

User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /page/*/
Disallow: /feed/
Disallow: /?wpcf7=json*
Disallow: /*/feed/rss/
Disallow: /*/feed/
Disallow: /trackback/
Disallow: /*/trackback/
Disallow: /category/
Disallow: /2007/*/
Disallow: /2008/*/
Disallow: /cgi-bin/
Allow: /wp-content/uploads/

Explanation :

Update : I have remove Disallow: /2007/*/ & Disallow: /2008/*/ thanks to Marhgil.

User-agent: * means that all search engine

Above shown robots.txt informs bot not to index all WordPress files and documents, Pages, Feed, ?wpcf7, track backs ( URL only ), category, 2007 & 2008 archives and allows indexing of my uploads.

In SEO good for a certain page to be unique upon visitation of the search engine bot. Duplicates came from scrapers, Copier, and even from your own domain.

You may ask why did I restrict feed, page, Trackbacks, category, 2007 & 2008, This is because they are all archives and seen by bot as duplicate pages.

Ultimate Tip

Go to your favorite blog check their Robots.txt and you will see how experts in the field is doing or commanding the bot. No wonder they are always on top.

But How…. Use this syntax [ Domain name/robots.txt ] ex.. http://www.abc.com/robots.txt

I saw lots of variation yesterday. and You will see lots of it also.

Combine their strategy and make your own that is applicable for your blog. You cannot access it if they are using .htaccess

The Reality

If you want that your site be a Search Engine Friendly you have to direct them on what part of your blog needs to be visited. By doing this you’re not making any duplicate content in the web that makes your post in the supplemental index of Google. This is a must to do specially if your monetize your site.And so if your thinking in moving to Self hosted blog check how to do it in SEO friendly way.

Hope you like it and you subscribe to my Email or feeds for future SEO Tips and Tricks. Just don’t forget to verify your email subscription.

Dexter Panganiban https://techathand.net/about-2/

Dexter is a person who loves technology,new gadget, SEO, Social Media and Christianity. Follow him at twitter via @techathand and add him @ Google+ and contact us at [email protected]

More From Author

35Comments

Add yours

1

Bye Bye Global Translator and Wow A Very Fast Site | Tech At Hand Dot Net on September 2, 2011 at 12:44 pm
Reply

[…] about more than 6 mos ago when I tried to block some translated page like Tagalog translation thru robots.txt and after checking it today I cannot see any tagalog translated page in SERP, that only means that […]
2

Having Problem with The SERP | Tech At Hand Dot Net | Philippines, Technology, SEO and Blogging on June 24, 2010 at 1:04 am
Reply

[…] restricted crawl to that folder that has trigger the problem thru this method that i blog before ( Robots.txt How important it is ? ) […]
3

Marbella on June 17, 2010 at 9:16 am
Reply

Thanks. Now I understand what robot.txt for.
- 4
  
  Dexter Panganiban on June 17, 2010 at 9:35 am
  Reply
  
  @Marbella,
  
  Glad to know that I have given you an explanation for this subject. If you have any other concern just let me know.
5

lloydsbackyard on June 4, 2009 at 3:05 am
Reply

i opened my http://www.chaada.com/robots.txt. i found out these:
User-agent: *
Disallow:

Sitemap: http://www.chaada.com/sitemap.xml.gz

what does that mean? do i need to download then edit my robot.txt and after that upload in my ftp?
- 6
  
  Dexter Panganiban on June 4, 2009 at 5:09 am
  Reply
  
  @lloydsbackyard,
  Tama yun ang gagawin mo .. kailangang ma iedit ang robots.txt
7

lloydsbackyard on June 1, 2009 at 8:38 pm
Reply

Wow.nice tipz..im learning alot sir.thanks. im gonna check it out in my site.

lloydsbackyards last blog post..Hayden Kho – Katrina Halili Scandal Keyword Failed!
- 8
  
  Dexter Panganiban on June 1, 2009 at 10:14 pm
  Reply
  
  @lloydsbackyard,
  
  Your welcome 🙂
9

Yunar on February 12, 2009 at 7:16 am
Reply

I just realized that robot.txt give us significant impact on search engine placement. Thanks for the tips.
10

Google Penalty Again ? | Tech At Hand Dot Net | Philippines, Technology, SEO and Blogging on January 24, 2009 at 10:43 pm
Reply

[…] First action is to block the said post by my robot.txt so that next time that Google will crawl my blog it will not crawl the said post. I donâ€™t know if […]
11

Did You know that Google Also has it’s Own Robots.txt | Tech At Hand Dot Net | Philippines, Technology, SEO and Blogging on November 1, 2008 at 10:35 pm
Reply

[…] Google is making their Robots.txt and you may way to implement it also in your site. I have made a tutorial for Robots.txt just check it out form the link. Just want to share what I found today while browsing the […]
12

WordPress Tips : Is it good to Use Excerpt | Tech At Hand Dot Net | Philippines, Technology, SEO and Blogging on October 14, 2008 at 12:38 am
Reply

[…] content are used by WordPress that leads to a duplicate content in the blog not unless you disallow crawler to indexed your archives like what I do by changing the […]
13

How to Add a WordPress Blog to Your Site | Niche Store Strategies on August 2, 2008 at 5:52 am
Reply

[…] Robots.txt How important is it? […]
14

How to Setup a WordPress Blog | Niche Store Strategies on July 31, 2008 at 12:57 pm
Reply

[…] Robots.txt How important is it? […]
15

Better Check Your Robots.txt says Adsense : Tech At Hand dot Net | Philippine, Blogging, SEO & Tips on June 11, 2008 at 8:27 am
Reply

[…] have taught my visitors before on how to properly implement their Robots.txt and just today AdSense just warned their Publisher to check their Robots.txt file in order not to […]
16

SEO Tips : Changing Category, Tags and Search Pages : Tech At Hand dot Net | Philippine, Blogging, SEO & Tips on June 7, 2008 at 12:37 pm
Reply

[…] have also modified my robot.txt during implementation of this experiment and remove restriction in my Category pages and tag Pages. […]
17

dcpanganiban on March 30, 2008 at 5:29 pm
Reply

@ derek

What seem to be the problem.. Kaya mo nasabi na di ka success?
18

derek on March 30, 2008 at 4:38 pm
Reply

I haven’t had any success with this robots.txt, maybe i’m doing it wrong. pero so far, ok pa nman mga post ko sa SERPS

derek’s last blog post..September 2007 BAR Exam Results, Top 10
19

Welcome Pacquiao And Marquez Fans » Tech At Hand on March 17, 2008 at 5:37 pm
Reply

[…] Robots.txt How important it is? […]
20

7 must read Webmaster Central Blog Post » Tech At Hand on February 15, 2008 at 2:02 am
Reply

[…] blocked by robots.txt – I have also made a post regarding the robots.txt , and all I can say is, The experiment was a success […]
21

Visitors : How do I Classified You ? » Tech At Hand on January 30, 2008 at 11:16 am
Reply

[…] Yes, Among all other visitors aside from Visitor via ads, The Search Engine Visitor has high percentage of clicking your ads, because usually those type of visitors are searching for […]
22

zobibi on January 25, 2008 at 5:01 pm
Reply

Thank you for sharing your techniques.. very talented website.. Keep up good work!http://zobibiseo.harenatv.com
23

Dexter on January 25, 2008 at 4:15 pm
Reply

@ Zobibi

Thanks for the compliment. I wish to comment at your blog but unfortunately . I cannot understant it. 🙂
24

Weekly Link Love from Pinoy Tech Guy | Pinoy Tech Guy on January 25, 2008 at 4:11 pm
Reply

[…] is one of the most important considerations on a self-hosted site.Â Kuya Dex shows us how important a Robots.txt is to protecting your site and guiding online bots to see only what you want them to […]
25

zobibi on January 25, 2008 at 5:36 am
Reply

thats all very nice information, thank you for that!! http://zobibiseo.harenatv.com .Great article. Honestly, I wish i could write like you.
26

Dexter on January 24, 2008 at 7:38 pm
Reply

@ sylv3rblade

kaya ko pinost ko. kaya nabago ko agad nung macheck ni marhgil
27

sylv3rblade on January 24, 2008 at 5:37 pm
Reply

hmm.. I screwed up my robots.txt a few weeks ago.. medyo nakakatakot iedit ulit haha
28

Dexter on January 24, 2008 at 3:22 pm
Reply

@ Allen

Thanks for sharing
29

Allen on January 24, 2008 at 3:13 pm
Reply

Mine is similar to that but with less lines. ^_^

sitemap: http://silkenhut.com/blog/sitemap.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/author/
Disallow: */page/
Disallow: /blog/archives/
Disallow: */trackback/
Disallow: */feed/
Disallow: /blog/stats/
30

Dexter on January 23, 2008 at 5:57 pm
Reply

@ Ordnacin

I do hope you find good information in this post.
31

Dexter on January 23, 2008 at 5:56 pm
Reply

@ Marhgil

A very important correction thanks.. This is what is good in blogging there are lots of co blogger who is helpful to others.

Again Thanks..
32

marhgil on January 23, 2008 at 5:13 pm
Reply

i’m afraid your permalink pages will also be deindexed by google because of these lines:
Disallow: /2007/*/
Disallow: /2008/*/

Your permalink format falls under that category, so, it will not only deindex the archive page but also your permalink pages.
33

Ordnacin on January 23, 2008 at 3:56 pm
Reply

nice post, I actually edited my robots.txt file today for digitalfrap.com, I think I’ll edit again… hehe
34

Dexter on January 23, 2008 at 3:02 pm
Reply

@ Amy

I believe it is must be installed on blogs for Search Engine Ranking
35

Amy on January 23, 2008 at 11:45 am
Reply

aww I must apply this one on my blogs!