It is good that I am already using WordPress self hosted blog. There is another reason to be happy, I can alter my robots.txt . For blogs hosted in WordPress.com or Blogspot.com , I am afraid this is not applicable to you guys and gals.
This post will aim to educate readers on how to implement robots.txt for SEO purpose.
How do I use a robots.txt file to control access to my site?
A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
You need a robots.txt file only if your site includes content that you don’t want search engines to index. If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).
So this is it, robots.txt is used to restrict bot’s in indexing once site. If you are a regular reader here, You know that I am having some problem with those Unusual string showing at my Google Search Engine Results and this is the reason why I studied this matter, and later I found out that this is a must to do for SEO purpose.
How is my robots.txt made, I just change this yesterday and just 4 hrs ago it was crawled by Google smoothly.
Sitemap: https://www.techathand.net/sitemap.xml.gz User-agent: * Disallow: /wp-content/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp- Disallow: /page/*/ Disallow: /feed/ Disallow: /?wpcf7=json* Disallow: /*/feed/rss/ Disallow: /*/feed/ Disallow: /trackback/ Disallow: /*/trackback/ Disallow: /category/ Disallow: /2007/*/ Disallow: /2008/*/ Disallow: /cgi-bin/ Allow: /wp-content/uploads/
Update : I have remove Disallow: /2007/*/ & Disallow: /2008/*/ thanks to Marhgil.
User-agent: * means that all search engine
Above shown robots.txt informs bot not to index all WordPress files and documents, Pages, Feed, ?wpcf7, track backs ( URL only ), category, 2007 & 2008 archives and allows indexing of my uploads.
In SEO good for a certain page to be unique upon visitation of the search engine bot. Duplicates came from scrapers, Copier, and even from your own domain.
You may ask why did I restrict feed, page, Trackbacks, category, 2007 & 2008, This is because they are all archives and seen by bot as duplicate pages.
Go to your favorite blog check their Robots.txt and you will see how experts in the field is doing or commanding the bot. No wonder they are always on top.
But How…. Use this syntax [ Domain name/robots.txt ] ex.. http://www.abc.com/robots.txt
I saw lots of variation yesterday. and You will see lots of it also.
Combine their strategy and make your own that is applicable for your blog. You cannot access it if they are using .htaccess
If you want that your site be a Search Engine Friendly you have to direct them on what part of your blog needs to be visited. By doing this you’re not making any duplicate content in the web that makes your post in the supplemental index of Google. This is a must to do specially if your monetize your site.And so if your thinking in moving to Self hosted blog check how to do it in SEO friendly way.