Did you know that Google has it’s own Robots.txt, Robots.txt is a text file that prevents search engines from accessing and indexing some file. So check it out. Well upon checking Google Robots.txt, this is what I saw.
User-agent: zombies Disallow: /brains User-agent: * Allow: /searchhistory/ Disallow: /news?output=xhtml& Allow: /news?output=xhtml Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalogues Disallow: /news Disallow: /nwshp Allow: /news?btcid= Disallow: /news?btcid=*& Allow: /news?btaid= Disallow: /news?btaid=*& Disallow: /setnewsprefs? Disallow: /index.html? Disallow: /? Disallow: /addurl/image? Disallow: /pagead/ Disallow: /relpage/ Disallow: /relcontent Disallow: /sorry/ Disallow: /imgres Disallow: /keyword/ Disallow: /u/ Disallow: /univ/ Disallow: /cobrand Disallow: /custom Disallow: /advanced_group_search Disallow: /googlesite Disallow: /preferences Disallow: /setprefs Disallow: /swr Disallow: /url Disallow: /default Disallow: /m? Disallow: /m/? Disallow: /m/ig Disallow: /m/lcb Disallow: /m/news? Disallow: /m/setnewsprefs? Disallow: /m/search? Disallow: /m/trends Disallow: /wml? Disallow: /wml/? Disallow: /wml/search? Disallow: /xhtml? Disallow: /xhtml/? Disallow: /xhtml/search? Disallow: /xml? Disallow: /imode? Disallow: /imode/? Disallow: /imode/search? Disallow: /jsky? Disallow: /jsky/? Disallow: /jsky/search? Disallow: /pda? Disallow: /pda/? Disallow: /pda/search? Disallow: /sprint_xhtml Disallow: /sprint_wml Disallow: /pqa Disallow: /palm Disallow: /gwt/ Disallow: /purchases Disallow: /hws Disallow: /bsd? Disallow: /linux? Disallow: /mac? Disallow: /microsoft? Disallow: /unclesam? Disallow: /answers/search?q= Disallow: /local? Disallow: /local_url Disallow: /froogle? Disallow: /products? Disallow: /froogle_ Disallow: /product_ Disallow: /products_ Disallow: /print Disallow: /books Allow: /booksrightsholders Disallow: /patents? Disallow: /scholar? Disallow: /complete Disallow: /sponsoredlinks Disallow: /videosearch? Disallow: /videopreview? Disallow: /videoprograminfo? Disallow: /maps? Disallow: /mapstt? Disallow: /mapslt? Disallow: /maps/stk/ Disallow: /maps/br? Disallow: /mapabcpoi? Disallow: /center Disallow: /ie? Disallow: /sms/demo? Disallow: /katrina? Disallow: /blogsearch? Disallow: /blogsearch/ Disallow: /blogsearch_feeds Disallow: /advanced_blog_search Disallow: /reader/ Disallow: /uds/ Disallow: /chart? Disallow: /transit? Disallow: /mbd? Disallow: /extern_js/ Disallow: /calendar/feeds/ Disallow: /calendar/ical/ Disallow: /cl2/feeds/ Disallow: /cl2/ical/ Disallow: /coop/directory Disallow: /coop/manage Disallow: /trends? Disallow: /trends/music? Disallow: /notebook/search? Disallow: /music Disallow: /musica Disallow: /musicad Disallow: /musicas Disallow: /musicl Disallow: /musics Disallow: /musicsearch Disallow: /musicsp Disallow: /musiclp Disallow: /browsersync Disallow: /call Disallow: /archivesearch? Disallow: /archivesearch/url Disallow: /archivesearch/advanced_search Disallow: /base/search? Disallow: /base/reportbadoffer Disallow: /base/s2 Disallow: /urchin_test/ Disallow: /movies? Disallow: /codesearch? Disallow: /codesearch/feeds/search? Disallow: /wapsearch? Disallow: /safebrowsing Disallow: /reviews/search? Disallow: /orkut/albums Disallow: /jsapi Disallow: /views? Disallow: /c/ Disallow: /cbk Disallow: /recharge/dashboard/car Disallow: /recharge/dashboard/static/ Disallow: /translate_c Disallow: /translate_suggestion Disallow: /s2/profiles/me Allow: /s2/profiles Disallow: /s2 Disallow: /transconsole/portal/ Disallow: /gcc/ Disallow: /aclk Disallow: /cse? Disallow: /tbproxy/ Disallow: /MerchantSearchBeta/ Disallow: /ime/ Disallow: /websites? Disallow: /shenghuo/search? Disallow: /support/forum/search? Disallow: /reviews/polls/ Disallow: /hosted/images/ Disallow: /hosted/life/ Disallow: /newspapers? Disallow: /search2001/search? Disallow: /ppob/? Disallow: /ppob? Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
Try to learn on how Google is making their Robots.txt and you may way to implement it also in your site. I have made a tutorial for Robots.txt just check it out form the link. Just want to share what I found today while browsing the net.