Newbie Question Can people please share HTaccess for blocking backlink crawlers?

Discussion in 'SEO and Marketing' started by nanexo, Jan 1, 2016.

Share This Page

  1. nanexo

    nanexo Member

    Joined:
    Dec 17, 2015
    Messages:
    28
    Likes Received:
    6
    There are many versions out there some more effective than others.

    Please share your current ones so we could know which one is the most effective and perhaps someone can combine into one master htaccess.
     
  2. cardine

    cardine Administrator Staff Member

    Joined:
    Dec 9, 2015
    Messages:
    1,064
    Likes Received:
    1,026
    Here's a good starter one:

    Although I would be careful using something like this across your entire PBN as this could potentially be used as a footprint. If you do use it, you might want to take a couple random ones out for each blog so that they aren't all identical to each other.
     
    nanexo likes this.
  3. radda

    radda Member

    Joined:
    Jan 3, 2016
    Messages:
    11
    Likes Received:
    11
    Order Allow,Deny
    Deny from 72.44.
    Deny from 67.202.
    Deny from 75.101.
    Deny from 174.129.
    Deny from 204.236.
    Deny from 184.73.
    Deny from 184.72.
    Deny from 50.16.
    Deny from 50.17.
    Deny from 50.19.
    Deny from 107.20.
    Deny from 107.21.
    Deny from 107.22.
    Deny from 107.23.
    Deny from 23.20.
    Deny from 23.21.
    Deny from 23.22.
    Deny from 23.23.
    Deny from 54.242.
    Deny from 54.243.
    Deny from 54.234.
    Deny from 54.235.
    Deny from 54.236.
    Deny from 54.237.
    Deny from 54.224.
    Deny from 54.225.
    Deny from 54.226.
    Deny from 54.227.
    Deny from 54.208.
    Deny from 54.209.
    Deny from 54.210.
    Deny from 54.211.
    Deny from 54.221.
    Deny from 54.204.
    Deny from 54.205.
    Deny from 54.196.
    Deny from 54.197.
    Deny from 54.198.
    Deny from 54.80.
    Deny from 54.81.
    Deny from 54.82.
    Deny from 54.83.
    Deny from 54.84.
    Deny from 54.85.
    Deny from 54.86.
    Deny from 54.87.
    Deny from 50.112.
    Deny from 54.245.
    Deny from 54.244.
    Deny from 54.214.
    Deny from 54.212.
    Deny from 54.213.
    Deny from 54.218.
    Deny from 54.200.
    Deny from 54.201.
    Deny from 54.202.
    Deny from 54.203.
    Deny from 54.184.
    Deny from 54.185.
    Deny from 54.186.
    Deny from 54.187.
    Deny from 54.188.
    Deny from 54.189.
    Deny from 54.190.
    Deny from 54.191.
    Deny from 204.236.
    Deny from 184.72.
    Deny from 50.18.
    Deny from 184.169.
    Deny from 54.241.
    Deny from 54.215.
    Deny from 54.219.
    Deny from 54.193.
    Deny from 79.125.
    Deny from 46.51.
    Deny from 46.137.
    Deny from 176.34.
    Deny from 54.247.
    Deny from 54.246.
    Deny from 54.228.
    Deny from 54.216.
    Deny from 54.217.
    Deny from 54.229.
    Deny from 54.220.
    Deny from 54.194.
    Deny from 54.195.
    Deny from 54.72.
    Deny from 54.73.
    Deny from 54.74.
    Deny from 54.75.
    Deny from 175.41.
    Deny from 122.248.
    Deny from 46.137.
    Deny from 46.51.216.
    Deny from 46.51.217.
    Deny from 46.51.218.
    Deny from 46.51.219.
    Deny from 46.51.220.
    Deny from 46.51.221.
    Deny from 46.51.222.
    Deny from 46.51.223.
    Deny from 54.251.
    Deny from 54.254.
    Deny from 54.255.
    Deny from 54.252.
    Deny from 54.253.
    Deny from 54.206.
    Deny from 54.79.
    Deny from 175.41.
    Deny from 46.51.224.
    Deny from 46.51.225.
    Deny from 46.51.226.
    Deny from 46.51.227.
    Deny from 46.51.228.
    Deny from 46.51.229.
    Deny from 46.51.230.
    Deny from 46.51.231.
    Deny from 46.51.232.
    Deny from 46.51.233.
    Deny from 46.51.234.
    Deny from 46.51.235.
    Deny from 46.51.236.
    Deny from 46.51.237.
    Deny from 46.51.238.
    Deny from 46.51.239.
    Deny from 46.51.240.
    Deny from 46.51.241.
    Deny from 46.51.242.
    Deny from 46.51.243.
    Deny from 46.51.244.
    Deny from 46.51.245.
    Deny from 46.51.246.
    Deny from 46.51.247.
    Deny from 46.51.248.
    Deny from 46.51.249.
    Deny from 46.51.250.
    Deny from 46.51.251.
    Deny from 46.51.252.
    Deny from 46.51.253.
    Deny from 46.51.254.
    Deny from 46.51.255.
    Deny from 176.32.64.
    Deny from 176.32.65.
    Deny from 176.32.66.
    Deny from 176.32.67.
    Deny from 176.32.68.
    Deny from 176.32.69.
    Deny from 176.32.70.
    Deny from 176.32.71.
    Deny from 176.32.72.
    Deny from 176.32.73.
    Deny from 176.32.74.
    Deny from 176.32.75.
    Deny from 176.32.76.
    Deny from 176.32.77.
    Deny from 176.32.78.
    Deny from 176.32.79.
    Deny from 176.32.80.
    Deny from 176.32.81.
    Deny from 176.32.82.
    Deny from 176.32.83.
    Deny from 176.32.84.
    Deny from 176.32.85.
    Deny from 176.32.86.
    Deny from 176.32.87.
    Deny from 176.32.88.
    Deny from 176.32.89.
    Deny from 176.32.90.
    Deny from 176.32.91.
    Deny from 176.32.92.
    Deny from 176.32.93.
    Deny from 176.32.94.
    Deny from 176.32.95.
    Deny from 103.4.8.
    Deny from 103.4.9.
    Deny from 103.4.10.
    Deny from 103.4.11.
    Deny from 103.4.12.
    Deny from 103.4.13.
    Deny from 103.4.14.
    Deny from 103.4.15.
    Deny from 176.34.
    Deny from 54.248.
    Deny from 54.249.
    Deny from 54.250.
    Deny from 54.238.
    Deny from 54.199.
    Deny from 177.71.
    Deny from 54.232.
    Deny from 54.233.
    Deny from 54.207.
    Deny from 96.127.
    Deny from 216.145.16.
    Deny from 54.243.130.
    Deny from 216.145.16.
    Deny from 212.74.41.
    Deny from 54.243.130.
    Deny from 103.4.8.
    Deny from 103.4.9.
    Deny from 103.4.10.
    Deny from 103.4.11.
    Deny from 103.4.12.
    Deny from 103.4.13.
    Deny from 103.4.14.
    Deny from 103.4.15.
    Deny from 107.20.
    Deny from 107.21.
    Deny from 107.22.
    Deny from 107.23.
    Deny from 122.248.
    Deny from 174.129.
    Deny from 175.41.
    Deny from 176.32.
    Deny from 176.34.
    Deny from 177.71.
    Deny from 184.169.
    Deny from 184.72.
    Deny from 184.72.
    Deny from 184.72.
    Deny from 184.73.
    Deny from 204.236.
    Deny from 23.20.
    Deny from 23.21.
    Deny from 23.22.
    Deny from 23.23.
    Deny from 46.137.
    Deny from 46.51.
    Deny from 50.112.
    Deny from 50.16.
    Deny from 50.17.
    Deny from 50.18.
    Deny from 50.19.
    Deny from 54.184.
    Deny from 54.193.
    Deny from 54.194.
    Deny from 54.195.
    Deny from 54.196.
    Deny from 54.197.
    Deny from 54.198.
    Deny from 54.199.
    Deny from 54.200.
    Deny from 54.201.
    Deny from 54.202.
    Deny from 54.203.
    Deny from 54.204.
    Deny from 54.205.
    Deny from 54.206.
    Deny from 54.207.
    Deny from 54.208.
    Deny from 54.209.
    Deny from 54.210.
    Deny from 54.211.
    Deny from 54.212.
    Deny from 54.213.
    Deny from 54.214.
    Deny from 54.215.
    Deny from 54.216.
    Deny from 54.217.
    Deny from 54.218.
    Deny from 54.219.
    Deny from 54.220.
    Deny from 54.221.
    Deny from 54.224.
    Deny from 54.225.
    Deny from 54.226.
    Deny from 54.227.
    Deny from 54.228.
    Deny from 54.229.
    Deny from 54.232.
    Deny from 54.233.
    Deny from 54.234.
    Deny from 54.235.
    Deny from 54.236.
    Deny from 54.237.
    Deny from 54.238.
    Deny from 54.241.
    Deny from 54.242.
    Deny from 54.243.
    Deny from 54.244.
    Deny from 54.245.
    Deny from 54.246.
    Deny from 54.247.
    Deny from 54.248.
    Deny from 54.249.
    Deny from 54.250.
    Deny from 54.251.
    Deny from 54.252.
    Deny from 54.253.
    Deny from 54.254.
    Deny from 54.255.
    Deny from 54.72.
    Deny from 54.73.
    Deny from 54.74.
    Deny from 54.75.
    Deny from 216.145.16.
    Deny from 54.79.
    Deny from 54.80.
    Deny from 54.81.
    Deny from 54.82.
    Deny from 54.83.
    Deny from 54.84.
    Deny from 54.85.
    Deny from 54.86.
    Deny from 54.87.
    Deny from 67.202.
    Deny from 72.44.32.
    Deny from 72.44.33.
    Deny from 72.44.34.
    Deny from 72.44.35.
    Deny from 72.44.36.
    Deny from 72.44.37.
    Deny from 72.44.38.
    Deny from 72.44.39.
    Deny from 72.44.40.
    Deny from 72.44.41.
    Deny from 72.44.42.
    Deny from 72.44.43.
    Deny from 72.44.44.
    Deny from 72.44.45.
    Deny from 72.44.46.
    Deny from 72.44.47.
    Deny from 72.44.48.
    Deny from 72.44.49.
    Deny from 72.44.50.
    Deny from 72.44.51.
    Deny from 72.44.52.
    Deny from 72.44.53.
    Deny from 72.44.54.
    Deny from 72.44.55.
    Deny from 72.44.56.
    Deny from 72.44.57.
    Deny from 72.44.58.
    Deny from 72.44.59.
    Deny from 72.44.60.
    Deny from 72.44.61.
    Deny from 72.44.62.
    Deny from 72.44.63.
    Deny from 75.101.
    Deny from 79.125.
    Deny from 96.127.
    Allow from all

    SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
    SetEnvIfNoCase User-Agent .*exabot.* bad_bot
    SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
    SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
    SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
    SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
    <Limit GET POST HEAD>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>

    My concern with trying to block every single bot out there is that you may accidentally block a search engine bot or some other good bot that you don't want blocked. So I instead just focus on blocking the major link spying bots using a combination of IP and user agent blocking. I haven't updated the IP ranges in a while but it still seems to work fine. I've been using the above htaccess on a lot of sites for over a year and in that time the sites have stayed out of link spying tools and also haven't had any issues with deindexing.
     
    cardine and nanexo like this.
  4. SmokeTree

    SmokeTree Member

    Joined:
    Dec 30, 2015
    Messages:
    5
    Likes Received:
    8
    I'd definitely recommend blocking IP ranges/addresses in iptables rather than using .htaccess. For one thing, lots of entries in .htaccess can slow things down because that file is read with every request. Also, if you are using .htaccess, that might be effective against web server ports, but those same IP addresses can still access other ports (SSH, MySQL if it's open, etc).
     
    radda likes this.
  5. radda

    radda Member

    Joined:
    Jan 3, 2016
    Messages:
    11
    Likes Received:
    11
    Thanks for the advice. I'm still pretty inexperienced when it comes to running a server.
     
  6. cardine

    cardine Administrator Staff Member

    Joined:
    Dec 9, 2015
    Messages:
    1,064
    Likes Received:
    1,026
    iptables is better than .htaccess. The one I did was for robots.txt which is probably the best way to do it if you think that the scrapers in question are going to obey it. Otherwise iptables is your best bet.
     
  7. nanexo

    nanexo Member

    Joined:
    Dec 17, 2015
    Messages:
    28
    Likes Received:
    6
    Good stuff, I think that is a great option if you run multiple servers/vps with root access. That said most pbn's are from multiple shared hosts to prevent footprints how do you tackle that with a dedi/vps as you would then only have ip's and hosting from just a few companies, or it would get the cost higher a lot as you will be forced to buy many dedis/vps vs shared hosts.

    I do not think that any of the major crawlers ahref and majestic etc obey robots txt, not sure but blocking them in the htaccess based upon user agent and ip as ip's change but should keep their user agent very much similar?


    @radda I would be interested how you came up with that IP list....I am not disputing it but am intrigued.
     
    Last edited: Jan 4, 2016
  8. cardine

    cardine Administrator Staff Member

    Joined:
    Dec 9, 2015
    Messages:
    1,064
    Likes Received:
    1,026
    If you have WHM access you should be able to do iptables. Otherwise you will probably have to do .htaccess

    Yes, it should be pretty similar. Here is a good site for generating .htaccess code based on a user agent list (like the one that I posted above).
     
    radda likes this.
  9. radda

    radda Member

    Joined:
    Jan 3, 2016
    Messages:
    11
    Likes Received:
    11
    I agree with you. I don't bother using robots.txt since bots can simply choose to ignore it and often do. Robots.txt can also potentially be a footprint like cardine pointed out since anyone can view its contents. Unlikely to be a footprint that causes problems but possible.

    The ip addresses I posted above came mostly from the spyderspanker wordpress addon. I simply converted it to htaccess format so I wouldn't have to load an extra addon and so that it could be used on non-wordpress sites.
     
    cardine likes this.
  10. Criks

    Criks Member

    Joined:
    Jan 14, 2016
    Messages:
    32
    Likes Received:
    12
    Might be abit off-topic here, but in-case you are using Wordpress, try SpiderSpanker :)
     
  11. nanexo

    nanexo Member

    Joined:
    Dec 17, 2015
    Messages:
    28
    Likes Received:
    6