Blocking Semalt Web Crawler
The Semalt.com web crawler is useless. The services they claim to provide (if you pay) offer no better analysis on your website than the free ones you’re already using. Semalt also uses a persistent web crawler, which inflates your web statistics, ignores Robots.txt protocols, and even ignores their own Website Removal Tool after just a week. Any one of those issues suggest an illegitimate service, so having all of them certainly warrants blocking the Semalt.com web crawler from accessing your website.
For those of us running Apache, it’s real simple to do using the .htaccess file. Just add in the following script and the Semalt.com web crawler will be denied access to your website.
# Block bots by referer that ignore robots file
# https://www.computertechtips.net/192/blocking-semalt-web-crawler/
RewriteEngine on
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* - [F]
The .htaccess file is one of those really sensitive files in which one wrong character can cause the whole file to fail. Since this is loaded for everyone that visits your website (human or crawler), it means your whole site will fail to begin loading. So, just be extra careful with your .htaccess file and make sure you back it up before ever amending it.
For those people not on a shared web hosting account, such as those using dedicated or virtual dedicated hosting, you have higher access to server configurations including the httpd.conf file. In which case, you should be making site-wide adjustments there instead of the HyperText Access file because the latter loads slower.
For deeper reading, please see the Apache Documentation:
I have intentionally not linked to Semalt’s Website Removal Tool because I don’t believe illegitimate services should benefit from inbound links.
Recent Comments