|Multiple bots, spiders and crawlers at the same time might have DDoS effect. CPU fully consumed and high RAM load.|
Most of the advice is to set robots.txt and block spider from accessing your website's content. Rules such as:
User-agent: Baidu User-agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image Disallow: /However, when I view apache logs, Bad Bots such as Baidu spider keep ignoring the robots.txt rules and consume most of the server bandwidth. After some research, I have advised the library to sign up with Cloudflare (it is FREE by the way) for protection.
The following are steps to block Baidu Spider from accessing your websites.
- Sign Up with Cloudflare
- Assign your domain nameserver to Cloudflare nameserver.
- Set your subdomain through "Cloudflare Orange Cloud"
- Using Cloudflare Firewall, set the following Rules. Select Field="User Agent", Operator="Contains", Value="Baiduspider/2.0". Click Save.
|Firewall events shows Baidu Spider successfully block|
|Details of Firewall events. Baidu has been successfully blocked.|
|CPU resources now < 10%.|
|Bots can be combine using OR. So I use only one rule to block bots which mean I have another 4.|
I am not an expert in web security, just research, try and error. I hope this article helps those who face similar problems.