AI has borked web traffic - how are you handling it?

omega@community.nodebb.org · edit-2 18 days ago

AI has borked web traffic - how are you handling it?

anchorite@community.nodebb.org · 11 days ago

CDNs like Cloudflare seem to mitigate things to some degree. I self host a little instance of Mediawiki. Before putting it behind Cloudflare I was getting a firehose of requests, now it’s just Cloudflare’s caching thingy doing it’s thing.

I’m not sure it does anything about AI crawlers other than taking the load off the end server and on to the CDN, so it’s probably not stopping the clankers from stealing your data.

Using Cloudflare has its own host of issues though, namely concentrating a bunch of stuff behind a single point of failure as was seen a few weeks ago.

omega@community.nodebb.org · edit-2 10 days ago

@anchorite Cloudflare now has a specific Ai feature with allow/block toggles for the various bots on the free tier. I’m not sure if you get more features in the paid tiers with that but you can write a lot more security rules, with the paid tiers.

julian@community.nodebb.org · 11 days ago

@omega nice! I shall give it a try… Wonder how well it works with federation

omega@community.nodebb.org · 10 days ago

@Julian Where might interruption of federation occur? Or, how does federation and/or where does federation interlinky (technical term) magic happen?

omega@community.nodebb.org · edit-2 10 days ago

@Julian To note, it uses up one of the 5 free WAF/Security rules slots.

Having looked into it a bit more there seems to be more bot control rules you can invoke with the paid tiers.

On the free one you have - (cf.client.bot) true/false

And then the Ai crawled control (managed via own menu option), which nicely groups the various types of AI crawlers which you can allow/block.

This is 2 slots used, you need to use your remaining 3 wisely.

It’s amazing how much traffic you’re going to block with a wildcard approach (if applicable) to blocking all .php request, and I have had to play around with the order a bit to allow the goodbots as such in. I’m not totally sure how the rules fire in order is 100% robust. It might be 99.x% robust. I would like to hear others thoughts/experiences on this.

I would urgently encourage anyone who is using CF to look at the security analytics, and look at the top countries overall.

China, Singapore are the top offenders in this user surge wave. Incredible volumes of multi second users can swarm out of nowhere and really eat up server resources. You may find the only solution is a total block but this would be my starting point.

However, interactive challenge may be as good or nearly as good (but I am too paranoid to be totally convinced, I tested it and I am still not sure and reverted to some hard blocks).

I think the managed challenge may be out of it’s depth here when dealing with origin offenders.

It is very tricky. However, drilling down using the “filter” option in CF has been a really useful visual aid. I never had to do as much looking until this China/Singapore wave started ramping up.

Log search is paid feature. You can however apply various filters combined to analytics.

anchorite@community.nodebb.org · 10 days ago

@omega I’ll have to look into that.

omega@community.nodebb.org · 13 days ago

This is a good run down of the wave the net is up against from a WP site perspective but applies to all.

I has already discovered that interactive challenge had a similar to blocking effect on the traffic using CF. It’s taken a few re-gigs and tweaks to make the whole thing manageable since this became problematic in early November.

https://martech.zone/block-china-and-singapore-bot-traffic-using-cloudflare/

omega@community.nodebb.org · 18 days ago

The same big sources of disruptive traffic hitting everyone

https://www.reddit.com/r/GoogleAnalytics/comments/1oi4bo0/unexpected_traffic_from_singapore_and_lanzhou/

Surprise surprise, Singapore is a hub for AI investment for a good number of years.

omega@community.nodebb.org · edit-2 18 days ago

@Julian @baris

I sincerely bring this to your attention urgently and for all, this is I believe a very good summary (link below) of what the net has just faced and what may be to come. I’ not to worried about the cyber war what if’s, but keeping the service, up, how to protect your service and retain function and aches for said users.

It aligns with my own web traffic experience, and thus reason for this topic, while reading anecdotally across reddit (e.g. link in previous post) and other platforms of others very recent traffic trials the bottom being, everyone seems to have been affected by this since at least November:

https://restoringthemind.com/china-singapore-bot-surge-raises-global-cyber-alarms/

Could this current wave be only a preemptive stress test and survey before a monumental attack?

At this new current level / wave it is breaking the open internet affairs

omega@community.nodebb.org · 18 days ago

@Julian I had to wait 5/10 mins before being able to post - coincidental, having hosting/server issues connecting, nodeBB community under pressure? Maybe from the same waves??

Meanwhile, a very common issues all around, below is Nov 29th reddit discussion, but I can see this issue being raised earlier in the year, September, October and maybe even earlier.

https://www.reddit.com/r/SEO/comments/1p9irqq/how_did_you_deal_with_the_chinasingapore_bot_or/

julian@community.nodebb.org · 17 days ago

@omega setting up Anubis is probably going to be the only path forward (especially if you don’t want to use CF built in tooling.)

You want to let search crawlers through but stop AI crawlers. It’s a tough game of cat and mouse.

omega@community.nodebb.org · 17 days ago

@Julian Yea Anubis does look good but right now free CF tools and rules plucked in a need solution asap when this got going earlier in the year.

Here is the bad bot report for 2025 from Imperva it make for some interesting reading! :grimacing: Like net traffic has tipped over the 50% mark for bot traffic first time in a decade.

https://cpl.thalesgroup.com/sites/default/files/content/campaigns/badbot/2025-Bad-Bot-Report.pdf

omega@community.nodebb.org · 25 days ago

Having found this topic

https://community.nodebb.org/topic/19021/any-protection-against-ai-crawlers-and-ai-learning-bots/

Thanks to user shaknunic for introducing Anubis as one potential mitigation-solution

More info here:

>In an era where data is the new gold and AI is the new gatekeeper, safeguarding digital sovereignty has become more critical than ever. The rise of large-scale AI scraping has left countless small websites and independent developers struggling to keep their services online. Anubis was born from the urgent need to protect these digital voices from being drowned in a sea of automated traffic. As part of the broader mission of the AI & Data Foundation , this tool is not just a utility, it’s a statement that the web belongs to everyone, not just those with the most aggressive crawlers. > >https://medevel.com/anubis-ai/