I like tech but Ai is really spooky.
Having said that, Ai is making a mess of web traffic among all the other things.
How are people handling the massive surges in users that are apparently Ai crawler bots and whatever else, acting like DDoS attacks without being intentionally malicious (assumption), the whole thing is a real headache.
There are many facets to this topic, one also being the proliferation of Ai into everything, and Serach being an obvious one, where the top result is now an AI readout, that is and will kill clicks down the SERP page and changes user behaviours, causing revenue loss already I am sure.
Removing the need to think. Making knowledge access even more frictionless to the point that broad scale cognitive atrophy occurs or is the need result, potentially further concentrating knowledge into smaller more powerful control groups (due to the current horsepower required). It’s eating our energy supply too. Essentially we are at the BORG stage of the game.
As all should be rather familiar with now, it goes well beyond web traffic, but web traffic is the entry point here because this is where NodeBB’s bread and butter is.
I flagged this in multiple places, years back at this point, with multiple warnings, adoption was done in haste, reaction was negative (as usual) and there was no due regard to Pandora’s tsunami peopel thought they coudl surf, but the adoption of the Ai into anything and everything is what has allowed the Ai bot swarms to proliferate. it’s a vicious cycle and maybe worse.
I could see and rationalise years ago that Ai itself is a parasitic technology and works in net deficit, it functions at a permanent net-negative (I understand many tech do not see it like this, feel free to posit your rebuttle). Why this is not glaringly obvious, is another point, but a fundamental one at that. There are other fundamental points. This is one that get’s less airtime afaict.
Well have at it if you think you can add value to help the whole, server tips, what do you do to mitigate the loads, protect you content form the LLM plunder bots, etc. etc. - try stay on point in web traffic terms, as hard as it it due to the vastness of the consequences and implications before us, and Merry Christmas too while I’m here too! :santa: :christmas_tree:
CDNs like Cloudflare seem to mitigate things to some degree. I self host a little instance of Mediawiki. Before putting it behind Cloudflare I was getting a firehose of requests, now it’s just Cloudflare’s caching thingy doing it’s thing.
I’m not sure it does anything about AI crawlers other than taking the load off the end server and on to the CDN, so it’s probably not stopping the clankers from stealing your data.
Using Cloudflare has its own host of issues though, namely concentrating a bunch of stuff behind a single point of failure as was seen a few weeks ago.
@anchorite Cloudflare now has a specific Ai feature with allow/block toggles for the various bots on the free tier. I’m not sure if you get more features in the paid tiers with that but you can write a lot more security rules, with the paid tiers.
@omega nice! I shall give it a try… Wonder how well it works with federation
@Julian Where might interruption of federation occur? Or, how does federation and/or where does federation interlinky (technical term) magic happen?
@Julian To note, it uses up one of the 5 free WAF/Security rules slots.
Having looked into it a bit more there seems to be more bot control rules you can invoke with the paid tiers.
On the free one you have -
(cf.client.bot)true/falseAnd then the Ai crawled control (managed via own menu option), which nicely groups the various types of AI crawlers which you can allow/block.
This is 2 slots used, you need to use your remaining 3 wisely.
It’s amazing how much traffic you’re going to block with a wildcard approach (if applicable) to blocking all
.phprequest, and I have had to play around with the order a bit to allow the goodbots as such in. I’m not totally sure how the rules fire in order is 100% robust. It might be 99.x% robust. I would like to hear others thoughts/experiences on this.I would urgently encourage anyone who is using CF to look at the security analytics, and look at the top countries overall.
China, Singapore are the top offenders in this user surge wave. Incredible volumes of multi second users can swarm out of nowhere and really eat up server resources. You may find the only solution is a total block but this would be my starting point.
However, interactive challenge may be as good or nearly as good (but I am too paranoid to be totally convinced, I tested it and I am still not sure and reverted to some hard blocks).
I think the managed challenge may be out of it’s depth here when dealing with origin offenders.
It is very tricky. However, drilling down using the “filter” option in CF has been a really useful visual aid. I never had to do as much looking until this China/Singapore wave started ramping up.
Log search is paid feature. You can however apply various filters combined to analytics.
@omega I’ll have to look into that.
This is a good run down of the wave the net is up against from a WP site perspective but applies to all.
I has already discovered that interactive challenge had a similar to blocking effect on the traffic using CF. It’s taken a few re-gigs and tweaks to make the whole thing manageable since this became problematic in early November.
https://martech.zone/block-china-and-singapore-bot-traffic-using-cloudflare/
The same big sources of disruptive traffic hitting everyone
Surprise surprise, Singapore is a hub for AI investment for a good number of years.
I sincerely bring this to your attention urgently and for all, this is I believe a very good summary (link below) of what the net has just faced and what may be to come. I’ not to worried about the cyber war what if’s, but keeping the service, up, how to protect your service and retain function and aches for said users.
It aligns with my own web traffic experience, and thus reason for this topic, while reading anecdotally across reddit (e.g. link in previous post) and other platforms of others very recent traffic trials the bottom being, everyone seems to have been affected by this since at least November:
https://restoringthemind.com/china-singapore-bot-surge-raises-global-cyber-alarms/
Could this current wave be only a preemptive stress test and survey before a monumental attack?
At this new current level / wave it is breaking the open internet affairs
@Julian I had to wait 5/10 mins before being able to post - coincidental, having hosting/server issues connecting, nodeBB community under pressure? Maybe from the same waves??
Meanwhile, a very common issues all around, below is Nov 29th reddit discussion, but I can see this issue being raised earlier in the year, September, October and maybe even earlier.
https://www.reddit.com/r/SEO/comments/1p9irqq/how_did_you_deal_with_the_chinasingapore_bot_or/
@omega setting up Anubis is probably going to be the only path forward (especially if you don’t want to use CF built in tooling.)
You want to let search crawlers through but stop AI crawlers. It’s a tough game of cat and mouse.
@Julian Yea Anubis does look good but right now free CF tools and rules plucked in a need solution asap when this got going earlier in the year.
Here is the bad bot report for 2025 from Imperva it make for some interesting reading! :grimacing: Like net traffic has tipped over the 50% mark for bot traffic first time in a decade.
https://cpl.thalesgroup.com/sites/default/files/content/campaigns/badbot/2025-Bad-Bot-Report.pdf
Having found this topic
https://community.nodebb.org/topic/19021/any-protection-against-ai-crawlers-and-ai-learning-bots/
Thanks to user shaknunic for introducing Anubis as one potential mitigation-solution
More info here:
>In an era where data is the new gold and AI is the new gatekeeper, safeguarding digital sovereignty has become more critical than ever. The rise of large-scale AI scraping has left countless small websites and independent developers struggling to keep their services online. Anubis was born from the urgent need to protect these digital voices from being drowned in a sea of automated traffic. As part of the broader mission of the AI & Data Foundation , this tool is not just a utility, it’s a statement that the web belongs to everyone, not just those with the most aggressive crawlers. > >https://medevel.com/anubis-ai/



