TLDR: There were increased 404 Not Found stats in Kajabi Analytics with no explanation at the moment. I created a proxy script to monitor requests to my own Kajabi website and found the culprit. I am also testing a custom firewall rules out of this. It is still best to read the whole article.
About a week ago on October 22, one of my clients showed me some mysterious 404 spikes in their Kajabi analytics.
Still, on that same week, a couple of other Kajabi users also reported an increased 404 landings on their website.
Going back to my client I did a scan of all their links in their website, checked their sitemap.xml, made some cross-checking on the websites that refer to their pages, and did some clicking in their ongoing ads BUT I did not find any broken links that can cause that massive 404 landings.
I did saw something that seems fishy.
This is a screenshot of Google Analytics' most visited links. The third on the list did not occur in any of the Kajabi pages, sitemap, or website referrers for this website. It also results in a 404 landing.
I've seen this pattern before on other websites I managed. It's a signature pattern for some bots who are trying to access different combinations of URLs to see some vulnerabilities.
But the stats do not count right. For the same period, it should be somewhere 21,000. But that 3rd link just totaled to 3,280. Other non-existing links are being requested but do not trigger the Google Analytics tracking script or image pixel.
It's time for Ghostbusters!
Tracing the Culprit
You should know that monitoring HTTP activities in a Kajabi website are a challenge.
First, custom domains are linked to Kajabi's endpoint via CNAME and therefore all HTTP requests are directed to Kajabi's endpoint. It's Kajabi who will have all the data regarding these requests, hence we have Kajabi Analytics.
But Kajabi Analytics only shows the request count and not the exact request URL. It also does not show the referrer of that request unlike analytics software such as Google Analytics or Matomo.
These two pieces of information are important for knowing the nature of the culprit. The URL tells us what exactly is being accessed, what seems to be the purpose of the access, and whether the URL exists on the Kajabi website.
The referrer will give us a clue on how we got to this URL in the first place. Was it referred from one of your ads? Was it coming from another website that links to your page? Or is there no referrer which means the URL was accessed directly? Beyond the tracing that we will do for our 404 issues, the referrer also gives another source of truth to conversion tracking, say your tracking script sends a page view event together with the query parameters such as utm_campaign and utm_source, and cookie data where referrers are saved if there's any. If your website software is tracking the actual referrer they capture during HTTP requests to the server, then you can compare and validate the referrer data being sent by your tracking script versus the referrer data being captured by the server.
In this image, the URL being accessed by the browser is the URL for this blog you are reading. But this particular request does not come from a user typing the URL in the browser. It was referred by the Kajabi Admin editor when I was editing this blog post. I know this because the referrer header shows "https://app.kajabi.com/" which is the host domain of Kajabi's admin interface.
Another image here, this time the homepage of my website was accessed by Microsoft's Bing Search Engine Bot as shown by the user agent. Note that there is no referrer because it was accessed directly.
Cloudflare Workers and Pipedream Does The Trick
Okay, you may be wondering how I got these data if my custom domain is CNAME'd to Kajabi endpoint? Yep here's how I did it.
To be able to know exactly the request headers being sent to my custom domain, I created a Cloudflare Worker that serves as a proxy between Kajabi's endpoint and the requester. Given that my custom domain's DNS settings are under Cloudflare, setting up a Cloudflare Worker is possible.
For this tracing, I used the following script named 404tracer:
Highlighted in that image are 3 information being sent to my monitoring server in Pipedream: the URL, the request headers, and the request object. These are enough for me to know the nature of the 404 requests to my website.
Then I routed all the requests to my Kajabi custom domain to the 404tracer worker.
This will not break my integration with Kajabi custom domain. What this does is whenever a device (or bot) accesses my website, it will route the request to our 404tracer worker, send the information to my Pipedream monitoring, then send back the response to the device. It serves as a proxy.
The Cloudflare Worker has precedence over the requests being made to our custom domains, much like how Cloudflare's Page Rules works. But unlike Cloudflare's Page Rules, Workers can do a lot more processing from the request made to your custom domain.
This is how the logs look like in Pipedream.
Why not just use Google Analytics?
Bots usually just download the HTML codes, sniffs through some patterns, check whether there are naked passwords or email addresses, then go to another link in their attack list.
I let the 404tracer do its job for more than a day so I can get a representative sample of all the devices, bots, and IP addresses that tried to access my website. Take note that my website is a low traffic site which allows me to easily see malicious access and spikes in requests from the same malicious IP addresses.
I have collected more than 500 page visits. To understand it better, I grouped the URLs based on the type of access:
Let me explain further.
- Kajabi Thumbnail View - When you access the list of Kajabi Pages in your Kajabi Admin, Kajabi will create a thumbnail view of all the pages. Kajabi uses the path "__404" to simulate 404 not found landings. This type of access is safe and needed by Kajabi.
- Non-Existent Files - bots usually scan different parts of the website and will look for sitemaps and other URL lists identifiers. One of the most common identifiers that bots access is robots.txt. Kajabi supplies the robots.txt and sitemap.xml so it will not result in a 404. However, some of the files that bots usually access do not apply to Kajabi. Among these are ads.txt, allowurl.txt, and sellers.json which are all related to ensuring ads are served properly. Although these type of access are relatively benign for Kajabi websites, they are annoying because it pollutes our Kajabi analytics.
- Old Blog Posts - My custom domain was previously hosted in WordPress. That's from 2010 to 2013. In the past, I did a 301 permanent redirect with these links already but ten years after, search engine bots are still crawling my old blog posts. These are considered safe and can be prevented by making a Cloudflare Page Rule redirect.
- Password Sniffing - These are brute force attacks from bots who are trying to test poorly configured servers for files that contain passwords, keys, and tokens that can be used for further attacks. These are standard attack vectors that bots use on different websites.
- Scans for PHP Vulnerability - These are attacks to check the presence of vulnerable PHP files that can be used to upload other malicious files to the server. This does not affect Kajabi because Kajabi does not use PHP in its codes.
- WordPress Attack - Funny to see these as well but bots randomly attack a lot of servers, domains, and IP addresses. Given that WordPress is the most used CMS software in the world, bots have a set of attacks specific to WordPress. These will not affect Kajabi nor does it pose any threat to it.
May I remind you again that these attacks are not specific to Kajabi and they do not pose a threat because Kajabi is designed differently from the other web servers.
It's just annoying to see them reflected in the page visit counters in Kajabi Analytics.
Why is Kajabi not blocking them?
Good question. At the time of this writing, Kajabi is not blocking any of these. As I said, these attacks are benign given Kajabi's different architecture.
Also by default, most websites do not block these as well. Even Cloudflare's DDOS rules and default firewall rules do not block them. It's up to the user to selectively block certain URLs without disrupting legitimate access to the website.
In a recent Facebook Group comment, Kajabi Team discussed their possible solution regarding this:
I suggest creating a set of firewall rules that will block malicious access to Kajabi websites. As you can see in the logs I collected, all the attacks are not specific to Kajabi. We can safely block them without any disruption on our Kajabi websites.
I created a firewall rule that I am testing right now. It's a custom Cloudflare Firewall rule but it can be used as a guide to creating firewall rules for other DNS hosting systems (assuming their firewall sits on top of any CNAME'd subdomain).
You can check the firewall configuration here: https://github.com/jasongodev/kajabi-cloudflare-firewall-rules
However, we can only implement the firewall rules to custom domains. We don't have any control over the mykajabi.com subdomains. It's up to Kajabi to do their part.
Next In Part 2
Now that we know the reason for the increased 404 landing stats in Kajabi I will detail the results of the custom firewall rules that I made to prevent this unnecessary access to our Kajabi websites.