my messages on the right please (i)

Traffic analysis

DyadStats acts as pre-router filter that can block unwanted visitors as well as helping to show how a website is performing.

When I first started looking at traffic analysis, information about the keywords people used when searching Google was passed along when results links were clicked. This meant you could process your log files and see how people were finding your sites.

For an e-commerce website you really want to identify the search terms people use that lead to sales. Log file crunching is too blunt a tool for this. When shopping around online, it's not uncommon for people to search for the product they're interested in, check out a few different websites, and then return to their chosen preference, and the search and click trail that leads to the final transaction might not have any connection to the original search used.

Addressing this was the primary function of my first mini-tracker. By issuing a simple numeric id to each visitor using a cookie, I could see a history of site access - see the keywords used on the visits leading to a sale - not just last one.

Google spoils the fun

I think it was around October 2011 when Google took measures to deliberately prevent keyword info being sent with each click of its organic/natural/unpaid listings. Organic search was my thing and this change rather took the wind out of my SEO sails - at least as far as developing keyword analysis tools was concerned.

I kept the core system embedded in my projects by default though. The issuing of ids was useful for creating shopping carts and other modules that need to remember things - ‘save state’ and all that. As it was part of the main system code, rather than a JavaScript plugin, it could examine requests from User Agents that were not JavaScript enabled - bots and scrapers, and by logging info using a database, traffic reports could be on demand without the need to parse log files.

I quickly started to see just how much web traffic is generated by bots, and added ways to filter and block requests associated with IP addresses or User Agents that were clearly trying to gain unauthorised access to systems.

Syncing feeling

The system could be plugged into websites on a server by server basis. If a new spambot user agent was seen scraping a site, its profile could be added to the DyadStats database and every site on that server would benefit form the new info automatically. Keeping the data synchronised across multiple servers was more complicated with details of different User Agents and IP Addresses potentially being added to different copies of the system and then clashing.

Synchronising data is not new to me, but it doesn’t really feel like there’s any tangible gain spending much time developing something that essentially nobody sees.

However, as part of a operation ‘Review and Update Everything’ I recently gave it some love. To deal with multi-server data sync, new profiles are now added to just the primary database, and the process of updating the data across collections of servers in different data centres is completely automatic.

Exciting times!

Ok, so this area of web development tends not to be the ice-breaker topic at parties. It’s dull - I know. Perhaps an animated chart might help?

Wait. One more thing. If you've read this far you deserve a 'one more thing' thing.' "SlugBlock". Yes I just
made the name up, but it's a thing. It's one of the recent features of DyadStats that gives it some legitimacy in terms of “Not a complete waste of time”.

User Agents can be faked, IP addresses masked, but URLS are URLs and Query Strings are Query Strings, and nefarious ne'erdowells looking for low hanging fruit will often scan for unsecured admin URLs or have go at passing parameters in URLs that could cause database connections to either fail and expose their access credentials, or their entire contents. DyadStats now blocks requests that include dodgy query strings (eg ?expose=yoursecrets) or slugs (url sections e.g /slug-a/slug-b/).

If you know what you’re doing, it’s unlikely this form of vulnerability scanning would be a threat, but it’s nice to block these requests and return a silent nothing, and rules that block known issues today might just block new ones tomorrow that are quite likely to be cleverer.