Wednesday, October 5, 2011

Node.JS-based real-time web tracking

This all started out of a need. A need for a statistics service that did not suck and that could handle our quite intense traffic (we run at 1500 req/sec max). Google Analytics failed miserably, GetClicky failed a little less miserably and other options were quite simply to expensive for our taste.

Enter Node.JS and a little bit of pre-alpha code known as Hummingbird. I'll let Hummingbird introduce it self:

"Hummingbird lets you see how visitors are interacting with your website in real time. And by “real time” we don’t mean it refreshes every 5 minutes—WebSockets enable Hummingbird to update 20 times per second. Hummingbird is built on top of Node.js, a new javascript web toolkit that can handle large amounts of traffic and many concurrent users."

Sounds good to me, where do I sign up? Well, as is customary these days, this project is on github and is installed by first cloning the repo and then running "npm install" in the hummingbird subdir. Of course, you need to have a recent node.js core, mongodb and libgeoip-dev 1.4.7+ if you want to get ip-location working (hummingbird has a nice little map display in the demo setup). Unfortunately there is not much documentation written (yet) so be prepared for a lot of debugging and guesswork if you want it to do anything besides the graph / total / map that's included in the demo.
Note: On Debian Lenny, libgeoip is at 1.4.4 which means you'll have to load it from lenny-backports to get the correct version (1.4.7+). Instructions for installing from backports can be found here: http://backports-master.debian.org/Instructions/

So how does it work? Well, the core is really a pixeltracker - including a 1x1 pixel from the system in your content will register a view with hummingbird, in effect saving a little bit of data to our mongodb collection. Hummingbird also runs a dashboard (if enabled) that can be viewed via http. The dashboard utilizes Web Sockets for rapid updates and binds the updates to jQuery elements on the dashboard page.

This is about as far as I have gotten right now. The lack of documentation makes it a bit hard to develop more widgets for the dashboard but that's nothing a little trial-and-error won't fix :)

Update:
Managed to get another piece of the puzzle working - a custom widget for the dashboard page. I started with the backend connection by combining the code for cart_adds.js and total_views.js to get a starting point for an event-driven counter widget. Followed up by creating a very rudimentary widget endpoint that picks up on whatever is processed on the backend by hooking into the event system. So any tracking that has a specific event attached to it (basically parts of the query string for the tracking pixel) will result in a display on the dashboard that reacts to new hits by incrementing a value or otherwise adjusting the style or layout of the widget. Nothing new really but still a valuable tool for our editors who are constantly chasing those extra page views.

I'll update again once I have a better implementation of it all.