In-browser analytics with large client-side datasets are now feasible and will form an important new trend for the web.
Typically now webpages load and display tens of records at a time. I believe loading many thousands of records at a time will become more and more common with the aggregation and filtering then performed in the browser.
The Current Situation
Many websites have consist of simple drilldowns. For example on my online banking site I see a page with a summary of my recent recent transactions and links to load a new page with more detailed information. e.g. summary page -> individual statement page -> individual transaction page
More recently there has been a growing trend for single page applications - however the underlying services often still reflect the old model. For example a RESTful api to the banking example could consist of linked summary resources, statement resources and transactions resources with each drilldown causing another roundtrip to the browser.
This simple drilldown approach often makes sense and will never be fully replaced. However I believe there is an alternative approach that is beginning to make more sense -
load all the data and perform the necessary aggregations and display filtering on the client. For example this
crossfilter demo loads 250,000 rows of data and allows interactive filtering and drill down with instantaneous results once the data has been loaded.
Why Change?
The simple answer is that in-browser analytics with large client-side datasets can produce better a user experience.
In my online banking example if all transactions were loaded on the client then monthly statements are just an arbitrary choice. If I want to see what transactions I made on 10 day vacation I can do it instantly. Likewise how much money have ever spent in Starbucks - a near instant result is available.
In short - by having all the data available in-browser, all interactions with the data can become quicker and certain interactions that were previously impossible become possible. Consider watching your data dynamically change when moving a slider over a date range in comparison to the typical interaction of choosing a date and pressing submit.
Why Now?
Until now In-browser Analytics had many technical obstacles however a number of changes have now made this approach viable:
- Increased Connection Speeds. Downloading a Megabyte of data can be completed in seconds on broadband connections. A single megabyte of compressed data can store hundreds of thousands of rows of data. Utilising Local Storage with incremental updates can also help reduce load-times (and provide an offline capability for mobile users). In addition, from personal experience users are often happy to accept an initial few seconds delay if they have the knowledge that all interactions are lightening fast afterwards.
- Increased Javascript performance. The ongoing browser speed wars have brought about huge speedups - http://whyeye.org/browsers/history-of-javascript-performance-chrome/ This is in addition to gains due to Moore's law. Increased use of HTML5's Webworkers will even allow web-apps to start exploiting multiple cores. In short - modern browsers are now really, really fast.
- Better Browser Visualisation. Frameworks like D3, Highcharts, Raphael can now routinely handle very large datasets see: http://www.highcharts.com/stock/demo/data-grouping.
- Better Rich Browser Applications. The filtering and display of large datasets requires complex performant UI frameworks. In addition loading a large dataset only really makes sense if you are going to perform multiple interactions on it - hence the need for a single page app. The recent explosion of rich JavaScript application frameworks has provided the necessary capabilities here - http://blog.stevensanderson.com/2012/08/01/rich-javascript-applications-the-seven-frameworks-throne-of-js-2012/
- Better Data Handling. Traditionally all heavy analytics were performed server side and the results exposed with web services. Now there is a growing ecosystem of frameworks to handle large datasets client side. See crossfilter, gauss and jstat
Why Haven't We Seen More of This?
This approach is currently quite niche. Many of the components required are still relatively immature and limited to modern browsers and fast connections. Therefore I think In-Browser Analytics may initially find its most fertile ground for internal webapps. In my current role I've focussed on creating analytics websites for internal use only. Due to a managed browser ecosystem (Chrome only) and fast internal networks handling large datasets client-side is made a lot easier.
There is also a mindset change required. Try telling your fellow developers that your webapp loads a whole years worth of financial records (over 200,000 rows) and then performs statistical analysis on it in JavaScript . This will lead to lots of raised eyebrows. The users don't care though - they just appreciate the fast interface.
The approach is also best suited to datasets where some statistical summaries can be made. e.g. loading 50,000 book reviews in one go is unlikely to be useful as the user will never read the vast majority of them. This will limit its domain somewhat.
What Next?
Join the revolution! Build some In-Browser Analytic Applications
( IBAA? - this really needs a better acronym...)