In-memory processing is becoming a business necessity in a similar way as collecting and processing ever increasing data sets (a.k.a Big Data) has become a business “must have” rather than just a simple technology in the last five years. Both of these trends are intervened in an interesting ways. Let me explain…
1. Storing Necessitates Processing
The initial foray into BigData for many companies was all about storing the data and then some rudimentary processing that most of the time resulted in some trivialized analytics run on log files, purchase history, and similar type of data (that’s what 90% of analytics are still doing today if you ask people on the “inside”). As the amount of data stored kept growing (as well as associated direct and indirect cost) the IT departments were more and more pressured to get deeper and more actionable, i.e. operational, insights and provide more meaningful results from collected data. That meant more, a lot more, processing.
2. World Is Rapidly Becoming Real Time
As we are pressured for more and more processing we are facing yet another evolution. It would be an understatement to say that one of the most radical evolutions in IT these days is a torrent-like move into “now” data processing, i.e. processing live data stream and existing working sets in real time. Ask yourself this question: “Do you know any business that would make their IT systems NOT real time if the price of making them batch/ETL or real time would be same”. The answer is no.
In the age of real time ad serving, hyper-local advertising, instant sentiment analysis, 24/7 financial trading, global arbitrage, operational BI/analytics, instant mobile applications rapidly growing in processing complexity, geo-based merchant platform, and many, many other systems in place today – what business would specifically lock itself out of these advances, new business opportunities or competitive advantages?
Instant, real time data processing is the reality today and a massive force to reckon in coming years. Businesses that will lag behind and rely on data processing where customers or systems will have to “wait” to get their answers – will be simply swept away.
3. In-Memory Processing Is The Only Answer
This sounds rather bullish but it is a technological reality. There is no other technology in the foreseeable future (that we know of) that would provide enough processing performance to deal with ever increasing amount of data we need to process. Consider this fact: RAM access is up to 10,000,000 (!) faster than access to disk, the next storage layer where we can store the date (and where we’ve been storing data in the last 25 years, and before that we were using tapes…). There’s simply nothing else commercially available today or in the nearest future that approaches that performance differential.
We simply have to adjust to what Gartner calls a new tenet of the data processing: “RAM is a new disk, and disk is a new tape”.
4. RAM Pricing Dropping 30% Every 18 Month
The economics behind in-memory processing is finally hitting wide adoption curve:
- 1GB of RAM costs today less than $1.
- The rack with 10 blades, 50 processing cores and total RAM capacity of 1TB can be purchased today for less than $50,000 – the price point that was almost 10x times 10 years ago.
For $500,000 investment a company can have 10TB of RAM (along with associated CPU power and slower disk storage) for in-memory processing of a working set of data. 10TB is considered to be a typical working set size of most of the today large big data installations – and having it in memory enables real time, sub-second processing of this data set.
5. Software and Hardware Availability
Finally, the hardware and software is catching up to enable the massively parallel in-memory processing of large data sets. Consider these facts:
- Typical commodity hardware today has 8-24 physical cores (DELL R410, R610 lines of rack servers costing in $2,500-4,000 range with 64GB of RAM). Having physical parallelization capability is essential for effective utilization of local RAM.
- 64-bit CPUs (found in almost any new consumer laptop today) can address up to 16 Exabytes of data – enough to address all data in the world today (just by 1 CPU).
- Most operating systems (like modern Linux and Windows) provide robust support for advanced parallelization as well as support for necessary application development eco-systems (Java and .NET)
- New type of software middleware developed specifically to deal with in-memory processing has been introduced and matured over the last couple of years. GridGain, SAP HANA, Oracle Coherence – all provide sophisticated capabilities for in-memory processing.