Economics of In-Memory Computing

Few real-life facts about the economics of In-Memory Computing that I’ve recently observed when talking to our customers and users that put the concepts into simple perspective…

Let’s just say you one of the few dozens startups sprung just last week building social analytics platform and Twitter’s firehose is your main source for streaming data. Few facts:

  • Twitter currently produces ~177 millions twits per day globally.
  • Let’s assume that you can store a tweet and all its meta information in roughly 512 bytes.
  • Your typical “working set” is 2 weeks of tweets.

Simple calculation shows that in order to keep this data in-memory you’ll need to have a cluster with total of ~ 1TB of RAM to keep all this information in memory.

In other words, you can keep 2 weeks of all tweets in memory as long you have in-memory data grid with 1TB total capacity.

Now – how much does this cost today? You can easily buy a new single Xeon-based blade with 64GB RAM for ~$1500 (on eBay and elsewhere). Which brings the hardware cost for this cluster to roughly $30,000 (around 20 servers).

Let me say it again: you can buy a new cluster with 1TB RAM capacity for ~$30K today (in US).

It will take probably about a day to setup and properly configure GridGain on this cluster and you’ll have a full featured in-memory platform with 1TB RAM data capacity and ~20 parallel Xeon computing capacity.

Given the fact that modern RAM access is up to 10,000,000 times faster than disk access – you can start analyzing your 2 weeks of tweeter data in real time in no time at all!

That’s the power of in-memory computing…

“Modern distributed HPC with Scala”, June 5, NYC Scala Meetup

We’ll be talking about using GridGain for highly distributed HPC programming with Scala.

As one of the main example we will be walking through a realtime word counting program with constantly changing text and will compare it with a Hadoop word counting example. You will see some cool features of GridGain such as Auto-Discovery, Streaming MapReaduce, Zero Deployment, Distributed Data Partitioning, and In-Memory SQL Queries — all coded live in the presentation. Towards the end we will have an extensive Q&A session and cover some interesting real life use cases.

All detailes are here: http://www.meetup.com/ny-scala/events/63655442/