GridGain 3.5 – Preview Of What’s Coming

GridGain 3.2

With GridGain 3.5 is just around the corner I want to take time and preview some of the changes that are coming.

Availability
We expect to push Enterprise Edition GA release of GridGain 3.5 in the next 30-45 days. Community Edition will be released later on with possibly some changes being back ported by that time – in which case we’ll release it the same day.

Deprecations
As we noted in our monthly newsletter GridGain 3.5 is the first release since GridGain 2.0 came out where we will be removing most of the deprecated APIs. In most cases deprecations were for non-uniformed naming that we wanted to fix but there will be some API changes. Specifically, JEXL-based predicates and Collision SPI will be changed.

Some of the older 3rd party integrations that we no longer support or can recommend will also be removed (that including JBoss Cache, JGroups communication, Mule and some of JMS providers). In all cases – our updated TCP/IP discovery and communication SPI provide much superior functionality, performance and stability for GridGain.

For all our Enterprise Clients we will provide close hands-on support in cases when non-trivial changes are required during migration.

Performance Improvements
One of the key themes for GridGain 3.5 is cumulative performance & stability improvements. We’ve worked with several of our customers this last quarter that utilized GridGain is rather edge scenarios that allowed us to really fine tune the performance characteristic. We’ll be introducing number of new configuration properties as well as design changes to GridGain such as changes to Collision an SwapSpace SPIs.

It is interesting to see how edge cases start popping when our software being used in real-time scenarios on 1000s of nodes. Achieving linear scalability is becoming progressively hard in these cases but I’m proud that GridGain is bucking this trend.

New Additions
We’ll also introduce enhancements to our current APIs.

First of all, we’ll clean up the monadic support in Scalar, our Scala-based DSL so that Scalar projections can be used in standard Scala for-loop comprehensions.

We are also adding API-level support for affinity-based co-location. You can already use simple annotations to provide automatic affinity-based co-location – but the new APIs will make it all but effortless. You will be able to co-locate any closure or a task with any set of key in the in-memory data grid in a single call. I’ll detail it in one of the blogs later on.

Overall – GridGain 3.5 is looking to be an important milestone for GridGain eco-system.

Enjoy!

Real-Time – A New Era of Cloud Applications

There’s a significant shift that has been happening in the last 12 months for many, if not all, BigData and BigCompute cloud applications – a shift to real-time processing.

This shift is nothing short of tectonic change and it is disrupting many software design approaches that are utilized today.

Now, when we talk about real-time processing we, of course, mean a near real-time (nR/T) since nothing can be really real-time in JVM world. Essentially, anything that can be processed within a reasonable user response time expectation (typically no longer than a coupe of seconds) can be considered a real-time for enterprise applications.

…Many analysts first got a hunch of this change when Google decided to drop a batch-oriented MapReduce design towards more real-time approach in their search implementations with what they call a Streaming MapReduce. Facebook followed earlier this year with dumping Hadoop-like processing in favor of different design that would have finally allowed them to tackle real-time performance.

Now, why all the fuss?

Fundamentally, the answer is pretty simple. First, just look around at devices and services you use every day: your TV, your iPhone or Android, Google or Bing, Facebook and Twitter, eBay and Amazon… Apart from slow internet connections what was the last time you needed to wait for 10 or even 5 seconds to get your result?

Your TV switches program instantly, Google and Bing return search results within a few seconds at most, almost all of the apps on iPhone and Android work in real-time (or it seems so), eBay processes your bids seemingly in real-time, and Amazon can put suggestions for you to purchase instantly. So as everyday users of these devices and services we are accustomed to instant response or… a real-time capabilities of these services.

However, when we apply the same expectation to today’s enterprise and business applications the picture is very different. And while delays in consumer devices and services lead to mostly frustrations – the delays in business applications often lead to broken business processes and significant revenue loss. Just a few real-life examples we at GridGain have witnessed:

In insurance industry many complex products cannot be currently priced or quoted on the spot (i.e. while having customer on the phone) because they require compute and data intensive processing and are usually done overnight. Sales reps have to hang up on customer and promise him to call back with the numbers next day (or worse – send in a letter).

Up to 30% of customers lost due to this awkward process.

In investment banks and hedge funds automated or algorithmic trading is often done on models that are regenerated overnight or even less frequently – typically as part of pre-trade activity. Options and futures are prime examples… If market conditions change beyond the model’s parameters from pre-trade the auto-trading may be stopped all together since models are no longer valid – hence the loss of the revenue. What’s even worse is that less than critical deviation on the market are not accounted in rigid models and revenue is lost still even if trading continues.

Quite simply – inability to maintain complex quantitative financial models live in real-time is the main reason for this obvious hole in otherwise highly effective financial world.

But how do you implement complex business algorithms in real-time?

The answer is the ability to massively parallelize the business algorithm in such a way that its processing happens entirely in memory and can linearly scale up (and down) on demand. 


There are three axiomatic principles that you need to follow to achieve that:

  • You have to be able to parallelize the computational side of your algorithm
  • You have to be able to parallelize (or partition) the in-memory storage of the data your algorithm needs
  • You have to be able to co-located the computations with the data they need

Few important notes:

  1. It is critically important that your task support algorithmic parallelization. Not all tasks can be parallelized and therefore not all tasks can be optimized for real-time processing. However, many of the typical business tasks can be split into multiple sub-tasks executing in parallel – and therefore are parallelizable.
  2. Data have to be partitioned and stored in-memory. Any outside calls to get data from NoSQL, file systems like HDFS or traditional SQL storage renders any real-time attempts useless. This is one of the most critical element and often overlooked.

    

In other words – in no time the processing of a sub-tasks should escape the boundaries of the local JVM it is executing on.

  3. Co-location of the computation and data (a.k.a affinity-based routing referring to the fact that there’s an obvious affinity between the computation and the data this computation needs) is the main mechanism to ensure that there is no noise data exchange between nodes in the clouds when a real-time tasks is being processed. 

As we noted above such noise exchange will violate the rule of not escaping the local JVM during the processing thus making real-time processing impossible.

We at GridGain have been working on real-time BigData and BigCompute processing for several years now. These ideas led to develop the first middleware that natively combines both Compute Grid and In-Memory Data Grid into one product – making an ideal middleware software to build real-time cloud applications.

Using GridGain you can easily build systems that span 100s and 1000s of nodes while maintaining all necessary data cached in-memory and all computational processing fully parallelized and co-located.

GridGain 3.2 Hits the GA!

GridGain 3.2

We’ve just released GridGain 3.2 – our latest stable release of GridGain’s High Performance Cloud Computing platform that allows anyone easily develop, scale and manage compute and data intensive JVM based applications using integrated Compute and In-Memory Data Grid.

Some of the key new features and enhancements:

  • Improvements in stability and performance for In-Memory Data Grid
  • Bug fixed in Distributed Data Structures
  • Enhanced distributed closure execution with zero deployment
  • Improved documentation and examples

Enjoy!