Dennis Ritchie – RIP

I’ve got Dennis Ritchie’s (and Brian Kernighan’s) book “The C Programming Language” when I was 15. I remember it quite well because I got in English (it wasn’t translate yet to Russian) and it took me a while to read it (often with vocabulary). It took me a lot longer to learn C and follow it through coding… I’ve had AT&T 7300 Unix workstation at home that I was hacking everyday after school. Graphics were pretty pail comparing to Amiga/Atari (which interestingly were all based on the same Motorola 68000 CISC processor) – but it was genuine Unix environment and I believe it’s what got me hooked every since.

I think that book along with Rob Pike’s (and again along with Brian Kernighan) book “The Unix Programming Environment” were the two most influential books in my early years – if not overall. Somehow they opened up the inner-beauty and almost an mathematical elegance of Unix and its main language – C language.

Since I was 15 till early 20 I was writing programs exclusively in C/C++ and Unix – thanks to wonderful language created by Dennis Ritchie and his equally wonderful book.

It’s sad to see the legend go…

PDC Theorem – The Underlying Principle of Real-Time Distributed Processing

Few months ago I blogged about real-time processing in enterprise systems of all stripes, and since then I’ve talked quite a bit about the underlying principle of such processing. I call it a PDC theorem as in “Processing, Data, and Co-Location”.

The idea behind PDC is pretty simple. Real-time response in a highly distributed system is not achievable unless the following 3 rules are followed:

  • Processing must be distributable for in-memory computation
  • Data storage must be distributable (i.e. partitioned) for in-memory storage
  • Co-location must be ensured between processing and data units to provide locality of remote operations

Few important notes:

  • We, of course, are talking about business or perceptual real-time (a.k.a. Near Real-Time or nRT) and not about hardware real-time. Perceptual real-time response is not well defined but you can conceptually visualize it as the time the user of the system willing to wait for the response that he or she expects right away… In most cases it means few seconds or less. In rarer cases like FOREX trading, for example, the real-time would mean microseconds.
  • It is critically important that your processing supports algorithmic parallelization. Not all tasks can be parallelized and therefore not all tasks can be optimized for real-time processing. However, many of the typical business and social graph tasks can be split into multiple sub-tasks executing in parallel – and therefore are trivially parallelizable.
  • Data have to be partitioned and stored in-memory. Any outside calls to get data from NoSQL storage, file systems like HDFS, or traditional SQL storage renders any real-time attempts useless in most cases. This is one of the most critical design element and it is often overlooked. In other words – in no time the remote processing should escape the boundaries of the local JVM it is executing on.
  • Co-location of the processing and data (a.k.a affinity-based routing referring to the fact that should be an affinity between the computation and the data this computation needs) is the main mechanism to ensure that there is no noise data transfer between remote nodes where a task is being processed. Such unnecessary data transfer will violate the locality principle of the remote operations making real-time processing often unachievable.

It’s also quite obvious that PDC theorem doesn’t guarantee the real-time processing – it merely states these three rules are necessary but not enough on their own for a real-time response. Latencies of the atomic remote operations will often dictate whether or not real-time response is achievable in practice.

It is interesting to note that combination of PDC and CAP theorems really defines the fundamentals of high performance distributed programming today.

Presenting at Silicon Valley Code Camp 2011 – Sunday 1pm

I think for the 4th year in the row GridGain will be presenting at Silicon Valley Code Camp. With over 400 tracks – this is one of the largest events in the valley (and zero travel for me which is a huge bonus!).

My talk “Distributed Programming with Scala and GridGain” will be held on Sunday at 1pm. Presentation will be a new format – zero slides & 100% live coding. As always – nothing is prepared and you’ll see everything that goes into creating several cool Scala-based highly distributed applications from scratch.

If you are around come to see my presentation – it’s always fun.