Wednesday, January 10, 2007

Distributed Caching != JavaSpaces

I have been talking about distributing caching and JavaSpaces lately.

While you can use JavaSpaces as a caching technology because a JavaSpace keeps stateful objects (Entries) in memory and can be persistent (i.e., survive a failure), it isn't necessarily the best alternative.

Conversely, while you could use a distributed cache as a service orchestration engine because it has an eventing model in it & you could hack master/worker on top of it, it probably isn't the best choice.

The combination of a good distributed cache and a good JavaSpaces implementation, however, may be a good combination.

There certainly is overlap & you have to be fairly deliberate in figuring out which tool you want to use for certain things, but it seems achievable.

A distributed cache is likely the best place to store reference data, hard to query data (e.g., from a mainframe), user session data, etc. where a JavaSpace is probably the best place to store transient conversational service state.

It is the combination of these technologies that allows you to avoid the dreaded work in process database.

More importantly, both technologies use Java POJOs rather then XML so you can avoid a lot of mapping complexity (certainly not all) within the core of a large application.

Anyway, I still have a lot to learn, but am beginning to settle on this distinction.


Holger Hoffstätte said...

Excellent post! Most people who equate caches with JavaSpaces seem to miss the obvious, namely that their respective query capabilities are so fundamentally different. Most caches have by nature a 1:1 mapping; this puts a *fundamental* constraint on the client since in order to retrieve an item you need to _know the key_. Sounds like a small detail but unfortunately makes a world of difference, because in many scenarios you do not know the exact key; you know one or more query attributes.
I find it interesting to see that Tangosol have added query capability to a previously purely 1:1 associative model; essentially this is a concession to JavaSpaces and the recognition that a pure key/value mapping is often just not good enough.
Management summary: use caches when you know what to look for; use JavaSpaces when you need queries.

Anonymous said...

In your opinion, in the open source world, what is the best distributed cache available ?.

By best i mean something with a high performance, able to cache several hundreds of megabytes of data.