Panic From Fuzzy: distributed caching

Showing posts with label distributed caching. Show all posts

Tuesday, May 15, 2007

Emerging from the Google OSS Grid Haze

Looks like GridGain is emerging from their semi-silence:

GridGain Systems announced today availability of the first public release of GridGain project, an enterprise open source grid computing platform for Java. This release culminates 18 months of development and presents a software product with unique set of grid computing features, open source LGPL licensing and clear focus on high-performance Java-based development.

"Almost two years ago we set out to develop a technology that would change the enterprise Java grid computing landscape in much the same way Spring and JBoss have changed the J2EE market through simplification and focusing on a developer. Today we are releasing our project that is based on proven professional open source model, business and community friendly LGPL licensing, and the host of unique grid computing features", said Nikita Ivanov, founder of GridGain project.

Tangosol (now owned by Oracle) and others gave us the Java "Data Grid". Perhaps GridGain can give us the "Compute Grid" part. This is what we struggled with in looking at data grid/caching technologies - you still needed the compute part somewhere. Tools like Tangosol excelled in bridging the gap between J2EE performance problems due to data latency issues. But that is only 1/2 of the problem in terms of application grids. You still need to be able to easily add compute nodes to be able to scale.

Without a compute grid, you need to rely on things like JMS or other bits of J2EE. This is of course fine in an evolution approach - just doesn't seem to fit right in a new large system (at least for me).

Could this be an application grid without the Jini/JavaSpaces baggage? Don't know - certainly borrows some Jini features (code mobility). They know how to document things and have a roadmap and bits so that is a start ;) Message to Apache River: see I'm not kidding, the clock is going tick, tick, tick - the future is coming.

This should be interesting to watch.

Saturday, March 24, 2007

Oracle buys Tangosol

Wow - that was fast.

This was discussed a while back on this TheServerSide thread.

I sure hope Oracle invests in the product and doesn't slow bleed it or margainalize it. Of course that never happens.

I stand by my comment from February (link above) on why commercial proprietary software is at the bottom of my preference list.

. . . Back to GigaSpaces and Tangosol. They both have compelling technology, but both are small vendors. Where will this technology go? Will someone buy it to kill it? Will they buy it to grow it? Will they remain independent? Who knows. If it were open, well it doesn't quite matter. I can happily pay them now and perhaps someone else later - or no one if I can support it myself.

Monday, February 05, 2007

Shallow vs. Deep Entry Models & Mapping

Ok so guess what - JavaSpaces isn't a silver bullet! Shocker.

I am fond of talking of two big problems I have had in the past with ESB style architectures: excessive mapping and issues managing state (active/transient state vs. steady state/long-lived storage).

The mapping you deal with in an ESB includes object to XML (OX) and object to relational (OR) and various object to object mappings. Update (few hours later) - forgot one: And if it is a message-centric ESB, active/transient state to your destination names (e.g., hierarchical Topic names).

It isn't surprising, but I'm finding that all of the same object to object mapping is still there with JavaSpaces + an object to object equivalent of OX.

The ESB systems I have worked on in the past had a canonical message format. All messages that bounced around the ESB conformed to this model. Some of the messages were small validations of particular data (e.g., address scrub). The main work flows, however, were large XML documents loosely following the ACORD XML standard (insurance industry XML guideline).

With JavaSpaces, the patterns you read about use distributed data structures. Instead of having a large Entry in the space you have many that are related to each other. You do this for transactional reasons, serialization reasons, flexibility reasons etc. Some of the objects I deal with are indeed very large. Doing a take on a large object only to navigate the object graph and flip a bit and do another write isn't performant or flexible.

So the "shallow" model (lots of small interrelated Entries) works great for UI input, address validation etc.

But there are also areas of the system (some key work flows) that would benefit from a canonical form - the "deep" Entry model. Rather than having each worker gather all the Entries up to make sense of them, you do this once and then write the deep Entry to the space and let various workers use it to walk it through a work flow.

But now you have to render it back to your UI that expects a shallow representation.

So the deeper I get the more I realize that JavaSpaces do not make the mapping angst I have encountered go away. The mapping is still there. It is certainly improved because it is object to object mapping that is easy to write unit tests for etc., but it is still there. Update (few hours later) And there isn't any active/transient state mapping to destination names so at least one is eliminated entirely :)

These same issues are also present with a distributed caching solution I believe.

Do I have this right or am I missing something?

Saturday, February 03, 2007

People are Talking

People are Talking. Talking about JavaSpaces & distributed caching.

As my co-worker Ed would say, the future is coming!! And so when are the big boys going to notice that the future is coming? Hmmmmm.

Perhaps this technology will be in a few Plan B MRDs this year.

Did Cameron really say this:

Our reason for considering the introduction of JavaSpaces into Coherence is to allow programmers to use the spaces model and the JavaSpaces API to code parts of those transactions. While JavaSpaces is not an effective data management API (i.e. it's not good for replacing a database), it is an effective data processing API (i.e. it can easily be used for computational processing in a grid). So Coherence can certainly bring those two concepts together (data managed by a database and processed using a spaces approach).

Peace,

Cameron Purdy Tangosol Coherence: The Java Data Grid

Yes he did! He authenticated it with the "Peace" key!

Should be interesting to watch - perhaps embedding Blitz? Why don't you OSS Coherence while you are at it? People still need support. Look at JBoss.

Friday, January 26, 2007

Best TheServerSide Post EVER

This is the best TheServerSide post/thread I have ever seen. I have been reading TheServerSide for years and years. Not as much the last couple of years, but I still check in on it 1x ever couple weeks or so.

On a side note, we spoke with Cameron Purdy briefly today. We really wanted to tell him that we were looking at NCache, but I didn't have the guts.

I did, however, open and close the call with "peace". I just couldn't resist. I'm sure he gets that a lot - oh well, he made his choices on that a long time ago ;)

Wednesday, January 10, 2007

Distributed Caching != JavaSpaces

I have been talking about distributing caching and JavaSpaces lately.

While you can use JavaSpaces as a caching technology because a JavaSpace keeps stateful objects (Entries) in memory and can be persistent (i.e., survive a failure), it isn't necessarily the best alternative.

Conversely, while you could use a distributed cache as a service orchestration engine because it has an eventing model in it & you could hack master/worker on top of it, it probably isn't the best choice.

The combination of a good distributed cache and a good JavaSpaces implementation, however, may be a good combination.

There certainly is overlap & you have to be fairly deliberate in figuring out which tool you want to use for certain things, but it seems achievable.

A distributed cache is likely the best place to store reference data, hard to query data (e.g., from a mainframe), user session data, etc. where a JavaSpace is probably the best place to store transient conversational service state.

It is the combination of these technologies that allows you to avoid the dreaded work in process database.

More importantly, both technologies use Java POJOs rather then XML so you can avoid a lot of mapping complexity (certainly not all) within the core of a large application.

Anyway, I still have a lot to learn, but am beginning to settle on this distinction.

Saturday, January 06, 2007

Simplicity & Integrating with Less Code

I have been on a distributed caching & JavaSpaces kick lately due to the reasons listened here.

I was talking to my co-worker Erik yesterday and we concurred that one of our biggest goals is to come up with an architecture that is as simple as possible and requires the least amount of code.

Coming from my experience with SOA of various flavors and EDA, I have been on many projects that wrote way too much code. The more code the more defects. This is what is so appealing to me about JavaSpaces and potentially distributed caching (e.g., memcached, Tangosol Coherence) (or some of both). In the core of an application it is possible to use these layers as your service orchestration tier as well as your transient data store. This is very appealing as you skip a significant amount of OR and OX mapping which saves heaps of time and defects.

The JavaSpaces API itself is extremely seductive. Lifted from here:

write: Places one copy of an entry into a space. If called multiple times with the same entry, then multiple copies of the entry are written into the space.

read: Takes an entry that is used as a template and returns a copy of an object in the space that matches the template. If no matching objects are in the space, then read may wait a user-specified amount of time until a matching entry arrives in the space.

take: Works like read, except that the matching entry is removed from the space and returned as the result of the take.

notify: Takes a template and an object and asks the space to notify the object whenever entries that match the template are added to the space. This notification mechanism, which is built on Jini's distributed event model, is useful when you want to interact with a space using a reactive style of programming.

snapshot: Provides a method of minimizing the serialization that occurs whenever entries or templates are used; you can use snapshot to optimize certain entry usage patterns in your applications. We will cover this method in detail in a later article.

With the addition of JavaSpaces05, there is collection based (i.e., bulk) read, take, and write.

The integration patterns possible with this API are very powerful. Master/Worker, distributed, highly scalable data structures, etc.

It appears that with certain distributed caching vendors, you can achieve some semblence of these patterns, although I am not convinced of that yet, and the associative nature of JavaSpaces (i.e., you use templates containing objects with what you are looking for in a space to find it) is just amazingly seductive.

Love it or hate it, JavaSpaces comes with Jini. To a newbie like myself, Jini takes some getting used to. The whole mobile code bit tends to make people who have lived through J2EE classloader hell nervous. But there are those who say it works fine in Jini. It is a different paradigm and this makes people very nervous.

Maybe what is needed is a mix of J2EE (specifically servlet, JMS, JMX MBeans), distributed caching, and JavaSpaces. Maybe you can use JavaSpaces for both caching and service orchestration. Maybe you should use all of Jini. Maybe there is something else.

From previous lessons learned around persistence in EDA, I am 100% convinced that we need some sort of flexible transient data store (no work in progress database - I beg you!). I see that today as being some form of a persistent distributed cache that requires very little if any mapping code. I also have seen the power of asynchronous integration and SEDA and am not about to give up on it. JavaSpaces and Master/Worker appear to give me that. What other ways are there? Maybe things that I would have considered anathema a year ago like sending objects through JMS for the eventing layer isn't really that awful? I do know that I do not want XML at the core of a brand new system again - again way too many defects with mapping etc. Too slow, too lame. Sure, at the periphery to integrate with services that require it, but never again in the core of the system.

Lots of thinking results in lots of blather . . .

Bottom line, I want this to be as simple as it can be and I want to write as little code as possible. Oh and it should scale like the dickens, be simple to specialize, maintain, etc.

Panic From Fuzzy