Monday, April 24, 2006

EDA Lessons Learned - Persistence

See EDA Lessons Learned for the list

Our EDA is Stream Event Processing based.

There are *lots and lots* of events. It blows your mind to listen to all of the events at once (just subscribe to # in SonicMQ). It is like watching the Matrix. Most of the events aren't terribly meaningful to a typical subscriber. Many of the events are only used in the source system in "event workflows".

EDA is highly distributed. Our system has many different processes and machines publishing and subscribing to events. Persistence is very easy to get wrong in an environment like this. The first lesson is to put enough data in each event so that subscribers of the event do not need to gather more data from the source. Also, in terms of inserts and updates, you have to be careful of table / row locks etc. If you are not careful, you'll have N processes with N threads on N machines trying to hit the same data source.

For us, a lot of these "ordinary / non-notable to downstream service events" are used in event workflows. State is important - you have to put it somewhere. The key is to think very carefully about where you put it and non screw it up.

Here is a list of some general thoughts on the subject:

  • As mentioned above, put the appropriate amount of data in each event so that subscribers do not need to gather more data from the sending system
  • If you are using persistent messaging & durable subscriptions as your EDA backbone, trust it! You do not need to persist state in each service that handles an event. If you do, you are going to regret it
  • XML shredding is expensive and results in a high defect rate (on the way in and out)
  • Separate transient (i.e., part of event workflow), terminal state (i.e., completed event workflow), and reporting data bases (ODS, Data ware house, OLAP)
  • Beware of O/R tools
    1. Work ok in experienced hands, but caching difficult if multiple services in different JVMs have separate cache, mistakes can be hard to fix
    2. Use proper granularity
    3. Emitted SQL not always performant
    4. Often ends up being more complicated then JDBC
  • Consider XML database, high performance reliable file system, or caching solution for transient data


John J Wright said...

Hi Mike,

My name is John Wright and I find your blog very helpful and informative. We have similar backgrounds. I recently started working for CapeClear Software and am interested in your EDA experience but have found all of your entries to be very thought provoking.


Cameron Purdy said...

I've had very similar experiences to what you describe.


fuzzy said...

Hey thanks John - glad to hear that it is useful.