Saturday, April 29, 2006

EDA Lessons Learned - Logging

See EDA Lessons Learned for the list

This is a small one, but better to get this right from the beginning - we didn't, but are almost done fixing it. And this isn't really EDA specific - its applicable to any large scale distributed programming effort. Our problem was that various developers logged different things. Most used log4j, some didn't. Some configured it one way, others another. Some logged everything to INFO - some used DEBUG properly ... What you want is uniform logging.

We have approximately 50 different services. Each service is typically an event source and sink. Some are just event sources, others just sinks.

Our system has an admin GUI that lets you see the last 10 events (configurable). The runtime just keeps various buffers for incoming and outgoing events. When a user clicks on a particular service, he sees the incoming/outgoing events.

But logging to log files is a different issue. Obviously, this is where you go looking when there is a problem in the system.

Due to privacy / regulatory requirements, we have to be careful what we log. When there is an error, however, you want to know as much as possible.

Here are some thoughts on how to log. As I said, better to do this in development then to dedicate a large portion of a maintenance release (like we did) to getting it right:

  • Write a brief doc / WIKI entry on logging policy. Get agreement on what categories to log to and when (e.g., DEBUG, INFO, ERROR)
  • Service entry/exit logging
    1. In the original event source, generate a UUID. Put this in the event header
    2. Log the UUID, event type, status (if applicable), and other appropriate meta data
  • If you must log the payload of the event when there is an error, scrub the private information (e.g., SSN, DOB, etc.)
  • If you are using error queues, you have no reason to log the event body. Just log the stack trace, event meta data, and event history. The error message should also contain the stack trace, event meta data, and event history. If you have an Error Queue Admin tool like we do - it will protect the sensitive information
  • Keep the log config files out of the artifacts (e.g., .jar) that you deploy so that if there is panic in prod, you can turn on DEBUG by editing the config
Like I said, this isn't rocket science, but I figured I would point it out. You'll save yourself some angst down the road by spending a couple hours getting consensus from the beginning.

Friday, April 28, 2006

The Big Boys

Saw IBM's big bet on SOA via Simon Tilkov.

To quote the article:

Clearly SOA is key, if not the core, to IBM's software strategy. We will of course have to wait and see if this "shock and awe" approach to dominating a market will work - not to mention solve the customer problems. I do wonder how 31 separate products (so far!) can really deliver the simplicity and agility that is meant to be at the heart of SOA. More importantly, SOA is about providing a solution, not selling an even more complex collection of products.

I don't really have anything to add to that. I tried to, but I just keep staring at my screen muttering things.

Thursday, April 27, 2006

EDA Lessons Learned - Choose Topics over Queues

See EDA Lessons Learned for the list

If you are using JMS as the back bone of your EDA, choose Topics (i.e., pub/sub) for 99% of your Destinations. I guess that is pretty obvious - I guess it is more applicable to the Messaging-Centric ESB world.

Exceptions where I think Queues are appropriate:

  • Error Queues
  • Existing integrations that have queues (I'd still route it through a Topic before it got there)

There are other times where Queues are appropriate, but just triple check that it is appropriate and if in doubt, use a Topic instead.

Topics have a lot of benefits over Queues:

  • Facilitate event driven architecture
  • Not Point-to-Point (you have a prayer against the n(n-1) problem (where n= # of systems to integrate)
  • Very flexible if using in a event workflow - can fan out/fan in easily
  • Ease of debugging (can snoop on a Topic)
  • Can do 1:M Request/Reply (one request, get N responses)

Also, use hierarchical Topic names. Hierarchical Topic names are great because they:

  1. Help to avoid dependence on message selectors (slower)
  2. Organize things
  3. Enable listening to a pattern - for example:
    If you have Topics in this pattern (several different "EVENTTYPE" values: COMPANYNAME.DIVISION.PRODUCTLINE.PRODUCT.NOUN.EVENTTYPE
    Say you are interested in all of the event types ... you can listen to them like this: COMPANYNAME.DIVISION.PRODUCTLINE.PRODUCT.NOUN.*

Unless you don't need it, use durable subscriptions with your Topics - they work great and will bail you out of some hairy situations. While durable subscriptions were originally intended for momentary connection problems, they can be used to help bridge the EDA & batch world. You can register a durable subscription and then just connect to it at night for instance. You accumulate events all day on the durable subscription and then just drain it as part of the batch run.

To be fair on my Topic preference, I do take advantage of a SonicMQ feature that makes a Topic like a Queue in terms of the ability to have competing consumers (i.e., different connections (can be on different hosts) competing to drain the queue) (Shared Subscriptions). Other vendors do similar things. It is a pity really that Sonic's "Shared Subscriptions" isn't in the JMS spec.

Wednesday, April 26, 2006

SonicMQ 7

Yesterday I watched / listened to an early access web cast on SonicMQ 7.

I have been using SonicMQ for about 5 years. Rock solid - just works. And when it doesn't, their support people are very helpful - they dig into your problem and crush it. If the problem is a defect - they fix it. If the support person can't figure it out, they work with engineering. If they still can't figure it out, they get engineering on the phone. And they keep following up with you. Really can't say enough about the success I have had with the product.

Anyway, there are some improvements that will help me that I am looking forward to. We use pub/sub, durable subscriptions, and Sonic's "shared subscription" feature extensively. They apparently have some fixes/enhancements that will prevent messages from getting trapped on brokers when bad things happen. I have had this happen when consumers of a Topic on a node in a cluster get horked.

They also are introducing (don't know exact name) "MultiPublishers" and "MultiSubscribers". Apparently, this feature was driven by wall street customers who have thousands of Topics. It takes 100ms to register a subscription at start up. This adds up when you have thousands of topics in use. So, my understanding is that with "MultiSubscribers" and "MultiPublishers" you can add Topics to a Set and register them all at once.

They also apparently tricked out the "Continuous Availability Architecture" (CAA) further. This is wicked cool and I don't know anyone else who does it. Basically, you can have a client publishing to BrokerA, kill -9 BrokerA, and the client just fails over to "BrokerB" and continues publishing ... even within a transaction. And this takes 10 minutes to set up. You don't need exact hardware, bios settings, etc. (i.e., hardware based failover). You just configure your setup that way. Another thing that is huge when fault tolerance really matters (more and more these days). You can recover within 10 seconds vs. 10 minutes. Wicked cool.

Anyway, I'm not the type to spaz out about a product like this - I am a huge skeptic from years of being let down by integration companies. Sonic just makes things stupid simple like they should be - so you can get on with SOA & EDA.

Update 17-JUL-06 - Now that it is possible to determine my employer by reviewing my blog posts, I must follow my employer's blog policy.

This is not an endorsement by Liberty Mutual, but my personal view. Vendors are NOT free to quote with attribution.

Tuesday, April 25, 2006

EDA Lessons Learned - Service Contracts & Event Stream Design

See EDA Lessons Learned for the list

The contracts between services are the most important thing there is in either SOA or EDA IMHO. You get this right and your future is bright, get it wrong and it is dim. Pretty much that simple.

I came to SOA/EDA from CORBA, J2EE, and DCOM projects. I thought that IDL was the only way to fly. When WSDL came out, I treated it like any other IDL. I was one of the people who endured significant pain with WSDL when it first came out because I was such a believer in "contract first" design/development.

The first time I started muttering "what is the contract?" when discussing OO design, service design, anything software related really was when I was mimicking one of my mentors (Dave Read when we worked at eXcelon). Just because I knew he was wicked smart and he said that a lot, I made sure I knew what he was talking about and added it to my repertoire.

So I am still a huge fan of contract first development, but it isn't always practical - especially with EDA. For one thing if you wanted to use WSDL today with EDA, you really couldn't as WS-* doesn't support pub/sub yet and WS-Notification & WS-Eventing are being merged into WS-EventNotification. I tend to think they will be too rigid when they are updated, but we'll just have to wait and see. Anyway, just because you don't have an IDL, doesn't mean you shouldn't think about contracts.

Service contracts in EDA aren't usually request/reply like they typically are in SOA. It is easier to get your hands around contracts with request/reply. With event streams, you have N services listening to an event and possibly publishing N events to N destinations. A service may listen to many destinations or a pattern. A service may receive XML documents conforming to different schemas. So you pretty much have to let the whole IDL thing go ... unless you just allow "any" or whatever it is in XSD (which is pointless).

This isn't very concise, but for me, contracts in EDA are:

  • Event input payload(s)
  • Service input destinations (set/patterns)
  • Event output payload(s)
  • Service output destinations (set)

Here are some more details on my opinions on service contracts and event stream design and EDA:

  • Devote appropriate time to service contract/event stream design
    1. Have several sets of experienced eyes review
    2. Review a number of times - allow for "soak time" as my friend Sarge would say
  • The goal is low coupling and high cohesion
  • Anticipate event stream flow changes
  • Consider persistence when thinking about contracts
  • Get the granularity right (narrow vs. coarse) - the truth is somewhere in the middle
  • UML sequence diagrams with embellishments (persistence, state change, etc.) and annotations works very well for technical design and review
  • Great website/book - Enterprise Integration Patterns

Monday, April 24, 2006

EDA Lessons Learned - Persistence

See EDA Lessons Learned for the list

Our EDA is Stream Event Processing based.

There are *lots and lots* of events. It blows your mind to listen to all of the events at once (just subscribe to # in SonicMQ). It is like watching the Matrix. Most of the events aren't terribly meaningful to a typical subscriber. Many of the events are only used in the source system in "event workflows".

EDA is highly distributed. Our system has many different processes and machines publishing and subscribing to events. Persistence is very easy to get wrong in an environment like this. The first lesson is to put enough data in each event so that subscribers of the event do not need to gather more data from the source. Also, in terms of inserts and updates, you have to be careful of table / row locks etc. If you are not careful, you'll have N processes with N threads on N machines trying to hit the same data source.

For us, a lot of these "ordinary / non-notable to downstream service events" are used in event workflows. State is important - you have to put it somewhere. The key is to think very carefully about where you put it and non screw it up.

Here is a list of some general thoughts on the subject:

  • As mentioned above, put the appropriate amount of data in each event so that subscribers do not need to gather more data from the sending system
  • If you are using persistent messaging & durable subscriptions as your EDA backbone, trust it! You do not need to persist state in each service that handles an event. If you do, you are going to regret it
  • XML shredding is expensive and results in a high defect rate (on the way in and out)
  • Separate transient (i.e., part of event workflow), terminal state (i.e., completed event workflow), and reporting data bases (ODS, Data ware house, OLAP)
  • Beware of O/R tools
    1. Work ok in experienced hands, but caching difficult if multiple services in different JVMs have separate cache, mistakes can be hard to fix
    2. Use proper granularity
    3. Emitted SQL not always performant
    4. Often ends up being more complicated then JDBC
  • Consider XML database, high performance reliable file system, or caching solution for transient data

Sunday, April 23, 2006

EDA Lessons Learned - Error Queue Administration

See EDA Lessons Learned for the list

When a service is processing an event and a non-recoverable error is encountered, the poisoned event/message is routed to a named error queue (i.e., not the dead message queue).

It is painful and error prone to handle error queues manually (e.g., command line programs & email requests).

A web based error queue administration tool is very helpful. Due to regulatory requirements, we have authorization requirements on who can see the error queues (i.e., read only) and who can act on the error queue (i.e., edit access). All actions must be logged etc.

Some useful features:

  • View all error queues & error count
  • View queue details (list all errors)
  • View error detail (lists event meta data (e.g., event history, stack trace (root cause of the error), original headers, properties, event body etc.)
  • Redrive event
  • Edit & Redrive event
  • Delete event

EDA Lessons Learned - Event History

See EDA Lessons Learned for the list

Events Driven Architecture is about as loosely coupled as it gets.

One event can result in the generation of other events that can result in still more events ad infinitum. This is how our system works - there are many events. For example, one user click is 1 event, but it very often results in 10+ other events. The "event type" typically stays the same, but different attributes on the event change (e.g., the status). Also, the event is re-published to different destinations.

When there are problems in the system (e.g., an event is routed to an error queue), it is very useful to know where the event came from. Event history is also very useful in bringing new developers up to speed on how the system works. In a system that has many different types of events and many different services processing events, it is very easy to get confused. Also, documentation doesn't always keep up with the system. Event history is a form of system self documentation ;)

We record basic history on events. Each time an event is generated, history is appended to it. We track information like service name, service type, destination name, destination type, and host name. We are using JMS so we record the history in a message property.

OSS CEP

Looks pretty good ... OSS CEP: Esper

EDA Lessons Learned

I put together a presentation at work on some lessons my team learned on EDA/ESB related topics a few months ago. I have made a couple friends recently via my blog. They have expressed an interest in this information. While I can't ship off the original presentation because:
  1. it would bore them to tears
  2. it contains proprietary information
the guts of it can be shared as they are standard architecture issues that anyone dealing with EDA/ESB will likely encounter. I'm curios what other people have experienced. Hopefully I'll learn something from sharing :) This list is by no means exhaustive, I might add to it as I think of other topics.

So I'll be blogging on these topics the next few weeks. I'll update this post with links as I complete the topics (no particular order).

Here are some of the topics I'll cover:

  1. Persistence
  2. Service Contracts & Event Stream Design
  3. Choose Topics over Queues
  4. Canonical Message Format
  5. Scaffolding
  6. Transactions
  7. Performance
  8. Build/SCM
  9. XML Aware Analysts
  10. Batch vs EDA
  11. Content Based Routing
  12. EDA and Object Oriented Design
  13. Shared Code
  14. Logging
  15. Config Files
  16. Error Queue Administration
  17. Event History
  18. Aggregation

WS-DeepBreathing

Saw this by Dave Podner on Richard Monson-Haefel's blog.

Pretty funny. I'd add a #6, however ...

6. Cut your losses and move on

Saturday, April 22, 2006

Message-Centric vs. Service-Centric

Stumbled across this on Dave Orchard's blog. It talks about leaky abstractions in SOAP. Also discusses how SOAP hides the underlying protocol. This is the main reason that REST or POX/HTTP appeals to me more then SOAP - they say you have to embrace the network and not hide it. SOAP just brings too much unnecessary complexity to the table for my taste. It truly is the doorknob to hell.

When I was working on "Aspira" - the EDA (Event Driven Architecture) bus that never was, we dealt with this by intentionally forbidding any:any transport. You couldn't for instance have an HTTP endpoint talk directly to a Socket endpoint. Every transport was designed to have an adapter. That adapter would deal with the transport in its native form and get the event to the persistent / trusted / consistent messaging layer as fast as possible. In short, it was "Message-Centric" rather then "Service-Centric". Decent write-up on the differences by CapeClear advocating the opposite of what I think.

The nice thing about the "Message-Centric" approach is most of your services end up being vanilla - they use one of the standard service types, which, deal directly with the messaging layer for its inputs and outputs. They get the benefits of persistent messaging, clustering, consistent security, fail over, durable subscriptions, standard error handling (e.g., rollback and pause, route to error queue). Yes, they do hide the network, but that is ok, because network problems have been accounted for. And the services don't care about the payload - so long as it is XML. The result is your service impls are stupid simple. They just do their little bit.

Dave Orchard's post prescribes some proposals for fixing the problems he sees. Sounded good enough for me. I just think the REST or POX/HTTP approach is the better way to go. WSDL/SOAP intended to ease pain, but sadly, I think they make things more complicated then they need to be. I think it is far simpler to just embrace various protocols directly and stay in control. Yeah, at the outset, it might feel a bit more complicated, but in the long run, you save yourself pints of blood.

Anyone care to talk me out of the "Message-Centric" approach?

Friday, April 21, 2006

Be Afraid

This post by Andrew Townley is really well written. Scared the crap out of me.

Anne Thomas Manes says JAX-WS will fix this and everyone will use the attribute approach.

Like Yoda and Andrew say, "Once you start down the dark path, forever will it dominate your destiny".

Update - plea for help
Can someone from the future please send me an email and let me know how WS-* turns out? I am sick of worrying about it. Thanks in advance for your help!

Thursday, April 20, 2006

Sonic Software - Actional in Seattle

I went up to Seattle today to listen to Sonic Software and Actional (Progress/Sonic recently acquired Actional) talk about SOA/Event Driven Architecture best practices and maturity models and such.

I had to get up at the butt-crack of dawn to get there in time. It is a 3 hour drive from Portland to Seattle - the meeting started at 8:30. Not fun. So early that I forgot my belt. Good thing is I look particularly handsome without a belt unlike most people.

I also participated in a forum talking about lessons learned and SOA challenges. It was interesting to hear other points of view, etc.

My company uses SonicMQ as our JMS provider. One of the better ones out there. Coolest thing they do is "shared subscriptions". Basically lets you scale a subscription horizontally. You just use their clever little shared subscription syntax when you register the subscription. Then, multiple connections on N hosts are considered one logical entity and they shared the burden of processing the messages for that entity. Basically, you get the benefit of competing consumers on the Topic. It is just like it is a Queue, but you can have other shared subscriptions and normal subscriptions listening to the same topic at the same time.

I didn't know much about Actional until recently. I had lunch with Sonic's new CTO, Dan Foody and some other Sonic guys. Dan was the CTO at Actional. They have some cool products. Using their products, you can monitor a SOA/EDA deployment end to end. It traces transactions through all your services. Apparently they do this by some byte code magic where they listen at all the appropriate times. They append headers to HTTP/XML requests, JMS, CORBA, RMI, etc. that allows them to trace the logical transaction where it goes on the SOA/EDA. Wicked cool.

We do this by hand currently - as event bounce around, we scribble which hosts/Destinations they were routed on, which services processed them, etc to a "event history" message property. This information is really useful when events end up on error queues etc. Also nice for debugging and bringing new developers up to speed on the system. Self documenting and what not. We just do this for JMS though ... we don't have the same visibility for HTTP based services. Anyway, the Actional stuff seems pretty useful. I want to give it a spin.

Update 17-JUL-06 - Now that it is possible to determine my employer by reviewing my blog posts, I must follow my employer's blog policy.

This is not an endorsement by Liberty Mutual, but my personal view. Vendors are NOT free to quote with attribution.

Wednesday, April 19, 2006

Sad but true

I came across this over the weekend.

Very sad, but so often true.

Monday, April 17, 2006

Just how much REST - 'Web Style' are we talking about here?

##########
Update I posted the gist of this thread to the SOA and REST Yahoo discussion groups. Here are the threads:
Service Oriented Architecture
REST
##########

After about a week of quiet, the panic is back on REST ("Web Style" according to Tim Bray which I think is better) vs WS-*.

Tim Bray takes the gloves off:

The End of SOA

Important, I Think

He is preaching to the converted when it comes to "Web Style" vs WS-* for me. Perhaps the acronym "SOA" will eventually meet its demise, but, we'll just have to come up with something else. Sadly, "Web Style" doesn't cut it for every type of service. We've had some flavor of "SOA" since ARPAnet. We like services. And I'm sure they'll be something after "Web Style".

But I have to wonder, just how much Web Style do we need?

Which types of integration is it appropriate for? I'm specifically talking about internal integration within a company. Web Style all the way for integrating between companies IMHO.

I have my own shiny object -- its messaging (e.g., JMS). While I don't think it should be used for everything, it works very well for providing the "plumbing" for complex integration. Yes, JMS is just a Java thing, but good JMS providers have C, C++, COM, .NET, & HTTP APIs so it is platform independent enough to get the job done. I'd love to see a truly platform independent reliable messaging, but as the REST guys say re:HTTP, JMS is a technology that works and is here today.

I like Web Style as opposed to WS-* for the class of service that should be over the web. I just don't see either being that great for extensive internal integration compared to something like JMS that provides things like clustering, fail over, guaranteed delivery, error queues, and durable subscriptions.

As part of the future state architecture planning I am participating in, we are trying to identify the high level types of integration that are common for us. I think they are common for most large companies. Here they are:

Update 18-APR-06: Added some prose around each integration type.

  1. Sync Request/Reply

    ServiceA.Thread1 sends request and waits for response from ServiceB.ThreadX.

  2. Async Request/Reply

    ServiceX.Thread1 makes request. ServiceX.ThreadX receives response.

  3. Async 1:1 (fire/forget)

    ServiceA.Thread1 publishes message and does not wait. ServiceB.Thread1 receives message. ServiceA.Thread1 may or may not be aware of ServiceB.Thread1 (depends on protocol used).

  4. Async 1:M (pub/sub)

    ServiceA.Thread1 publishes message. ServiceB:ServiceN receive message. ServiceA.Thread 1 unaware of ServiceB:ServiceN.

  5. Async 1:M Request/Reply (pub/sub Request/Reply)

    ServiceA.Thread1 publishes message. ServiceB:ServiceN receive message and send response. ServiceA.Thread1 receives N responses. ServiceA.Thread1 unaware of ServiceB:ServiceN - just know that N responses were returned.

  6. Async M:1 (event stream subscription)

    ServiceB:ServiceN publishes message. ServiceA.Thread1:ThreadN receive messages. ServiceB:ServiceN unaware of ServiceA.Thread1:ThreadN

  7. Batch Feed

    ProcessA sends file to ProcessB

I think Web Style is perfectly fine for a certain class of Request/Reply services. For example looking up existing customer information. It definitely wouldn't be my first choice, however, for something transactional like creating an order. Not saying I'd never do it, just wouldn't be my first choice. My problem with using Web Style at all for heavy integration is the coupling and complexity that starts to add up when you have a lot of services all doing Request/Reply. Each client has to handle errors and retry logic. Certainly not rocket science, but it is more then I want to deal with.

I think that Batch Feeds could also be done via Web Style. They don't tend to be coupled sequentially the way I described Request/Reply above.

Once you get to anything async, however, Web Style quickly falls down for me. Lets start with "Async Request/Reply". An example is requesting a report from a third party that takes processing time (say 1 day). You make the request, and want to get the reply async. With Web Style, people often end up making the request for the report, get an id back and then polling at an interval for the results for the id. You add logic so that you don't poll until you think the report is ready, etc. This is complexity. Again, not rocket science, but lines of code with defects in it to be sure. To be fair, you could have the report service call you back which isn't as bad - again, however, there is non-trivial complexity here when you deal with the error cases. If you are writing one service, it is no big deal - its when there are more then a few where you get a cumulative amount of complexity that starts to get your attention.

For "Async 1:1", you can do that with Web Style like Request/Reply. You do have to contend with error cases, however. And "out of the box" you don't get guaranteed delivery. You can't send the message persistently so that you know that the send being ack'd means something. You can certainly do this by hand - and it isn't that hard, but again, this stuff tends to add up and every time you roll your own guaranteed delivery, there are going to be defects. Not the type of stuff I want to deal with.

Alright, so now we are getting to the harder stuff - "Async 1:M", "Async 1:M Request/Reply" and "Async M:1". This is where I really struggle with Web Style. I've tried to challenge myself to find a way to do this with Web Style, but haven't gotten really far. Admittedly, however, I tired fairly quickly - please let me know if you know of something.

How would you do "Async 1:M"? This is classic pub/sub. A client sends 1 message and N subscribers listen to the message. The client has no idea who is listening. I don't know of any way to do this w/o writing a ton of code.

How about "Async 1:M Request/Reply"? This is when you send one message and get N responses. This is pub/sub request reply.

Last one on my list is "Async M:1". This is 1 service listening to N destinations / Topics for events.

Don't get me wrong. I'm all for Web Style. The sooner SOAP and WS-* are deprecated, the better, IMHO. I just don't see Web Style as a silver bullet for integration any more then anything else is. Not saying that Tim is saying that - just think that there are lots of cases in integration where Web Style OR WS-* isn't the best choice.

Sunday, April 16, 2006

SCA - Pretty bold

I heard about SCA when it came out. I was on a bit of an anti standard tear then and I was tuned out a bit in general. I've started looking at it more closely because I am looking at IBM ESB 7.0 a little bit and I am addicted to the Internet and I read a lot.

David Chappell has a good post on it.

Anyway, SCA is way better then JBI. For starters, it is platform independent and IBM and BEA are both behind it. Man it must suck to be Sun these days. I agree with David that SCA is not w/o politics. Basically, IBM and BEA are coming to wrestle the JCP process out of their hands and start to take control of the future. Makes sense in a lot of ways. Anyway, I don't really care about that crap.

SCA is like M$ WCF/Indigo but for Java. Good stuff, now we are making some progress. There is already an Apache impl underway (Apache Tuscany).

So I looked at the SCA Whitepaper. Didn't see anything that made me writhe on the floor like when I looked at JBI the first time.

Also, happily WSDL/SOAP doesn't appear to be mandated on the wire. Yippee! I'd also agree with David, that this doesn't look real good for J2EE / app centric model. Thank goodness, there is a way out. There is a REST binding as well, but I doubt it will meet the requirements of purists. Really more of a POX binding. Good enough for me. I wouldn't be surprised if this is how WSDL/SOAP eventually gets deprecated. Everyone will just not use WSDL/SOAP when there is the REST/POX binding and they'll slow boil it to death.

The spec is really young, but looks promising. They had the word "pub/sub" in there so I'm happy. Not included in this version of the spec though. Perhaps I'll have to try to contribute.

Probably a couple years before it starts becoming mainstream, but color me hopeful. Wow, maybe I'll have to stop being so full of panic now. I doubt that will happen.

Update (after Easter naivete wore off): Marc Fleury at Jboss no likey SCA. If JBI didn't suck, IMHO, he'd have a better argument. But it does - Java specific SOA doesn't really get you far - at least in my world. Jeez, how did I miss this whole thing last Winter? I was tuned out, but not that tuned out.

Dangit, I probably am going to end up recommending the same thing .. really sad.

Other applicable links:

Macehiter Ward-Dutton

James Strachan

James Governor

Saturday, April 15, 2006

Future state

I am on a small distributed team planning the future state integration architecture for a $6B business. It isn't as glamorous as it might sound (replace glamorous with painful depending on your outlook). It is just a small part of my job. Most of time is spent dealing with present day integration and maintenance of past integrations. Plus, we all know how integration architecture standards go -- they are rarely followed when project deadlines loom, etc. But, I have learned a lot about integrating systems during my career and I have strong opinions so I'm going to give it my best effort. I don't think it will go 100% the way I think it should (nor should it), but I hope that I can at least save some poor sap some agony by keeping things as simple as possible, but not simpler.

Should be interesting to see where this goes.

I'm trying to remain objective and not just be knee jerk anti WS-* and anti anything EJB (i.e., ESBs built upon app servers). It is hard to a) keep my biases in check (established from in the trenches angst) b) compete with the industry echo chamber that says WS-* is good and you need an app server for every Java process.

I wonder if I'll end up recommending the same thing I said the last time I participated in planning a SOA integration strategy in 2002. How depressing would that be? There better be better options 4 years later.

The previous company I did this for couldn't be more different then the current. For starters it was a $700M business and is a high tech company. The integration challenges were just Java vs. .NET & Unix vs. Windows + a bunch of COTS packages (e.g., SAP, Siebel, Onyx). The current company has every platform ever invented - just like most major financial services companies. We have Unix, Linux, AS/400 (iSeries), MVS (z/OS), VSE (z/VSE - been around for 40 years), just to name the ones you might have heard of.

I don't know where this will go. Perhaps we'll just punt. It would be easy enough to lay something out that a) looked respectable, b) would sit on the file server unaccessed

I do know one thing - right now is a very bad time to place bets on future state. Sadly, there is a lot of uncertainty right now IMHO in integration land. I would have thought things would have come further in 4 years, but here we are. We have WS-* & SOAP, Service Centric ESBs, Message Centric ESBs, App Server Centric Don't-Write-Us-Off ESBs, EAI Vendor Centric Repackaged as Not-Sucking-Anymore ESBs, WS-* vs. REST, JBI (quickest spec to die in history? I set the over under at 1 year 6 months ago and still am comfortable as the house), SCA (Service Component Architecture), and vendors drafting standard after standard and having to sing Kumbaya before the whole house of cards falls down (dramatic?).

SOA is here to stay, that is for sure, but SOA is just an integration style. We've had that style since before I was born (don't tell my boss), but it is finally mainstream. You'd think there would be some consensus in 2006 around web service impl of SOA, but, the battle lines are just being drawn it seems. There is more uncertainty now, then ever IMHO.

So anyway, I'll likely be posting about this future state bit-o-panic a bit the next few months.

Oh yeah, so what was my recommendation last time? I suggested that we avoid SOAP based web services and find the best JMS provider (Microsoft still doesn't really do messaging in 2006) we could that would a) scale b) cluster c) fail over d) notify/monitor e) secure clients, destinations simply, support Java, C, C++, COM, .NET, & HTTP clients, maintain a simple Canonical Message Format (de-crufted OAGIS at the time), and build a lightweight ESB layer on top of it and then write services against the ESB layer's api.

WS-DeathStar

I want a T-shirt dangit.

WS-DeathStar

Tuesday, April 11, 2006

Crappy Release - events to the rescue

We had a monthly release of an auto insurance system I spend a lot of my time working on this past weekend.

It didn't go very well. The week leading up to the official code freeze uncovered some critical defects. Then we found some post code freeze. The team pulled it together and got the system in good enough shape to release.

The release was deployed to the production environment without incident on Saturday. I multitasked showing my aunt and uncle who were in town from Sun Valley, ID Multnomah Falls and occassionally glanced at the CrackBerry. We got through Monday with only one problem and it was "just" the monitoring software. So we'll fly blind a few days not knowing what Mercury Biz Activity Monitor says our performance is in Atlanta vs. California etc. Big deal right? My boss says we missed a Choice Point outage so actually it is somewhat of a big deal, but anyway ...

But today, the wheels fell off. 2 major problems were discovered.

This is going to be a total pain the rest of the week to resolve, but, the good thing is, the users of the system have no idea this is going on. The only reason they don't is because the system uses an event driven architecture. Yeah ok, they would have had the same result if the features in question were async, but we get bailed out by this a lot because most of what we do is async - event driven.

Typically, in this system, recoverable errors are rolled back to the ESB/messaging layer where they are paused and then resent. Non-recoverable errors are simply routed to an appropriate error queue. We have an application that manages the error queues. Pretty standard stuff - browse all queues, browse specific queue, browse message, delete, re-send, edit-resend. This type of error handling gets it done quite nicely for 99.99% of our errors. But today the critical defect was causing threads to hang and events to hang up in 2 of the services. Hung threads don't rollback and don't route the poisoned message to an error queue.

So .. what to do besides panic?

Luckily, as I stated above, events and more specifically guaranteed delivery & durable subscriptions came to the rescue anyway. From examining the thread dumps of one of the services, it looked like the service just woke up on the wrong side of the bed. So we restarted it. Whatever pissed it off before had passed ... it happily drained its durable subscription - as its threads had been hung, it had not ackd any of its events - they all were automatically resent to the service. Now what if this wasn't event driven? What if it used web services (i.e., JAX-RPC or JAX-WS) ... you would be hosed ... yes that is exactly right. Or, your poor developers would have to account for this stuff in each of your service impls. Good luck getting that right.

<sarcasm text="Remember, web services make interop / integration easier. You don't need all the complexity of any integration products ... just roll it all your self. Interop is easy. You just need web services. The tool support is great."/>

Sadly, we did not get so lucky on our second problem. This one is going to leave a mark for some people for most of the week I imagine. Looks like we have to drain a durable subscription and purge out some poisoned messages and then redrive the good ones. And likely some other misery ... but without an event driven architecture and more specifically the goodness that guaranteed delivery and durable subscriptions provide, we'd be totally f'd.

Sunday, April 02, 2006

SOAP is the backbone - REST/POX is the eject button

Read a good post by Clemens Vasters REST/POX with WCF: Version 2, Part 1: Foreword via Stefan Tilkov.

Here are my thoughts on it.

Nice write up. I really actually miss the old interop problems pre WS-*. I used to curse DCOM/Java interop (it really was horrible), but it is no worse then what we have today with WS-* IMHO. I understand how we got here … I think it is understandable. It is just really unfortunate. Perhaps if the XML Schema writers had focused on the 80/20 rule, & the WSDL folks toned it down a bit (oh how I miss the simplicity of CORBA IDL), we wouldn’t be in the complexity hell that we are in today, but maybe interop is just this hard regardless. I for one have to say that I’m pretty disappointed where things are today, and don’t see a bright future for the course we are on. I just don’t buy the part you wrote about “…SOAP for building the backbone of it” in terms of interop. Yes in theory it should be, but I think the complexity of it all, the industry politics, and the endless specification churn point a pretty bleak future for SOAP and WS-*. I mean it is April 2006 – shouldn’t things be further along then they are now?

I think that sadly, WS-* is fatally flawed. Perhaps I am just jaded from trying to actually use it for interop or have a mental block when I stare at SOAP messages with oodles of headers. I know that HTTP Headers and SOAP Headers are at the end of the day essentially the same thing, but to me, it is just fundamentally wrong to have the headers in the XML document. To me, the XML document is the application developer’s domain – sticking that stuff there just invites panic.

Also, perhaps I am not informed, but are there any success stories out there on interop and WS-ReliableMessaging, AtomicTransactions, etc.? I’ve found it hard enough to make complex documents interoperate between .NET and Java using vanilla WSDL.

The stuff you guys are doing to make Indigo POX friendly is laudable. At least developers will have an eject button once they give up on building their “backbone”.