Industrie Toulouse: Component Relational Mapping?

Those who follow this weblog for stories from the software development side (instead of the gut-level screams of anguish on the political side) know that Object-Relational systems have been a minor fixation of mine. A few weeks ago, I wrote up my own evaluation criteria covering what I was looking for. I wrote this up because there are a lot of O-R solutions available for Python, but very few that came close to fitting my situation. This is fine - there are different designs for different needs. But now, it's a few weeks later and I'm nipple deep in a project where we're trying out some new things in this area. This post will cover the design decisions and where the current successes (and potential issues) lie.

I've cut back on my desires for pure object-relational mapping. One of the reasons for this is that making the leap to object-relational, particularly in regards to existing applications, is a substantive one. It's also a big leap if you are using an application and persistence framework that puts its own expectations on the behavior of objects. Also, there may well be very good reasons for wanting to use a relational database, and the use of an O-R mapping layer may impede some of the RDBMS advantages.

Knowing the issues that I faced with wanting to go to a pure O-R system, I decided to reevaluate the situation. It came down to this:

Zope's SQL Methods are great... for reads. They enable the complex and adaptive queries that I often need in an application.
Maintaining insert and update statements, particularly during development, sucks. Lately I've often had the pattern of wrapping a Python script/method around a Zope SQL Method. The Python method doing a tiny bit of extra data preparation, but usually would just pass data through. On INSERT statements, the Python method would add the extra step of calling out to a UID generator service to create primary keys. Often, adding just one new column to the database would require touching at least four different scripts just to get the data into the database.
Our data entry ('admin') screens have often been weak. Much of the UI work (forms, validation, etc) has often been focused on the public side of an application. We needed those same form and data validation capabilities to be in the admin screens. Because, like the previous bullet, manually entering new HTML fields to respond to a new column just sucks.

With these bullets in mind, I then started to evaluate potential architectures. And I don't quite know how it all happened, but during my regular HBO sunday night viewing on May 4, this new design just hit me. I believe most of the influences can be traced back to the designs and design patterns behind Zope 3 and Ape, along with some Martin Fowler writings (which are strong influences behind Ape as well). A simple overview of the architecture follows:

Simple Architectural Overview

So first, we have the AdminView. This is a Zope 3 "view component" inspired object which contains one or more Page Templates and methods to respond to input from those pages. The "Form" is built dynamically (typically at instantiation time, but in development mode it's rebuilt constantly) from a series of Field Definitions. This is similar to Zope 3's use of Schema's which not only help define a data/attribute interface but are used to generate management screens. For most purposes, adding a new column that doesn't require any extra domain logic now involves adding in a single field definition to these forms.

When the data from the form is submitted, Formulator's (the engine being used) validation services kick in. These help weed out potential errors from bad data, and also ensure that all incoming data that needs to go to the database is in the correct format (ie - a float is a float is a float, a date is a date is a date, etc). Further post-Formulator validation may also happen to respond to multi-field situations. Then we ask for a gateway.

The AdminView is usually intimately involved with its form(s), but only loosely involved with the gateway. Gateways are gotten out of registries, with the hope that gateways to other storage solutions can be inserted for the same schema. By my understanding, this is similar to how APE works. It's also similar to the Table Data Gateway, Row Data Gateway, and general Gateway patterns from Fowler's Patterns of Enterprise Applications Architecture book. [note: I say similar because I myself don't own a copy of this book...yet]. So, a registry is asked for a gateway, usually in the form of: agate = core.getGatewayFor(self, 'SomeGateway'). The other thing this does is return the gateway wrapped in the context of the calling object. This enables use of one of Zope's most powerful features, Acquisition, allowing the gateway to access elements in the system near the caller. Typically this is to allow access to an already established database connection object, but it could allow access to other service objects as well.

Each gateway determines their own destiny. By that I mean, each gateway is different. Some only save data, others can save and read and clean data. Typically on a save operation a gateway will determine as to whether this is a new object to be inserted into the database, or whether it's going to be an update (yay!). Then it will clean out the data passed into the save to ensure that the datatypes are formatted properly for the database (ie - sql-quoting strings, formatting DateTime types, wrapping certain values in database functions like password(). When that's all prepared, a single method handle_op(...) is called with the operation to perform, the table name, the id column name, and the data to use for the operation (typically a dictionary of values to insert or update). An optional "wherespec" query may be passed in as well. handle_op then dispatches to an appropriate SQL method to dynamically generate the SQL statement required. The number one benefit of this - I don't have to update INSERT and UPDATE statements to deal with different column combinations! Another benefit is that certain gateways will be reusable in other situations across the application. My hope is that everywhere data manipulation needs to happen, these gateways will be able to do the job. This then leaves SQL methods with the primarily responsibility of querying and accessing data, which they're very good at.

This is, of course, a simplified view of the overall architecture. But I must say that so far, it's been successful. It's responded well to the database schema changes that have come up already during development - in many cases dropping the column from the database and removing the mapped Field Definition has been enough. We have a system that's well componentized already and under CVS control that seemed impossible to do a year ago.

Something else I have learned (or at least come to accept) in recent months is that the data/model/business object layer of an application needs very little intelligence. It's all the components that work on that model that can get complex. It was in trying to figure out how to combine the complex business logic AND data into a single object in an O-R system that worked with Zope that I kept running into walls. By moving that business logic that needs to manipulate data out into separate components, it all becomes much easier. A shoe is a shoe is a shoe. It might now how to update its own price, but it shouldn't know the complex pricing rules that may exist governing it. Another subsystem or component can handle those rules, and then get around to saying "hey shoe, your price is now $39.99". That is - if the shoe needs to know its price at all. wink. It's all so obvious, and it's all been in my head for years. But only recently has the logjam seemed to clear enough to actually get something done.