Industrie Toulouse: Evaluating Object-Relational Migration

Now with two well designed frameworks that abstract persistence AND work with Zope (Ape 0.6, Modeling 0.9pre4), I can start evaluating their usefulness for migrating existing applications. Two projects that I am working on could benefit from moving to such a system. In this post, I mean to just propose some of the problems/issues/questions I am looking at when evaluating such a system. It's important to keep in mind that this is to apply to existing applications AND data. For a new application with little or no existing data, the evaluation would be a bit easier. With that in mind, let's begin.

1. Does it work with Zope?
This is an important one for me. It's not just that my existing applications are written in Zope, but it's that Zope as a platform offers me many things that I take for granted, not the least of which is the built in transaction management that I've been using since early 1997 when I first used Bobo. It's important that an O-R solution be able to tie in to Zope's transaction management, threads, connections, etc.

2. Does it keep the SQL out of the App?
EOF (Enterprise Objects Framework, part of WebObjects), Modeling (which bases its design on EOF), and Ape all abstract the storage away from the business objects. Other Python frameworks like SQLObject and PyDO bring some SQL to the object by injecting words like 'table' and 'column' (and sometimes things like 'varchar') into a class definition. This can often be quick and easy to work with, but does not fit the design that I (personally) prefer. In the middle (pun not initially intended) is WebWare's MiddleKit. This evaluation point is important to me because, in programming the application, I don't want to think about how the data is stored. I care about that in a different way. I want the business objects to be business objects. It's a testament to APE that it can be put into Zope, electing to store either as normal-looking files on the file system OR as rows in MySQL or Postgres, without modifying ANY existing Zope code, even though how a Python Script, an Image, a Page Template, or anything else it has mappers for, are completely different. This is actually a testament to the ZODB system which, like MySQL, was engineered to have different storage solutions. Ape just gets in there at an incredible level. It also leads to another (smaller) wish - no generated Python code.

Note that wanting this requirement has a significant impact on existing code.

3. Querying/Fetching
This is where Ape is weaker, but considering that the ZODB has no built in query language it's a bit understandable. This is also where design of an adaptable persistence layer shows itself. Apple's Enterprise Objects Framework, since it's meant to store against any back end (mapping to relational tables is just the most common - but storage systems exist for LDAP directories and flat files as well), and that back end should be able to be switched with no impact on an applications design, excels at this. Modeling looks like it's trying to follow suit. SQL is very expressive in its query model, but it allows for read queries that don't necessarily map to objects that are easily writable (joins, joins, joins, and more joins contribute to this problem). This is where it's especially difficult to migrate a direct-SQL application to an O-R mapped one - clients (page templates, scripts, other code) and other system components are expecting a certain data layout.

It's also preferable that a query that may span multiple tables not return a single object if that data is expected to be modified in any way in the present transaction. Thus, when fetching Receipts, which reference a Purchaser table and wanting to get Receipts where Purchaser's last name is 'Shell', you should be able to perform such an operation, but get back just the root Receipt object(s). (Naturally, only the ones that match). I believe EOF/Modeling allow for this to yield the results I'm expecting (but can't explain right now). Because - naturally - it should do this in the least amount of SQL calls possible. And this all leads to the next point:

4. References
I don't just mean relationships between tables, but keeping the right relationships between objects when they're loaded into memory. You want to ensure that if Receipt A and Receipt B both point to Purchaser Bob in the database, that the following results happen (in the same transaction, without extraneous database reads):

    in: A.purchaser.name
    out: 'Bob'
   
    in: B.purchaser.name
    out: 'Bob'

    in: A.purchaser.name = 'Robert'
    
    in: B.purchaser.name
    out: 'Robert'

EOF and Modeling deal with this using EditingContexts (see "Ensuring unicity of an object"). SQLObject also offers this.

However, Zope/ZODB + Ape is a little funnier. The ZODB really seems to prefer having only one persistent reference to an object. Relationships between objects, thus, has always felt like a strong point in relational databases, while a rather weak point in object databases. This has been a problem that's plagued some Zope 2 developers (myself included). Even in Zope 3, complex relationships are handled by application level relationship services. I wonder if such relationships can be (or are) handled via weakref's. I think that Ape, even in its Zope 2 implementation, might allow designing gateways that can "do the right thing". This wouldn't be a problem (as much) under a different application design, but in my apps, sometimes a Receipt is the root object that I might be working from, sometimes it might be a Person (referring to receipts, which might refer back to that person). There are few root objects, but a lot of relations - even between the potential root objects. So being able to fetch the right ones, not pull any duplicates (different database reads of the same row of the same table), and with relative efficiency are important.

Now, in the favor of one of the applications that I have to deal with, a lot of the really complex reads are already abstracted out into special Executable Components (they were previously Python Scripts, but they've been abstracted into classes with configurable run-time behavior). These executables basically are mapped one-to-one with a particular page template to preload a lot of the data needed to keep the amount of logic (and number of queries) on the target template down to a bare minimum. These executables may be relatively easy to convert to special Gateway objects for Ape, or they may be partly moved to a Gateway object and partly kept as they are - only modified to talk to the persistence system directly instead of just firing off SQL directly.

Wrapup...for now
The Jazz are losing, the beer's settling in, it's time to wrap this up...for now. I'm really looking forward to being able to use Ape. Actually, we're already looking at using it to be able to copy code out to CVS thanks to Ape's file system storage (which nicely represents objects as naturally as possible, while still finding ways of staring properties and extra persistent data), but the real clincher for me would be to get a lot of the redundant-feeling create/read/update/delete SQL (and scripts that may show up marginally in front of it) out of the application and into a mapping layer, so that the application can focus on doing its job better with less worries about the data, that'd be great. Since Ape is pretty open about how you can write Schema objects, hopefully that means we can bring Formulator into the equation and be able to automatically generate management and administrative screens and do all that cozy little validation that Formulator provides, similar to what Zope 3's Schema/Form system gives. But it is going to be a fair amount of work to move existing straight-SQL code over.