Development Blog

 Friday, June 01, 2007
« Example MonoRail Solution | Main | Adaptive Fetching Strategies - more ques... »

OK, let me start out by saying that I'm hardly an expert on the subject of ORMs. I've only really used NHibernate, and I've still got plenty to learn about it. That said, I do know a bit about being lazy...

When working in any real application you quickly realize that actually exercising lazy loading is something that should probably be avoided at all costs. You want to fetch everything you need to handle a request in one go. Scott Bellware recently struggled with this issue as we all have, and the general consensus in the comments is that you should load what you need and not lazy load anything (unless of course that lazy load happens in a branch less often traveled). Udi Dahan provided a link to a rather elegant solution to the problem that I think I'm definitely going to explore.

Even with Udi's solution, you're still required to keep the fetching strategy in sync with the domain. This isn't an incredibly difficult thing to do, but it requires discipline and effort. I've been thinking for a while about this and a few other possible performance tweaks that can make ORMs easier to use and even a bit quicker than they are today.

Let's talk about another performance issue briefly. Say you want to Display an email list containing all of your users. The list should display just their username and email address, nothing more. The easy way to do fetch the data would be to do a "from UserEntity". But is that the most efficient? What if UserEntity has 10 fields? 20? 30? Is it worth explicitly querying for "u.Username u.Email from UserEntity u"? Let's find out. Being lazy, I simply borrowed James Avery's Northwind NHibernate example, ported it to NHibernate 1.2 and added a new mapping and class called TrimmedOrder. The class is exactly the same, but the mapping only contains the Freight and ShipName properties. I also commented out the many-to-ones in both Order and TrimmedOrder. You can get the source here.

The rest was just a few simple tests. One for reading, and one for writing. Here are the results:

  Order TrimmedOrder Performance Gain
Read Test 18.62 ms 11.66 ms 1.6 times faster
Write Test 227.13 ms 161.24 ms 1.41 times faster

Notes: The Read Test was averaged over 50 iterations and the Write test was averaged over 20 iterations. The db was queried for all orders once before anything was tested because I found the first query was always slower, but after that all queries seemed level. In the writing test, I wrote random values to the two properties mapped by TrimmedOrder. The Write Test values were adjusted by subtracting the read test values so that only the write itself was timed (in theory). 

Now, these gains aren't anything to write home about, but they are significant. Some of you may wonder why the Write Test sees any gain given that the database calls would be exactly the same, as NHibernate doesn't update columns that didn't change. Well, without profiling, I'm willing to bet that the reason is simply that NHibernate has discover which fields changed while flushing by testing them against their old values.

So now we have two problems: we have to know exactly what collections the domain needs ahead of time or we lazy load and unless we use projection queries, we're not getting the best performance we can get. Also, NHibernate doesn't currently support deferred (lazy) property loading, so projection queries return object[]'s rather than partially initialized entities, which means that as far as I know, you cannot use projection queries to do updates, so we just can't solve the Write problem without additional mappings or something else hackish.

What do we do? Yes, I'm finally nearing the point of this post. What if we taught the ORM to learn? Bear with me here. Let's go back to the email list example. If we were using Udi's Fetching Strategy approach, we'd do something like this:

IList<UserEntity> users = Repository<IEmailListService>.FindAll();

Which would ultimately get a FetchingStrategy associated with this service. Now what if every object fetched by this FetchingStrategy was proxied and would communicate back to the FetchingStrategy about its usages... and the FetchingStrategy remembered how its objects were used, so the next time it got invoked it would only return what was used last time... and it would keep learning, and refining so that it would simply return all the fields and collections that were ever accessed via that particular FetchingStrategy. If ever it returned too little, it would just lazy load what needed to be loaded and make note that it should provide that field next time (of course this would be configurable so that it would just load everything on a miss so you're not hitting the db several times in these cases). This would mean the first time your FetchingStrategy was invoked it would be "slower" but most every time after that it will have adapted and improved... all for mostly free. And yes, I know that Udi's FetchingStrategy would just return an hql query and live closer to the domain objects, but mine would live closer to NHibernate and it would probably be responsible for actually querying from NHibernate (so it's probably really a Dao, but you get my point.)

There are probably a few other things this could help with too, such as marking queries as readonly automatically so as to avoid an unnecessary and costly flush (unfortunately this is not currently supported any ways).

Caveats? Plenty I'm sure. Here's some I can think of:

  • There's the fact that it's not implemented. Jacob and I started to work on it and quickly decided it would be best to get lazy loaded properties hashed out first. It's probably a decent ways off the NHibernate radar, but maybe we can change that.
  • Then there's the proxy element. The Entities are often proxied normally to lazy load collections and such, so I don't think adding in reporting back to the FetchingStrategy would be a huge burden.
  • Complexity... yeah, this is complicated, but so are ORMs and software :)
  • Increased startup cost (kind of). You could always persist the strategies to help alleviate this a little...
  • Instance variables, access strategies, etc. Much like lazy loaded collections, restrictions would apply to lazy loaded properties and such. You'd probably need to just always load things that are field accessed, and you'd certainly have to avoid accessing the instance variables that back deferred properties.
  • Objects fetched by multiple strategies. This is a big one. There will be tons of questions when it comes time to solve this problem. If I access property A after an object is fetched from two different strategies, do I notify both strategies? Will data be fetched twice if both strategies need it?
  • Caching complications. Most of these would come from the lazy loaded properties, so that's something that'd have to be tackled eventually.

Now I'm just waiting for Gavin King or Sergey Koshcheyev to come tell me why this is a horrible idea, but until then, what do you all think?