Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.
© Copyright 2013, Eleutian Technology
E-mail
OK, let me start out by saying that I'm hardly an expert on the subject of ORMs. I've only really used NHibernate, and I've still got plenty to learn about it. That said, I do know a bit about being lazy...
When working in any real application you quickly realize that actually exercising lazy loading is something that should probably be avoided at all costs. You want to fetch everything you need to handle a request in one go. Scott Bellware recently struggled with this issue as we all have, and the general consensus in the comments is that you should load what you need and not lazy load anything (unless of course that lazy load happens in a branch less often traveled). Udi Dahan provided a link to a rather elegant solution to the problem that I think I'm definitely going to explore.
Even with Udi's solution, you're still required to keep the fetching strategy in sync with the domain. This isn't an incredibly difficult thing to do, but it requires discipline and effort. I've been thinking for a while about this and a few other possible performance tweaks that can make ORMs easier to use and even a bit quicker than they are today.
Let's talk about another performance issue briefly. Say you want to Display an email list containing all of your users. The list should display just their username and email address, nothing more. The easy way to do fetch the data would be to do a "from UserEntity". But is that the most efficient? What if UserEntity has 10 fields? 20? 30? Is it worth explicitly querying for "u.Username u.Email from UserEntity u"? Let's find out. Being lazy, I simply borrowed James Avery's Northwind NHibernate example, ported it to NHibernate 1.2 and added a new mapping and class called TrimmedOrder. The class is exactly the same, but the mapping only contains the Freight and ShipName properties. I also commented out the many-to-ones in both Order and TrimmedOrder. You can get the source here.
The rest was just a few simple tests. One for reading, and one for writing. Here are the results:
Notes: The Read Test was averaged over 50 iterations and the Write test was averaged over 20 iterations. The db was queried for all orders once before anything was tested because I found the first query was always slower, but after that all queries seemed level. In the writing test, I wrote random values to the two properties mapped by TrimmedOrder. The Write Test values were adjusted by subtracting the read test values so that only the write itself was timed (in theory).
Now, these gains aren't anything to write home about, but they are significant. Some of you may wonder why the Write Test sees any gain given that the database calls would be exactly the same, as NHibernate doesn't update columns that didn't change. Well, without profiling, I'm willing to bet that the reason is simply that NHibernate has discover which fields changed while flushing by testing them against their old values.
So now we have two problems: we have to know exactly what collections the domain needs ahead of time or we lazy load and unless we use projection queries, we're not getting the best performance we can get. Also, NHibernate doesn't currently support deferred (lazy) property loading, so projection queries return object[]'s rather than partially initialized entities, which means that as far as I know, you cannot use projection queries to do updates, so we just can't solve the Write problem without additional mappings or something else hackish.
What do we do? Yes, I'm finally nearing the point of this post. What if we taught the ORM to learn? Bear with me here. Let's go back to the email list example. If we were using Udi's Fetching Strategy approach, we'd do something like this:
IList<UserEntity> users = Repository<IEmailListService>.FindAll();
Which would ultimately get a FetchingStrategy associated with this service. Now what if every object fetched by this FetchingStrategy was proxied and would communicate back to the FetchingStrategy about its usages... and the FetchingStrategy remembered how its objects were used, so the next time it got invoked it would only return what was used last time... and it would keep learning, and refining so that it would simply return all the fields and collections that were ever accessed via that particular FetchingStrategy. If ever it returned too little, it would just lazy load what needed to be loaded and make note that it should provide that field next time (of course this would be configurable so that it would just load everything on a miss so you're not hitting the db several times in these cases). This would mean the first time your FetchingStrategy was invoked it would be "slower" but most every time after that it will have adapted and improved... all for mostly free. And yes, I know that Udi's FetchingStrategy would just return an hql query and live closer to the domain objects, but mine would live closer to NHibernate and it would probably be responsible for actually querying from NHibernate (so it's probably really a Dao, but you get my point.)
There are probably a few other things this could help with too, such as marking queries as readonly automatically so as to avoid an unnecessary and costly flush (unfortunately this is not currently supported any ways).
Caveats? Plenty I'm sure. Here's some I can think of:
Now I'm just waiting for Gavin King or Sergey Koshcheyev to come tell me why this is a horrible idea, but until then, what do you all think?