Development Blog

 Saturday, August 25, 2007

sqldavidson First I want to admit something somewhat embarrassing: I don't know enough about databases. Until coming here, I've always had a DBA that could handle that. I've started to realize now how important it is for all developers to have a much better understanding about databases than I do. So I picked up this book. I was really looking to better understand optimization, especially knowing when to create indexes and optimizing queries. Though the book only had a few chapters on the subject, I felt like it greatly increased my understanding on the subject.

A few months ago I wrote about Adaptive Fetching Strategies. At that time I had little knowledge about covering indexes and how they improve performance. Oh well, it's just more reason something like that should be implemented. Before we even get to that however, NHibernate needs to have a few features added to it in order to be useful in scenarios where performance is important.

Remember folks, SELECT * is bad. NHibernate is for effectively always doing a SELECT * when you're asking for an object when it comes to index coverage. Yes, it specifies columns so it doesn't have all the problems associated with SELECT *, but it is still decidedly less performant than querying for only what you want. Compound this with the extra overhead the unneeded columns add during a flush, and you've got a pretty compelling argument to only query for what you need.

OK, so you've decided that you only want to query for what you need and you're using NHibernate. Well, you can do that... kind of. Not much differently than you can if you just used ADO.NET and DataSets though. NHibernate doesn't support lazy field initialization. This means that if you query for only username and email address from your user table, you don't get back a User. You get back a List of object[]'s. Arguably a list of object[]'s is less functional than a DataSet. Combine this with the fact that you're querying using HQL, which has a subset of functionality, predictability, and therefore optimizability of real SQL, and you start to see a big hole in whole ORM thing... if you want to optimize your pages. Yes, HQL does make writing queries more pretty and more in the domain, but it would be nice if it supported things like "on".

Obviously optimizing too early is evil, but we've had several pages now where we've needed to optimize, and just resorted to querying for a table of values. No longer are we querying for objects, that's just not performant enough. Especially in display scenarios where you're not making changes to anything as NHibernate lacks the ability to do read-only queries, so you end up with that (very) expensive flush unless you change your flushing strategy to be manual (awkward) or you detach your objects from the session (also awkward).

So in short, I feel NHibernate (and any ORM for that matter) needs the following features to really be optimization friendly:

  • Lazy field initialization
  • Querying for partial objects: select u(Username, Email) from User u
  • Read-only queries that do not get flushed.
  • Join qualifiers (on in T-SQL)

And yeah, I know it's open source and I could just do it myself, but I have nightmares about that codebase, and I hardly have the time to implement such large features. All I have the time to do is complain and wish :)

by Aaron on Saturday, August 25, 2007 11:28:29 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Saturday, August 18, 2007

There's a good back and forth going on about something I've been thinking about for quite a while. Jacob Proffitt is basically claiming that DI's primary benefit is mockability in unit tests and that we should all but ditch it in favor of using TypeMock to mock our tests. Meanwhile, Ayende and Nate Kohari have been defending DI with reasons like "getting low coupling between our objects" (Ayende) and "simply put, dependency injection makes your code easier to change" (Nate).

Well, I think I'm just going to have to agree with both of them... to an extent. I agree that DI promotes loose coupling, but I disagree that TypeMock is too powerful (even though I said just that in our podcast with Hanselman, I'm allowed to change my mind). I think that DI has its place, and is not a Silver Bullet. There have been a number of times when I was refactoring a class, pulling out a method object, or moving something to a a sub service (in my attempts to adhere to the Single Responsibility Principle) that I've questioned whether or not that new class warrants all of the following:

  1. Having an interface.
  2. Being added to the container and being injected.
  3. Writing new tests for just that, even though tests were already passing and will continue to pass for its parent object with the same coverage %.
  4. Being mocked out of the original tests.

The problem with doing #4 with a framework like Rhino Mocks is that it requires you to do #3 (obviously) as well as #1 & #2 . You can't mock unless your methods are virtual or you're interfaced and that mock is injected. With TypeMock I can do #3 and #4 without worrying about adding my new class to the container. So why not add it to the container? Well, because a great deal of the time You Aren't Gonna Need It. I think once one other class takes a dependency on that service you have enough justification to control its creation in one place by adding it to the container. However, a great deal of the time, these one off objects you create to increase readability and maintainability in your code just end up either changing the way you test your objects in that you're testing more than one class at a time, or you end up with an interface and constructor argument explosion solely so that you can mock them in tests. In situations like that, Jacob is right--the only real benefit is mockability. Maybe we should consider using something like TypeMock in these situations? Just because something is powerful doesn't mean its evil. We just need to exercise more caution and use it only in times its warranted.

That said, I've never actually used TypeMock and this is all purely conjecture based on what I've read about it. I just think that we can learn something from the Ruby guys and from people with the viewpoints of Eli and Jacob, as long as everyone realizes that neither sides of the camp is producing Silver Bullets.

by Aaron on Saturday, August 18, 2007 12:11:35 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Sunday, August 05, 2007

Jacob posted about the AutoMockingContainer several months ago. At that time we didn't really use it, it was just kind of an implementation of an idea. Well, we've finally started using it in some side projects (Resharper.TestDrive for example), and I must say... wow. It is most definitely the way to instantiate your subject under test most of the time. Why?

  1. It decouples your tests from your constructors. This means that if you have multiple TestFixtures for a class and you want to add a new service to your constructor, you don't have to change a thing in your tests.
  2. It simplifies your tests. Things are just cleaner when you're not having to create all your mock services to inject into your subject under test.
  3. It helps reinforce good mock usage. The default mock strategy is dynamic mocks. You can override that if you want to, but most tests should (in my opinion) be written with dynamic mocks. Like Dave talks about you only really want to set actual expectations on zero or one mock at a time. Everything else should be more stub-like.

I've started to use a base class for all my tests. Let's take a look at ReSharper.TestDrive's test base class:

  public abstract class AutoMockingTests 
  {
    private MockRepository _mocks;
    private AutoMockingContainer _container;

    protected MockRepository Mocks
    {
      get { return _mocks; }
    }

    protected AutoMockingContainer Container
    {
      get { return _container; }
    }

    [SetUp]
    public void BaseSetup()
    {
      _mocks = new MockRepository();
      _container = new AutoMockingContainer(_mocks);
      _container.Initialize();
      Setup();
    }

    public abstract void Setup();

    public T Create<T>()
    {
      return _container.Create<T>();
    }

    public T Mock<T>() where T : class
    {
      return _container.Get<T>();
    }

    public void Provide<TService, TImplementation>()
    {
      _container.AddComponent(typeof(TImplementation).FullName, typeof(TService), typeof(TImplementation));
    }

    public void Provide<TService>(object instance)
    {
      _container.Kernel.AddComponentInstance(instance.GetType().FullName, typeof(TService), instance);
    }
  }

So what's a test look like with this base class? Let's borrow Dave's example.

  public class SearchPresenterTests : AutoMockingTests
  {
    private SearchPresenter _presenter;
    private SearchResultDTO _fakeResults;

    public override void Setup()
    {
      this._fakeResults = new SearchResultDTO();
      this._presenter = Create<SearchPresenter>();
    }

    [Test]
    public void Can_search_for_customers_by_number_of_orders()
    {
      using (_mocks.Record())
      {
        Expect
         .Call(Mock<ISearchService>().GetCustomersByOrderCount(42))
         .Return(this._fakeResults);
      }

      using (_mocks.Playback())
      {
        _presenter.SearchByOrderCount(42);
      }
    }

    [Test]
    public void Search_results_are_displayed_to_the_user()
    {
      using (_mocks.Record())
      {
        mockView.SearchResults = _fakeResults;
        SetupResult
         .For(Mock<ISearchService>().GetCustomersByOrderCount(42))
         .Return(_fakeResults);
      }

      using (_mocks.Playback())
      {
        presenter.SearchByOrderCount(42);
      }
    }
  }

Not bad eh? You can do some more complicated things too. Let's say all your presenters take a hub service called PresenterServices. Rather than mocking it and its child services and setting up expectations for each of the children you can just use the real one and do this:

      Provide<IPresenterService>(Create<PresenterService>());
      this._presenter = Create<SearchPresenter>();

Now you can refer to all your hub's child services with the Mock<T>() method.

Ok, so if you made it this far you probably want to check it out for yourself. Thanks to Ayende, the AMC it is now part of Rhino.Tools so you can check it out (svn co https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk rhino-tools)  and build it yourself or just grab the current trunk build with all the dependencies here. Hope Oren doesn't mind me building and linking this... ;)

by Aaron on Sunday, August 05, 2007 11:47:22 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [7]  |  Trackback
 Monday, July 30, 2007

Writing ReSharper.TestDrive was kind of an exercise in tiny classes for me. I didn't TDD the whole thing because so much of it was just experimenting with ReSharper's mostly undocumented API and EnvDTE, though I did TDD a good portion of it after my initial spike. After I got a working prototype implemented I spent a lot of time refactoring it into tiny classes that for the most part follow the Single responsibility principle.

As this is the most code I've ever thrown out into the public at any one time and it was a bit of an experiment for me, I wanted to take this chance to ask the community to review my code. If you have the time, feel free to look over the code and tell me what you think. Likes/dislikes/hates/loves anything is fair game--feel free to rip on it. Maybe we'll get some interesting discussion out of it.

Source

by Aaron on Monday, July 30, 2007 9:54:52 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [2]  |  Trackback

Note: You must have ReSharper 3.0.1 in order for this to work.

When you're doing TDD you'll create two classes every time you need one. You'll create one for the class itself and you'll create one for the tests for that class. ReSharper makes it a little bit easier by allowing you to write your tests and then alt+enter on your class under test to create it. Unfortunately it will create the class in your test project and not your project under test. It may actually create it in that same file (I don't remember) which means you have to Move to File. Then you have to drag it over to the project under test and/or change the namespace. Pretty obnoxious for something we have to do so often.

So... I decided to write a ReSharper plugin to do just that. It'll also create tests from a class under test (just in case you cheated and created your class first). Heck, it'll even create all the folders you need to.

This current version makes a few assumptions about your structure and it's not configurable at all unless you actually hit the code. Here are the assumptions it makes:

  • The tests for Project.Foo live in Project.Foo.Tests.
  • Test classes have the "Tests" suffix.
  • Test classes live in the same namespace as the classes under test.
  • The tests for ClassFoo are in ClassFooTests.

I lied when I said it wasn't configurable at all. After you've used it for the first time it will have created ReSharper file templates that you can edit to customize what is generated when you create a test or a class under test. Just go to ReSharper>Options>Templates>FileTemplates>User Templates>TestDrive.

To install it just extract it somewhere and run install.cmd, or just copy the dll to your %APPDATA%\JetBrains\ReSharper\v3.0\vs8.0\Plugins\Resharper.TestDrive (obviously you'll need ReSharper installed).

To use it, just have the cursor somewhere within a class that doesn't have a test, or a test that doesn't have a class under test and hit Alt+Enter, select Create X Tests... or create X... and hit enter. In order for the light bulb/alt+enter to show up it will need to be able to find an associated project to the one you're in (Sample.Project.Tests<->Sample.Project).

I've got more plans for this but I wanted to get it out there to see what you all thought. Oh, and in case you're wondering, Bunker is just a really light nearly feature-free IoC container that Jacob wrote in a day for another project we're working on.

Binaries
Source

by Aaron on Monday, July 30, 2007 9:14:15 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Monday, July 16, 2007

So I recently started using the Early Access edition of ThoughtWorks' Mingle. Let me start out by saying I'm impressed. Mingle is a very clean, very powerful and most importantly, very flexible approach to project management. We all have our approach to managing our projects and Mingle allows you to handle many of those methods. I've spent a few days over the last week configuring our Mingle project to map to our newly developing process.  It hasn't been entirely smooth sailing, but it's slightly less than an R1 product so I won't complain... too much :) Here's a list of my thoughts thus far:

  1. Installation - I've done an install on both Windows and Linux. Both were relatively painless. The Linux install documentation was missing a few steps but that has since been fixed. I wish that Mingle could run on IIS and/or Apache. It's a bit annoying that I have to run it on a separate port than my primary web server, though Jacob tells me we can proxy a port with Apache, so that'd be nice.
  2. Creating a new project - The existing templates are nice to allow you to see what Mingle is capable of. I wouldn't recommend creating your real project using one of the existing templates as it leaves you with a lot of undeletable properties you don't need. Start out by creating three test projects, one using each of the three templates and play around with them until you know what you want to go into yours. Then create yours from a Blank Project.
  3. Card properties - It's really nice to be able to specify exactly what properties you want to be on each of your cards. The property creation UI could use some help though, it'd be nice if we could specify the values we want at the same time we're creating the property--but that's just a minor nitpick.
  4. wikiWiki - Mingle allows for you to create custom wiki pages that have nifty things like bar graphs and pie charts and tables. My biggest complaint about the wiki is lack of documentation. Not all of the samples in the documentation seem to work and the syntax is a little confusing... plus, if you type something wrong the error messages you get back are less than useless. Fortunately I've managed to get most of the things I want in our overview wiki, but it took much longer than it should have. Also, there's no way at the moment to do a burndown chart which would certainly be a nice thing to have for a lot of the agile methodologies out there.
  5. listList view - Mingle has some really simple filtering options for its two views. You can basically choose a value for each of your properties (or any) and add tags you'd like to filter by as well. Once you have a view you like, you can save that view. If you really like the view you can add it as a tab to the top for the whole team to see. Unfortunately there are some limitations: There's no way to give an OR for two property values and there's no way to specify that you only want to see cards with a property that has a null value. I'm hoping they will add these things eventually. There's also no way to specify "Current User" in the view... which makes views like "My Bugs" out of reach. Also, there's currently a max page size of 25, so if you want to do batch operations you can only do them 25 at a time. That's pretty annoying. In the meantime I can get *most* of my views the way I want them, but not all of them.
  6. gridGrid view - OK, this is pretty cool. You can pick a property to group by, and pick a property to color by. Of course you can specify the same filters as you can in the list (so the same limitations). What's nice though, is you can drag those cards around into different values for that grouped property. You can move a Story Card from New to Open just by dragging it. Pretty slick if you ask me. This view allows you to make a number of cool views. You can create an Assignment view that lets you drag unassigned cards to ready and willing developers. You can create an Iteration planning view that allows you to drag cards around to different Iterations. Or even an estimation view so you can drag your cards to different story point values. If you need more detail on a card, you can always just click it. It'll pop up a nice little window with more information so you don't have to navigate away from the page. All in all pretty fancy. It would be nice if the lanes were colored... if you have a long list it'd be easier to tell where you're dragging to when you're at the bottom. Oh, and where's the sort? I should be able to sort the cards in the grid.
  7. transitions Transitions - These are cool too. These allow you to add buttons to cards that meet specific preconditions. Clicking that button will set the properties you specify in the transition. Pretty slick to kind of workflow your process. Unfortunately there are a few things missing from these as well. You can't specify "Current User" in either the preconditions or the set area. This means you can't do things like automatically set the Approved By field when something is Approved. You also can't say that only the user who a Card is Assigned To can Resolve it. Just adding that feature would make it so much more powerful. The other thing that is lacking is that the only way to "trigger" these transitions is to actually click on the transition button. This means that dragging things around your swim lanes won't trigger these. Yes, that complicates things but it'd be nice to have some sort of trigger criteria... or even a view that shows "transitionable" cards that you can drag to transition them. Obviously I'm rambling now, but I think this is a cool idea and can be expanded upon.
  8. svn SVN Integration - Right now we use Trac as a Wiki and to explore our SVN revisions. Mingle's SVN integration looks pretty slick but is kind of broken. Take a look at this diff to the right. See all that white? Why is that white? It's not blank lines when you do a svn diff... where's my context? I don't know if this is a bug or what, but it's annoying. Other than that, Mingle's SVN integration is reasonable. I like the fact that you can define keywords so that when you add things like "#245" or "card 245" in your commit log it will automatically reference that commit with the specified card.
  9. Support - Eh... It seems the best point of contact with support is either email (which I haven't had a high enough priority issue to try) or their forum. The forum is painfully slow most of the day (read: unusable, but who knows if that's a routing issue between me and them or just their forum server/software) and doesn't have a ton of activity. A few of the bugs I reported were acknowledged via email so that's good, but who knows how seriously the suggestions are taken. There's only one guy moderating the forum so I'm sure he's busy, but it would be nice to see a bit more activity and feedback.
  10. Overall impressions - It's slick. It's flexible. It's... a little slow. Loading our Bug Board view just now took almost 5.71 seconds and 51 requests. Eek. Granted I'm not local right now, but it's not much faster when I'm on my Lan. That said, we're using it. We moved all our bugs off of FogBugz and we're going to give Mingle a shot for now.  It's free for the first 5 users, so why not?
  11. Wish list -
    • The greatest thing about FogBugz is its email integration. It can check a mail account you specify and automatically create tickets for messages in that inbox. You can reply to the senders of that mail via FogBugz interface and you can receive more correspondence through email from the original sender. It's pretty slick and great for support. Right now we're just using FogBugz for this purpose. To handle support requests. If something is upgraded to a bug we'll move it to Mingle... but the copy/paste operation will probably get tedious, so it'd be nice to have this functionality built into Mingle as well.
    • I want to be able to show/hide properties based on another property such as Type. If my Type is Bug, I should see Bug Status and not Story Status. Right now all cards have all properties even if they aren't relevant. It's fine that they have them, because I could change the type, but I don't want to see them.
    • I also want to create Backlogs--Priority Queues. Having just a property with a fixed # of #s for Priority is annoying at best. It's much more natural to keep things in a sorted order. I want to be able to create a new backlog with a specific filter and then drag my cards into an order that I want. I then want to be able to sort in that order. That way I can pop things off of that stack in that order and as new cards come in I can insert them. I'm very tempted to write this feature on my own...
    • Triggers--I mentioned this before, but it's worth mentioning again. I'd like to be able to specify what happens when I drag a card from one lane to another in a Grid view. I'd also like to be able to gate that dragging... something can't be dragged to Ready for Development until it has an Estimate set. Now that I think more about it, I think the latter would be more useful. I want to be able to specify constraints on the Cards that will be enforced by all views and editing techniques.

Alright, that's more than enough rambling for now. I'd strongly urge you all to go take a look at Mingle. It's got a ton of potential and I think it's definitely headed in the right direction. Remember, it's free for Open Source and it's free for the first five users of a Commercial project.

by Aaron on Monday, July 16, 2007 6:56:44 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Tuesday, June 26, 2007

Generally when I'm testing something that has dependencies my test fixture looks something like this:

  [TestFixture]
  public class FooTests
  {
    private MockRepository _mocks;
    private Foo _foo;
    private IBar _bar;

    [SetUp]
    public void Setup()
    {
      _mocks = new MockRepository();
      _bar = _mocks.DynamicMock<IBar>();
      _foo = new Foo(_bar);
    }

    [Test]
    public void Test()
    {
      using (_mocks.Unordered())
      {
        SetupResult.For(_bar.ProvideService()).Returns(1);
      }
      _mocks.ReplayAll();

      Assert.IsTrue(_foo.DoSomething());
      _mocks.VerifyAll();
    }
  }

Granted, I'm starting to use our AutoMocking Container more and more, but that's not the point of this post. As an aside, Ayende has some more examples on its usage and has added it to Rhino.Tools.

Anyways, back to the subject at hand. The above works great if nothing really happens in your Foo constructor. But what happens if part of the constructor is to call a method on IBar? Well surely you could just move the construction to the Test method:

    [Test]
    public void Test()
    {
      using (_mocks.Unordered())
      {
        _bar.DoSomeSetup();
        SetupResult.For(_bar.ProvideService()).Returns(1);
      }
      _mocks.ReplayAll();
      _foo = new FooTests(_bar);

      Assert.IsTrue(_foo.DoSomething());
      _mocks.VerifyAll();
    }

That works, but what if you have more than one test? Well, you could just do the same thing in every test, but that's an annoying amount of repeated code that doesn't add any value. You can't simply Extract Method on it because you'll most likely need to setup other expectations for other scenarios you're testing. So... now I do this:

    [Test]
    public void Test()
    {
      SetupMocks(delegate()
      {
        SetupResult.For(_bar.ProvideService()).Returns(1);
      });

      Assert.IsTrue(_foo.DoSomething());
      _mocks.VerifyAll();
    }

    public void SetupMocks(Block block)
    {
      using (_mocks.Unordered())
      {
        _bar.DoSomeSetup();
        if (block != null) block();
      }
      _mocks.ReplayAll();
      _foo = new FooTests(_bar);
      
    }

And this:

public delegate void Block();

Funny what a little Ruby exposure will do to you...

by Aaron on Tuesday, June 26, 2007 7:36:55 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [2]  |  Trackback
 Friday, June 01, 2007

Ayende responded to my last post asking "But, how do you correlate queries?". Alex James and I shot off a few ideas in comments on both posts. Here are a few of the issues with correlating the queries to a strategy (assuming a Consumer->Service->Query->ORM architecture):

  • A single query can be called by multiple services.
  • Each service can have different needs for that query and so would need a different strategy.
  • Furthermore, consumers of that service may have different needs from the entities returned by that service returned by that query and so on.

So ultimately each consumer of the results of any query would need its own strategy. You could name them manually, with strings or types or something. Those names would need to be passed down starting from one of those levels. If you choose to name the strategies based on the query, you'd end up having an explosion of similar if not identical queries with differing strategies, and an explosion at the service level. If you name them at the service level, you would have to pass that name down to the Query and you'd still end up with several similar service methods. Note that none of this is different than if you optimized each query manually on your own. Different queries are different queries, it's just that in this case a query is really a query+strategy.

The other thing both Ayende and Alex suggested was using Stack traces. That would make things automatic, but it's slow and a bit awkward.

So is there any way to make this better? Maybe. Jacob had a great idea about using attributes to name the strategies. What if we took that further? What if the Dao was proxied to look like this:

public class ProxyDao : RealUserDao
{
  public IList<UserEntity> QueryAllUsers()
  {
    using (session.PushStrategyContext("RealUserDao.QueryAllUsers"))
    {
      return base.QueryAllUsers()
    }
  }
}

Now you can proxy any layer you want with this and the strategy will be automatically named without any stack trace magic. (I think proxy magic is A-OK :) ) This solution keeps getting more and more complicated... I think it would make sense to support the simple string naming from the api level and let the crazy people do the proxy stuff.

There's still other issues however, but I'll post more in a bit.. distracted by tv right now.

by Aaron on Friday, June 01, 2007 6:15:47 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [4]  |  Trackback

OK, let me start out by saying that I'm hardly an expert on the subject of ORMs. I've only really used NHibernate, and I've still got plenty to learn about it. That said, I do know a bit about being lazy...

When working in any real application you quickly realize that actually exercising lazy loading is something that should probably be avoided at all costs. You want to fetch everything you need to handle a request in one go. Scott Bellware recently struggled with this issue as we all have, and the general consensus in the comments is that you should load what you need and not lazy load anything (unless of course that lazy load happens in a branch less often traveled). Udi Dahan provided a link to a rather elegant solution to the problem that I think I'm definitely going to explore.

Even with Udi's solution, you're still required to keep the fetching strategy in sync with the domain. This isn't an incredibly difficult thing to do, but it requires discipline and effort. I've been thinking for a while about this and a few other possible performance tweaks that can make ORMs easier to use and even a bit quicker than they are today.

Let's talk about another performance issue briefly. Say you want to Display an email list containing all of your users. The list should display just their username and email address, nothing more. The easy way to do fetch the data would be to do a "from UserEntity". But is that the most efficient? What if UserEntity has 10 fields? 20? 30? Is it worth explicitly querying for "u.Username u.Email from UserEntity u"? Let's find out. Being lazy, I simply borrowed James Avery's Northwind NHibernate example, ported it to NHibernate 1.2 and added a new mapping and class called TrimmedOrder. The class is exactly the same, but the mapping only contains the Freight and ShipName properties. I also commented out the many-to-ones in both Order and TrimmedOrder. You can get the source here.

The rest was just a few simple tests. One for reading, and one for writing. Here are the results:

  Order TrimmedOrder Performance Gain
Read Test 18.62 ms 11.66 ms 1.6 times faster
Write Test 227.13 ms 161.24 ms 1.41 times faster

Notes: The Read Test was averaged over 50 iterations and the Write test was averaged over 20 iterations. The db was queried for all orders once before anything was tested because I found the first query was always slower, but after that all queries seemed level. In the writing test, I wrote random values to the two properties mapped by TrimmedOrder. The Write Test values were adjusted by subtracting the read test values so that only the write itself was timed (in theory). 

Now, these gains aren't anything to write home about, but they are significant. Some of you may wonder why the Write Test sees any gain given that the database calls would be exactly the same, as NHibernate doesn't update columns that didn't change. Well, without profiling, I'm willing to bet that the reason is simply that NHibernate has discover which fields changed while flushing by testing them against their old values.

So now we have two problems: we have to know exactly what collections the domain needs ahead of time or we lazy load and unless we use projection queries, we're not getting the best performance we can get. Also, NHibernate doesn't currently support deferred (lazy) property loading, so projection queries return object[]'s rather than partially initialized entities, which means that as far as I know, you cannot use projection queries to do updates, so we just can't solve the Write problem without additional mappings or something else hackish.

What do we do? Yes, I'm finally nearing the point of this post. What if we taught the ORM to learn? Bear with me here. Let's go back to the email list example. If we were using Udi's Fetching Strategy approach, we'd do something like this:

IList<UserEntity> users = Repository<IEmailListService>.FindAll();

Which would ultimately get a FetchingStrategy associated with this service. Now what if every object fetched by this FetchingStrategy was proxied and would communicate back to the FetchingStrategy about its usages... and the FetchingStrategy remembered how its objects were used, so the next time it got invoked it would only return what was used last time... and it would keep learning, and refining so that it would simply return all the fields and collections that were ever accessed via that particular FetchingStrategy. If ever it returned too little, it would just lazy load what needed to be loaded and make note that it should provide that field next time (of course this would be configurable so that it would just load everything on a miss so you're not hitting the db several times in these cases). This would mean the first time your FetchingStrategy was invoked it would be "slower" but most every time after that it will have adapted and improved... all for mostly free. And yes, I know that Udi's FetchingStrategy would just return an hql query and live closer to the domain objects, but mine would live closer to NHibernate and it would probably be responsible for actually querying from NHibernate (so it's probably really a Dao, but you get my point.)

There are probably a few other things this could help with too, such as marking queries as readonly automatically so as to avoid an unnecessary and costly flush (unfortunately this is not currently supported any ways).

Caveats? Plenty I'm sure. Here's some I can think of:

  • There's the fact that it's not implemented. Jacob and I started to work on it and quickly decided it would be best to get lazy loaded properties hashed out first. It's probably a decent ways off the NHibernate radar, but maybe we can change that.
  • Then there's the proxy element. The Entities are often proxied normally to lazy load collections and such, so I don't think adding in reporting back to the FetchingStrategy would be a huge burden.
  • Complexity... yeah, this is complicated, but so are ORMs and software :)
  • Increased startup cost (kind of). You could always persist the strategies to help alleviate this a little...
  • Instance variables, access strategies, etc. Much like lazy loaded collections, restrictions would apply to lazy loaded properties and such. You'd probably need to just always load things that are field accessed, and you'd certainly have to avoid accessing the instance variables that back deferred properties.
  • Objects fetched by multiple strategies. This is a big one. There will be tons of questions when it comes time to solve this problem. If I access property A after an object is fetched from two different strategies, do I notify both strategies? Will data be fetched twice if both strategies need it?
  • Caching complications. Most of these would come from the lazy loaded properties, so that's something that'd have to be tackled eventually.

Now I'm just waiting for Gavin King or Sergey Koshcheyev to come tell me why this is a horrible idea, but until then, what do you all think?

by Aaron on Friday, June 01, 2007 9:12:39 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [5]  |  Trackback