Development Blog

 Saturday, September 29, 2007

First some code:

UserEntity user = New.User("bob").In(New.Company("ACME")).With(
  New.PurchaseOf("ProductA")
    .WithAcademicSlot(1).FilledBy(New.EnrollmentIn(101).ThatIsCanceled)
    .WithPlacementSlot(1).FilledBy(New.EnrollmentIn(1)));

This code actually creates and saves (with NHibernate) about 12 or so entities. We can use code like this to set up one off sample data for tests in a way that's easy to read, understand and change. There is a decent amount of magic that goes into making this work and I wanted to talk about how I did it.

First we have a FixtureContext, which is just a hub for the DaoFactory, the current Session, and three helper classes, New, Current and Existing. New is the class responsible for the beginning of the syntax you see above. Each method on New returns a Creator. There's some magic in Creator, so here's the code:

public class Creator<T> : FixtureContextAware where T: class, new()
{
  private T _creation;
  
  protected T Creation
  {
    get { return _creation; }
    private set 
    {
      if (Current.Get<T>() != value)
      {
        Current.Push(value);
      }
      _creation = value; 
    }
  }

  public Creator(IFixtureContext context) : base(context)
  {
    Creation = new T();
  }

  public Creator(IFixtureContext context, T creation) : base(context)
  {
    Creation = creation;
  }

  public static implicit operator T(Creator<T> creator)
  {
    if (creator._creation == null) throw new Exception(
      String.Format("Creation of {0} is null, it probably shouldn't be.", typeof(T)));
    creator.Current.Pop<T>();
    return creator._creation;
  }
}

The first thing is that Creator is a subclass of FixtureContextAware, which is just a helper base class that provides access to FixtureContext's children. Next there are a few references to Current which is simply a collection of stacks of entities so that Creators can refer to other entities that are being created so they don't have to be passed around. This is better explained with an example. In the beginning example you see New.EnrollmentIn(101). An enrollment requires a User to be created, so because there is a User in this creation context, we can do this:

public CourseEnrollmentCreator(IFixtureContext context, short number) : base(context)
{
  Creation.User = Current.User;
  Creation.Course = Existing.Course(number);
  Creation.StartDate = DateTime.Now;
  Creation.EndDate = DateTime.Now.AddMonths(1);

  Session.Save(Creation);
}

The next thing is that the creation itself is stored as Creation in the Creator. This can either be new'd up or can be passed in to the constructor.

The coolest part (at least in my opinion) is the implicit operator. This allows you do to do things like: UserEntity user = New.User().Foo(), where each of those methods returns a UserCreator, but at the end of all of it the Creator is implicitly cast to a UserEntity, the thing actually being created. This also serves as an excellent time to pop the entity from the Current stack.

Next we have the Existing class. This is essentially just a wrapper for your Daos/Repositories so you can fetch things that are already in your database (like Existing.Course(number) or Existing.User("bob")).

With this simple framework in place, the next step is to start writing your domain specific Creators. Here's an example:

public class PurchaseCreator : Creator<PurchasedProductEntity>
{
  protected PurchaseCreator(IFixtureContext context, PurchasedProductEntity creation) : base(context, creation)
  {
  }

  public PurchaseCreator(IFixtureContext context, string productName) : base(context)
  {
    Creation.Product = New.Product(productName);
    // Initialize Purchase...
  }

  public PurchaseCreatorWithSlot WithAcademicSlot(int credits)
  {
    // Create and add slot...
   
    return new PurchaseCreatorWithSlot(Context, this, slot);
  }

  public PurchaseCreatorWithSlot WithPlacementSlot(int credits)
  {
    // Create and add slot...

    return new PurchaseCreatorWithSlot(Context, this, slot);
  }

  public PurchaseCreator WithSessions(int credits)
  {
    // Create and add sessions...

    return this;
  }
}

public class PurchaseCreatorWithSlot : PurchaseCreator
{
  private readonly PurchasedCourseEnrollmentSlotEntity _slot;

  public PurchaseCreatorWithSlot(IFixtureContext context, PurchasedProductEntity creation, PurchasedCourseEnrollmentSlotEntity slot) : base(context, creation)
  {
    _slot = slot;
  }

  public PurchaseCreatorWithSlot FilledBy(CourseEnrollmentEntity enrollment)
  {
    // Fill slot...
  }
}

Then we add a PurchaseOf(string productName) method to New that will create and return a new PurchaseCreator. We can also add a PurchaseOf(ProductEntity product) and pass a New.Product("Product") in instead. You can make it as granular or magic as you want. Also, Notice that WithAcademicSlot and WithPlacementSlot return a subclass of PurchaseCreator that adds another method. Using techniques like this you can make some very verbose and context sensitive fixtures.

We also make our HibernateTests base class FixtureContextAware so that we can use nice syntax in our tests.

Here is the source for the basic framework. Let me know what you think. 

by Aaron on Saturday, September 29, 2007 3:48:54 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Monday, August 27, 2007

James Kovacs replied to one of my many NHibernate Optimization Ramblings:

In the first example, how do you know that most customers are unimportant until you fetch them from the database? You have an additional problem that generally NHibernate sessions are short - especially in web apps or web services. So when do you reset your fetching strategy. What if one portion of your code uses different customer properties than another? The adaptive fetcher needs to do a lot of analysis of your post-fetch code paths or monitor the behaviour of the application as it executes. As it stands, NHibernate has a lot of options besides adaptive queries, which I believe are better including projections (since you as a programmer know the data you need) and lazy-loading of properties - both collections and objects. There are probably others. We're talking about saving milliseconds on DB queries when a round-trip to the DB is at least an order of magnitude greater. I personally feel that adaptive queries would require a lot of work for little gain. I call YAGNI.

This is probably my fault, but his first question tells me he doesn't quite understand what I'm trying to propose. It doesn't matter if customers are important or unimportant. Once the both code paths are hit and the strategy fully adapts, it is adapted and it will fetch a superset of fields that are required for that query in the context it is called. There's no reason to ever reset that strategy unless the code changes. 

Context is another important aspect of the adaptive queries, and I'm not sure how I'd implement it. At the moment I'm thinking that something along the line of scopes (nested or single level) so for each scope/query combination there would be a strategy. That's the answer to his second issue. The only analysis it needs to do is it needs to pay attention to what properties are hit on the entities it fetched by proxying that entity. That's it. No instrumentation, parsing, or any other crazy stuff.

 

using (Query.Scope("Print Customer Stuff"))
{
  customers = LoadCustomers();
  ...
}

As for the YAGNI assertion, I understand why you'd call YAGNI on the 2-12ms standard savings I showed in my tests, but You Already Do Need It at times (we use projections for just this) it's just that this would be automatic and require less maintenance and you wouldn't have to choose to do it. It would be free savings and require less manual optimization. If someone further down your chain decides they need to log customer.Name, you don't have to climb back up, find the original query and add it there. With projections at least you'd know you'd need to add it, but you'd have to change the query and change your DTO (anonymous types will help with this... I guess).

My point is, You don't need an inversion of control container, it's just easier. You don't need auto mocking containers for tests, it's just easier and you don't have to change your code when you change your constructor. You don't even need Mocks, you could write those by hand too. Well, with adaptive queries you don't have to change your code when you decide to access another field... or even another collection....

You also have to look further than field adapting. There's the potential to adapt collection initialization as well. No more select N+1's for those devs that don't pay attention, they'd just go away like magic. And yes, I realize hand optimized queries will generally prevail, but adaptively optimized queries will be free.

Today, Jacob had another great idea for a use of adaptive queries. We often do this in our MonoRail actions:

public void ShowCourseEnrollments([EntityParameter] User user)
{
  foreach (CourseEnrollment ce in user.CourseEnrollments)
  {
    // Do something with ce.Course.Number
  }
}

This probably requires a bit of explanation. Basically, when someone goes to the url /Controller/ShowCourseEnrollments.rails?user=1, MR's databinding (and our EntityParameter binder) will do an NHibernate session.Load<User>(1). At this time, the db hasn't been hit. As soon as we start enumerating user.CourseEnrollments, we're select N+1ing. Furthermore, we could potentially be doing another fetch for ce.Course. The solution to this is to either change your mapping to always fetch these things (bad idea anyone?) or to do something like this:

user = UserDao.FetchUserWithEnrollments(user.Id);

Well, what if adaptive queries kicked in at the databind, and instead of adding that method to your dao, you just got what you needed? Sure, YAGNI, but You Are Gonna Want It... if I or someone ever implements it.

by Aaron on Monday, August 27, 2007 9:11:52 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Friday, June 01, 2007

Ayende responded to my last post asking "But, how do you correlate queries?". Alex James and I shot off a few ideas in comments on both posts. Here are a few of the issues with correlating the queries to a strategy (assuming a Consumer->Service->Query->ORM architecture):

  • A single query can be called by multiple services.
  • Each service can have different needs for that query and so would need a different strategy.
  • Furthermore, consumers of that service may have different needs from the entities returned by that service returned by that query and so on.

So ultimately each consumer of the results of any query would need its own strategy. You could name them manually, with strings or types or something. Those names would need to be passed down starting from one of those levels. If you choose to name the strategies based on the query, you'd end up having an explosion of similar if not identical queries with differing strategies, and an explosion at the service level. If you name them at the service level, you would have to pass that name down to the Query and you'd still end up with several similar service methods. Note that none of this is different than if you optimized each query manually on your own. Different queries are different queries, it's just that in this case a query is really a query+strategy.

The other thing both Ayende and Alex suggested was using Stack traces. That would make things automatic, but it's slow and a bit awkward.

So is there any way to make this better? Maybe. Jacob had a great idea about using attributes to name the strategies. What if we took that further? What if the Dao was proxied to look like this:

public class ProxyDao : RealUserDao
{
  public IList<UserEntity> QueryAllUsers()
  {
    using (session.PushStrategyContext("RealUserDao.QueryAllUsers"))
    {
      return base.QueryAllUsers()
    }
  }
}

Now you can proxy any layer you want with this and the strategy will be automatically named without any stack trace magic. (I think proxy magic is A-OK :) ) This solution keeps getting more and more complicated... I think it would make sense to support the simple string naming from the api level and let the crazy people do the proxy stuff.

There's still other issues however, but I'll post more in a bit.. distracted by tv right now.

by Aaron on Friday, June 01, 2007 6:15:47 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [4]  |  Trackback

OK, let me start out by saying that I'm hardly an expert on the subject of ORMs. I've only really used NHibernate, and I've still got plenty to learn about it. That said, I do know a bit about being lazy...

When working in any real application you quickly realize that actually exercising lazy loading is something that should probably be avoided at all costs. You want to fetch everything you need to handle a request in one go. Scott Bellware recently struggled with this issue as we all have, and the general consensus in the comments is that you should load what you need and not lazy load anything (unless of course that lazy load happens in a branch less often traveled). Udi Dahan provided a link to a rather elegant solution to the problem that I think I'm definitely going to explore.

Even with Udi's solution, you're still required to keep the fetching strategy in sync with the domain. This isn't an incredibly difficult thing to do, but it requires discipline and effort. I've been thinking for a while about this and a few other possible performance tweaks that can make ORMs easier to use and even a bit quicker than they are today.

Let's talk about another performance issue briefly. Say you want to Display an email list containing all of your users. The list should display just their username and email address, nothing more. The easy way to do fetch the data would be to do a "from UserEntity". But is that the most efficient? What if UserEntity has 10 fields? 20? 30? Is it worth explicitly querying for "u.Username u.Email from UserEntity u"? Let's find out. Being lazy, I simply borrowed James Avery's Northwind NHibernate example, ported it to NHibernate 1.2 and added a new mapping and class called TrimmedOrder. The class is exactly the same, but the mapping only contains the Freight and ShipName properties. I also commented out the many-to-ones in both Order and TrimmedOrder. You can get the source here.

The rest was just a few simple tests. One for reading, and one for writing. Here are the results:

  Order TrimmedOrder Performance Gain
Read Test 18.62 ms 11.66 ms 1.6 times faster
Write Test 227.13 ms 161.24 ms 1.41 times faster

Notes: The Read Test was averaged over 50 iterations and the Write test was averaged over 20 iterations. The db was queried for all orders once before anything was tested because I found the first query was always slower, but after that all queries seemed level. In the writing test, I wrote random values to the two properties mapped by TrimmedOrder. The Write Test values were adjusted by subtracting the read test values so that only the write itself was timed (in theory). 

Now, these gains aren't anything to write home about, but they are significant. Some of you may wonder why the Write Test sees any gain given that the database calls would be exactly the same, as NHibernate doesn't update columns that didn't change. Well, without profiling, I'm willing to bet that the reason is simply that NHibernate has discover which fields changed while flushing by testing them against their old values.

So now we have two problems: we have to know exactly what collections the domain needs ahead of time or we lazy load and unless we use projection queries, we're not getting the best performance we can get. Also, NHibernate doesn't currently support deferred (lazy) property loading, so projection queries return object[]'s rather than partially initialized entities, which means that as far as I know, you cannot use projection queries to do updates, so we just can't solve the Write problem without additional mappings or something else hackish.

What do we do? Yes, I'm finally nearing the point of this post. What if we taught the ORM to learn? Bear with me here. Let's go back to the email list example. If we were using Udi's Fetching Strategy approach, we'd do something like this:

IList<UserEntity> users = Repository<IEmailListService>.FindAll();

Which would ultimately get a FetchingStrategy associated with this service. Now what if every object fetched by this FetchingStrategy was proxied and would communicate back to the FetchingStrategy about its usages... and the FetchingStrategy remembered how its objects were used, so the next time it got invoked it would only return what was used last time... and it would keep learning, and refining so that it would simply return all the fields and collections that were ever accessed via that particular FetchingStrategy. If ever it returned too little, it would just lazy load what needed to be loaded and make note that it should provide that field next time (of course this would be configurable so that it would just load everything on a miss so you're not hitting the db several times in these cases). This would mean the first time your FetchingStrategy was invoked it would be "slower" but most every time after that it will have adapted and improved... all for mostly free. And yes, I know that Udi's FetchingStrategy would just return an hql query and live closer to the domain objects, but mine would live closer to NHibernate and it would probably be responsible for actually querying from NHibernate (so it's probably really a Dao, but you get my point.)

There are probably a few other things this could help with too, such as marking queries as readonly automatically so as to avoid an unnecessary and costly flush (unfortunately this is not currently supported any ways).

Caveats? Plenty I'm sure. Here's some I can think of:

  • There's the fact that it's not implemented. Jacob and I started to work on it and quickly decided it would be best to get lazy loaded properties hashed out first. It's probably a decent ways off the NHibernate radar, but maybe we can change that.
  • Then there's the proxy element. The Entities are often proxied normally to lazy load collections and such, so I don't think adding in reporting back to the FetchingStrategy would be a huge burden.
  • Complexity... yeah, this is complicated, but so are ORMs and software :)
  • Increased startup cost (kind of). You could always persist the strategies to help alleviate this a little...
  • Instance variables, access strategies, etc. Much like lazy loaded collections, restrictions would apply to lazy loaded properties and such. You'd probably need to just always load things that are field accessed, and you'd certainly have to avoid accessing the instance variables that back deferred properties.
  • Objects fetched by multiple strategies. This is a big one. There will be tons of questions when it comes time to solve this problem. If I access property A after an object is fetched from two different strategies, do I notify both strategies? Will data be fetched twice if both strategies need it?
  • Caching complications. Most of these would come from the lazy loaded properties, so that's something that'd have to be tackled eventually.

Now I'm just waiting for Gavin King or Sergey Koshcheyev to come tell me why this is a horrible idea, but until then, what do you all think?

by Aaron on Friday, June 01, 2007 9:12:39 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [5]  |  Trackback