Development Blog

 Saturday, September 29, 2007

First some code:

UserEntity user = New.User("bob").In(New.Company("ACME")).With(
  New.PurchaseOf("ProductA")
    .WithAcademicSlot(1).FilledBy(New.EnrollmentIn(101).ThatIsCanceled)
    .WithPlacementSlot(1).FilledBy(New.EnrollmentIn(1)));

This code actually creates and saves (with NHibernate) about 12 or so entities. We can use code like this to set up one off sample data for tests in a way that's easy to read, understand and change. There is a decent amount of magic that goes into making this work and I wanted to talk about how I did it.

First we have a FixtureContext, which is just a hub for the DaoFactory, the current Session, and three helper classes, New, Current and Existing. New is the class responsible for the beginning of the syntax you see above. Each method on New returns a Creator. There's some magic in Creator, so here's the code:

public class Creator<T> : FixtureContextAware where T: class, new()
{
  private T _creation;
  
  protected T Creation
  {
    get { return _creation; }
    private set 
    {
      if (Current.Get<T>() != value)
      {
        Current.Push(value);
      }
      _creation = value; 
    }
  }

  public Creator(IFixtureContext context) : base(context)
  {
    Creation = new T();
  }

  public Creator(IFixtureContext context, T creation) : base(context)
  {
    Creation = creation;
  }

  public static implicit operator T(Creator<T> creator)
  {
    if (creator._creation == null) throw new Exception(
      String.Format("Creation of {0} is null, it probably shouldn't be.", typeof(T)));
    creator.Current.Pop<T>();
    return creator._creation;
  }
}

The first thing is that Creator is a subclass of FixtureContextAware, which is just a helper base class that provides access to FixtureContext's children. Next there are a few references to Current which is simply a collection of stacks of entities so that Creators can refer to other entities that are being created so they don't have to be passed around. This is better explained with an example. In the beginning example you see New.EnrollmentIn(101). An enrollment requires a User to be created, so because there is a User in this creation context, we can do this:

public CourseEnrollmentCreator(IFixtureContext context, short number) : base(context)
{
  Creation.User = Current.User;
  Creation.Course = Existing.Course(number);
  Creation.StartDate = DateTime.Now;
  Creation.EndDate = DateTime.Now.AddMonths(1);

  Session.Save(Creation);
}

The next thing is that the creation itself is stored as Creation in the Creator. This can either be new'd up or can be passed in to the constructor.

The coolest part (at least in my opinion) is the implicit operator. This allows you do to do things like: UserEntity user = New.User().Foo(), where each of those methods returns a UserCreator, but at the end of all of it the Creator is implicitly cast to a UserEntity, the thing actually being created. This also serves as an excellent time to pop the entity from the Current stack.

Next we have the Existing class. This is essentially just a wrapper for your Daos/Repositories so you can fetch things that are already in your database (like Existing.Course(number) or Existing.User("bob")).

With this simple framework in place, the next step is to start writing your domain specific Creators. Here's an example:

public class PurchaseCreator : Creator<PurchasedProductEntity>
{
  protected PurchaseCreator(IFixtureContext context, PurchasedProductEntity creation) : base(context, creation)
  {
  }

  public PurchaseCreator(IFixtureContext context, string productName) : base(context)
  {
    Creation.Product = New.Product(productName);
    // Initialize Purchase...
  }

  public PurchaseCreatorWithSlot WithAcademicSlot(int credits)
  {
    // Create and add slot...
   
    return new PurchaseCreatorWithSlot(Context, this, slot);
  }

  public PurchaseCreatorWithSlot WithPlacementSlot(int credits)
  {
    // Create and add slot...

    return new PurchaseCreatorWithSlot(Context, this, slot);
  }

  public PurchaseCreator WithSessions(int credits)
  {
    // Create and add sessions...

    return this;
  }
}

public class PurchaseCreatorWithSlot : PurchaseCreator
{
  private readonly PurchasedCourseEnrollmentSlotEntity _slot;

  public PurchaseCreatorWithSlot(IFixtureContext context, PurchasedProductEntity creation, PurchasedCourseEnrollmentSlotEntity slot) : base(context, creation)
  {
    _slot = slot;
  }

  public PurchaseCreatorWithSlot FilledBy(CourseEnrollmentEntity enrollment)
  {
    // Fill slot...
  }
}

Then we add a PurchaseOf(string productName) method to New that will create and return a new PurchaseCreator. We can also add a PurchaseOf(ProductEntity product) and pass a New.Product("Product") in instead. You can make it as granular or magic as you want. Also, Notice that WithAcademicSlot and WithPlacementSlot return a subclass of PurchaseCreator that adds another method. Using techniques like this you can make some very verbose and context sensitive fixtures.

We also make our HibernateTests base class FixtureContextAware so that we can use nice syntax in our tests.

Here is the source for the basic framework. Let me know what you think. 

by Aaron on Saturday, September 29, 2007 3:48:54 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Monday, August 27, 2007

James Kovacs replied to one of my many NHibernate Optimization Ramblings:

In the first example, how do you know that most customers are unimportant until you fetch them from the database? You have an additional problem that generally NHibernate sessions are short - especially in web apps or web services. So when do you reset your fetching strategy. What if one portion of your code uses different customer properties than another? The adaptive fetcher needs to do a lot of analysis of your post-fetch code paths or monitor the behaviour of the application as it executes. As it stands, NHibernate has a lot of options besides adaptive queries, which I believe are better including projections (since you as a programmer know the data you need) and lazy-loading of properties - both collections and objects. There are probably others. We're talking about saving milliseconds on DB queries when a round-trip to the DB is at least an order of magnitude greater. I personally feel that adaptive queries would require a lot of work for little gain. I call YAGNI.

This is probably my fault, but his first question tells me he doesn't quite understand what I'm trying to propose. It doesn't matter if customers are important or unimportant. Once the both code paths are hit and the strategy fully adapts, it is adapted and it will fetch a superset of fields that are required for that query in the context it is called. There's no reason to ever reset that strategy unless the code changes. 

Context is another important aspect of the adaptive queries, and I'm not sure how I'd implement it. At the moment I'm thinking that something along the line of scopes (nested or single level) so for each scope/query combination there would be a strategy. That's the answer to his second issue. The only analysis it needs to do is it needs to pay attention to what properties are hit on the entities it fetched by proxying that entity. That's it. No instrumentation, parsing, or any other crazy stuff.

 

using (Query.Scope("Print Customer Stuff"))
{
  customers = LoadCustomers();
  ...
}

As for the YAGNI assertion, I understand why you'd call YAGNI on the 2-12ms standard savings I showed in my tests, but You Already Do Need It at times (we use projections for just this) it's just that this would be automatic and require less maintenance and you wouldn't have to choose to do it. It would be free savings and require less manual optimization. If someone further down your chain decides they need to log customer.Name, you don't have to climb back up, find the original query and add it there. With projections at least you'd know you'd need to add it, but you'd have to change the query and change your DTO (anonymous types will help with this... I guess).

My point is, You don't need an inversion of control container, it's just easier. You don't need auto mocking containers for tests, it's just easier and you don't have to change your code when you change your constructor. You don't even need Mocks, you could write those by hand too. Well, with adaptive queries you don't have to change your code when you decide to access another field... or even another collection....

You also have to look further than field adapting. There's the potential to adapt collection initialization as well. No more select N+1's for those devs that don't pay attention, they'd just go away like magic. And yes, I realize hand optimized queries will generally prevail, but adaptively optimized queries will be free.

Today, Jacob had another great idea for a use of adaptive queries. We often do this in our MonoRail actions:

public void ShowCourseEnrollments([EntityParameter] User user)
{
  foreach (CourseEnrollment ce in user.CourseEnrollments)
  {
    // Do something with ce.Course.Number
  }
}

This probably requires a bit of explanation. Basically, when someone goes to the url /Controller/ShowCourseEnrollments.rails?user=1, MR's databinding (and our EntityParameter binder) will do an NHibernate session.Load<User>(1). At this time, the db hasn't been hit. As soon as we start enumerating user.CourseEnrollments, we're select N+1ing. Furthermore, we could potentially be doing another fetch for ce.Course. The solution to this is to either change your mapping to always fetch these things (bad idea anyone?) or to do something like this:

user = UserDao.FetchUserWithEnrollments(user.Id);

Well, what if adaptive queries kicked in at the databind, and instead of adding that method to your dao, you just got what you needed? Sure, YAGNI, but You Are Gonna Want It... if I or someone ever implements it.

by Aaron on Monday, August 27, 2007 9:11:52 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Sunday, August 26, 2007

Ayende posted a great comment with some questions about adaptive fetching. Here are his questions and my responses:

Let us assume this:
customers = LoadCustomers();
for customer in customers:
if customer.IsImportant:
print customer.Birthday
else:
print customer.CurrentCharge
What would the adaptive fetching do in this case?
Assume that you have started with mostly unimportant customers and then moved to important customers?
The amount of queries that would be generated is prohibitive.

In this scenario, adaptive fetching would generate a query that pulled IsImportant and CurrentCharge for unimportant customers. As soon as a single Important customer ran through this code, the query would change to fetch IsImportant, CurrentCharge and Birthday. It would also immediately lazy load *all* missing properties from the original query for all customers originally queried, and continue to track accessed properties. That's only one additional query for each differing codepath, and that's only the first time its hit. From that point on, until the query was reset (version upgrade, app restart if it's not persisted, manually, etc), then you would have all you needed for all codepaths.

Another problem that you have here is that you do a query like:
"select u.Name, u.Email from Customer" and the query actually returns you Name,Email, Address, Photo.
That violates the law of least surprise fairly drastically.

I don't see adaptive queries having that syntax. I'm thinking more along the lines of "select ??? from customer"  or "select what  i need from customer", etc. Specifying what you're looking for puts you right back into projections. If you want to be specific, use projections.

Finally, I think that a much easier solution than trying to eek a few more milliseconds from a DB query is not to go to the DB at all. Utilize NH's caching abilities, and you'll get a significant performance benefit for little cost.

I agree that most of the time you'd get more benefit from caching than squeezing some time out of the db. I just think it gets us one step closer to making mapping between objects and the db more friendly to both sides of the map, and gives developers a tool that works automatically "for free". I'm sure it'd also make DBAs happier that their devs aren't just doing select * all the time.

Now, if you still want adaptive fetching, the best way to get it is to help build the HQL Parser, which would generate a human workable AST.

Agreed. I guess if we work on rewriting chunks of NHibernate the code would become manageable eh? So yeah, I'll stop blathering and try and start contributing.

by Aaron on Sunday, August 26, 2007 10:44:50 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Saturday, August 25, 2007

Oren replied to my original post about features I feel that NHibernate is missing. I think he may have misinterpreted some of my wants so I'll run through and reply in order.

Lazy Field Initialization

Obviously grabbing each field as it was accessed would be ludicrous. Lazy field initialization is simply something that needs to be there in order for partial object queries to work properly. It's the same idea as deciding whether or not to fetch a child collection. You query for it if you need it, you don't if you don't, and if you end up using the child collection when you didn't fetch it, it will get lazy loaded. I'm of the opinion that lazy loads are a smell, and if you hit them in anything but a border case, you're doing something wrong and you should be fetching what you need. 

I'm fine with specifying defaults on fields (don't load Photo unless I ask for it explicitly), but Lazy Field Initialization really would shine in partial queries.

Partial object queries

UserSummaryDetails is not User. Let's say I have a ReminderService that has a SendMailTo method that takes a User and a message. If all that method needs is that User's name and email address, why should I allow NHibernate to query for the entire user? It's just wasteful. Furthermore, it'd be rather tedious to have multiple Value Objects like UserSummaryDetails in your codebase just to support the different scenarios you want.

If you add Lazy Field Initialization and real partial object queries to the mix, you'd be able to query for a group of Users that just have email and name populated, and if you happen to need address, it'd requery for all of the user's addresses. You can even take it further and requery for all of the fields missing from the original user query. NHibernate should also probably slap you on the wrist (by logging it). I think I'm going to find a way to start logging all lazy loads so that we can investigate them.

Using objects like UserSummaryDetails eliminates a good chunk of the usefulness of the domain. That is,  you can no longer use the methods on the User object itself or any domain services that would consume that object.

So really what I want is the syntax I mentioned in my original post, which would just reuse NHibernate's field access strategy to set the appropriate fields. Any fields that were not set at that time would be flagged for lazy loading. On flush, only those fields that were set would be compared. Combine this with Adaptive Fetching Strategies and you've got a very powerful, self optimizing and flexible query mechanism that leaves you free to use your Domain as you wish.

Read Only Queries

Maybe it would make sense to give two sessions to each Dao, one of them that handled read only queries and one that handled read write queries. That's a good idea, but I still think it's something that should be built in (Hibernate does it). 

Join Qualifiers

I don't remember what the query was, but there was a reporting query I was trying to do involving multiple left joins that I just couldn't get to work with HQL. I was able to do it in SQL no problem by qualifying one of the joins, but every attempt in HQL hit a wall. Good point about having to support multiple db's. It's still something I'd like to see eventually.

by Aaron on Saturday, August 25, 2007 4:48:04 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [8]  |  Trackback

sqldavidson First I want to admit something somewhat embarrassing: I don't know enough about databases. Until coming here, I've always had a DBA that could handle that. I've started to realize now how important it is for all developers to have a much better understanding about databases than I do. So I picked up this book. I was really looking to better understand optimization, especially knowing when to create indexes and optimizing queries. Though the book only had a few chapters on the subject, I felt like it greatly increased my understanding on the subject.

A few months ago I wrote about Adaptive Fetching Strategies. At that time I had little knowledge about covering indexes and how they improve performance. Oh well, it's just more reason something like that should be implemented. Before we even get to that however, NHibernate needs to have a few features added to it in order to be useful in scenarios where performance is important.

Remember folks, SELECT * is bad. NHibernate is for effectively always doing a SELECT * when you're asking for an object when it comes to index coverage. Yes, it specifies columns so it doesn't have all the problems associated with SELECT *, but it is still decidedly less performant than querying for only what you want. Compound this with the extra overhead the unneeded columns add during a flush, and you've got a pretty compelling argument to only query for what you need.

OK, so you've decided that you only want to query for what you need and you're using NHibernate. Well, you can do that... kind of. Not much differently than you can if you just used ADO.NET and DataSets though. NHibernate doesn't support lazy field initialization. This means that if you query for only username and email address from your user table, you don't get back a User. You get back a List of object[]'s. Arguably a list of object[]'s is less functional than a DataSet. Combine this with the fact that you're querying using HQL, which has a subset of functionality, predictability, and therefore optimizability of real SQL, and you start to see a big hole in whole ORM thing... if you want to optimize your pages. Yes, HQL does make writing queries more pretty and more in the domain, but it would be nice if it supported things like "on".

Obviously optimizing too early is evil, but we've had several pages now where we've needed to optimize, and just resorted to querying for a table of values. No longer are we querying for objects, that's just not performant enough. Especially in display scenarios where you're not making changes to anything as NHibernate lacks the ability to do read-only queries, so you end up with that (very) expensive flush unless you change your flushing strategy to be manual (awkward) or you detach your objects from the session (also awkward).

So in short, I feel NHibernate (and any ORM for that matter) needs the following features to really be optimization friendly:

  • Lazy field initialization
  • Querying for partial objects: select u(Username, Email) from User u
  • Read-only queries that do not get flushed.
  • Join qualifiers (on in T-SQL)

And yeah, I know it's open source and I could just do it myself, but I have nightmares about that codebase, and I hardly have the time to implement such large features. All I have the time to do is complain and wish :)

by Aaron on Saturday, August 25, 2007 11:28:29 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Friday, June 01, 2007

Ayende responded to my last post asking "But, how do you correlate queries?". Alex James and I shot off a few ideas in comments on both posts. Here are a few of the issues with correlating the queries to a strategy (assuming a Consumer->Service->Query->ORM architecture):

  • A single query can be called by multiple services.
  • Each service can have different needs for that query and so would need a different strategy.
  • Furthermore, consumers of that service may have different needs from the entities returned by that service returned by that query and so on.

So ultimately each consumer of the results of any query would need its own strategy. You could name them manually, with strings or types or something. Those names would need to be passed down starting from one of those levels. If you choose to name the strategies based on the query, you'd end up having an explosion of similar if not identical queries with differing strategies, and an explosion at the service level. If you name them at the service level, you would have to pass that name down to the Query and you'd still end up with several similar service methods. Note that none of this is different than if you optimized each query manually on your own. Different queries are different queries, it's just that in this case a query is really a query+strategy.

The other thing both Ayende and Alex suggested was using Stack traces. That would make things automatic, but it's slow and a bit awkward.

So is there any way to make this better? Maybe. Jacob had a great idea about using attributes to name the strategies. What if we took that further? What if the Dao was proxied to look like this:

public class ProxyDao : RealUserDao
{
  public IList<UserEntity> QueryAllUsers()
  {
    using (session.PushStrategyContext("RealUserDao.QueryAllUsers"))
    {
      return base.QueryAllUsers()
    }
  }
}

Now you can proxy any layer you want with this and the strategy will be automatically named without any stack trace magic. (I think proxy magic is A-OK :) ) This solution keeps getting more and more complicated... I think it would make sense to support the simple string naming from the api level and let the crazy people do the proxy stuff.

There's still other issues however, but I'll post more in a bit.. distracted by tv right now.

by Aaron on Friday, June 01, 2007 6:15:47 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [4]  |  Trackback

OK, let me start out by saying that I'm hardly an expert on the subject of ORMs. I've only really used NHibernate, and I've still got plenty to learn about it. That said, I do know a bit about being lazy...

When working in any real application you quickly realize that actually exercising lazy loading is something that should probably be avoided at all costs. You want to fetch everything you need to handle a request in one go. Scott Bellware recently struggled with this issue as we all have, and the general consensus in the comments is that you should load what you need and not lazy load anything (unless of course that lazy load happens in a branch less often traveled). Udi Dahan provided a link to a rather elegant solution to the problem that I think I'm definitely going to explore.

Even with Udi's solution, you're still required to keep the fetching strategy in sync with the domain. This isn't an incredibly difficult thing to do, but it requires discipline and effort. I've been thinking for a while about this and a few other possible performance tweaks that can make ORMs easier to use and even a bit quicker than they are today.

Let's talk about another performance issue briefly. Say you want to Display an email list containing all of your users. The list should display just their username and email address, nothing more. The easy way to do fetch the data would be to do a "from UserEntity". But is that the most efficient? What if UserEntity has 10 fields? 20? 30? Is it worth explicitly querying for "u.Username u.Email from UserEntity u"? Let's find out. Being lazy, I simply borrowed James Avery's Northwind NHibernate example, ported it to NHibernate 1.2 and added a new mapping and class called TrimmedOrder. The class is exactly the same, but the mapping only contains the Freight and ShipName properties. I also commented out the many-to-ones in both Order and TrimmedOrder. You can get the source here.

The rest was just a few simple tests. One for reading, and one for writing. Here are the results:

  Order TrimmedOrder Performance Gain
Read Test 18.62 ms 11.66 ms 1.6 times faster
Write Test 227.13 ms 161.24 ms 1.41 times faster

Notes: The Read Test was averaged over 50 iterations and the Write test was averaged over 20 iterations. The db was queried for all orders once before anything was tested because I found the first query was always slower, but after that all queries seemed level. In the writing test, I wrote random values to the two properties mapped by TrimmedOrder. The Write Test values were adjusted by subtracting the read test values so that only the write itself was timed (in theory). 

Now, these gains aren't anything to write home about, but they are significant. Some of you may wonder why the Write Test sees any gain given that the database calls would be exactly the same, as NHibernate doesn't update columns that didn't change. Well, without profiling, I'm willing to bet that the reason is simply that NHibernate has discover which fields changed while flushing by testing them against their old values.

So now we have two problems: we have to know exactly what collections the domain needs ahead of time or we lazy load and unless we use projection queries, we're not getting the best performance we can get. Also, NHibernate doesn't currently support deferred (lazy) property loading, so projection queries return object[]'s rather than partially initialized entities, which means that as far as I know, you cannot use projection queries to do updates, so we just can't solve the Write problem without additional mappings or something else hackish.

What do we do? Yes, I'm finally nearing the point of this post. What if we taught the ORM to learn? Bear with me here. Let's go back to the email list example. If we were using Udi's Fetching Strategy approach, we'd do something like this:

IList<UserEntity> users = Repository<IEmailListService>.FindAll();

Which would ultimately get a FetchingStrategy associated with this service. Now what if every object fetched by this FetchingStrategy was proxied and would communicate back to the FetchingStrategy about its usages... and the FetchingStrategy remembered how its objects were used, so the next time it got invoked it would only return what was used last time... and it would keep learning, and refining so that it would simply return all the fields and collections that were ever accessed via that particular FetchingStrategy. If ever it returned too little, it would just lazy load what needed to be loaded and make note that it should provide that field next time (of course this would be configurable so that it would just load everything on a miss so you're not hitting the db several times in these cases). This would mean the first time your FetchingStrategy was invoked it would be "slower" but most every time after that it will have adapted and improved... all for mostly free. And yes, I know that Udi's FetchingStrategy would just return an hql query and live closer to the domain objects, but mine would live closer to NHibernate and it would probably be responsible for actually querying from NHibernate (so it's probably really a Dao, but you get my point.)

There are probably a few other things this could help with too, such as marking queries as readonly automatically so as to avoid an unnecessary and costly flush (unfortunately this is not currently supported any ways).

Caveats? Plenty I'm sure. Here's some I can think of:

  • There's the fact that it's not implemented. Jacob and I started to work on it and quickly decided it would be best to get lazy loaded properties hashed out first. It's probably a decent ways off the NHibernate radar, but maybe we can change that.
  • Then there's the proxy element. The Entities are often proxied normally to lazy load collections and such, so I don't think adding in reporting back to the FetchingStrategy would be a huge burden.
  • Complexity... yeah, this is complicated, but so are ORMs and software :)
  • Increased startup cost (kind of). You could always persist the strategies to help alleviate this a little...
  • Instance variables, access strategies, etc. Much like lazy loaded collections, restrictions would apply to lazy loaded properties and such. You'd probably need to just always load things that are field accessed, and you'd certainly have to avoid accessing the instance variables that back deferred properties.
  • Objects fetched by multiple strategies. This is a big one. There will be tons of questions when it comes time to solve this problem. If I access property A after an object is fetched from two different strategies, do I notify both strategies? Will data be fetched twice if both strategies need it?
  • Caching complications. Most of these would come from the lazy loaded properties, so that's something that'd have to be tackled eventually.

Now I'm just waiting for Gavin King or Sergey Koshcheyev to come tell me why this is a horrible idea, but until then, what do you all think?

by Aaron on Friday, June 01, 2007 9:12:39 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [5]  |  Trackback
 Tuesday, March 27, 2007
using NUnit.Framework;

[TestFixture]
public class OperatorExplicitTests
{
  public interface IFoo { }
  public sealed class Foo : IFoo { }

  public class Bar : IFoo
  {
    private Foo _foo = new Foo();

    public static explicit operator Foo(Bar bar)
    {
      return bar._foo;
    }
  }

  [Test] // Passes
  public void TestCastFromClass()
  {
    Bar bar = new Bar();
    Foo foo = (Foo)bar;
  }

  [Test] // Passes
  public void TestCastFromInterfaceToNormal()
  {
    IFoo bar = new Bar();
    Foo foo = (Foo)(Bar)bar;
  }

  [Test] // Throws System.InvalidCastException: Unable to cast object of type 'Bar' to type 'Foo'.
  public void TestCastFromInterface()
  {
    IFoo bar = new Bar();
    Foo foo = (Foo)bar;
  }

}

Ugh, see what happens when you try to be sneaky? Is this a bug? A limitation?

If you're curious why I care, we have a custom implentation of IDbCommand that wraps a SqlCommand and this code expects it to be a SqlCommand... no fooling it I guess :/

by Aaron on Tuesday, March 27, 2007 11:15:48 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [1]  |  Trackback