Optimizing Entity Framework Using View-Backed Entities

The Problem

I was profiling a Web application built on Entity Framework 6 and MVC 5, using the excellent Glimpse. I found that a page with three lists of five entities each was causing over a hundred query executions, eventually loading a huge object graph with hundreds of entities. I could eliminate the round trips using Include(), but that still left me loading way too much data when all I needed was aggregate/summary data.

The problem was that the aggregates I needed were complex and involved calculated properties, some of which were based on aggregates of navigation collection properties: a parent had sums of its children's properties, which in turn had sums of their children's properties, and in some cases parents had properties that were calculated partly based on aggregates of children's properties. You can see how this quickly spun out of control.

My requirements were that the solution had to perform better, at returning the same data, while allowing me to use standard entity framework, code first, with migrations. My solution was to calculate this data on the server side, using entities backed by views that did the joining, grouping, and aggregation.

The Solution

First I set about creating the views themselves, using DB-side logic to perform the aggregation and math I needed to get the statistics I needed for each entity. I basically went through the existing calculated properties and replicated their behavior in SQL, returning the results as columns in the view. I created one view for each type that I would be optimizing this way, plus one additional view that held aggregations shared between the other views (in order to keep the views DRY).

I then created the entities that would represent the views, using unit tests to ensure that the properties now calculated on the server matched expected values the same way that the original, app-calculated properties did. Creating entities backed by views is fairly straightforward; they behave just like tables, but obviously can't be modified - I made the property setters protected to enforce this at compile time. Because my View includes an entry for every "real" entity, any query against the entity type can be cast to the View-backed type and it will pull full statistics (there is no possibility of an entity existing in the base table but not in the view).

A Small Misstep

At first I tried to create a one to one association between the now bare entity type and the view type holding the aggregate statistics. The only ID I had for the view was the ID of the raw entity it was connected to. This turned out to be easier said than done - entity framework expects that, in a one to one relationship, it will be managing the ID at one end of the relationship; in my case, the ID's at both ends were DB-generated, even though they were guaranteed to match (since the ID in the view was pulled directly from the ID in the entity table).

I ended up abandoning the one-to-one mapping idea after a couple days' struggle, instead opting to map the statistics objects as subclasses of the real types in a table per type structure. This turned out to be relatively easy to accomplish - I added a table attribute to the sub type, giving the name of the view, and it was off to the races. I went through updating references to the calculated fields throughout LINQ queries, views, and unit tests. The unit and integration tests proved very helpful in validating the output of the views and offering confidence in the changes.

The Results

I then ran my benchmarks again and found that pages that had required over a hundred queries to generate now used only ten to twenty, and were rendering in half to a third the time - a one to two hundred percent improvement, using views designed purely to mimic the existing functionality - I hadn't even gone about optimizing them for performance yet!

After benchmarking, it looks even better (times are in milliseconds, min/avg/max):

  EF + LINQ EF + Views
3 lists of 5 entities (3 types) 360/785/1675 60/105/675
2 lists of 6 entities (1 type) 325/790/1935 90/140/740
1 entity's details + 1 list of 50 entities 465/975/2685 90/140/650

These tests were conducted by running Apache JMeter on my own machine against the application running on Microsoft Azure, across a sampling of 500 requests per page per run. That's a phenomenal 450 to 650 percent improvement across the board on the most intensive pages in the application, and has them all responding to 100% of requests in under 1 second. The performance gap will only widen as data sets grow; using views will make the scaling much more linear.

I'm very pleased with the performance improvement I've gotten. Calculating fields on the app side works for prototyping, but it just can't meet the efficiency requirements of a production application. View-backed entities came to the rescue in a big way. Give it a try!