Repositories and Entity Framework

One of the deadly sins for a developer is writing code without real purpose. It's an easy trap to fall into. Even with the fantastic frameworks we have available to us today, there's still always that layer of cruft - the setup code every application needs - that must be added. Over time, this becomes rote - something you just do, without thinking about why you're doing it. One of those things is creating a DAL class library with a handful of repositories.

It doesn't help that this behavior is reinforced by a shocking number of books, blog posts, and tutorials: many directly from Microsoft. When I was a bright-eyed, newb C# developer, I, like many others, poured through these resources and began likewise building my own repositories. However, it didn't take long before I started to feel something wasn't quite right. Looking at my repo code, virtually every method simply proxied to a strikingly similar method in Entity Framework. My unit of work classes quickly grew out of control with a series of virtually identical fields and properties, differing only on a particular sub-type and and a slightly different name. My previous programming experience fell mostly with Ruby and Python, at the time, and having come from frameworks like Ruby on Rails and Django, this setup felt extremely heavy, finicky and brittle.

As my skills in C# grew, I began trying to smooth over some of this uneasiness. I eventually created an abstraction that I referred to as a "truly generic repository", in contrast to the "generic repository", used as a base for each of the ridiculous amount of concrete repository implementations I was creating for each app. I even created a series of articles here on my blog detailing this setup, which you can peruse if you want. (Frankly, you should actually just hold on a bit, and not go look at those; in a few paragraphs I'll detail why it's crap.)

This was an improvement, in that I no longer had multiple repositories, and it essentially merged my one repository with my unit of work. With this truly generic repository, you could work with any entity for a particular context, without having to create and instantiate a separate class for each. Since there were not multiple repositories to coordinate, the unit of work class went away. It was fantastic - until it wasn't.

As I started working on more advanced applications, I began to find failings. Interestingly, all of those were related to managing Entity Framework under the hood. There were issues with its change tracking, issues with its object graph, issues with performance caused by the mere fact that the repo wasn't letting Entity Framework optimize things as well as it could on its own. And, that's when it began to hit me very clearly: this whole thing was wrong. The whole approach of trying to put some layer over Entity Framework was wrong. That moment was like sunshine breaking through a crack in the curtains of my mind. Which now brings me back full circle to the start of this post.

The repository pattern, like all design patterns, exists to solve a specific problem. What is that problem? Back in the heady days before ORMs, MVC, and all the other myriad of acronyms, you had spaghetti code. Particularly with Web Forms, you'd usually have a single file with HTML, C# or VB code, and SQL queries all interspersed throughout. If you were a good developer you might shuffle the largest majority of this into the "code behind", but even there, you'd often still have great machinations of SQL query string construction. This type of code was obviously found to be awful for a host of reasons. It was brittle - easy to break and hard to debug. It was nearly impossible to maintain. It was immensely inefficient. So, what is a friendly neighborhood developer to do? Well, you add an abstraction. You factor out all that SQL construction into a separate class - your repository. And, since this thing was rather useful and could work with multiple different types of things, that class went into a library. And, and as long as you're putting that there, you might as well start moving all your database stuff there. The DAL was born.

It's not that this pattern is incorrect in any way. If you have the problem of needing to abstract a bunch of low-level data store access code away, the repository pattern will serve you as well today as it ever has. The problem comes with the introduction of ORMs. An ORM, or object-relational mapper, by its very definition is essentially the repository pattern taken to its natural conclusion. The entire concept of a database is essentially abstracted away, replaced with pure code. All the messy work of doing CRUD operations is neatly tucked under the hood, and if you wanted, you could forget SQL was even a thing (though a good developer never would).

More to the point, these ORMs, in fact, almost always employ the repository and unit of work patterns themselves, because of course they would. They're patterns for a reason, and since there's low-level data store work to do, they're there to save the day. None of this is really controversial or even new information. What is, and what will likely buy me more than few flames in the comments below is this: an ORM is your data layer.

Really stop and think about it. If you were to set out today to create an application that needed database access, and you were told that you could only use something like ADO.NET, what would you do? You'd create a class library. You'd create a generic abstract repository class. You'd create concrete derivations of that class. You'd create a unit of work that possessed properties for each of these concrete derivations. If you were really industrious, you might go on to implement some sort of change tracking system, model an object graph, provide support for programmatic joins across repositories, etc. Eventually, if you spent enough time, energy, and effort, you'd end up something that looked very much like Entity Framework. And, you would very definitely have reinvented the wheel.

If you choose to use an ORM like Entity Framework, you are opting simply to use a DAL class library created by a third-party, instead of one you created yourself. When you start to really think about that, it's more than a little mind-blowing, though it really shouldn't be. Most developers have no issue dropping in third-party libraries for functionality they need. When was the last time you coded a routing framework or a view templating system? Why then do we feel need to code data access?

This, then, is normally the point where someone chimes in ask the perennial question: "But, what if I want to substitute Entity Framework for something else later? Isn't it better to abstract the dependency away?" To which I'll reply: "Yes, if you actually do." That, there, is the rub, because wrapping something like EF in a repository does not. Your app code is still littered with LINQ queries and such that likely won't work if you were to switch ORMs, regardless of whether it was abstracted or not. You still have a hard dependency on EF. You still have to do things like setup your EF context to be injected into your repo and such. In short, changing ORMs is still going to require app code changes.

Additionally, this is mostly a moot question anyways, as I can almost guarantee you that you will never actually change ORMs. No matter what you do, switching out something like an ORM is going to require massive code changes. Whether that's in your application or a class library makes little difference. The ROI of such changes will typically be extremely low: Entity Framework isn't going anywhere, so what's the rational behind switching to something like, say NHibernate? Because you like it better? Unless you can make a strong business case for the change, it simply will not be a business priority, and if it's not a business priority, it's not happening.

Finally, and this is really the death blow to the whole pro-repository argument: you can get abstraction via other methods, and those other methods both generally do a better job than the repository pattern at truly abstracting EF and provide additional benefits besides. All of CQRS (command-query responsibility segregation), the service layer pattern, and particularly microservices offer enormous additional benefits to your application and truly abstract the entire data dependency.

Long and short, if you're using an ORM like Entity Framework, the repository pattern is just wrong. Either use the ORM directly in your app code or use a pattern that actually does something truly beneficial for your app. It's time to stop doing things because that's the way you've always done them. Think about why you're actually doing something, and consider if it still makes sense. The repository pattern does not.

comments powered by Disqus