Data Access Layers... Who Needs Them?


REVELATION TECHNOLOGIES
July 7, 2004

What's a Data Access Layer and Why Do I Need One?

Not everyone needs a data access layer (DAL). There are times when a DAL can save you development time and effort and increase reliability and maintainability. And then there are times when a DAL will only slow you down and bloat your code. This article will explain the usefulness of a DAL and help you decide whether you really need one and which type is right for you.

If all your program does is collect and display data, chances are you don't need much of a data access layer. You're probably connecting your UI component directly to your database. For example, if you're using .NET technologies, your DataGrid is bound to a DataSet, which updates your database. If that's all you're doing, that's probably the best way to do it. If, however, you need to manipulate data, do calculations or otherwise use object oriented techniques, you could probably benefit from a data access layer.

Object Oriented vs. Relational Design

You can write object oriented code in many different languages including C#, VB.NET, C++ and Java. These languages all have one thing in common: they allow you to model your programs after real-world metaphors. For example, a retail store may have real-world business objects such as Customers, Catalogs and Products. Each of these objects can be represented in software as a class. Classes define properties and methods. Properties are the characteristics or data that represent a particular instance of the class. A Customer class may define properties like account number, name, address, credit card number, etc. Each instance of the customer class may have different values for each of these properties. A Customer class may also define methods like buyProduct, viewOrders, etc. These are the actions that a customer object may take. These do not change from one instance to another. When it comes time to save the information about a customer only the properties need to be saved. It is these properties that make customers unique.

When you design your relational database you typically use a similar technique. You find the main business objects (e.g., Customer, Catalog, Product, etc.) and create a table for each. Within each table you create a column for each property (e.g., account number, name, address, credit card number, etc). This becomes the first draft of your logical data model. If it ended here, things would be easy. There would be a one-to-one correlation between your classes and your tables. Sometimes this happens, but not always. After applying some performance optimizations, normalization and other techniques you end up with your physical data model. This physical (actual) data model may differ somewhat from the logical model that you started with. It is the physical model that your code must talk to.

The Job of the Data Access Layer

Having object oriented code on one side and a relational database on the other, your job is to somehow connect the two so that data (properties) that exist in the objects can be saved to the database tables. Likewise, data in the tables needs to be loaded into objects in code. This is what the data access layer does. It provides the communication between the code and the database.

So what's the big deal? Don't all these languages (or supporting platforms) provide data access mechanisms? Yes they do. If you do your coding in the Microsoft® arena you use ADO or ADO.NET. If you're in the Java realm you use JDBC. If you're writing C++ for UNIX you may be using Embedded SQL for C (ESQL-C) or something similar. Other languages and platforms have their own mechanisms as well. But these all share one common theme: you have to write SQL. Well, that's not so bad. Loading an object from the database requires a SELECT statement. Saving requires an INSERT the first time and an UPDATE thereafter for any given object. Deleting an object requires a DELETE. So each class you define requires four SQL statements. You can easily handle all your data access needs by writing these four statements for each class you create. This will be your data access layer. It doesn't really need to be its own physical layer. It can simply be four methods on each class.

How Code Generators Can Help

Four methods per class may not seem like a lot. But given a program with dozens of classes, this can become tedious, not to mention error prone. If you copy and paste from one class to another you will invariably forget to change something somewhere and spend a few hours debugging it. To reduce errors many programmers seek the aid of DAL code generators. Code generators-whether home grown or purchased-are a good way of creating a lot of error-free "boiler plate" code. What these generally do is examine the database and generate a class file for each table they find. The class has a property for each column in the table. The class also contains methods to load, add, update and delete the object. After the code generator has generated these class files you are free to modify them as you see fit to make them fit your business model.

There are a few problems that can arise, however, when working with generated code. First, as mentioned earlier, your business model may not match one-to-one with your physical database model. You may need to make some modifications to your generated classes to adapt them to your needs. These classes also will not have any business-specific methods in them such as viewOrders() or buyProduct(). You will need to add these yourself after they are generated. Second, if your database model changes (you add or remove a column, change a data type, etc.) your generated class file is now out-of-sync. Depending on how sophisticated your code generator is, it may or may not be able to apply the database changes while preserving your changes. You may need to regenerate the code and manually reapply your changes. This can become tedious and error prone. To avoid this situation you may choose to implement your business layer separate from your data access layer. In other words, let your data access code be just that. These generated classes remain as-is. You don't modify them. They simply represent the data. Instead, you add your business logic to a separate set of classes that interact with these data classes. This also allows you to map the business objects to the data objects any special way needed. Now if the database changes you simply regenerate the entire data layer, and then modify the business layer wherever necessary. That's much easier and cleaner.

You can see how DAL code generators can save you a lot of development time, shield you from having to know SQL and provide reliable code. There are only two things that could make them better. First, if you could get the flexibility offered by separate business and data access layers with only one layer, and second if you didn't need to run a code generator every time the database changed. Let's discuss each of these.

Problems With Code Generators

As noted above, the "best" way to use a code generated DAL is to leave the generated classes as-is and build a separate business layer on top of it. This allows you to regenerate the data classes any time you wish without affecting the business objects. If you change your data model you can regenerate and recompile your data access layer independently. The business objects then need to be modified to make use of the new or changed data. But these can be changed independently, leaving less modified code to be tested. The downside to this is that you now have two layers of objects instead of one. Rather than the business object simply getting the data it needs it must instantiate and communicate with another object that does the data access. More code means bigger and harder to maintain.

The other downside is the fact that you must regenerate and recompile code whenever the data model changes. Newly compiled code needs to be tested. If your business layer doesn't need the changes (maybe your DBA changed a data type of a column your program is not using) you don't need to regenerate the DAL. But then you run the risk of forgetting this change and later building functionality on top of a DAL that is not in sync with the database. So every time the database changes-whether you need the change or not-you should regenerate and recompile (and re-test) the DAL. That's a lot of extra work.

Alternatives to Code Generators

If you're willing to live with these inconveniences then a generated DAL is just what you need. If you want a DAL that is not the product of a code generator and you don't want to write and maintain all the data access code manually you can benefit from a product by Revelation Technologies called Dali.

Dali is an intelligent data access layer. Dali analyzes your database structure and class model at runtime and determines how your classes' properties map to your database tables' columns. Dali does not require a traditional data access layer with classes mapping directly to tables. Dali is intelligent in that it makes assumptions and searches for hints in your business layer code to determine how your business objects relate to database tables. There is no code generation required and only minor code changes, if any, needed to whatever existing business classes you already have. Simple methods like Load() and Save() do all the work. If that sounds too good to be true, you really need to try it out for yourself. Complete documentation as well as a trial version of Dali are available from Revelation Technologies' website at www.revtechnologies.com. Dali is currently only available for the Microsoft® .NET Framework.

Conclusion

Whether or not you need a data access layer depends mostly on how you use your data. If you simply gather and display data, a DAL may be overkill. If objects are your thing a DAL can save you a lot of time. If you work in a changing environment, a DAL that does not require code generation can save you a lot of unnecessary work.