Sunday, July 27, 2014

Object Relationship Mapping (ORM)

ORM stands for Object-relational Mapping. ORM is an attempt to map the notion of object and the relational world so that they can talk to each other in an easy way. Any non-trivial application has a database behind it and Java applications are no exception. In fact, if we look closely into any application, one will realize that the application gets more or less modeled around the data model. In database technology, relational database is the clear winners. Other database technologies have come and gone. The relational concept of data management was first introduced by E.F. Codd in 1970.

An analogy for a relational model can be drawn with spreadsheets. Each sheet represents a table and the columns in the sheet represent the table attributes. Each instance of data is represented by the rows. The data in different sheets are connected with each other by referring to the data point using the sheet number, column number, and row number. This is what is called a foreign key relationship in database technology. In fact, most of the GUI interfaces to database show the data in a spreadsheet format.

To interact with the database, Standard Query Language(SQL) has emerged as the standard way. The SQL standards are controlled by ANSI. However, there are still proprietary variations to it. SQL provides two types of mechanism:

  • Data Definition Language (DDL): Provides ways to create and alter tables.
  • Data Manipulation Language (DML): Provides ways to manipulate and retrieve data. It includes inserting, updating, and deleting data.

To interact with the database, the applications have to issue SQL to the database. How to issue SQL is proprietary to each database. They have their own APIs exposed for this and the APIs might be written in different languages. For example, a database is written in C language. might expose C-based APIs. Considering that the data independence is considered a virtue for any application, it would be a lot of work for an application developer to understand the interfaces for each of the database and implement it. To solve this kind of problem, Java has come up with ((JDBC)) APIs.

JDBC is the most popular way of connecting to databases in Java. It's an interface-based API where the implementation for each database is provided by the drivers for a particular database. Though JDBC is very popular, it is inherently relational in nature. The basic problem is the mismatch in the conceptual level between relational technology and Object-Oriented Technology. Java being a pure Object Oriented Language, this mismatch is important to deal with. This mismatch is also known as Object-relational mismatch. ORM tries to solve this mismatch.

Let's see the kind of mismatch that is there:

Inheritance

Java supports inheritance. For example, we might have a User class from which Student and Teacher class is derived.

User

public class User{
   private String Name;

   //Setters and getters
}

Student

public class Student extends User{

    private double percentage;
   
    //Setter and Getter
}

Teacher

public class Teacher extends User{
    private int exprienceYears;

    //Setters and Getters
}

Now think for a moment how you are going to map these classes to the table structure. ORM frameworks adopt different strategies to solve this, which can be seen at ((Hibernate)) section.
Also with this comes the mismatch in terms of polymorphism. A reference of User type can refer to an object of Student or Teacher. Also, a list might contain a mix of Teacher, Student, and User objects. How you build those lists by querying the database. The ORM frameworks have to somehow differentiate that the data is belonging to User or Student or Teacher.

Granularity

The granularity problem comes when the number of classes mapping to the number of tables in the database does not match. For example, let's say we have the User class which has an Address object

public class User{
   private String name;
   private Address address;

   //Setters and getters
}

Address

public class Address{
    private String city;
    private String country;

    //Setters and getters

Also, the table structure for User is

Table USER:

 NAME
 CITY
 COUNTRY

There is one table but the data is sitting in two objects. The same problem can come the other way round also where you have two tables and one class containing all the data points. ORM frameworks have to care of this mismatch in terms of the different number of tables mapped to the different number of classes.

Identity and Equality

The notion of identity is driven by the primary key in the relational model. Given a primary key you will always retrieve the same data. Also, it does not matter, how many clients are retrieving the same data. With the right isolation level, all will see the same data. In Java, the situation becomes more complex. The same data row might be represented by more objects in the Java layer. For example, User data in a database with primary key 1 might have more than one object in a different thread. The question comes which is the object having the right data. Also if all thread tries to save the same object, then who wins? A similar problem arises related to equality.

In Java, the default is reference equality. If the references are pointing to the same object than they are equal. The corollary is that if there are two objects representing the same data row, they will come out as different. To solve this we have to give implementation to the equals methods, but it's not always trivial. ORM solutions have to provide provisions to maintain the correct notion of equality and identity. The frameworks usually ask to explicitly map the primary key to an attribute.

Association

In Java, the relationship is done by the association. For example, the User has a reference of address. However, in Tables, the relationship is done with the foreign key association. Java has also the notion of directionality. For example, you can access the address form User but not the other way round. To build the relationship from both sides you have to put the reference on both sides.

User

public class User{
    private Address address;
    //Setters and Getters
}

Address

public class Address{
    private User user;

    //setters and getters
}

However, there is no notion of directionality in the relational world. The foreign key relationship can be done at any one end and the relationship is built. With SQL you can navigate from any end. For example, the foreign key relationship build at the User side will be

Table ADDRESS

 ADDRESS_ID
 CITY
 COUNTRY

Table USER

USER_ID
NAME
ADDRESS_ID (Fk)

ORM solutions have to deal with these association while setting and getting the data from the database.

Type Systems

The object-Oriented and relational world has different type systems. For example in the relational world string can be constrained on the size however on the Java side you can point the reference of String to any size based on memory allocated. Also, date and times are handled differently in Java and relational world. For example, in some databases, there is a distinction between date, time, and timestamp. The timestamp contains both date and time. In Java, the date is a long value which contains the date and time both. So when you fetch the date or time from the relational world how you map it to a Java data type of Date or Calendar.

Databases are Different

There are so many popular databases in the market. In spite of SQL being a standard, there are still variations in the SQL support in each database and there are many vendor extensions and individual flavors in each database. Though this is not really an ORM issue it is important from the perspective of database portability, if you are looking for it. Ensuring database portability using JDBC is usually a hurricane task if you wish to use the individual flavors of the database for performance or other reasons. ORM frameworks attempt to handle most of the important databases transparently. Also, they do have extension mechanisms if you wish to support another database.
A video explaining the above concepts:


ORM frameworks adopt different strategies to solve these kinds of mismatches. ORM frameworks strive to preserve the notion of object world concepts and shield the developers from the relational world by taking care of mappings. This should be taken as an excuse not to learn relational concepts. In fact on the other way to be a good user of ORM frameworks, one should understand how the mapping works and the implications of it. The number issue in using ORM frameworks is performance and most of the time its because of not understanding how the ORM frameworks map to the relational world and not having a good grasp of relational world concepts.

3 comments:

  1. Thanks for nice explanation of basic ORM torments.

    Unfortunately, reference (identity) equality cannot be replaced with object content equality (based on equals), because
    1.two different objects may temporarily have same content, but change later, so that equality might be intermittent;
    2.two objects may contain same data because application restricts it's view on the problem and (unintentionally) rejects some object properties that differ.

    ReplyDelete
    Replies
    1. That's true...for the same reason we have to careful about implementing the equals method. Hibernate does one thing smartly is to make sure that in the session only one copy of the managed object is kept. But again there could be multiple scenarios where we might have multiple copies representing the same rows. That's where we get into versioning/locking etc.

      Delete
  2. I don't understand the part of association (y)

    ReplyDelete