Writing Better Code -- Keepin' it Cohesive

Matthew Cochran
1y
40.8k
0
0

Article

Part I. Cohesion

When we talk about a method being highly cohesive we mean that it performs one (and only one) piece of logic. Another way to look at it is each unit of code (class, method, etc) should have a single purpose that is easy to understand from a glance. This is a long-standing principle of software engineering (the Single Responsibility Principle). Methods with low cohesion perform multiple procedures and it can be said that they have have multiple purposes.

When we talk about a class being highly cohesive, we mean the class encapsulates one (and only one) piece of business logic as well. For instance, a highly cohesive middle tier would have classes that clearly seperate abstract data transport classes, their instantiation, database functionality and business logic. If we have one monolithic class that does everything our code most likely be considered to have very low cohesion.

We can also talk about the cohesiveness of packages (or assemblies in the CLR world). Each assembly should encapuslate one (and only one) business domain. If we have an assembly that does exposes logging, sending order confirmation emails, performs accounting calculations and washes our dirty dishes it is not very cohesive and is probably going to be a pain to maintain.

Let's look at an example of a huge method with very low cohesion that performs a series of logical sub-steps in an overall process of retrieving data from the database and building data transport objects. My personal rule of thumb is that if a method does not fit on the screen and I have to scroll to see the whole thing, it is probably too long and needs to be refactored.

Here are the steps:

Instantiating a database connection
Instantiating a database command
Opening the connection
Executing the command
Reading the results
Getting the ordinals from the data reader
Getting the record values and Instantiating new instance of object

As you glance through the code below, notice how difficult it is to completely absorb how this method works at first glance. If you give up, just scroll to the end of this code block where we'll start the refactoring.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    // instantiate the connection
    using(SqlConnection connection = newSqlConnection(ConfigurationManager.ConnectionStrings[CONNECTION_STRING_KEY].ConnectionString))
    using(SqlCommand command = new SqlCommand("spGetEmployee", connection)) // instantiate the command
    {
        command.CommandType = CommandType.StoredProcedure;
        // try to open the connection
        try
        {
            connection.Open();
        }
        catch (InvalidOperationException ex)
        {
            // log exception
            throw;
        }
        catch (SqlException ex)
        {
            // log exception
            throw;
        }
        catch (Exception ex)
        {
            // log exception
            throw;
        }

        // execute the reader & read the results
        using(SqlDataReader reader = command.ExecuteReader())
        {
            Int32 m_NameOrdinal, m_DeptOrdinal;

            // get the ordinals
            try
            {
                m_NameOrdinal = reader.GetOrdinal(NAME);
                m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
            }
            catch(System.IndexOutOfRangeException ex)
            {
                // log exception
                throw;
            }

            Collection<Employee> result = new Collection<Employee>();

            // read the results
            while(reader.Read())
            {
                try
               {
                    Employee emp =new Employee();
                    emp.Name = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_NameOrdinal);
                    emp.Department = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_DeptOrdinal);
                    result.Add(emp);
                }
                catch (IndexOutOfRangeException ex)
                {
                    // Log the exception
                    throw;// rethrow or handle gracefully here
                }
                catch (Exception ex)
                {
                    // Log the general exception
                    throw;// rethrow or handle gracefully here
                }
            }

            return result;
        }
    }
}

Whew! That was painful. All we really want to do is understand what the code does quickly and be able to dig in further if we want. If we wanted to test any one step of our process it would be difficult because everything is thrown in the same pot.

Example of High Cohesion

Below, we have a method with the same functionality, but constructed in a highly cohesive manner with no implementation logic in it at all... the one (and only one) purpose of this method is orchestrating a bunch of reusable method calls. As a result we will have less repeated code in the project because there is a lot of logic in each sub-step that is reusable. It has the same functionality as our non-cohesive method... Do you think it is easier to quickly understand?

public IEnumerable<Employee> GetEmployeesFromDb_HighCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    using (SqlDataReader reader = ExecuteReader(command, connection))
    {
        return new MapperFactory()
            .BuildEmployeeMapper()
            .ReadAll(reader);
    }
}

Part II. The Journey: Refactoring Code From Low to High Cohesion.

The first step in refactoring code with low cohesion is to identify what and where the logic steps are. The first thing we'll start looking at is object instantiation. We can refactor all code responsible for instantiating objects into builder methods which often makes a tremendous difference.

The first instantiation we come to is the SqlConnection object. Let's build a factory for our SqlConnection objects. This will have immediate reusable payoffs because we'll be building a new connection every time we need to hit the database.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    // instantiate the connection
    using(SqlConnection connection = newSqlConnection(ConfigurationManager.ConnectionStrings[CONNECTION_STRING_KEY].ConnectionString))
    using(SqlCommand command = new SqlCommand("spGetEmployee", connection)) // instantiate the command
    {
        ... code continues here...

We select the instantiation code and use Visual Studio's (2008) Refactoring tool to extract the method. After our refactoring, we have a very cohesive method that can be reused every time we need a database connection and so we'll have less repeated code in our project. As you can see, with a little formatting, this method is very cohesive and easy to understand at a glance, so if our fellow developers are drilling down from our main method to this one (or we are revieweing code we wrote ourselves a while ago), there should be no confusion as to what the method is actually doing.

As an aside: If you don't have Studio08, check out http://www.jetbrains.com/resharper/ for a really nice plug-in with some great refactorings for Studio05. Their current version for Studio08 does not support LINQ syntax and so is a little messy if you are using the new language features in 3.5, but there is a new release coming in March of this year that I'm looking forward to that is supposed to support LINQ syntax in Studio08.

To get back on subject...This is now what our builder method looks like (with a few line breaks added for extra measure)...

private SqlConnection BuildConnection()
{
    return newSqlConnection(
        ConfigurationManager
            .ConnectionStrings[CONNECTION_STRING_KEY]
            .ConnectionString);
}

And our main method is now a bit simpler.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using(SqlConnection connection = BuildConnection())
    using(SqlCommand command = new SqlCommand("spGetEmployee", connection)) // instantiate the command
    {
        //... code continues here...

The next place we are instantiating an object and can refactor to a builder method is when we build the SqlCommand.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using(SqlCommand command = new SqlCommand("spGetEmployee", connection))
    {
        command.CommandType = CommandType.StoredProcedure;
        //-- code continues here --

Following the same process as last time, we'll select the construction section of the code and extract it to a method and we end up with the following:

private static SqlCommand BuildCommand(SqlConnection connection)
{
    return newSqlCommand("spGetEmployee", connection);
}

Which is better, but still can be improved. What we want is a reusable method with one step in our process fully encapsulated. If you think about it, the stored procedure name really should be passed in so we have a more generic "BuildCommand()" method, so we'll make the sproc name a parameter

private static SqlCommand BuildCommand(String cmdText, SqlConnection connection)
{
    return newSqlCommand(cmdText, connection);
}

And we'll modify our source method so it now looks like...

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee", connection))
    {
        command.CommandType = CommandType.StoredProcedure;
        //--code continues here--

Better... but still there is room for improvement. The CommandType can really be thought of as a part of building a command and should probably be moved into the builder method and exposed as a parameter.

private static SqlCommand BuildCommand(String cmdText, CommandType cmdType,SqlConnection connection)
{
    SqlCommand cmd =new SqlCommand(cmdText, connection);
    cmd.CommandType = cmdType;
    return cmd;
}

And our calling code is getting much easier to read because BuildCommand() does everything required to build the SqlCommand object.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    {
        // try to open the connection
        try
        {
//--code continues here---

While we're at it.... If we know that we may need parameters at some point in our project, we can preemptively make one further alteration to our method so we can pass them in when they are needed.

private static SqlCommand BuildCommand(String pCommandText, CommandType pCommandType,SqlConnection pSqlConnection,params SqlParameter[] pSqlParameters)
{
    SqlCommand cmd =new SqlCommand(pCommandText, pSqlConnection);
    cmd.CommandType = pCommandType;

    if (null != pSqlParameters && pSqlParameters.Length > 0)
        cmd.Parameters.AddRange(pSqlParameters);

    return cmd;
}

So now we have a very reusable and easy to understand builder method for any SqlCommand object with parameters.

The next object we are instantiating is the SqlReader. Right now, instantiating our reader consists of opening the connection to the db and executing the command. So what we're going to do is extract a method that builds the reader and move the step to safely open the connection to our new method.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    {
        // try to open the connection
        try
        {
             connection.Open();
     }
        catch (InvalidOperationException ex)
        {
             // log exception
             throw;
        }
        catch (SqlException ex)
        {
             // log exception
             throw;
        }
        catch (Exception ex)
        {
             // log exception
             throw;
        }

        // execute the reader & read the results
        using(SqlDataReader reader = command.ExecuteReader())
        {
            Int32 m_NameOrdinal, m_DeptOrdinal;

            // get the ordinals
            try
            {
                m_NameOrdinal = reader.GetOrdinal(NAME);

//-- code continues --

We select the code that instantiates our reader as seen below.

When we extract the method we get the following code.

private static SqlDataReader ExecuteReader(SqlCommand command)
{
    return command.ExecuteReader();
}

But part of executing the reader is actually opening the connection as well so we'll add the connection to our new method's calling parameters and move the code that opens the connection into the new method.

private SqlDataReader ExecuteReader(SqlCommand command, IDbConnection connection)
{
    try
    {
        connection.Open();
        return command.ExecuteReader();
    }
    catch (InvalidOperationException ex)
    {
     // log exception
        throw;
    }
    catch (SqlException ex)
    {
        // log exception
        throw;
    }
    catch (Exception ex)
    {
        // log exception
        throw;
    }
}

One of the nice side effects of this refactoring is that in our calling code we can now remove our nested "usings" and now stack them instead which makes the code much more readable

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    using (SqlDataReader reader = ExecuteReader(command))
    {
        Int32 m_NameOrdinal, m_DeptOrdinal;

        // get the ordinals
        try
        {
            m_NameOrdinal = reader.GetOrdinal(NAME);
            m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
        }
        catch (System.IndexOutOfRangeException ex)
        {
            // log exception
            throw;
        }

        Collection<Employee> result = new Collection<Employee>();

        // read the results
        while (reader.Read())
        {
            try
            {
                Employee emp =new Employee();
                emp.Name = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_NameOrdinal);
                emp.Department = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_DeptOrdinal);
                result.Add(emp);
            }
            catch (IndexOutOfRangeException ex)
            {
                // Log the exception
                throw;// rethrow or handle gracefully here
            }
            catch (Exception ex)
            {
                // Log the general exception
                throw;// rethrow or handle gracefully here
            }
        }
        return result;
    }
}

Now we are done moving all our instantiation code into builder methods, but we still have a bit of a mess left that would be hard to understand at a glance. One of the inconsistencies is that our method contains both orchestration code and performs actual work as well so it is actually performing more than one logical "step" which means we still have low cohesion. The remaining code in the using block performs the last three steps of our process from the list at the beginning of this article:

Reading results
Getting result ordinals
Getting record values and Instantiating new instance of object

What seems to jump out is that this process (reading results, getting the ordinals, and instantiating an object based on the results) will most likely be used over and over and so I'm sure we are all thinking the same thing... "Hey, this should be encapsulated for reuse!".

So lets first get the code out of our primary method so it can be considered cohesive by extracting our remaining code into a separate method as we did when we were creating build methods.

We'll call our new method "GetEmployees".

So now our main method looks great. It's very readable at a glance and any we won't make us or our fellow developers go cross-eyed as we are looking though our code.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    using (SqlDataReader reader = ExecuteReader(command))
    {
        return GetEmployees(reader);
    }
}

But now, we've basically just swept a smaller mess under the rug and have to go back and clean it up.

private static Collection<Employee> GetEmployees(SqlDataReader reader)
{
    Int32 m_NameOrdinal, m_DeptOrdinal;

    // get the ordinals
    try
    {
        m_NameOrdinal = reader.GetOrdinal(NAME);
        m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
    }
    catch (System.IndexOutOfRangeException ex)
    {
        // log exception
        throw;
    }

    Collection<Employee> result = new Collection<Employee>();

    // read the results
    while (reader.Read())
    {
        try
        {
            Employee emp =new Employee();
            emp.Name = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_NameOrdinal);
            emp.Department = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_DeptOrdinal);
            result.Add(emp);
        }
        catch (IndexOutOfRangeException ex)
        {
            // Log the exception
            throw;// rethrow or handle gracefully here
        }
        catch (Exception ex)
        {
            // Log the general exception
            throw;// rethrow or handle gracefully here
        }
    }
    return result;
}

This refactoring will be a bit more difficult because we can't just extract chunks of logic into their own methods at this point. Let look at factoring our code into a EmployeeFactory object that will be responsible for instantiating Employees, again pulling instantiation code into a separate location, but this time we'll be using a factory object instead of a builder method. First, we'll make the factory object and move all of our code over.

During our move, we have the opportunity to alter our code to couple to a more abstract version of theSqlDataReader (IDataReader). Whenever we have opportunities like this we should jump on them because a major factor determining if we end up with flexible code is the degree to which our classes are loosely coupled. We have more loosely coupled code when we code to abstractions which is almost always a good thing unless we need some specific functionality provided by a concrete class we are consuming.

public class EmployeeFactory
{
    private static Collection<Employee> GetEmployees(IDataReader reader)
    {
        Int32 m_NameOrdinal, m_DeptOrdinal;

        // get the ordinals
        try
        {
            m_NameOrdinal = reader.GetOrdinal(NAME);
            m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
        }

        catch (System.IndexOutOfRangeException ex)
        {
            // log exception
            throw;
        }

        Collection<Employee> result = new Collection<Employee>();

        // read the results
        while (reader.Read())
        {
            try
            {
                Employee emp = new Employee();
                emp.Name = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_NameOrdinal);
                emp.Department = reader.IsDBNull(m_NameOrdinal) ?String.Empty : reader.GetString(m_DeptOrdinal);
                result.Add(emp);
            }
            catch (IndexOutOfRangeException ex)
            {
                // Log the exception
                throw;// rethrow or handle gracefully here
            }
            catch (Exception ex)
            {
                // Log the general exception
                throw;// rethrow or handle gracefully here
            }
        }

        return result;
    }
}

So now we have to update our calling code to instantiate our factory and call our builder method.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    using (SqlDataReader reader = ExecuteReader(command))
    {
        return new EmployeeFactory().GetEmployees(reader);
    }
}

At this point I'd like to point out that at the end of each refactoring we should make sure our project builds and all of our unit tests pass or we can end up with a really big mess.

So now we'll look at refactoring our employee factory into a more generic, reusable class that will be responsible for mapping any type of data transport object. To get started, let's look at what we need done from a higher, non-implementation specific perspective. Structuring our code into non-implementation specific logical groups and then handling the implementation of each logical group makes it much more readable and readability is key for maintainability.

class Factory<T>
{
    public IEnumerable<T> Map(IDataReader reader)
    {
        return ReadAll(reader);
    }
}

So we basically need to get read the reader and map the results. Can we do this in a generic way that would apply to all methods? Sure... what changes... getting the ordinals and instating the object will be different for each type the factory produces so we'll write stubs for these two things and we have our basic frame for a reusable build method (below).

public IEnumerable<T> ReadAll(IDataReader reader)
{
    if (null == reader)
        return new T[0];

    GetOrdinals(reader);

    Collection<T> result =new Collection<T>();

    while (reader.Read())
    {
        T item = Build(reader);

        if (null != item)
            result.Add(item);
    }
    return result;
}

So now we'll make our factory abstract with two abstract methods that will be type-specific implementations for each factory we build.

abstract class Factory<T>
{
    public IEnumerable<T> Map(IDataReader reader)
    {
        return ReadAll(reader);
    }

    protected abstract T Build(IDataRecord record);
    protected abstract void GetOrdinals(IDataReader reader);

    public IEnumerable<T> ReadAll(IDataReader reader)
    {
         if (null == reader)
            returnnew T[0];

        GetOrdinals(reader);

        Collection<T> result =new Collection<T>();

        while (reader.Read())
        {
            T item = Build(reader);

            if (null != item)
             result.Add(item);
        }

        return result;
    }
}

We'll now modify ourEmployeeFactory to be a super class of Factory<T> and handle the specifics of mapping an employee. Notice that with our refactoring each object has a single purpose and so we end up with highly cohesive code.

class EmployeeFactory:Factory<Employee>
{
    private static readonlyString
        NAME = "Name",
        DEPARTMENT = "Dept";

    private Int32
        m_NameOrdinal,
        m_DeptOrdinal;

    protected override Employee Build(IDataRecord record)
    {
        try
        {
            Employee emp =new Employee();

            emp.Name = record.IsDBNull(m_NameOrdinal) ?String.Empty : record.GetString(m_NameOrdinal);
            emp.Department = record.IsDBNull(m_NameOrdinal) ?String.Empty : record.GetString(m_DeptOrdinal);

            return emp;
        }
        catch (IndexOutOfRangeException ex)
        {
            // Log the exception
            throw;// rethrow or handle gracefully here
        }
        catch (Exception ex)
        {
            // Log the general exception
            throw;// rethrow or handle gracefully here
        }
    }

    protected override void GetOrdinals(IDataReader reader)
    {
        try
        {
         m_NameOrdinal = reader.GetOrdinal(NAME);
            m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
        }
        catch (System.IndexOutOfRangeException ex)
        {
            // log exception
            throw;
        }
    }
}

We have a few things to wrap up... although it's not necessary with our current solution having only one data transport object, we know that there will eventually be a bunch of different object types in our solution that we'll need to map. We probably want to build a factory for any mappers that we'll be needing and once again we will be thinking about moving instantiation responsibilities to builder methods. The cool thing about this is that we can encapsulate all the specifics of mapping our objects in the factory and make it private so the total number of classes that are exposed in our project are smaller and so there are fewer exposed classes to dig through and project will be a bit easier to maintain. When we have new mappings we need to use, we'll just build new mapper builder methods and be able to reuse our abstract Mapper<> class for each one. Granted, this could be abstracted further (into an abstract factory), but this step is good enough for now. We should keep in mind that we may have to do more refactoring later.

internal class MapperFactory
{
    #region Methods

    internal Mapper<Employee> BuildEmployeeMapper()
    {
        return new EmployeeMapper();
    }

    #endregion

    #region Private Members

    private class EmployeeMapper :Mapper<Employee>
    {
        #region Member Variables

        privateInt32
            m_NameOrdinal,
            m_DeptOrdinal;

        private const String
            NAME = "Name",
            DEPARTMENT = "Dept";

        #endregion

        protected override void GetOrdinals(IDataReader reader)
        {
            m_NameOrdinal = reader.GetOrdinal(NAME);
            m_DeptOrdinal = reader.GetOrdinal(DEPARTMENT);
        }

        protected override Employee Build(IDataRecord record)
        {
            try
            {
                Employee item =new Employee();
                item.Name = record.IsDBNull(m_NameOrdinal) ?String.Empty : record.GetString(m_NameOrdinal);
                item.Department = record.IsDBNull(m_NameOrdinal) ?String.Empty : record.GetString(m_DeptOrdinal);
                return item;
            }
            catch (IndexOutOfRangeException ex)
            {
                // Log the exception
                throw;// rethrow or handle gracefully here
            }
            catch (Exception ex)
            {
                // Log the general exception
                throw;// rethrow or handle gracefully here
            }
        }
    }
    #endregion
}

So now we are at the end of our journey and have a much more cohesive solution and our methods are very readable so another developer can have a basic understanding of our architecture from a quick glance and we can understand what our own code is doing after it has left our brains and we have to come back to it later.

public IEnumerable<Employee> GetEmployeesFromDb_LowCohesion()
{
    using (SqlConnection connection = BuildConnection())
    using (SqlCommand command = BuildCommand("spGetEmployee",CommandType.StoredProcedure, connection))
    using (SqlDataReader reader = ExecuteReader(command))
    {
        return new MapperFactory()
            .BuildEmployeeMapper()
            .ReadAll(reader);
    }
}

I hope you enjoyed walking through these refactorings with me. There are many more that could be applied because code is never perfect and there are always tradeoffs to consider with almost every single line of code. Keeping things cohesive has always had a big payoff in terms of readability, reuse, loose coupling and almost always results in code that is easier to maintain (as long as the methods are well named) so cohesion (or the Singe Responsibility Principle) is definitely worth keeping in the back of our minds as we write new code or are refactoring older code.

Until next time,

Happy coding