Cohesion is one of the most important concepts in software design. Cohesion is at the heart of the vast majority of good design principles and patterns, guiding separation of concerns and maintainability. The term cohesion (alongside coupling) was first introduced by Larry Constantine in the late 1960s as part of Structured Design and later published in more detail by W. Stevens, G. Myers and L. Constantine in 1974. Due to the increasing complexity and cost of software in the 60s, 70s and 80s, many studies and researches on the design and maintainability of software were carried out. While we can still find some of this material and research online today, it was done in a pre-Internet era and most of the work done during this period is either lost or not readily available.
But before we dive into the details, let’s see some definitions.
Cohesion (name): when the members of a group or society are united. Cohesive (adjective): united and working together effectively.
the Cambridge dictionary
In computer programming, cohesion is a measure of how closely the various responsibilities of a software module are interrelated and focused.
Cohesion is a sliding scale metric
A common mistake is to treat cohesion as a binary attribute instead of a sliding scale. In the original work of Stevens, Myers, and Constantine in the early 1970s, they defined seven levels of cohesion, which later became known as SMC Cohesion.
As the original articles were written a long time ago and were very academic, let’s admit that when we say module we are actually talking about a class or a group of functions and when we say processing elements we’re actually talking about methods or functions.
- Coincidence (worse): The processing items are grouped arbitrarily and have no meaningful relationship. There is no relationship between the processing elements. Ex: update a customer file, calculate a loan repayment, print a report. Fortuitous cohesion is quite common in modules called Utils or Helpers.
- Logic: At the module level, the processing elements are grouped together because they belong to the same logical class of associated functions. Each time the module is invoked, one of the processing elements is invoked. Ex: group together all I / O operations, all database operations, etc. At the level of the processing element, the calling module passes a check indicator and this indicator decides which behavior will be invoked by the processing element. Ex: A flag indicating whether a discount should be calculated, a behavior should be ignored, etc.
- Temporal: The processing elements are linked in time. They are grouped together because they are called together at a particular time in the execution of a program, but in fact, they are not related to each other. A different business requirement may require a different sequence or combination of processing elements. Ex: data persistence / validation, audit trail, email notifications, etc.
- Procedural: Processing items are sequentially part of the same business unit but do not share data. grouped together because they always follow a certain sequence of execution. Ex: validate a user, process a payment, trigger an inventory inventory system to send purchase orders to suppliers, write logs.
- Communicational: Processing elements contribute to activities that use the same inputs or outputs. Ex: processing of items that would take a basket and calculate discounts, promotions, money saved, delivery costs, and return the total price.
- Sequential: Processing items are grouped together when the output of one processing item can be used as input for another processing item. Ex: formatting and validation of data.
- Functional (best): All the processing elements of a module are essential to the performance of a single, well-defined task. Ex: analyze an XML, calculate the cost of an insurance policy based on the data provided.
If we adapt some of the ideas published by Meilir Page-Jones on The Practical Guide To Structured System Design (1980), we might have a guideline for identifying levels of cohesion.
In pursuit of metrics
The SMC Cohesion Model was a big step forward in the 1970s and many software professionals and academics tried to create software metrics capable of measuring degrees of cohesion so that they could design their systems to be easier. to maintain. The problem with the SMC Cohesion Model (Levels) is that it can be quite subjective and requires personal judgment. I can think of a few code examples that could match more than one level of cohesion in the SMC cohesion ladder. If we start to dig into the details of the examples I gave above, you will see how easy it is to doubt the level to which the example belongs. Due to its subjective nature, SMC Cohesion could not be used effectively to derive reliable measurements.
Numerous articles and a few books were published from the late 1970s to the late 1990s exploring and expanding the notion of cohesion and coupling defined by SMC Cohesion. One model that gained some acceptance was the Design-Level Cohesion Measure (DLC). The DLC is very similar to the SMC but with only 6 levels and a small variation in definition and names. The main advantage of the DLC is that it is more suitable for deriving tools from metrics.
Before we dive into the DLC levels, let’s define a vocabulary:
- condition check: a v2 variable has a condition check dependency on a v1 variable when v1 is used in the predicate of a decision (if / then / else) that affects the value of v2.
- iteration-control: Same as above but in a loop (while / for / etc.)
Here are the DLC levels:
- Coincidence relation (R1): Two outputs o1 and o2 of a module have no dependency relationship with each other, nor dependence on a common input.
- Conditional relationship (R2): Two outputs are dependent on condition control on a common input, or one of the two outputs has a condition control dependency on the input and the other has an iteration control dependency on the input.
- Iterative relation (R3): Two outputs are iteration controlled depending on a common input.
- Communicative relationship (R4): Two outputs depend on a common input. One of the two inputs has a data dependency on the input and the other can have a control or a data dependency.
- Sequential relation (R5): One output depends on the other output.
- Functional (R6): There is only one output in a module.
These six relationships are on the ordinal scale where R1 is the weakest for cohesion and R6 the strongest. In the DLC metric definition, the level of cohesion is determined by the relationship between the outputs of a module and the processing elements.
And this is how SMC and DLC relate to each other:
Depending on the type of software you are writing, you may need to compromise a bit. While we should always strive to have our code at the highest level of cohesion, sometimes it can make the code appear unnatural. There is a difference between ignoring design principles and not consciously following a design principle in a given context. I don’t write my code so that it meets all of the design principles, but I always try to have a good reason whenever I decide not to follow certain principles. That said, cohesion is one of the most important building blocks of software design and a good understanding of it is essential for writing well-designed code.
If you are building a framework, a very generic part of your code, or a data transformation, chances are that the majority of your modules and processing elements are at sequential and functional levels. However, when writing business rules in a business application, I mean an application where there is business logic, user journeys, database accesses, etc., there is a good chance that some of your modules and processing elements are at a communicational level and some even at a lower level of cohesion. And that’s okay as long as it was a conscious decision and the right thing to do in this context.
Some compare cohesion to Single Responsibility Principle (PRS). Although SRP is a great software premise and entirely based on cohesion, it has a fairly narrow and subjective scope.
Identifying responsibilities is not always easy. We need to develop a keen eye to detect minor variations in behavior. Unit testing of a module can potentially help us identify different behaviors, if the code was really test-driven, of course.
The more consistent your code, the more reusable, robust and easy to maintain it will be.
Rules-based approach to calculate module cohesion
Software complexity: towards a unified theory of coupling and cohesion
A quantitative framework for software restructuring
Using Design Cohesion to visualize, quantify and restructure software
The practical guide to designing structured systems
Analysis and design of systems with UML
Cohesion – Wikipedia
Principle of single responsibility
This article first appeared on the Codurance Blog.