Data Structures and Algorithm (DSA) – Performance, Complexity And Big-O Notation

Hussain Patel
1w
44.1k
0
5

Article

Introduction

In this article, we will be talking about what performance of an algorithm is and how Complexity is used to measure performance, and also learn what Big-O notation is.

Background

When we write a program or an algorithm that will be used only once on a small amount of data and then discarded, we focus more on writing the code (program) in the easiest possible way we know, we debug it, test it and then use it in our application and then forget about it. However, when we need to write a program or an algorithm that will be used over a longer period of time and will be maintained by many people, we need to focus on factors like simplicity, clarity, and efficiency /performance of the program or algorithm. In this article, we will be focusing on the efficiency/performance of an algorithm. How is it measured? What is the complexity of the algorithm? We will also discuss Big-O Notation, which is a unit for expressing complexity.

So let’s start with Performance and see how performance is measured.

Performance of a program or an algorithm is measured using various factors, such as Time, Space, and Network.

Time: How much time does the program take to execute?
Space: How much memory or space the program is using while it's executed (also the physical space)
Network: if there are any network-related activities the program is doing, which may affect the performance.

Of all the factors mentioned above, Time are Space are usually considered important when measuring the performance of an algorithm. Example – How fast is your algorithm compared to another algorithm? How much space does your algorithm take in terms of memory?. When we talk about the time taken by the algorithm, we are actually talking about Complexity (Time Complexity). “The complexity of an algorithm is high when the algorithm takes more time to execute,” and more time means your algorithm has low performance.

Let’s discuss more about Complexity and understand what Time complexity is.

Definition

Complexity is used to measure performance. Or, Complexity is a measure of how the Resources (in this case, Time) change as the size of the problem gets larger.

Let’s understand this with an example: For small input of data, an algorithm might run fast and might not show any time difference, but as the input size increases, the program might take time to execute and become slow, and performance is low; this is where the complexity is measured. The Complexity of an algorithm indicates to us how performant the algorithm is.

So we can say that - Size of input determines complexity, which in turn determines how the algorithm behaves when the input size is very large. In computer terms, this is also called the worst-case performance of the algorithm.

Now that we know what complexity is, we can see how complexity is measured. This is done using Big–O notation. Let’s take an example.

Example

Suppose you have two algorithms and you want to compare them based on the performance, i.e., Time complexity. How do you do that?

One way is - You can practically run each of them for the same inputs and wait till they execute, and note the time difference. In some cases, this might not be feasible to practically run each algorithm, taking into consideration the time and resources required to execute them. In such a case, the other way is to prove this theoretically using mathematical terms – that is, using the Big–O notation.
Big–O notation is used to study the performance/complexity of an algorithm in theoretical terms. Big–O notation looks at the upper bound of an algorithm's performance; i.e., its worst-case behavior. Big–O notation also looks at algorithms' asymptotic behavior – what it means is the performance of the algorithm as the size of the input increases to very large. The size of the input is represented by n.

Definition

The Big–O notation is a unit to express complexity in terms of the size of the input that goes into an algorithm.

The following are the Big–O notation rules to figure out an algorithm's performance or asymptotic behavior,

Constant Time Complexity O(1)

If the time taken by the algorithm does not change and remains constant as the input size is increased, then the algorithm is said to have a complexity of O(1) – read as Order of 1, performance. The algorithm doesn’t depend on the size of the input. Example – Node added at the end of the linked list (assuming we have a tail pointer).

static void ConstantTimeComplexity(int n)
{
    int a = 100;
    int b = 200;

    int sum = a + n;
    int mul = b * n;

    Console.WriteLine("{0}, {1}", sum, mul);
}

O(log n)

An algorithm for input size n (where n is very large) breaks the problem into small chunks for each iteration. The algorithm is said to have a complexity of O (log n) - read as Order log n performance. Example: Binary Search Tree.

Linear Time Complexity O(n)

An algorithm for input size n (where n is very large) performs n steps, and the time taken by the algorithm changes linearly as the input size increases; then the algorithm is said to have complexity of O(n) – read as Order of n performance. The run time complexity is proportional to n.

Example

For loop to the size of the length of the array, as the array size increases, the time taken will proportionately increase.

static void LinearTimeComplexity(int[] arr1)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        Console.WriteLine("{0}", arr1[i]);
    }
}

O (n log n)

An algorithm for input size n (where n is very large), breaks the problem into small chunks for each invocation, and then takes each of the smallest chunks and stiches them back together, then the algorithm is said to have complexity of O(n log n) - read as Order n log n performance. Example: Quick Sort.

Quadratic: O(n2)

An algorithm for input size n (where n is very large) performs almost double (n2) the steps, and the time taken by the algorithm changes quadratically as the input size increases. The algorithm is said to have a complexity of O(n2) – read as Order n square performance. Example bubble Sort.

static void QuadraticTimeComplexity(int[] arr1)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        for (int j = 0; j < arr1.Length; j++)
        {
            Console.WriteLine("{0}", arr1[i] + arr1[j]);
        }
    }
}

Cubic: O (n3)
Very rare algorithms.
Exponential
O (n!) – Incredibly rare.

Note. Constants and numbers of lower order than N are not considered when expressing the Complexity of an algorithm. Example - O (n2 + n) = O (n2) as O (n) is very small compared to O (n2) and can be ignored.

Data Structures and Algorithm

Source – By Cmglee - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=50321072

From the graph shown on the left, we can quickly find out which algorithms of which complexity are faster,

Algorithms of complexity O (1) are fastest and take less time, while algorithms of complexity O(n!) are slower and take more time.

We can also write – the time taken by algorithms of complexity O (1) < O (log n) < O (n) < O (n2) < O (n!) and so on.

Additional Examples

What is the complexity of the code below?

static void MoreExamples1(int[] arr1, int[] arr2)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        Console.WriteLine(arr1[i]);
    }

    for (int j = 0; j < arr2.Length; j++)
    {
        Console.WriteLine(arr2[j]);
    }
}

In this case we have 2 for loops and each iterates through a different set of inputs (arr1 and arr2) and each input size is considered to be very large, so for the first for look the complexity is O (a) and for the second for loop the complexity is O (b) and as both for loops are independent the complexity is additive O (a) + O (b), we can write O (a + b) where a = arr1.Length and b = arr2.Length.

What is the complexity of the code below?

static void MoreExamples2(int[] arr1, int[] arr2)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        for (int j = 0; j < arr2.Length; j++)
        {
            if (arr1[i] > arr2[j])
            {
                Console.WriteLine(arr1[i] + " , " + arr2[j]);
            }
        }
    }
}

In this case we have two nested for loops and one if condition, so for each element i of arr1 the inner loop j goes through all the elements of arr2, (both arr1 and arr2 are different inputs and the input size is considered to be very large), and the if condition will be executed if the condition is satisfied. So, for the first for loop, the complexity is O (a), for the inner for loop, the complexity is O (b), and for the if condition, the complexity is O (1). As it’s a nested for loop, the complexity is multiplicative O (a) * O (b) * O(1), we can write O (a*b) or O (ab) where a = arr1.Length and b = arr2.Length.

Note. It’s not O (n2) as there are 2 different inputs, arr1 and arr2, of very large size.

What is the complexity of the code below?

static void MoreExamples3(int[] arr1, int[] arr2)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        for (int j = 0; j < arr2.Length; j++)
        {
            for (int k = 0; k < 100000; k++)
            {
                Console.WriteLine(arr1[i] + " , " + arr2[j]);
            }
        }
    }
}

In this case, we have three nested for loops, so for each element i of arr1 and loop j of arr2, the inner loop k goes through 100000 times (both arr1 and arr2 are different, and the input size is very large). For the first for loop, the complexity is O (a), for the second for loop, the complexity is O (b), and for the third for loop, the complexity is O (1) (constant time complexity and the input size is fixed – 100000 and would iterate fixed input). As it’s a nested for loop, we can write the complexity as O (a) * O (b) * O (1), we can ignore O (1) and write the complexity as O (a*b) or O (ab) where a = arr1.Length and b = arr2.Length.

Note. As mentioned, the constant time complexity O (1) for the constant 10000 can be ignored as it is very small compared to the input size of arr1 and arr2.

What is the complexity of the code below?

static void MoreExamples4(int[] arr1)
{
    for (int i = 0; i < arr1.Length; i++)
    {
        for (int j = 0; j < arr1.Length; j++)
        {
            Console.WriteLine(arr1[i] + ", " + arr1[j]);
        }
    }

    for (int k = 0; k < arr1.Length; k++)
    {
        Console.WriteLine(arr1[k]);
    }
}

In the above example, there are 2 sets of for loops - a nested for loop and a single for loop. For a nested for loop, the complexity is O (n2), and for a single for loop, the complexity is O (n). As both for loop sets are independent, the complexity is O (n2) + O (n).

Note. O (n) is much smaller than O (n2), and the maximum time the algorithm will take is to execute the nested for loop, so O (n) can be ignored, and we can say that the complexity for the above code is O (n2).

What’s the complexity of the code below?

static void MoreExamples5(int[] arr1)
{
    int i = 1;

    while (i < arr1.Length)
    {
        Console.WriteLine(arr1[i]);
        i = i * 2;
    }
}

In the above example, the loop variable is doubled in every iteration, which means the loop will execute faster than a normal loop that increments by 1. So the complexity of the above code can be written as O (log2 n), where n = arr1.Length. Let's see how we can derive this. In each iteration, the value of i is multiplied by 2, so in the first iteration, i = 1, then I is multiplied by 2, see below the calculation.

From the figure below, we can see that for every iteration, the value of i increases by powers of 2. Like 20, 21, 22, 23, and so on. You can also say that in each iteration, for some value of k, the value i will be 2k. For example, for the first iteration, if the value of K =0, the value of i will be 20, that is, i = 1.

Data Structures

So we can write a mathematical formula as for (K + 1)th iteration, i = 2K-1 * 2, that is, i = 2K.

At some point, there will be some number k which will be equal to N, that is, 2k >= n, and the iteration will stop and exit the loop. Now let’s do some derivation to find out how many iterations (K) it takes to double the value of i so that it exits the loop.

Thus, k = log2n. Since the loop will be iterated only for k iterations, the complexity of the above code is O(K), and so the complexity is O (log n).

Data Structures and Algorithm

Important point to note: When the input (n – where n is a very large size) is reduced by half or doubled in each iteration, your code will execute fast, and the loop will exit fast; the complexity of that piece of code is O (log n).

Adding link https://medium.com/@theworkingprogrammer/data-structures-and-algorithm-dsa-performance-complexity-and-big-o-notation-f8a697e5cbad

References

I have referenced a lot of books and training materials to understand the concepts, and writing this article would not be possible without the following and many other references.

Introduction to Algorithms - By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein
Data Structures Algorithms Made-Easy - By Narasimha Karumanchi
From 0 to 1: Data Structures & Algorithms in Java - Loony Corn Special thanks to Janani for the beautiful explanation and examples.