How To Improve Execution Performance Of An Application

For every application could be small or medium or a big enterprises application the major non-functional requirement will be a good code performance. In some cases, companies will hire a special team or outsource to improve their existing application's performance.

Be it a small, medium, or large enterprise application, the major non-functional requirement is good code performance. In some cases, companies hire a special team or outsource to a firm to improve the performance of their existing application.

Application performance is the major consideration for every application designer. In addition, we get many design patterns for considering the performance. The challenge to the application designer or programmer is how to improve the performance and to say frankly, there is no exact pattern to do this but it all depends on how we are utilizing the memory.

When we go back from C-language to Java and .NET, we see that the designers concentrated more on memory management. I won't go much deeper into this as I want to stay on how we can improve the performance with little design changes.

As a .NET programmer, we are aware of the fact that Types are divided into Primitive Type and reference type, where the data of primitive types (int, float, etc..,) will store be on Stack memory and data of reference types (String, Object, etc..,) will be stored on Heap memory.

From the processor point of view, whenever an application is running, it will deal with many variables of different types some may be primitive and some will be the reference. Generally, every processor from small to high end will maintain its own caches to store all the data which it frequently keeps using while executing our application.

Generally, if many applications are running in parallel, the CPU time will be shared by each application depending on the configuration we have. Reading the data from RAM by the processor will take up to ~10 nanoseconds (it may vary depending on configuration) and this read is so costly. If an application was given with 15 nanoseconds CPU time, 10 nanoseconds will go to get data from RAM and remaining 5 nanoseconds for processing, and again this application should wait for its next turn as 15 nanoseconds of CPU time is done.

To decrease the time to read data from RAM, every processor will have its dedicated Caches with different levels and these caches are very fast in reading data approx. 1 nanosecond (varies from processor to processor) i.e.., 10X faster than reading from dynamic RAM.

Nowadays, for any new processors will contain at least 3 levels of Caches,

  • Level 1 Cache (L1)
    This is the primary cache and often accessed in a few cycles. L1 cache is the fastest cache than other level cache and it will come with a processor built in. It can store up to 100 KB of data. This cache uses the high-speed SRAM (static RAM) instead of the slower and cheaper DRAM (dynamic RAM) used for main memory.

  • Level 2 Cache (L2)
    This is bigger than L1 and stores up to 512 KB. Accessing from this cache is little slow than L1 cache. This will be in between L1 and Main Memory.

  • Level 3 Cache (L3)
    This is bigger than L2 and stores up to 2 MB. Accessing from this cache is little slow than L2 cache. This will be in between L2 and Main Memory and can be found on the motherboard rather than on a processor.

Similarly, there will be other levels as well depends on the processor.

Now when processor starts the application, it will get and store all the required variable data to its Cache such that it can be read it whenever needed quickly to utilize most of the CPU time for the processor. In case of primitive type variables, the processor will store directly its original data to the cache and no need to go access main memory while processing. But, in case of reference type, these caches will be stored with the address or original data on the main memory and now everytime processor should get address from cache and read data or write data to the given address on main memory (if addresses are not stored on cache, gathering address and arranging in order is more time consuming). Here for the reference type, we are not fully utilizing the processor cache power. Now every developer or designer challenging part is to utilize maximum extent of these high-speed cache power and increase the CPU time utilization for processing and decreasing main memory accessed by a processor.

In C#, we have a type called Strut and it is similar to Class, but the variable of type Strut is treated as the primitive type and its data will be stored on a cache by the processor.

Note
If you are new to Struct please go to this link to better understand it.

Following is the code will provide a better understanding of how Struct type will provide performance improvement when compared with Class type. 

  1. public class ClassEmpolyee  
  2.    {  
  3.        public string FirstName { get; set; } = string.Empty;  
  4.        public string LastName { get; set; } = string.Empty;  
  5.        public decimal Salary { get; set; } = 0;  
  6.    }  
  7.   
  8.    public struct StructEmpolyee  
  9.    {  
  10.        public string FirstName { get; set; }  
  11.        public string LastName { get; set; }  
  12.        public decimal Salary { get; set; }  
  13.    }  
  14.   
  15.    public class PerfTest  
  16.    {  
  17.        public void StartPerfTest(int countOfEmployees)  
  18.        {  
  19.            System.Diagnostics.Stopwatch stopWatch = System.Diagnostics.Stopwatch.StartNew();  
  20.   
  21.            //Class  
  22.            ClassEmpolyee[] employeesAsClasses = new ClassEmpolyee[countOfEmployees];  
  23.   
  24.            for(int i=0; i< countOfEmployees; i++ )  
  25.            {  
  26.                employeesAsClasses[i] = new ClassEmpolyee() { FirstName = "EmoFName " + i, LastName = "EmoLName " + i, Salary = 1000 * i};  
  27.            }  
  28.   
  29.            //Update  
  30.            for (int i = 0; i < countOfEmployees; i++)  
  31.            {  
  32.                employeesAsClasses[i].Salary += 2000;   
  33.            }  
  34.   
  35.            long classTime = stopWatch.ElapsedMilliseconds;  
  36.   
  37.            stopWatch.Restart();  
  38.            //Struct  
  39.            StructEmpolyee[] employeesAsStructs = new StructEmpolyee[countOfEmployees];  
  40.            for (int i = 0; i < countOfEmployees; i++)  
  41.            {  
  42.                employeesAsStructs[i] = new StructEmpolyee() { FirstName = "EmoFName " + i, LastName = "EmoLName " + i, Salary = 1000 * i };  
  43.            }  
  44.   
  45.            //Update  
  46.            for (int i = 0; i < countOfEmployees; i++)  
  47.            {  
  48.                employeesAsStructs[i].Salary += 2000;  
  49.            }  
  50.   
  51.            long structTime = stopWatch.ElapsedMilliseconds;  
  52.   
  53.            Console.WriteLine("Time Taken for " + countOfEmployees + " Objects creation by\nClass: " + classTime + "ms\nStruct: " + structTime + "ms\nDifference: ~" + (classTime - structTime) + "ms");  
  54.        }  
  55.    }  

Here, in the above code we try to create some 100K class objects and 100K struct objects and also we updated its salary value by adding to test the read and write performance by the CPU and following are the results

How to Improve Application’s Execution Performance

 

Clearly seen there is a huge performance difference here. However, we have some limitations of using Struts and the following are those

  • If you want many variables to declare, the size of the Struct type will become huge and it will lead to more performance problem as on heap (in the case of Class type) it will handle differently for huge objects.
  • If you have many times, to pass these Struct type variables across classes or layers will lead to performance issue as we are not passing an address but we are passing the complete data.

If you are good with the above limitations while using Struts, you will definitely gain huge performance especially in the case of looping huge list of objects as in the above example program.

Hope this article gives you a good bit of information about how the processor will handle our application while processing and how can we design our application a processor friendly.

Thank you and happy coding.