How To Calculate The Mode Of Variables In R

Introduction

 
There are many predefined functions available in R which can be used for analysing data through some statistical functions. These functions are available in the R base package. Various statistical functions such as mean, median and mode are available in R for analysis of data. As input these functions take in vector and return the result. In this article, I will demonstrate how to calculate the mode of observations in a variables of a dataset.
 

Calculating mode

 
Mode of the values of a particular variable in a dataset is that observation in a variable whose occurance in a variables is more than any other observation in a variable. There is a predefined function available in R called median() function which can be used to calculate the medain of all the variable in a dataset.
 
There are different syntax available to calculate the mode of a variable in a dataset which are as follows,
  • mode(dataset_name$variable_name)
  • mode(dataset_name$variable_name, trim = 0.1)
  • mode(dataset_name$variable_name,na.rm = TRUE)
Now to calculate mode I will be using predefined datasets available in R package. We will be using mtcars dataset to calculate the mean of different variables available in dataset mtcars.
  1. > mtcars  
  2.                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb  
  3. Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  
  4. Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  
  5. Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  
  6. Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  
  7. Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  
  8. Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  
  9. Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  
  10. Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  
  11. Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  
  12. Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4  
  13. Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4  
  14. Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3  
  15. Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3  
  16. Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3  
  17. Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  
  18. Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4  
  19. Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4  
  20. Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  
  21. Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  
  22. Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1  
  23. Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  
  24. Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2  
  25. AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2  
  26. Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4  
  27. Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2  
  28. Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1  
  29. Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  
  30. Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2  
  31. Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4  
  32. Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6  
  33. Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8  
  34. Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2  
  35. >  
Now we will be creating a user defined function to calculate mode of variables in dataset,
  1. > calcmode <- function(a) {  
  2. + vector <- unique(a)  
  3. + vector[which.max(tabulate(match(a, vector)))]  
  4. + }  
Now we will calculate the mode of variables of mtcars dataset.
  1. ds = mtcars  
  2. calcmode(ds$mpg)  
  3. > var <- calcmode(ds$mpg)  
  4. > var  
  5. [121  
In the above code, the syntax for calculating the mode of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has mpg variable as its argument.
  1. ds = mtcars  
  2. > var <- calcmode(ds$cyl)  
  3. > var  
  4. [18  
In the above code, the syntax for calculating the mode of cyl variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has cyl variable as its argument.
  1. ds = mtcars  
  2. > var <- calcmode (ds$disp)  
  3. > var  
  4. [1275.8  
In the above code, the syntax for calculating the mode of disp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has disp variable as its argument.
  1. ds = mtcars  
  2. > var <- calcmode(ds$hp)  
  3. > var  
  4. [1110  
  5. >  
In the above code, the syntax for calculating the mode of hp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then predefined mode function is used, the function has hp variable as its argument.
 
  1. a <- c(96255603555, -319, -515)  
  2. # calculating the Mode.  
  3. var <- calc  
  4. var <- calcmode(a)  
  5. print(var)  
  6. It will generate the following output,  
  7. > a <- c(96255603555, -319, -515)  
  8. > var <- calcmode(a)  
  9. print(var)  
  10. [155  
  11. >  
Using the above code, we have created a vector named a having 11 values. Then we calculated the mode of the values of the vector. The name of the vector is passed as an argument to the mode function and mode of the vector named a is calculated and assigned to the variable vec.
 

Trim argument

 
To remove a certain number of observations from the variables and sort them in ascending order, we can include trim argument into the mode() function to calculate the median of the observations. Let us implement the mode() function using the trim argument as follows,
  1. calcmode(df1, trim = 0.1)  
  2. Let us implement the mode() function using the trim argument as follows,  
  3. > df1 = data$mpg  
  4. > df1  
  5.  [121.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4  
  6. > calc <- calcmode(df1,trim=0.3)  
  7. > calc  
  8. [121  
As we can see after using trim argument the observations are sorted and mode is calculated after the removal of 3 values from top and bottom of the mpg variable.
 
  1. > var <- calcmode(df1)  
  2. [121  
 
We can also calculate the mode of the vectors by including trim argument as follows,
  1. a <- c(9624320320, -319, -515)  
  2. var <- calcmode(a, trim = 0.2)  
  3. print(var)  
It will generate the following output,
  1. > a <- c(9624320320, -319, -515)  
  2. > var <- calcmode(a, trim = 0.2)  
  3. > var  
  4. [120  
We have created a vector named a and calculated the mode of the vector. In the mode function, trim argument is used whose value is set to 0.2 which will remove two values each from left and right of the vector.
 

Calculating mode by removing missing values

 
If there are missing values present in the observations of the variable then upon calculating the mode, it will return NA.
 
To create missing values in a variable we can use the below syntax,
  1. > data[2,4] = NA  
  2. > df2 = data$hp  
  3. > df2  
  4.  [1110  NA  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97 150 150 245 175  66  91 113 264 175 335 109  
As we can see the dataset named data contains a variable named hp whose second observation is set to a null value. Upon calculating the mode of the hp variable, it will return NA.
  1. > calcmode(df2)  
  2. [1] NA  

Removal of missing values

 
We can calculate the mode of the variable by removing missing values from the variable by using the na.rm = True parameter inside the mode () function. The value of the parameter na.rm is set to True which indicates that NA values should be removed.
 
The below code will remove missing values as follows,
  1. > rs2 = calcmode (df2,na.rm = TRUE)  
  2. > rs2  
  3. [1180  
  4. > a <- c(96243212155, -319, -5, NA)  
  5. # calculating the mode.  
  6. mode  <- calcmode (a)  
  7. print(mode)  
Above code will return the following output,
  1. > a <- c(96243212155, -319, -5, NA)  
  2. > mode <- calcmode (a)  
  3. print(mode)  
  4. [1] NA  
  5. >  
Removing NA values and calculating the mode.
  1. Res1 <- calcmode (a,na.rm = TRUE)  
  2. print(res1)  
The above code will generate the following output,
  1. > Res1 <-  calcmode (a,na.rm = TRUE)  
  2. > Res1  
  3. [121  
  4. >  
As we can see a vector named a has been created, which is having NA value as well, upon calculating the mode, it will return mode as NA. Then we have included the parameter na.rm =True to remove NA from vector and then mode is calculated.
 

Summary

 
In this article, I demonstrate how to calculate the mode of variables of a dataset. Different ways of calculating a mode are also demonstrated. Proper coding snippets are provided.