How To Calculate The Median Of Variables In R

Introduction

 
There are many predefined functions available in R which can be used for analyzing data through some statistical functions. These functions are available in the R base package. Various statistical functions such as mean, median, and mode are available in R for analysis of data. As input, these functions take in vector and return the result. In this article, I will demonstrate how to calculate the median of observations in the variables of a dataset.
 

Calculating Median

 
The median of values of a particular variable in a dataset is the central-most value obtained using the median function available in the R package. There is a predefined function available in R called median() function which can be used to calculate the median of all the variables in a dataset.
 
There are different syntaxes available to calculate the median of a variable in a dataset, which are as follows: 
  • median(dataset_name$variable_name)
  • median(dataset_name$variable_name, trim = 0.1)
  • median(dataset_name$variable_name,na.rm = TRUE)
Now to calculate median, I will be using predefined datasets available in R package. We will be using mtcars dataset to calculate the mean of different variables available in dataset mtcars.
  1. > ds = mtcars  
  2. > ds  
  3.                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb  
  4. Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  
  5. Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  
  6. Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  
  7. Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  
  8. Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  
  9. Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  
  10. Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  
  11. Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  
  12. Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  
  13. Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4  
  14. Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4  
  15. Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3  
  16. Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3  
  17. Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3  
  18. Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  
  19. Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4  
  20. Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4  
  21. Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  
  22. Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  
  23. Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1  
  24. Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  
  25. Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2  
  26. AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2  
  27. Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4  
  28. Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2  
  29. Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1  
  30. Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  
  31. Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2  
  32. Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4  
  33. Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6  
  34. Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8  
  35. Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2  
  36. >  
Now we will calculate the median of variables of the mtcars dataset.
  1. ds = mtcars  
  2. median(ds$mpg)  
  3. > median(ds$mpg)  
  4. [119.2  
In the above code, the syntax for calculating the median of mpg variable of the mtcars dataset has been defined. The dataset has been assigned to the variable ds and then a predefined median function is used, the function has the mpg variable as its argument.
  1. ds = mtcars  
  2. median(ds$cyl)  
  3. > median(ds$cyl)  
  4. [16  
In the above code, the syntax for calculating the median of cyl variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then a predefined median function is used, the function has a cyl variable as its argument.
  1. ds = mtcars  
  2. median(ds$disp)  
  3. > median(ds$disp)  
  4. [1196.3  
In the above code, the syntax for calculating the median of disp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then a predefined median function is used, the function has disp variable as its argument.
  1. ds = mtcars  
  2. median(ds$hp)  
  3. > median(ds$hp)  
  4. [1123  
  5. >  
In the above code, the syntax for calculating the median of hp variable of mtcars dataset has been defined. The dataset has been assigned to the variable ds and then a predefined median function is used, the function has the hp variable as its argument.
 
We can also calculate the median of the vectors as in the following:
  1. a <- c(9624321355, -319, -515)  
  2. # calculating the Median.  
  3. vec <- median(a)  
  4. print(vec)  
It will generate the following output:
  1. > a <- c(9624321355, -319, -515)  
  2. > vec <- median(a)  
  3. print(vec)  
  4. [19  
Using the above code, we have created a vector named a having 11 values. Then we calculated the median of the values of the vector. The name of the vector is passed as an argument to the median function and median of the vector named a is calculated and assigned to the variable vec.
 

Trim argument

 
To remove certain number of observations from the variables and sort them in ascending order, we can include trim argument into the median() function to calculate the median of the observations.
  1. median(df1, trim = 0.1)  
Let us implement the median() function using the trim argument as follows:
  1. > df1 = data$mpg  
  2. > df1  
  3.  [121.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4  
  4. > calc <- median(df1,trim=0.3)  
  5. > calc  
  6. [119.2  
As we can see after using trim argument the observations are sorted and median is calculated after the removal of 3 values from top and bottom of the mpg variable.
  1. > median(df1)  
  2. [119.2  
The median obtained without using trim argument is as follows:
  1. > median(df1)  
  2. [119.2  
We can also calculate the median of the vectors by including a trim argument, as follows:
  1. a <- c(9624321355, -319, -515)  
  2. res <- median(a, trim = 0.2)  
  3. print(res)  
It will generate the following output:
  1. > a <- c(9624321355, -319, -515)  
  2. > res <- median(a, trim = 0.2)  
  3. > res  
  4. [19  
We have created a vector named a and calculated the median of the vector. In the median function, a trim argument is used, whose value is set to 0.2 which will remove two values each from the left and right of the vector.
 

Calculating Median by Removing Missing Values

 
If there are missing values present in the observations of the variable, then upon calculating the median, it will return NA.
To create missing values in a variable, we can use the below syntax.
  1. > data[2,4] = NA  
  2. > df2 = data$hp  
  3. > df2
  4. [1] 110 NA 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97 150 150 245 175 66 91 113 264 175 335 109  
As we can see, the dataset named data contains a variable named hp, whose second observation is set to a null value. Upon calculating the median of the hp variable, it will return NA.
  1. > median(df2)  
  2. [1] NA  

Removal of Missing Values

 
We can calculate the median of the variable by removing missing values from the variable by using the na.rm = True parameter inside the median() function. The value of the parameter na.rm is set to True, which indicates that NA values should be removed.
 
The below code will remove missing values as follows:
  1. > rs2 = median(df2,na.rm = TRUE)  
  2. > rs2  
  3. [1123  
  4. > a <- c(9624321355, -319, -5, NA)  
  5. # calculating the median.  
  6. median <- median(a)  
  7. print(median)  
The above code will return the following output:
  1. > a <- c(9624321355, -319, -5, NA)  
  2. > median <- median(a)  
  3. print(median)  
  4. [1] NA  
  5. >  

Removing NA values and calculating the median

  1. Res1 <- median(a,na.rm = TRUE)  
  2. print(res1)  
The above code will generate the following output:
  1. > Res1 <- median(a,na.rm = TRUE)  
  2. > Res1  
  3. [17.5  
  4. >  
As we can see, a vector named a has been created, which has NA value as well. Upon calculating the median, it will return the median as NA. Then we have included the parameter na.rm =True to remove NA from the vector. Then the median is calculated.
 

Summary

 
In this article, I demonstrate how to calculate the median of variables of a dataset. Different ways of calculating a median are also demonstrated. Proper coding snippets are provided.