How To Calculate The Mean Of Variables In R

Introduction

 
There are many predefined functions available in R which can be used for analyzing data through some statistical functions. These functions are available in the R packages. Various statistical functions such as mean, median and mode are available in R for analysis of data. As input, these functions take in vector and return the result. In this article, I will demonstrate how to calculate the mean of variables of a dataset.
 

Calculating mean

 
The mean of a particular variable in a dataset is obtained by calculating the sum of all the observations of a particular variable of a dataset and dividing by the  total number of the observations of a variable. There is a predefined function available in R called mean() function which can be used to calculate the mean of all the variables in a dataset.
 
There are different syntaxes available to calculate the mean of a variable in a dataset which are as follows,
  • mean(df)
  • mean(df, trim = 0.1)
  • mean(df,na.rm = TRUE)
Now to calculate mean I will be using predefined datasets available in R package. We will be using mtcars dataset to calculate the mean of different variables available in  the dataset mtcars.
  1. > data = mtcars  
  2. > data  
  3.                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb  
  4. Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  
  5. Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  
  6. Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  
  7. Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  
  8. Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  
  9. Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  
  10. Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  
  11. Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  
  12. Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  
  13. Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4  
  14. Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4  
  15. Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3  
  16. Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3  
  17. Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3  
  18. Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  
  19. Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4  
  20. Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4  
  21. Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  
  22. Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  
  23. Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1  
  24. Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  
  25. Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2  
  26. AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2  
  27. Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4  
  28. Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2  
  29. Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1  
  30. Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  
  31. Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2  
  32. Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4  
  33. Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6  
  34. Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8  
  35. Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2  

Now we will calculate the mean of variables of mtcars dataset.

  1. df = mtcars  
  2. mean(df$mpg)  
  3. > mean(data$mpg)  
  4. [120.09062  

 

In the above code, the syntax for calculating the mean of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable df and then predefined mean function is used, the function has mpg variable as its argument.
  1. mean(df$cyl)  
  2. > mean(data$cyl)  
  3. [16.1875  
In the above code, the syntax for calculating the mean of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable df and then predefined mean function is used, the function has cyl variable as its argument.
  1. mean(df$disp)  
  2. > mean(data$disp)  
  3. [1230.7219  
In the above code, the syntax for calculating the mean of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable df and then predefined mean function is used, the function has disp variable as its argument.
  1. mean(df$hp)  
  2. > mean(data$hp)  
  3. [1146.6875  
  4. >  
In the above code, the syntax for calculating the mean of mpg variable of mtcars dataset has been defined. The dataset has been assigned to the variable df and then predefined mean function is used, the function has hp variable as its argument.
 
We can also calculate the mean of the vectors as follows:
  1. a <- c(9624321355, -319, -515)  
  2. # calculating the Mean.  
  3. vec <- mean(a)  
  4. print(vec)  
  5. It will generate the following output,  
  6. > a <- c(9624321355, -319, -515)  
  7. > vec <- mean(a)  
  8. print(vec)  
  9. [111.54545  
  10. >  
Using the above code, we have created a vector named a having 11 values. Then we calculated the mean of the values of the vector. The name of the vector is passed as an argument to the mean function and mean of the vector named a is calculated and assigned to the variable vec.
 

Trim argument

 
To remove certain number of observations from the variables and sort them in ascending order, we can include trim argument into the mean() function to calculate the mean of the observations.
  1. mean(df1, trim = 0.1)  
Let us implement the mean() function using the trim argument as follows,
  1. > df1 = data$mpg  
  2. > df1  
  3.  [121.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4  
  4. > calc <- mean(df1,trim=0.3)  
  5. > calc  
  6. [119.17857  
As we can see after using the trim argument the observations are sorted and mean is calculated after the removal of 3 values from top and bottom of the mpg variable.
 
The mean obtained without using trim argument is as follows,
  1. > mean(df1)  
  2. [120.09062  
We can also calculate the mean of the vectors by including trim argument as follows,
  1. a <- c(9624321355, -319, -515)  
  2. res <- mean(a, trim = 0.2)  
  3. print(res)  
It will generate the following output,
  1. > a <- c(9624321355, -319, -515)  
  2. > res <- mean(a, trim = 0.2)  
  3. > res  
  4. [19.285714  
We have created a vector named a and calculated the mean of the vector. In the mean function, trim argument is used whose value is set to 0.2 which will remove two values each from left and right of the vector.
 

Calculating mean by removing missing values

 
If there are missing values present in the observations of the variable then upon calculating the mean, it will return NA. To create missing values in a variable we can use the below syntax,
  1. > data[2,4] = NA  
  2. > df2 = data$hp  
  3. > df2
  4. [1] 110 NA 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97 150 150 245 175 66 91 113 264 175 335 109  
As we can see the dataset named data contains a variable named hp whose second observation is set to a null value. Upon calculating the mean of the hp variable, it will return NA.
  1. > mean(df2)  
  2. [1] NA  

Removal of missing values

 
We can calculate the mean of the variable by removing missing values from the variable by using the na.rm = True parameter inside the mean() function. The value of the parameter na.rm is set to True which indicates that NA values should be removed.
 
The below code will remove missing values as follows,
  1. > rs2 = mean(df2,na.rm = TRUE)  
  2. > rs2  
  3. [1147.871  
  4. > a <- c(9624321355, -319, -5, NA)  
  5. # calculating the mean.  
  6. mean <- mean(a)  
  7. print(mean)  
The above code will return the following output,
  1. > a <- c(9624321355, -319, -5, NA)  
  2. > mean <- mean(a)  
  3. print(mean)  
  4. [1] NA  
  5. >  
Removing NA values and calculating the mean
  1. Res1 <- mean(x,na.rm = TRUE)  
  2. print(res1)  
The above code will generate the following output,
  1. > Res1 <- mean(a,na.rm = TRUE)  
  2. > Res1  
  3. [111.2  
  4. >  
As we can see a vector named a has been created, which is having NA value as well, upon calculating the mean, it will return mean as NA. Then we have included the parameter na.rm =True to remove NA from vector and then mean is calculated.
 

Summary

 
In this article, I demonstrated how to calculate the mean of variables of a dataset. Different ways of calculating a mean is also demonstrated. Proper coding snippets are provided. 


Similar Articles