How To Use Readxl Package To Read Data In R

Introduction

 
R can read data from various types of files having data in different formats. One such data format is the format of files created using Microsoft Excel. Files saved in Microsoft Excel are generally in .xls or .xlsx format. Data available in files supporting .xls or .xlsx format can be read by R with the help of various packages provided in directory of R.
 
In this article we will be using readxl and XLConnect packages to read excel data from files available in .xls or .xlsx format. To importing data from Excel various techniques are available which require the installation of third‐party libraries as well. These libraries may not be available for all operating systems.
 
Therefore we can also use a package named XLConnect. To read and write files supporting .xls or .xlsx format, XLConnect package requires a Java library, therefore XLConnect can run on different kinds of operating systems.
 

Installing the package XLConnect

 
We need to install and load the package in the directory of R, only then we can read an Excel file supporting formats such as .xls an .xlsx. In order to use a particular file in R, we can create an object of a file name as well. If Java is installed in an operating system, then to run XLConnect package, there is no need to install and load any other library in R.
  1. > install.packages("XLConnect")  
The above code will generate the following output,
  1. > install.packages("XLConnect")  
  2. WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:  
  3.   
  4. https://cran.rstudio.com/bin/windows/Rtools/  
  5. Installing package into ‘R/win-library/4.0’  
  6. (as ‘lib’ is unspecified)  
  7. trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/XLConnect_1.0.1.zip'  
  8. Content type 'application/zip' length 22907452 bytes (21.8 MB)  
  9. downloaded 21.8 MB  
  10.   
  11. package ‘XLConnect’ successfully unpacked and MD5 sums checked  
Loading the XLConnect package
  1. > library("XLConnect")  
The above code will generate the following output,
  1. > library("XLConnect")  
  2. XLConnect 1.0.1 by Mirai Solutions GmbH [aut],  
  3.   Martin Studer [cre],  
  4.   The Apache Software Foundation [ctb, cph] (Apache POI),  
  5.   Graph Builder [ctb, cph] (Curvesapi Java library)  
  6. http://www.mirai-solutions.com  
  7. https://github.com/miraisolutions/xlconnect  
Once the package is loaded in the library, we can either load an entire excel file or load a single worksheet at a time using a function named readWorksheetFromFile(). We have to pass two arguments inside readWorksheetFromFile() function to read excel files in R.
 
These two arguments are,
  • File name with .xls or .xlsx format along with valid path of directory where the required file is stored.
  • Worksheet’s name or index position.
Using readWorksheetFromFile function
  1. > info <- readWorksheetFromFile("info.xlsx ", sheet = 1)  
  2. print(info)  
The above code will generate the following output,
  1. > info <- readWorksheetFromFile("info.xlsx ", sheet = 1)  
  2. print(info)  
  3.   age.          job  marital  eduation default balance housing loan contact day month duration campaign pdays previous poutcome  y  
  4. 1   58   management  married  tertiary      no    2143     yes   no unknown   5   may      261        1    -1        0  unknown  no  
  5. 2   44   technician   single secondary      no     143      no   no unknown   7   may      261        1    -1        0  unknown  no  
  6. 3   33 entrepreneur  married   primary      no      43     yes  yes unknown  10   may      261        1    -1        0  unknown yes  
  7. 4   47   management   single   unknown      no     250     yes   no unknown   4   may      261        1    -1        0  unknown  no  
  8. 5   33  blue-collar divorced  tertiary      no    2100     yes   no unknown   8   may      261        1    -1        0  unknown  no  
  9. 6   35     services  married  tertiary      no     213      no  yes unknown  25   may      261        1    -1        0  unknown yes  
  10. 7   28      retired  married secondary      no     214     yes   no unknown  19   may      261        1    -1        0  unknown  no  
  11. 8   42   technician   single   primary      no    2500      no  yes unknown   5   may      261        1    -1        0  unknown  no  
  12. 9   58   management  married  tertiary      no     230     yes   no unknown   9   may      261        1    -1        0  unknown yes  
To remove an Excel file and clean the working environment of R, we can use file.remove() function,
  1. > file.remove("movies_data.xlsx")  

readxl package

 
To use readxl package, we need to install readxl package first as follows,
  1. install.packages("readxl")  
The above code will generate the following output,
  1. > install.packages("readxl")  
  2. WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:  
  3.   
  4. https://cran.rstudio.com/bin/windows/Rtools/  
  5. Installing package into ‘R/win-library/4.0’  
  6. (as ‘lib’ is unspecified)  
  7. trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/readxl_1.3.1.zip'  
  8. Content type 'application/zip' length 1708917 bytes (1.6 MB)  
  9. downloaded 1.6 MB  
  10.   
  11. package ‘readxl’ successfully unpacked and MD5 sums checked  
  12. >  
To read data from excel files we can use readxl package to read data from .xls and .xlsx format files. The readxl package contains read_excel() function which can be used to read data from excel files.
 
We can also determine the number of sheets available in Excel file as follows,
  1. > library(readxl)  
  2. > excel_sheets('info.xlsx')  
Above code will generate the following output,
  1. > excel_sheets('info.xlsx')  
  2. [1"Book1"  
As we can see from above output, info.xlsx excel file contains just a single worksheet named Book1.
 
Using read_excel function to read data
  1. > df <-read_excel('info.xlsx')  
  2. print(df)  
The above code will generate the following output,
  1. > df <-read_excel('info.xlsx')  
  2. print(df)  
  3. # A tibble: 9 x 17  
  4.     age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  5.   <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
  6. 1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
  7. 2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
  8. 3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
  9. 4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
  10. 5    33 blue-collar  divorced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
  11. 6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
  12. 7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
  13. 8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
  14. 9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes    
  15. >  
As we can see, the above output generates a table instead of a data frame.
 
Reading excel data using position of worksheet
  1. > df <-read_excel('info.xlsx', sheet=1)  
  2. print(df1)  
Above code will generate the following output,
  1. > df <-read_excel('info.xlsx', sheet=1)  
  2. print(df1)  
  3.   
  4.   
  5. print(df1)  
  6. # A tibble: 9 x 17  
  7.     age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  8.   <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
  9. 1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
  10. 2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
  11. 3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
  12. 4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
  13. 5    33 blue-collar  divorced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
  14. 6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
  15. 7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
  16. 8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
  17. 9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes    
  18. >  
Reading excel data using name of the worksheet
  1. > df1 <-read_excel('info.xlsx', sheet='Book1')  
  2. print(df1)  
Above code will generate the following output,
  1. > df1 <-read_excel('info.xlsx', sheet='Book1')  
  2. print(df1)  
  3.   
  4. # A tibble: 9 x 17  
  5.     age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  6.   <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
  7. 1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
  8. 2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
  9. 3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
  10. 4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
  11. 5    33 blue-collar  divorced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
  12. 6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
  13. 7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
  14. 8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
  15. 9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes    
  16. >  

Summary

 
In this article I demonstrated how to use readxl and XLConnect package to read excel data from files available in .xls or .xlsx format. I explained how to install and load packages to read excel data. Proper coding snippets and outputs are also provided.