How To Use Readxl Package To Read Data In R

Abhishek Yadav
1y
14.7k
0
1

Article

Introduction

R can read data from various types of files having data in different formats. One such data format is the format of files created using Microsoft Excel. Files saved in Microsoft Excel are generally in .xls or .xlsx format. Data available in files supporting .xls or .xlsx format can be read by R with the help of various packages provided in the directory of R.

In this article, we will be using readxl and XLConnect packages to read Excel data from files available in .xls or .xlsx format. To import data from Excel various techniques are available which require the installation of third‐party libraries as well. These libraries may not be available for all operating systems.

Therefore we can also use a package named XLConnect. To read and write files supporting .xls or .xlsx format, the XLConnect package requires a Java library, therefore XLConnect can run on different kinds of operating systems.

Installing the package XLConnect

We need to install and load the package in the directory of R, only then we can read an Excel file supporting formats such as .xls and .xlsx. In order to use a particular file in R, we can create an object of a file name as well.If Java is installed in an operating system, then to run the XLConnect package, there is no need to install and load any other library in R.

install.packages("XLConnect")

The above code will generate the following output.

> install.packages("XLConnect")  
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:  
  
https://cran.rstudio.com/bin/windows/Rtools/  
Installing package into ‘R/win-library/4.0’  
(as ‘lib’ is unspecified)  
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/XLConnect_1.0.1.zip'  
Content type 'application/zip' length 22907452 bytes (21.8 MB)  
downloaded 21.8 MB  
  
package ‘XLConnect’ successfully unpacked and MD5 sums checked

Loading the XLConnect package

library("XLConnect")

The above code will generate the following output.

> library("XLConnect")  
XLConnect 1.0.1 by Mirai Solutions GmbH [aut],  
  Martin Studer [cre],  
  The Apache Software Foundation [ctb, cph] (Apache POI),  
  Graph Builder [ctb, cph] (Curvesapi Java library)  
http://www.mirai-solutions.com  
https://github.com/miraisolutions/xlconnect

Once the package is loaded in the library, we can either load an entire Excel file or load a single worksheet at a time using a function named readWorksheetFromFile(). We have to pass two arguments inside the readWorksheetFromFile() function to read Excel files in R.

These two arguments are

A file name with .xls or .xlsx format along with a valid path of the directory where the required file is stored.
Worksheet’s name or index position.

Using readWorksheetFromFile function

> info <- readWorksheetFromFile("info.xlsx ", sheet = 1)  
> print(info)

The above code will generate the following output.

> info <- readWorksheetFromFile("info.xlsx ", sheet = 1)  
> print(info)  
  age.          job  marital  eduation default balance housing loan contact day month duration campaign pdays previous poutcome  y  
1   58   management  married  tertiary      no    2143     yes   no unknown   5   may      261        1    -1        0  unknown  no  
2   44   technician   single secondary      no     143      no   no unknown   7   may      261        1    -1        0  unknown  no  
3   33 entrepreneur  married   primary      no      43     yes  yes unknown  10   may      261        1    -1        0  unknown yes  
4   47   management   single   unknown      no     250     yes   no unknown   4   may      261        1    -1        0  unknown  no  
5   33  blue-collar porced  tertiary      no    2100     yes   no unknown   8   may      261        1    -1        0  unknown  no  
6   35     services  married  tertiary      no     213      no  yes unknown  25   may      261        1    -1        0  unknown yes  
7   28      retired  married secondary      no     214     yes   no unknown  19   may      261        1    -1        0  unknown  no  
8   42   technician   single   primary      no    2500      no  yes unknown   5   may      261        1    -1        0  unknown  no  
9   58   management  married  tertiary      no     230     yes   no unknown   9   may      261        1    -1        0  unknown yes

To remove an Excel file and clean the working environment of R, we can use the file.remove() function.

> file.remove("movies_data.xlsx")

readxl package

To use readxl package, we need to install readxl package first as follows.

install.packages("readxl")

The above code will generate the following output.

> install.packages("readxl")  
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:  
  
https://cran.rstudio.com/bin/windows/Rtools/  
Installing package into ‘R/win-library/4.0’  
(as ‘lib’ is unspecified)  
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/readxl_1.3.1.zip'  
Content type 'application/zip' length 1708917 bytes (1.6 MB)  
downloaded 1.6 MB  
  
package ‘readxl’ successfully unpacked and MD5 sums checked  
>

To read data from Excel files we can use readxl package to read data from .xls and .xlsx format files. The readxl package contains the read_excel() function which can be used to read data from Excel files.

We can also determine the number of sheets available in the Excel file as follows.

> library(readxl)  
> excel_sheets('info.xlsx')

The above code will generate the following output.

> excel_sheets('info.xlsx')  
[1] "Book1"

As we can see from the above output, the info.xlsx Excel file contains just a single worksheet named Book1.

Using the read_excel function to read data

> df <-read_excel('info.xlsx')  
> print(df)

The above code will generate the following output.

> df <-read_excel('info.xlsx')  
> print(df)  
# A tibble: 9 x 17  
    age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
5    33 blue-collar  porced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes

As we can see, the above output generates a table instead of a data frame.

Reading Excel data using the position of the worksheet

> df <-read_excel('info.xlsx', sheet=1)  
> print(df1)

The above code will generate the following output.

> df <-read_excel('info.xlsx', sheet=1)  
> print(df1)  
  
  
> print(df1)  
# A tibble: 9 x 17  
    age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
5    33 blue-collar  porced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes    
>

Reading Excel data using the name of the worksheet

> df1 <-read_excel('info.xlsx', sheet='Book1')  
> print(df1)

The above code will generate the following output.

> df1 <-read_excel('info.xlsx', sheet='Book1')  
> print(df1)  
  
# A tibble: 9 x 17  
    age job          marital  eduation  default balance housing loan  contact   day month duration campaign pdays previous poutcome y      
  <dbl> <chr>        <chr>    <chr>     <chr>     <dbl> <chr>   <chr> <chr>   <dbl> <chr>    <dbl>    <dbl> <dbl>    <dbl> <chr>    <chr>  
1    58 management   married  tertiary  no         2143 yes     no    unknown     5 may        261        1    -1        0 unknown  no     
2    44 technician   single   secondary no          143 no      no    unknown     7 may        261        1    -1        0 unknown  no     
3    33 entrepreneur married  primary   no           43 yes     yes   unknown    10 may        261        1    -1        0 unknown  yes    
4    47 management   single   unknown   no          250 yes     no    unknown     4 may        261        1    -1        0 unknown  no     
5    33 blue-collar  porced tertiary  no         2100 yes     no    unknown     8 may        261        1    -1        0 unknown  no     
6    35 services     married  tertiary  no          213 no      yes   unknown    25 may        261        1    -1        0 unknown  yes    
7    28 retired      married  secondary no          214 yes     no    unknown    19 may        261        1    -1        0 unknown  no     
8    42 technician   single   primary   no         2500 no      yes   unknown     5 may        261        1    -1        0 unknown  no     
9    58 management   married  tertiary  no          230 yes     no    unknown     9 may        261        1    -1        0 unknown  yes

Summary

In this article, I demonstrated how to use readxl and XLConnect packages to read Excel data from files available in .xls or .xlsx format. I explained how to install and load packages to read Excel data. Proper coding snippets and outputs are also provided.