How To Read Tabular Data From CSV Files In R

Introduction

 
Various application programs such as R and Microsoft Excel support importing and exporting of data in tabular format. A CSV file contains data in the form of rows and columns that is data in form of table, A CSV file comprises of several rows of data, and each and every piece of information within that row is separated by commas.
 
In this article, I will discuss how to read comma separated data values in R and store these values in a data frame.
 

Reading comma separated values

 
The read.csv function can be used to read data from a comma-separated values (CSV) files. If a CSV file contains a header row, then to read data from such files, we can use the following syntax,
  1. > df1 <- read.csv("filename")  
To read data from a CSV file that includes a row header, we can include a new argument named header and change its value as follows,
  1. > df1 <- read.csv("filename", header=FALSE)  
Above syntax will generate the following output,
  1. > df1 = read.csv("bank.csv", header = TRUE)  
  2. > df1  
  3.    age          job  marital education default balance housing loan contact day month duration campaign pdays previous  
  4. 1   59       admin.  married secondary      no    2343     yes   no unknown   5   may     1042        1    -1        0  
  5. 2   56       admin.  married secondary      no      45      no   no unknown   5   may     1467        1    -1        0  
  6. 3   41   technician  married secondary      no    1270     yes   no unknown   5   may     1389        1    -1        0  
  7. 4   55     services  married secondary      no    2476     yes   no unknown   5   may      579        1    -1        0  
  8. 5   54       admin.  married  tertiary      no     184      no   no unknown   5   may      673        2    -1        0  
if we import data from a csv file named bank.csv containing data frames with 5 rows and 15 columns. On top there is a row header which consists of names of columns,
 
To read data from a csv file that has been imported using the syntax mentioned above, we are using read.csv function. The read.csv function builds up a data frame. Data frame is one of the ways through which we can represent any data in R which is available in the form of rows and columns that is in the form of tables.
 
The function in the below code implies that the csv file already contains a header row,
  1. > data = read.csv("bank.csv")   
  2. > data  
  3.    age          job  marital education default balance housing loan contact day month duration campaign pdays previous  
  4. 1   59       admin.  married secondary      no    2343     yes   no unknown   5   may     1042        1    -1        0  
  5. 2   56       admin.  married secondary      no      45      no   no unknown   5   may     1467        1    -1        0  
  6. 3   41   technician  married secondary      no    1270     yes   no unknown   5   may     1389        1    -1        0  
  7. 4   55     services  married secondary      no    2476     yes   no unknown   5   may      579        1    -1        0  
  8. 5   54       admin.  married  tertiary      no     184      no   no unknown   5   may      673        2    -1        0  
  9. 6   42   management   single  tertiary      no       0     yes  yes unknown   5   may      562        2    -1        0  
  10. 7   56   management  married  tertiary      no     830     yes  yes unknown   6   may     1201        1    -1        0  
  11. 8   60      retired divorced secondary      no     545     yes   no unknown   6   may     1030        1    -1        0  
  12. 9   37   technician  married secondary      no       1     yes   no unknown   6   may      608        1    -1        0  
  13. 10  28     services   single secondary      no    5090     yes   no unknown   6   may     1297        3    -1        0  
  14. 11  38       admin.   single secondary      no     100     yes   no unknown   7   may      786        1    -1        0  
  15. 12  30  blue-collar  married secondary      no     309     yes   no unknown   7   may     1574        2    -1        0  
  16. 13  29   management  married  tertiary      no     199     yes  yes unknown   7   may     1689        4    -1        0  
  17. 14  46  blue-collar   single  tertiary      no     460     yes   no unknown   7   may     1102        2    -1        0  
  18. 15  31   technician   single  tertiary      no     703     yes   no unknown   8   may      943        2    -1        0  
  19. 16  35   management divorced  tertiary      no    3837     yes   no unknown   8   may     1084        1    -1        0  
  20. 17  32  blue-collar   single   primary      no     611     yes   no unknown   8   may      541        3    -1        0  
  21. 18  49     services  married secondary      no      -8     yes   no unknown   8   may     1119        1    -1        0  
  22. 19  41       admin.  married secondary      no      55     yes   no unknown   8   may     1120        2    -1        0  
  23. 20  49       admin. divorced secondary      no     168     yes  yes unknown   8   may      513        1    -1        0  
  24. 21  28       admin. divorced secondary      no     785     yes   no unknown   8   may      442        2    -1        0  
  25. 22  43   management   single  tertiary      no    2067     yes   no unknown   8   may      756        1    -1        0  
  26. 23  43   management divorced  tertiary      no     388     yes   no unknown   8   may     2087        2    -1        0  
  27. 24  43  blue-collar  married   primary      no    -192     yes   no unknown   8   may     1120        2    -1        0  
  28. 25  37   unemployed   single secondary      no     381     yes   no unknown   8   may      985        2    -1        0  
  29. 26  35  blue-collar   single secondary      no      40     yes   no unknown   9   may      617        4    -1        0  
  30. 27  31   technician   single  tertiary      no      22     yes   no unknown   9   may      483        3    -1        0  
  31. 28  43  blue-collar   single secondary      no       3     yes   no unknown   9   may      929        3    -1        0  
  32. 29  31       admin.  married secondary      no     307     yes   no unknown   9   may      538        1    -1        0  
  33. 30  28  blue-collar   single secondary      no     759     yes   no unknown   9   may      710        1    -1        0  
From the above code snippet, we can see that header row within the data frame contains the name of the columns of the csv data file as header for the data frame.
 
If we do not want a data frame to include a header, then we can pass argument header=FALSE and R will generate dummy variables,
  1. > df1 = read.csv("bank.csv", header = F)  
  2. > head(df1,20)  
  3.     V1          V2       V3        V4      V5      V6      V7   V8      V9 V10   V11      V12      V13   V14      V15  
  4. 1   59      admin.  married secondary      no    2343     yes   no unknown   5   may     1042        1    -1        0  
  5. 2   56      admin.  married secondary      no      45      no   no unknown   5   may     1467        1    -1        0  
  6. 3   41  technician  married secondary      no    1270     yes   no unknown   5   may     1389        1    -1        0  
  7. 4   55    services  married secondary      no    2476     yes   no unknown   5   may      579        1    -1        0  
  8. 5   54      admin.  married  tertiary      no     184      no   no unknown   5   may      673        2    -1        0  
  9. 6   42  management   single  tertiary      no       0     yes  yes unknown   5   may      562        2    -1        0  
  10. 7   56  management  married  tertiary      no     830     yes  yes unknown   6   may     1201        1    -1        0  
  11. 8   60     retired divorced secondary      no     545     yes   no unknown   6   may     1030        1    -1        0  
  12. 9   37  technician  married secondary      no       1     yes   no unknown   6   may      608        1    -1        0  
  13. 10  28    services   single secondary      no    5090     yes   no unknown   6   may     1297        3    -1        0  
  14. 11  38      admin.   single secondary      no     100     yes   no unknown   7   may      786        1    -1        0  
  15. 12  30 blue-collar  married secondary      no     309     yes   no unknown   7   may     1574        2    -1        0  
  16. 13  29  management  married  tertiary      no     199     yes  yes unknown   7   may     1689        4    -1        0  
  17. 14  46 blue-collar   single  tertiary      no     460     yes   no unknown   7   may     1102        2    -1        0  
  18. 15  31  technician   single  tertiary      no     703     yes   no unknown   8   may      943        2    -1        0  
  19. 16  35  management divorced  tertiary      no    3837     yes   no unknown   8   may     1084        1    -1        0  
  20. 17  32 blue-collar   single   primary      no     611     yes   no unknown   8   may      541        3    -1        0  
  21. 18  49    services  married secondary      no      -8     yes   no unknown   8   may     1119        1    -1        0  
  22. 19  41      admin.  married secondary      no      55     yes   no unknown   8   may     1120        2    -1        0  

Structure of data frame

 
We can also take a look at the structure of data that has been imported. To display the structure of data, we can use the following syntax,
  1. > str(df1)  
Here df1 is the name of data frame.
 
Now I will discuss the structure of data frame of bank.csv file,
  1. > df <- read.csv("bank.csv", as.is=TRUE)  
  2. > str(df)  
  3.   
  4.   
  5. 'data.frame':   11162 obs. of  17 variables:  
  6.  $ age      : int  59 56 41 55 54 42 56 60 37 28 ...  
  7.  $ job      : chr  "admin." "admin." "technician" "services" ...  
  8.  $ marital  : chr  "married" "married" "married" "married" ...  
  9.  $ education: chr  "secondary" "secondary" "secondary" "secondary" ...  
  10.  $ default  : chr  "no" "no" "no" "no" ...  
  11.  $ balance  : int  2343 45 1270 2476 184 0 830 545 1 5090 ...  
  12.  $ housing  : chr  "yes" "no" "yes" "yes" ...  
  13.  $ loan     : chr  "no" "no" "no" "no" ...  
  14.  $ contact  : chr  "unknown" "unknown" "unknown" "unknown" ...  
  15.  $ day      : int  5 5 5 5 5 5 6 6 6 6 ...  
  16.  $ month    : chr  "may" "may" "may" "may" ...  
  17.  $ duration : int  1042 1467 1389 579 673 562 1201 1030 608 1297 ...  
  18.  $ campaign : int  1 1 1 1 2 2 1 1 1 3 ...  
  19.  $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...  
  20.  $ previous : int  0 0 0 0 0 0 0 0 0 0 ...  
  21.  $ poutcome : chr  "unknown" "unknown" "unknown" "unknown" ...  
  22.  $ deposit  : chr  "yes" "yes" "yes" "yes" ...  
As we can see the structure of the data frame contains observations and variables. The variables have values which are of integer and character datatype.
 

Import values using table function

 
We can also use read.table function to import values of CSV (comma separated values) files in R. After reading each and every value from the CSV file, the values are stored in data frame,
  1. > df = read.table("bank.csv", header = TRUE)  
  2. >  head(df,10)  
  3.    age.job.marital.education.default.balance.housing.loan.contact.day.month.duration.campaign.pdays.previous.poutcome.deposit  
  4. 1                                            59,admin.,married,secondary,no,2343,yes,no,unknown,5,may,1042,1,-1,0,unknown,yes  
  5. 2                                               56,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes  
  6. 3                                        41,technician,married,secondary,no,1270,yes,no,unknown,5,may,1389,1,-1,0,unknown,yes  
  7. 4                                           55,services,married,secondary,no,2476,yes,no,unknown,5,may,579,1,-1,0,unknown,yes  
  8. 5                                                54,admin.,married,tertiary,no,184,no,no,unknown,5,may,673,2,-1,0,unknown,yes  
  9. 6                                             42,management,single,tertiary,no,0,yes,yes,unknown,5,may,562,2,-1,0,unknown,yes  
  10. 7                                         56,management,married,tertiary,no,830,yes,yes,unknown,6,may,1201,1,-1,0,unknown,yes  
  11. 8                                           60,retired,divorced,secondary,no,545,yes,no,unknown,6,may,1030,1,-1,0,unknown,yes  
  12. 9                                            37,technician,married,secondary,no,1,yes,no,unknown,6,may,608,1,-1,0,unknown,yes  
  13. 10                                          28,services,single,secondary,no,5090,yes,no,unknown,6,may,1297,3,-1,0,unknown,yes  
  14. >   
  15. >   
Here we are also passing the argument header = TRUE to read.table function as data frame contains a header row.
 

Summary

 
In this article, I demonstrated how to read comma separated data values in R and store these values in data frame. I also discussed how to read data values with row header and without row header. Two different kinds of functions are used to import comma separated data values in R. Proper coding snippets and outputs are provided.