How To Read Tabular Data From CSV Files In R

Abhishek Yadav
1y
6.3k
0
2

Article

Introduction

Various application programs such as R and Microsoft Excel support importing and exporting data in tabular format. A CSV file contains data in the form of rows and columns that is data in the form of a table, A CSV file comprises several rows of data, and every piece of information within that row is separated by commas.

In this article, I will discuss how to read comma-separated data values in R and store these values in a data frame.

Reading comma-separated values

The read.csv function can be used to read data from comma-separated values (CSV) files. If a CSV file contains a header row, then to read data from such files, we can use the following syntax.

df1 <- read.csv("filename")

To read data from a CSV file that includes a row header, we can include a new argument named header and change its value as follows.

> df1 <- read.csv("filename", header=FALSE)

The above syntax will generate the following output.

> df1 = read.csv("bank.csv", header = TRUE)  
> df1  
  age        job  marital  education  default  balance  housing  loan  contact  day  month  duration  campaign  pdays  previous
1  59     admin.  married  secondary       no     2343      yes    no  unknown    5    may      1042         1     -1         0
2  56     admin.  married  secondary       no       45       no    no  unknown    5    may      1467         1     -1         0
3  41 technician  married  secondary       no     1270      yes    no  unknown    5    may      1389         1     -1         0
4  55   services  married  secondary       no     2476      yes    no  unknown    5    may       579         1     -1         0
5  54     admin.  married   tertiary       no      184       no    no  unknown    5    may       673         2     -1         0

if we import data from a CSV file named bank.csv containing data frames with 5 rows and 15 columns. On top, there is a row header that consists of the names of columns,

To read data from a CSV file that has been imported using the syntax mentioned above, we are using the read.csv function. The read.csv function builds up a data frame. Data frame is one of the ways through which we can represent any data in R which is available in the form of rows and columns that are in the form of tables.

The function in the below code implies that the CSV file already contains a header row.

> data = read.csv("bank.csv")   
> data  
   age          job  marital education default balance housing loan contact day month duration campaign pdays previous  
1   59       admin.  married secondary      no    2343     yes   no unknown   5   may     1042        1    -1        0  
2   56       admin.  married secondary      no      45      no   no unknown   5   may     1467        1    -1        0  
3   41   technician  married secondary      no    1270     yes   no unknown   5   may     1389        1    -1        0  
4   55     services  married secondary      no    2476     yes   no unknown   5   may      579        1    -1        0  
5   54       admin.  married  tertiary      no     184      no   no unknown   5   may      673        2    -1        0  
6   42   management   single  tertiary      no       0     yes  yes unknown   5   may      562        2    -1        0  
7   56   management  married  tertiary      no     830     yes  yes unknown   6   may     1201        1    -1        0  
8   60      retired divorced secondary      no     545     yes   no unknown   6   may     1030        1    -1        0  
9   37   technician  married secondary      no       1     yes   no unknown   6   may      608        1    -1        0  
10  28     services   single secondary      no    5090     yes   no unknown   6   may     1297        3    -1        0  
11  38       admin.   single secondary      no     100     yes   no unknown   7   may      786        1    -1        0  
12  30  blue-collar  married secondary      no     309     yes   no unknown   7   may     1574        2    -1        0  
13  29   management  married  tertiary      no     199     yes  yes unknown   7   may     1689        4    -1        0  
14  46  blue-collar   single  tertiary      no     460     yes   no unknown   7   may     1102        2    -1        0  
15  31   technician   single  tertiary      no     703     yes   no unknown   8   may      943        2    -1        0  
16  35   management divorced  tertiary      no    3837     yes   no unknown   8   may     1084        1    -1        0  
17  32  blue-collar   single   primary      no     611     yes   no unknown   8   may      541        3    -1        0  
18  49     services  married secondary      no      -8     yes   no unknown   8   may     1119        1    -1        0  
19  41       admin.  married secondary      no      55     yes   no unknown   8   may     1120        2    -1        0  
20  49       admin. divorced secondary      no     168     yes  yes unknown   8   may      513        1    -1        0  
21  28       admin. divorced secondary      no     785     yes   no unknown   8   may      442        2    -1        0  
22  43   management   single  tertiary      no    2067     yes   no unknown   8   may      756        1    -1        0  
23  43   management divorced  tertiary      no     388     yes   no unknown   8   may     2087        2    -1        0  
24  43  blue-collar  married   primary      no    -192     yes   no unknown   8   may     1120        2    -1        0  
25  37   unemployed   single secondary      no     381     yes   no unknown   8   may      985        2    -1        0  
26  35  blue-collar   single secondary      no      40     yes   no unknown   9   may      617        4    -1        0  
27  31   technician   single  tertiary      no      22     yes   no unknown   9   may      483        3    -1        0  
28  43  blue-collar   single secondary      no       3     yes   no unknown   9   may      929        3    -1        0  
29  31       admin.  married secondary      no     307     yes   no unknown   9   may      538        1    -1        0  
30  28  blue-collar   single secondary      no     759     yes   no unknown   9   may      710        1    -1        0

From the above code snippet, we can see that the header row within the data frame contains the name of the columns of the CSV data file as a header for the data frame.

If we do not want a data frame to include a header, then we can pass the argument header=FALSE, and R will generate dummy variables.

> df1 = read.csv("bank.csv", header = F)  
> head(df1,20)  
    V1          V2       V3        V4      V5      V6      V7   V8      V9 V10   V11      V12      V13   V14      V15  
1   59      admin.  married secondary      no    2343     yes   no unknown   5   may     1042        1    -1        0  
2   56      admin.  married secondary      no      45      no   no unknown   5   may     1467        1    -1        0  
3   41  technician  married secondary      no    1270     yes   no unknown   5   may     1389        1    -1        0  
4   55    services  married secondary      no    2476     yes   no unknown   5   may      579        1    -1        0  
5   54      admin.  married  tertiary      no     184      no   no unknown   5   may      673        2    -1        0  
6   42  management   single  tertiary      no       0     yes  yes unknown   5   may      562        2    -1        0  
7   56  management  married  tertiary      no     830     yes  yes unknown   6   may     1201        1    -1        0  
8   60     retired divorced secondary      no     545     yes   no unknown   6   may     1030        1    -1        0  
9   37  technician  married secondary      no       1     yes   no unknown   6   may      608        1    -1        0  
10  28    services   single secondary      no    5090     yes   no unknown   6   may     1297        3    -1        0  
11  38      admin.   single secondary      no     100     yes   no unknown   7   may      786        1    -1        0  
12  30 blue-collar  married secondary      no     309     yes   no unknown   7   may     1574        2    -1        0  
13  29  management  married  tertiary      no     199     yes  yes unknown   7   may     1689        4    -1        0  
14  46 blue-collar   single  tertiary      no     460     yes   no unknown   7   may     1102        2    -1        0  
15  31  technician   single  tertiary      no     703     yes   no unknown   8   may      943        2    -1        0  
16  35  management divorced  tertiary      no    3837     yes   no unknown   8   may     1084        1    -1        0  
17  32 blue-collar   single   primary      no     611     yes   no unknown   8   may      541        3    -1        0  
18  49    services  married secondary      no      -8     yes   no unknown   8   may     1119        1    -1        0  
19  41      admin.  married secondary      no      55     yes   no unknown   8   may     1120        2    -1        0

Structure of data frame

We can also take a look at the structure of data that has been imported. To display the structure of data, we can use the following syntax.

str(df1)

Here df1 is the name of the data frame.

Now I will discuss the structure of the data frame of the bank.csv file.

> df <- read.csv("bank.csv", as.is=TRUE)  
> str(df)  
'data.frame':   11162 obs. of  17 variables:  
 $ age      : int  59 56 41 55 54 42 56 60 37 28 ...  
 $ job      : chr  "admin." "admin." "technician" "services" ...  
 $ marital  : chr  "married" "married" "married" "married" ...  
 $ education: chr  "secondary" "secondary" "secondary" "secondary" ...  
 $ default  : chr  "no" "no" "no" "no" ...  
 $ balance  : int  2343 45 1270 2476 184 0 830 545 1 5090 ...  
 $ housing  : chr  "yes" "no" "yes" "yes" ...  
 $ loan     : chr  "no" "no" "no" "no" ...  
 $ contact  : chr  "unknown" "unknown" "unknown" "unknown" ...  
 $ day      : int  5 5 5 5 5 5 6 6 6 6 ...  
 $ month    : chr  "may" "may" "may" "may" ...  
 $ duration : int  1042 1467 1389 579 673 562 1201 1030 608 1297 ...  
 $ campaign : int  1 1 1 1 2 2 1 1 1 3 ...  
 $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...  
 $ previous : int  0 0 0 0 0 0 0 0 0 0 ...  
 $ poutcome : chr  "unknown" "unknown" "unknown" "unknown" ...  
 $ deposit  : chr  "yes" "yes" "yes" "yes" ...

As we can see the structure of the data frame contains observations and variables. The variables have values which are of integer and character datatype.

Import values using the table function

We can also use read. table function to import values of CSV (comma-separated values) files in R. After reading every value from the CSV file, the values are stored in the data frame.

> df = read.table("bank.csv", header = TRUE)  
>  head(df,10)  
   age.job.marital.education.default.balance.housing.loan.contact.day.month.duration.campaign.pdays.previous.poutcome.deposit  
1                                            59,admin.,married,secondary,no,2343,yes,no,unknown,5,may,1042,1,-1,0,unknown,yes  
2                                               56,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes  
3                                        41,technician,married,secondary,no,1270,yes,no,unknown,5,may,1389,1,-1,0,unknown,yes  
4                                           55,services,married,secondary,no,2476,yes,no,unknown,5,may,579,1,-1,0,unknown,yes  
5                                                54,admin.,married,tertiary,no,184,no,no,unknown,5,may,673,2,-1,0,unknown,yes  
6                                             42,management,single,tertiary,no,0,yes,yes,unknown,5,may,562,2,-1,0,unknown,yes  
7                                         56,management,married,tertiary,no,830,yes,yes,unknown,6,may,1201,1,-1,0,unknown,yes  
8                                           60,retired,divorced,secondary,no,545,yes,no,unknown,6,may,1030,1,-1,0,unknown,yes  
9                                            37,technician,married,secondary,no,1,yes,no,unknown,6,may,608,1,-1,0,unknown,yes  
10                                          28,services,single,secondary,no,5090,yes,no,unknown,6,may,1297,3,-1,0,unknown,yes  
>   
>

Here we are also passing the argument header = TRUE to read. table function as a data frame containing a header row.

Summary

In this article, I demonstrated how to read comma-separated data values in R and store these values in the data frame. I also discussed how to read data values with a row header and without a row header. Two different kinds of functions are used to import comma-separated data values in R. Proper coding snippets and outputs are provided.