R Data Frame Operations - Merging/Joining Two Data Frames In R

Introduction

 
As we have learned previously in What is Data Frame in R and adding Rows/Columns to a Data Frame in R, now we shall learn to merge or join Data Frame in R. Here we shall learn about the different joins available in R to merge two Data Frames in R.
 
Let's start now.
 

Joining or Merging & Its Benefits

 
Merging or Joining is the process of combining two or more different results sets into one result set based on some of the unique columns/attributes and their values. It is required when we need related data into a single place for a particular entity or object. The result of a merge or join operation gives us a consolidated/overall idea about an object into one place and hence it makes us easier to understand and help us to make decisions faster.
 
For example, HR keeps the employee records, such as educational background and previous employment records, and MIS admin keeps records of the target and achievement of an employee. Similarly, Finance admin keeps the records of employee's expenses and investment on an employee like salary, allowances, etc. As we can see here the data is in different places. Once the company's owner, MD or CF, wants to view all these data into a single place at a time when we can get all these data by the joining or merging operation by employee id of a particular employee.
 
What will happen if we do not use join merge or join and here MD wants to know all the records? They need to appoint separately to each admin like Finance admin, HR admin, and so on to show the records of an employee. So, we can see here joining saves time and improves our productivity also.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
The above image clarifies how different departments are joined in an organization.
 

R Data Frame Join

 
To join R Data Frame we use merge() function. We can learn about the merge() function in R studio itself by just typing either ?merge() or simply help(merge). It will show its definition, uses and different arguments it takes in detail as shown below. To know more about how to get help in R you can follow one of my previous article Getting help in R. Additionally to know more about merge function in R you can follow the R documentation link merge() function.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
As we can see from the above images the basic syntax of a merge is as shown below.
  1. #Syntax of R Data Frame merge
  2. merge(x, y, ...)  
Here x and y are the two different data frames or the object to be merged into one.
 

Argument Of R Merge() Function

 
The argument of the merge() function has been denoted by (, ...) as shown in the below image.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
The important arguments of the merge() function have been listed in the table below.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

Creating Data Frame For Example

 
We need to create two different data frames which we will use in R Data Frame merge operations below. We will create those data frames as shown below in the image.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
To create the data frame of the name df.OrdersDetail, we will run the below R script in R studio. 
  1. #Creating vector of three columns  
  2. OrderID <- c("10308""10309""10310""10853""10950")  
  3. CustomerID <- c(237775580)  
  4. OrderDate <- c("18-06-2019""19-03-2019""20-01-2019""20-05-2018""20-03-2018")  
  5. #Using above vector to create data frame  
  6. df.OrdersDetail <- data.frame(OrderID, CustomerID, OrderDate)  
Similar to the above we will use the below R script to create the df.AddressDetails data frame.
  1. #Creating three vectors for the data frame creation df.AddressDetails                     
  2. CustomerID <- c(1237785)  
  3. CustomerName <- c("Alfreds Futterkiste""Ana Trujillo Emparedados C-245",  
  4.                   "Antonio Moreno Taquería""Zachariah E. Somers",  
  5.                   "Jonathan G. Steck")  
  6. Address <- c("Obere Str. 57""Avda. de la Constitución 2222",  
  7.              "Mataderos 2312""1181 Lawman Avenue""Mellingburgredder 82")  
  8. City <- c("Berlin""Alexandria""México D.F.""Alexandria","Herzogenaurach")  
  9. PostalCode <- c(12209502150231485057860)  
  10. Country <- c("Germany""Mexico""Mexico""USA""Germany")  
  11.   
  12. #Using the above vector to create the data frame - df.AddressDetails  
  13. df.AddressDetails <- data.frame(CustomerID, CustomerName, Address, City,  
  14.                                 PostalCode, Country)  
Now, as we have already created both data frames to be used for the merge operation. So, we will print both data frames to see their elements in R studio.
  1. #Printing Both Data Frame to Check elements
  2. df.OrdersDetail  
  3. df.AddressDetails  
When we run the above script it gives the output as shown in the below image which is similar to the above data frames we had decided to create earlier. Now we can use both the data frames easily whenever required.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

Types Of R Data Frame Join

 
There are five important join types available in R to join or merge R Data Frame shown below. 
  • Natural join
  • Full Outer join
  • Left Outer join
  • Right Outer join &
  • Cross join
Now we shall learn about each in detail one by one.
 

R Data Frame Natural Join

 
The Natural join in R Data Frame merge works similar to SQL Server inner join. The output of natural join is based on the common values which exist on both sides of the data frame or object used in the merge.
 
The argument of natural join is all = FALSE. Below syntax is used for R data frame natural join.
  1. #R Data Frame Natural join arguments  
  2. merge(<dfA>, <dfB>, by = ["<Column Name(s)>"], all = FALSE)

The below image demonstrates the output of the data frame's natural join in R.

 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
The below example show the R data frame natural join in R studio. It will give the output as shown in the image below.
  1. #Natural Join example  
  2. merge(df.OrdersDetail, df.AddressDetails, by="CustomerID", all = FALSE)  
Here the output is just only for CustomerID, 2 and 77 as it exists in both data frames.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

R Data Frame Full Outer Join

 
It is just opposite to the Natural join and works as full outer join in SQL Server. The argument of natural join is all = TRUE. Below syntax is used for R Data Frame Full Outer Join. 
  1. #R Data Frame Full outer join syntax  
  2. merge(<dfA>, <dfB>, by = ["<Column Name(s)>"], all = TRUE)  
The output of the data frame full outer join is the combination of both data frames' merged column common values and union of both data frames' uncommon values. The below image shows how R Data Frame full outer join works.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
Here is an example of R Data Frame Full Outer join in R studio.
  1. #Full Outer Join example  
  2. merge(df.OrdersDetail, df.AddressDetails, by="CustomerID", all = TRUE)  
The output will be as shown in the image below. Here we need to remember that we have joined both data frames based on the CustomerID. Hence for the CustomerID which value will not be on another side of each data frame for that CustomerID value of other columns in join output will be shown as NA.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

R Data Frame Left Outer Join

 
The result of the R data frame left outer join is all records of a left object/data frame and the matching records of the right object/data frame. Its output remains the same as the SQL Server left outer join.
 
The argument of R Data Frame's left outer is all = TRUE. The syntax of the data frame left outer join is as shown below.
  1. #R Data Frame left outer join syntax  
  2. merge(<dfA>, <dfB>, by = ["<Column Name(s)>"], all.dfA = TRUE)  
Below image demonstrate the R Data Frame left outer join.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
Below R script shows the output of the R data frame left outer join in R studio with the data frames created earlier.
  1. #Left Outer Join example  
  2. merge(x = df.OrdersDetail, y = df.AddressDetails, by.x = "CustomerID", all.x = TRUE)  
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

R Data Frame Right Outer Join

 
The R data frame Right Outer join works just opposite to the R data frame left outer join. Its output remains similar to the SQL Server right outer join as it returns all the rows of the right data frame and corresponding matching rows of the left data frame.
 
The argument of R data frame right outer join is all.y = TRUE. Below syntax is used for the R data frame right outer join.
  1. #Right outer join syntax  
  2. merge(x, y, by.y = ["<Column Name(s)>"], all.y = TRUE)  
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
Here is an example of the R data frame right outer join. 
  1. #Right Outer Join example  
  2. merge(x = df.OrdersDetail, y = df.AddressDetails, by.y = "CustomerID", all.y = TRUE)  
The output of the R data frame right outer join will look as shown below.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

R Data Frame Cross Join

 
To use the R data frame cross join we pass the argument by = NULL as shown below.
  1. #Cross Join example  
  2. merge(x = df.OrdersDetail, y = df.AddressDetails, by = NULL)  
Important Points To Remember For R Data Frame Join
 
Case Sensitive
 
As we know R is case sensitive we should remember that FALSE and TRUE should be in upper case. Also, the name of the column in by should be exactly as in the data frame.
The below example shows the error when you pass “CustomerID” to “CustomerId” in merge() function. Although the name is the same the case is different so it is giving the error.
 
As shown in the image below it is saying to pass a valid column as “CustomerId” column is not there. It should be “CustomerID”.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 
Name & Data Type of Key 
 
The name and number of the column should be the same in both the data frames to be merged into one otherwise it will show the below error.
Here as shown in the image below I have changed the name of the CustomerID column to the CustomerIDs which is not available in another data frame to which we are merging. So, it is giving the below error.
 
R Data Frame Operations - Merging / Joining Two Data Frames In R
 

Summary

 
In this article, we have learned and explored what is the join or merge and what are the benefits of the merge. We learned also different types of merged available in R to merge two data frames.
 
I hope you learned and enjoyed it. I look forward to seeing your feedback.


Similar Articles