Post

Factors In R

Introduction

Factors are data objects used for the purpose of categorizing data and then storing them under levels. They can be used for storage of both strings and integers. Factors are only useful in the columns with a limited number of unique values. They are good in data analysis and statistical modeling.

Creation of Factors

To create factors in R, we use the factor() method and use a vector as the input. Consider the example given below showing how this function can be used:
1. d <- c("East","West","East","North","North","East","West","West","West","East","North")
Let us now see the contents of the vector:
1. > d <- c("East","West","East","North","North","East","West","West","West","East","North")
2. > d
3.  [1"East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"
4. [10"East"  "North"

To check whether d is a factor or not, we use the is.factor() attribute, as shown below:
1. is.factor(d)
The script returns the following:
1. > d <- c("East","West","East","North","North","East","West","West","West","East","North")
2. > d
3.  [1"East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"
4. [10"East"  "North"
5. >
6. is.factor(d)
7. [1] FALSE

Object d is not a factor. It is a vector. We need to call the factor() method and pass the name of the vector to it.

The vector will be changed to a factor:
1. # Applying the factor function.
2. factor_data <- factor(d)
3. >
4. > factor_data <- factor(d)
5. >

Let us need the contents of the factor and determine whether d is a factor or not,
1. is.factor(factor_data)
Execution of the program should give the following output:
1. is.factor(factor_data)
2. [1] TRUE
3. >

The output shows that we already have a factor. We have successfully created a factor from a vector by calling the factor() method.

We can also create a factor from a data frame. Once you have created a data frame having a column of text data, R treats the next column as categorical data and then creates factors on it. Consider the example given below showing how this can be done:
1. # Creating the vectors for the data frame.
2. height <- c(140,152,164,137,166,157,112)
3. weight <- c(38,49,76,54,97,22,30)
4. gender <- c("male","male","female","female","male","female","male")
5. > height <- c(140,152,164,137,166,157,112)
6. > weight <- c(38,49,76,54,97,22,30)
7. > gender <- c("male","male","female","female","male","female","male")

Creating the data frame
1. input_data <- data.frame(height,weight,gender)input_data <- data.frame(height,weight,gender)
Let us view the contents of the data frame:
1. > input_data <- data.frame(height,weight,gender)
2. > input_data
3.   height weight gender
4. 1    140     38   male
5. 2    152     49   male
6. 3    164     76 female
7. 4    137     54 female
8. 5    166     97   male
9. 6    157     22 female
10. 7    112     30   male

Let us check whether the column gender is a factor or not:
1. is.factor(input_data\$gender)
It returns the following output:
1. is.factor(input_data\$gender)
2. [1] FALSE

Yes, the column is a factor.

We can now print the gender column to see the levels:
1. input_data\$gender
The script will return the following output:
1. > input_data\$gender

The order of the levels contained in a factor can be changed by applying the factor function again while specifying the new order of the levels.

Consider the example given below:
1. d <- c("East","West","East","North","North","East","West","West","West","East","North")
2. > d <- c("East","West","East","North","North","East","West","West","West","East","North")
Let us create the factors:
1. factor_data <- factor(d)
2. > factor_data <- factor(d)
Let us display the factor data:
1. > factor_data
2.  [1] East  West  East  North North East  West  West  West  East  North
3. Levels: East North West
Let us now apply the factor function and the required order for the level,
1. new_order_data <- factor(factor_data,levels = c("East","West","North"))
The above syntax will give the following output:
1. >
2. > new_order_data <- factor(factor_data,levels = c("East","West","North"))
3. > new_order_data
4. [1] East  West  East  North North East  West  West  West  East  North
5. Levels: East West North
6. >
Let us view the data:

1. > new_order_data <- factor(factor_data,levels = c("East","West","North"))
2. > new_order_data
3.  [1] East  West  East  North North East  West  West  West  East  North
4. Levels: East West North
In R, we can generate factor levels using the “gl()” function. The function will take two integers, in which the first integer will specify the number of levels while the second integer will specify the number of times for each level.

The function takes the syntax as gl(n, k, labels)

The following parameters have been used in the above syntax:
• n- This is an integer which defines the number of levels.
• k- This is an integer that specifies the number of replications.
• labels- this is a vector of labels representing the resulting factor levels.
Consider the example given below which shows how the function can be used:
1. vec <- gl(23, labels = c("Texas""Seattle","Boston"))
Then we print the contents of the vector,
1. > vec <- gl(23, labels = c("Texas""Seattle","Boston"))
2. > vec
3. [1] Texas   Texas   Texas   Seattle Seattle Seattle
4. Levels: Texas Seattle Boston
5. >

We can also create a factor directly from the factor() function. The following example demonstrates this:
1. x <- factor(c("Married""married""single""single"));
We can then print out the contents of the factor:
1. > x <- factor(c("Married""married""single""single"));
2. > x
3. [1] Married married single  single
4. Levels: married Married single
5. >

The elements of a factor can be accessed in the same way as those of a vector. For example, Here is our factor x.

Let us access the 2nd element of the factor:
1. x[2]
The script will run as follows:
1. > x[2]
2. [1] married
3. Levels: married Married single
4. >

Let us access the 1st and the 3rd elements of the factor:
1. x[c(13)]
It will return the following:
1. >
2. > x[c(13)]
3. [1] Married single
4. Levels: married Married single
5. >

Let us access all the factor elements except for the 1st one:
1. x[-1]
It prints the following output:
1. >
2. > x[-1]
3. [1] married single  single
4. Levels: married Married single
5. >

To modify the elements of a vector, we only have to use simple reassignments. However, it’s impossible for us to choose components outside its predefined levels.

Here is an example:
1. x[3] <- "married"
We have changed the value of the 3rd element from single to married.

The code should run as follows:
1. >
2. > x[3] <- "married"
3. >
4.
5. > x
6. [1] Married married married single
7. Levels: married Married single
8. >

The above output shows that the change was made successfully. In our case, we only have two levels, married and single. If we attempt to assign a value that is outside this, we will get a warning message.

Here is an example:
1. x[3] <- "divorced"
This will run as follows:
1. >
2. > x[3] <- "divorced"
3. Warning message:
4. In `[<-.factor`(`*tmp*`, 3, value = "divorced") :
5.   invalid factor level, NA generated
6. >

Summary

In this article, I demonstrated how to create factors in R using R console and perform various operations on a factor such as accessing factor elements using indexing technique, accessing elements which are not in factor, and modifying the elements of a factor. Proper coding snippets along with output have been provided.

Recommended Free Ebook
Similar Articles