How To Access Data Of Predefined Datasets In R

Introduction

 
Information can emerge out of numerous sources. Various built in packages are included inside R directory. These packages are composed of predefined datasets from which data can be extracted for further analysis. Data can be read in R from a broad range of sources and this information can be perused in a large number of formats.
 
In this article,  I will discuss how to access data of a predefined single dataset as well as multiple datasets belonging to different packages in R. I will discuss different syntaxes which can be used to access either a dataset of a single package or all the datasets belonging to different datasets in R.
 

Accessing predefined datasets in R

 
The directory of R is loaded with various predefined datasets which are packed inside a package called datasets. Availability of different varieties of datasets ensures that different kinds of datasets can be used in different projects. These datasets can be used to apply different kinds of analysis techniques.
 
In R a wide variety of datasets are available in different R packages. The data function data() can be used to list and display datasets that are available inside a particular loaded package.
 
To access the datasets of package dataset we can use the syntax given below,
  1. > data()  
Data sets in package ‘datasets’.
 
The above syntax will give following output,
  1. Data sets in package ‘datasets’:  
  2.   
  3. AirPassengers           Monthly Airline Passenger Numbers 1949-1960  
  4. BJsales                 Sales Data with Leading Indicator  
  5. BJsales.lead (BJsales)  
  6.                         Sales Data with Leading Indicator  
  7. BOD                     Biochemical Oxygen Demand  
  8. CO2                     Carbon Dioxide Uptake in Grass Plants  
  9. ChickWeight             Weight versus age of chicks on different diets  
  10. DNase                   Elisa assay of DNase  
  11. EuStockMarkets          Daily Closing Prices of Major European Stock  
  12.                         Indices, 1991-1998  
  13. Formaldehyde            Determination of Formaldehyde  
  14. HairEyeColor            Hair and Eye Color of Statistics Students  
  15. Harman23.cor            Harman Example 2.3  
  16. Harman74.cor            Harman Example 7.4  
  17. Indometh                Pharmacokinetics of Indomethacin  
  18. InsectSprays            Effectiveness of Insect Sprays  
  19. JohnsonJohnson          Quarterly Earnings per Johnson & Johnson Share  
  20. LakeHuron               Level of Lake Huron 1875-1972  
  21. LifeCycleSavings        Intercountry Life-Cycle Savings Data  
  22. Loblolly                Growth of Loblolly pine trees  
  23. Nile                    Flow of the River Nile  
  24. Orange                  Growth of Orange Trees  
  25. OrchardSprays           Potency of Orchard Sprays  
  26. PlantGrowth             Results from an Experiment on Plant Growth  
  27. Puromycin               Reaction Velocity of an Enzymatic Reaction  
  28. Seatbelts               Road Casualties in Great Britain 1969-84  
  29. Theoph                  Pharmacokinetics of Theophylline  
  30. Titanic                 Survival of passengers on the Titanic  
  31. ToothGrowth             The Effect of Vitamin C on Tooth Growth in  
  32.                         Guinea Pigs  
  33. UCBAdmissions           Student Admissions at UC Berkeley  
  34. UKDriverDeaths          Road Casualties in Great Britain 1969-84  
  35. UKgas                   UK Quarterly Gas Consumption  
  36. USAccDeaths             Accidental Deaths in the US 1973-1978  
  37. USArrests               Violent Crime Rates by US State  
  38. USJudgeRatings          Lawyers' Ratings of State Judges in the US  
  39.                         Superior Court  
  40. USPersonalExpenditure   Personal Expenditure Data  
  41. UScitiesD               Distances Between European Cities and Between  
  42.                         US Cities  
  43. VADeaths                Death Rates in Virginia (1940)  
  44. WWWusage                Internet Usage per Minute  
  45. WorldPhones             The World's Telephones  
  46. ability.cov             Ability and Intelligence Tests  
  47. airmiles                Passenger Miles on Commercial US Airlines,  
  48.                         1937-1960  
  49. airquality              New York Air Quality Measurements  
  50. anscombe                Anscombe's Quartet of 'Identical' Simple Linear  
  51.                         Regressions  
  52. attenu                  The Joyner-Boore Attenuation Data  
  53. attitude                The Chatterjee-Price Attitude Data  
  54. austres                 Quarterly Time Series of the Number of  
  55.                         Australian Residents  
  56. beaver1 (beavers)       Body Temperature Series of Two Beavers  
  57. beaver2 (beavers)       Body Temperature Series of Two Beavers  
  58. cars                    Speed and Stopping Distances of Cars  
  59. chickwts                Chicken Weights by Feed Type  
  60. co2                     Mauna Loa Atmospheric CO2 Concentration  
  61. crimtab                 Student's 3000 Criminals Data  
  62. discoveries             Yearly Numbers of Important Discoveries  
  63. esoph                   Smoking, Alcohol and (O)esophageal Cancer  
  64. euro                    Conversion Rates of Euro Currencies  
  65. euro.cross (euro)       Conversion Rates of Euro Currencies  
  66. eurodist                Distances Between European Cities and Between  
  67.                         US Cities  
  68. faithful                Old Faithful Geyser Data  
  69. fdeaths (UKLungDeaths)  
  70.                         Monthly Deaths from Lung Diseases in the UK  
  71. freeny                  Freeny's Revenue Data  
  72. freeny.x (freeny)       Freeny's Revenue Data  
  73. freeny.y (freeny)       Freeny's Revenue Data  
  74. infert                  Infertility after Spontaneous and Induced  
  75.                         Abortion  
  76. iris                    Edgar Anderson's Iris Data  
  77. iris3                   Edgar Anderson's Iris Data  
  78. islands                 Areas of the World's Major Landmasses  
  79. ldeaths (UKLungDeaths)  
  80.                         Monthly Deaths from Lung Diseases in the UK  
  81. lh                      Luteinizing Hormone in Blood Samples  
  82. longley                 Longley's Economic Regression Data  
  83. lynx                    Annual Canadian Lynx trappings 1821-1934  
  84. mdeaths (UKLungDeaths)  
  85.                         Monthly Deaths from Lung Diseases in the UK  
  86. morley                  Michelson Speed of Light Data  
  87. mtcars                  Motor Trend Car Road Tests  
  88. nhtemp                  Average Yearly Temperatures in New Haven  
  89. nottem                  Average Monthly Temperatures at Nottingham,  
  90.                         1920-1939  
  91. npk                     Classical N, P, K Factorial Experiment  
  92. occupationalStatus      Occupational Status of Fathers and their Sons  
  93. precip                  Annual Precipitation in US Cities  
  94. presidents              Quarterly Approval Ratings of US Presidents  
  95. pressure                Vapor Pressure of Mercury as a Function of  
  96.                         Temperature  
  97. quakes                  Locations of Earthquakes off Fiji  
  98. randu                   Random Numbers from Congruential Generator  
  99.                         RANDU  
  100. rivers                  Lengths of Major North American Rivers  
  101. rock                    Measurements on Petroleum Rock Samples  
  102. sleep                   Student's Sleep Data  
  103. stack.loss (stackloss)  
  104.                         Brownlee's Stack Loss Plant Data  
  105. stack.x (stackloss)     Brownlee's Stack Loss Plant Data  
  106. stackloss               Brownlee's Stack Loss Plant Data  
  107. state.abb (state)       US State Facts and Figures  
  108. state.area (state)      US State Facts and Figures  
  109. state.center (state)    US State Facts and Figures  
  110. state.division (state)  
  111.                         US State Facts and Figures  
  112. state.name (state)      US State Facts and Figures  
  113. state.region (state)    US State Facts and Figures  
  114. state.x77 (state)       US State Facts and Figures  
  115. sunspot.month           Monthly Sunspot Data, from 1749 to "Present"  
  116. sunspot.year            Yearly Sunspot Data, 1700-1988  
  117. sunspots                Monthly Sunspot Numbers, 1749-1983  
  118. swiss                   Swiss Fertility and Socioeconomic Indicators  
  119.                         (1888) Data  
  120. treering                Yearly Treering Data, -6000-1979  
  121. trees                   Diameter, Height and Volume for Black Cherry  
  122.                         Trees  
  123. uspop                   Populations Recorded by the US Census  
  124. volcano                 Topographic Information on Auckland's Maunga  
  125.                         Whau Volcano  
  126. warpbreaks              The Number of Breaks in Yarn during Weaving  
  127. women                   Average Heights and Weights for American Women  
In R, datasets available in each and every package can be listed and displayed using the following syntax,
  1. > data(package = .packages(all.available = TRUE))  
To list the data sets in all available packages we can use the above syntax.
 
The above syntax will display a complete list of all the datasets that are loaded in all kinds of different packages that are available and preinstalled in directory of R.
 
The above syntax will give the following output,
  1. Data sets in package ‘boot’:  
  2.   
  3. acme                    Monthly Excess Returns  
  4. aids                    Delay in AIDS Reporting in England and Wales  
  5. aircondit               Failures of Air-conditioning Equipment  
  6. aircondit7              Failures of Air-conditioning Equipment  
  7. amis                    Car Speeding and Warning Signs  
  8. aml                     Remission Times for Acute Myelogenous Leukaemia  
  9. beaver                  Beaver Body Temperature Data  
  10. bigcity                 Population of U.S. Cities  
  11. brambles                Spatial Location of Bramble Canes  
  12. breslow                 Smoking Deaths Among Doctors  
  13. calcium                 Calcium Uptake Data  
  14. cane                    Sugar-cane Disease Data  
  15. capability              Simulated Manufacturing Process Data  
  16. catsM                   Weight Data for Domestic Cats  
  17. cav                     Position of Muscle Caveolae  
  18. cd4                     CD4 Counts for HIV-Positive Patients  
  19. cd4.nested              Nested Bootstrap of cd4 data  
  20. channing                Channing House Data  
  21. city                    Population of U.S. Cities  
  22. claridge                Genetic Links to Left-handedness  
  23. cloth                   Number of Flaws in Cloth  
  24. co.transfer             Carbon Monoxide Transfer  
  25. coal                    Dates of Coal Mining Disasters  
  26. darwin                  Darwin's Plant Height Differences  
  27. dogs                    Cardiac Data for Domestic Dogs  
  28. downs.bc                Incidence of Down's Syndrome in British  
  29.                         Columbia  
  30. ducks                   Behavioral and Plumage Characteristics of  
  31.                         Hybrid Ducks  
  32. fir                     Counts of Balsam-fir Seedlings  
  33. frets                   Head Dimensions in Brothers  
  34. grav                    Acceleration Due to Gravity  
  35. gravity                 Acceleration Due to Gravity  
  36. hirose                  Failure Time of PET Film  
  37. islay                   Jura Quartzite Azimuths on Islay  
  38. manaus                  Average Heights of the Rio Negro river at  
  39.                         Manaus  
  40. melanoma                Survival from Malignant Melanoma  
  41. motor                   Data from a Simulated Motorcycle Accident  
  42. neuro                   Neurophysiological Point Process Data  
  43. nitrofen                Toxicity of Nitrofen in Aquatic Systems  
  44. nodal                   Nodal Involvement in Prostate Cancer  
  45. nuclear                 Nuclear Power Station Construction Data  
  46. paulsen                 Neurotransmission in Guinea Pig Brains  
  47. poisons                 Animal Survival Times  
  48. polar                   Pole Positions of New Caledonian Laterites  
  49. remission               Cancer Remission and Cell Activity  
  50. salinity                Water Salinity and River Discharge  
  51. survival                Survival of Rats after Radiation Doses  
  52. tau                     Tau Particle Decay Modes  
  53. tuna                    Tuna Sighting Data  
  54. urine                   Urine Analysis Data  
  55. wool                    Australian Relative Wool Prices  
  56.   
  57. Data sets in package ‘cluster’:  
  58.   
  59. agriculture             European Union Agricultural Workforces  
  60. animals                 Attributes of Animals  
  61. chorSub                 Subset of C-horizon of Kola Data  
  62. flower                  Flower Characteristics  
  63. plantTraits             Plant Species Traits Data  
  64. pluton                  Isotopic Composition Plutonium Batches  
  65. ruspini                 Ruspini Data  
  66. votes.repub             Votes for Republican Candidate in Presidential  
  67.                         Elections  
  68. xclara                  Bivariate Data Set with 3 Clusters  
  69.   
  70. Data sets in package ‘datasets’:  
  71.   
  72. AirPassengers           Monthly Airline Passenger Numbers 1949-1960  
  73. BJsales                 Sales Data with Leading Indicator  
  74. BJsales.lead (BJsales)  
  75.                         Sales Data with Leading Indicator  
  76. BOD                     Biochemical Oxygen Demand  
  77. CO2                     Carbon Dioxide Uptake in Grass Plants  
  78. ChickWeight             Weight versus age of chicks on different diets  
  79. DNase                   Elisa assay of DNase  
  80. EuStockMarkets          Daily Closing Prices of Major European Stock  
  81.                         Indices, 1991-1998  
  82. Formaldehyde            Determination of Formaldehyde  
  83. HairEyeColor            Hair and Eye Color of Statistics Students  
  84. Harman23.cor            Harman Example 2.3  
  85. Harman74.cor            Harman Example 7.4  
  86. Indometh                Pharmacokinetics of Indomethacin  
  87. InsectSprays            Effectiveness of Insect Sprays  
  88. JohnsonJohnson          Quarterly Earnings per Johnson & Johnson Share  
  89. LakeHuron               Level of Lake Huron 1875-1972  
  90. LifeCycleSavings        Intercountry Life-Cycle Savings Data  
  91. Loblolly                Growth of Loblolly pine trees  
  92. Nile                    Flow of the River Nile  
  93. Orange                  Growth of Orange Trees  
  94. OrchardSprays           Potency of Orchard Sprays  
  95. PlantGrowth             Results from an Experiment on Plant Growth  
  96. Puromycin               Reaction Velocity of an Enzymatic Reaction  
  97. Seatbelts               Road Casualties in Great Britain 1969-84  
  98. Theoph                  Pharmacokinetics of Theophylline  
  99. Titanic                 Survival of passengers on the Titanic  
  100. ToothGrowth             The Effect of Vitamin C on Tooth Growth in  
Data within any dataset can be accessed using the data function data(), to access a specific dataset. Within some packages, we can pass the name of the dataset and the package where the data is found as follows,
  1. > data("cars", package = "datasets")  
car dataset in package ‘datasets’,
  1. > data("cars", package = "datasets")  
  2. > head(cars)  
  3.   speed dist  
  4. 1     4    2  
  5. 2     4   10  
  6. 3     7    4  
  7. 4     7   22  
  8. 5     8   16  
  9. 6     9   10  
  10. >  
As we can see, cars data frame contains two variables, which are speed and stopping distance of cars.
 

Summary

 
In this article, I demonstrated how to access a single dataset as well as multiple datasets belonging to different packages in R. I had discussed different syntaxes which can be used to access either a dataset of a single package or all the datasets belonging to different datasets in R. Proper coding snippets and output are also provided.