Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The R statement data( iris ) loads the built-in R data set iris . This dataset h

ID: 3664322 • Letter: T

Question

The R statement data(iris) loads the built-in R data set iris. This dataset has 150 rows and 5 columns. The 5 column names are
"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width" and "Species"

Write R code(s) to find the average values of Sepal.Length (the first column) for three species “setosa”, “versicolor”, and “virginica” (the fifth column), respectively.

The returned results should be as follows:

setosaver

sicolor

virginica

5.006

5.936

6.588

setosaver

sicolor

virginica

5.006

5.936

6.588

Explanation / Answer

First load the data.

Use the head function to see the first few observations.

By specifying $ it is possible to specify just the species column in head

To create a new data frame of the top half of iris we use the index of the iris data frame. Here 1:I(nrow(iris)/2) tells R to take the first row and the row that divides the data. the I() allows R to interpret the division of two literally. The blank after the comma tells R we want all columns.

Knowing the dimensions of your data frame make manipulation much easier. dim is used to take the dimensions of iris.top. [1] specifies we want the first list in dims output, the number of rows in iris.top. [2] specifies we want the second list in dims output, the number of columns in iris.top.

The above code starts by using the data function to upload iris to the global enviroment. If you are using Rstudio iris should come up in the global enviroment.

The head() function allows us to examine the first few top rows. There are 5 columns of 150 rows with Species being the class seperator. It is also possible to use the head() function on a single column to examine a single row. Attaching $Species to iris allows us to access the column labeled species. Here iris$Species is a factor with three different levels.

Another way to subset the data is to call specific rows and columns from the object. This takes the form (object[N_{t}:N_{t+k},M_{v}:M_{v+k}]) on the end of the data object where N is rows and M are columns. In iris.top, the function takes the first half of the rows and since no columns are specified R takes all of them. The function I() inside of the call forces R to treat everything inside of it with standard PEMDAS rules. This is complicated. If you ever get something weird from a simple multiply or divide, put an I() around it.. iris.top successfully pulled out the top half of the data set as proven by the dim function. The dim() function accepts a dataset and will either return both the number of columns and rows or it is possible to specify, like in the example, by saying one for rows or two for columns. Why did iris.sub fail? Note that the number of columns selected are more than the possible selection of columns and so R stops. To include more columns on this dataset for later it is necessary to create them and either merge them onto the existing database after creation or create columns of 1s to multiply by later.

Statistics and Aggregates

This sections code shows how to create some summary statistics that describe the iris dataset. The R function summary() can be used to find quartiles, min, max, average, and median of the dataset. This is similar to SAS’s means function.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote