Compute the following summary statistics for each of the four variables in the dataset:

  • Count
  • Minimum
  • Maximum
  • Range
  • Mean
  • Median
  • Mode
  • Variance
  • Skewness
  • Kurtosis
  • Coefficient of variation
  • Trimean
  • Yule coefficient
  • First and third quartile
  • Interquartile range
  • Percentiles: 1%, 2.5%, 5%, 10%, 20%, 80%, 95%, 97.5%, 99%

The following built-in functions can be accessed from R's base packages to compute summary statistics:

  • nrow
  • min
  • max
  • mean
  • median
  • var
  • quantile

In addition, the following functions from the moments package can be used to compute the skewness and kurtosis of the data:

  • moments::skewness
  • moments::kurtosis

Double Colon Operator ::

The double colon operator, ::, is used to access a specific function from a package (package::function). The code moments::skewness accesses the skewness function from the moments package. The double colon operator is not needed for base packages.

Use the code below to compute the summary statistics.

Compute Summary Statistics

# Summary Statistics
n <- nrow(eda_data)
min <- apply(eda_data, 2, min)
max <- apply(eda_data, 2, max)
range <- max - min
mean <- apply(eda_data, 2, mean)
median <- apply(eda_data, 2, median)
mode <- apply(eda_data, 2, getMode)
variance <- apply(eda_data, 2, var)
skewness <- apply(eda_data, 2, moments::skewness)
kurtosis <- apply(eda_data, 2, moments::kurtosis)
cv <- sqrt(variance)/mean
mh <- apply(eda_data, 2, midhinge)
trimean <-  0.5*(mh + median)
yule_coeff <- apply(eda_data, 2, yule)
iqr <- apply(eda_data, 2, IQR)

pr <- c(0.010, 0.025, 0.050, 0.100, 0.200, 0.800, 0.950, 0.975, 0.990)  

var1_pctile <- quantile(pull(eda_data, Var1), probs = pr)
var2_pctile <- quantile(pull(eda_data, Var2), probs = pr)
var3_pctile <- quantile(pull(eda_data, Var3), probs = pr)
var4_pctile <- quantile(pull(eda_data, Var4), probs = pr)
CODE

Explanation of the Summary Statistics Computations

The apply(x, MARGIN, FUN) function takes a dataframe or matrix, x, as an input and applies a function FUN to the specified margin (MARGIN=1 applies the function to rows and MARGIN=2 applies the function to columns). The function FUN can be a user-defined or built-in function.

The pull function is used to extract a single column from a dataframe.

R Libraries

The libraries in R are open-source, meaning that you can view and edit the underlying code. To view the source code for a function, put your cursor within the function name text and click the F2 key. 

By running the script you have created up to this point, you will be able to see lists of summary statistics values in the Environment tab in the RStudio interface. Each summary statistic can be viewed in this pane, or to print a list of the summary statistic values for each variable in the Console, type the name of the statistic (e.g. skewness) and press enter.

Continue to Task 4. Data Visualization