Disclaimer: The United States Army Corps of Engineers has granted access to these data for instructional purposes only. Do not copy, forward, or release the information without United States Army Corps of Engineers approval.

If at any time you need a refresher on working with R commands or on how to use RStudio, feel free to look back at the R Introduction or the RStudio introduction pages.


Project Files

Download the initial project files here:

In this workshop, you will use the tools of exploratory data analysis (EDA) to investigate the data contained within a multivariate dataset. You will have 75 minutes to develop summaries and visualizations of the data, identify issues with the data that might make inference regarding the data more challenging, and make recommendations for further analysis. This workshop is open-ended and provides you an opportunity to try many different approaches to understanding the data that you have been provided. Three basic tasks are up to you to decide how to complete:

  1. Develop appropriate summaries for each of the variables in the dataset
  2. Create visualizations which help understand the nature each variable
  3. Provide recommendations for further analysis of the dataset

The dataset linked above contains 90 observations of 4 variables, which are in observation order (the second row was observed after the first row, and so on). The dataset is provided in an Excel workbook. You will use the R programming language and the RStudio integrated development environment (IDE) to analyze the data. There are no missing observations (so any zero values are real observations) and not all of the datasets are strictly positive. Consider the behavior of each variable (column) by itself before investigating any possible relationship between the variables.

Recall some of the questions you might ask of a dataset:

  • What does the data look like?
  • What is a typical value?
  • How much do data in a sample vary?
  • What is a good model for a set of data?
  • How different are two sets of data?
  • Is a dataset taken from a single population?
  • Were the samples taken independently?

The tools available to you for the workshop are:

You should spend 15 minutes on each of the following tasks:

Task 2. Getting Started in R

Task 3. Compute Summary Statistics

Task 4. Data Visualization

Task 5. Workshop Questions