Data Parsing

Workshop_DataParsing.zip

Overview

In this workshop you will complete a script that reads a DRM forecast file and outputs the forecast for July by project and forecast month. You will need to open the forecast file, read the data into lines and parse the lines into specified data structures. The output portion of the script is already complete and will print correctly only if you succeed in parsing the data correctly.

The Input File

The input file is an actual DRM forecast file for the year 2008. Its contents are shown below.

FORECAST                2008    KA-F                           

FC_Mth  Month   FTPK    GARR    OAHE    FTRA    GAPT    SUX

Jan     01-2008 250     210     10      20      90      30
Jan     02-2008 300     300     80      40      100     70
Jan     03-2008 300     500     280     150     100     150
Jan     04-2008 325     530     250     80      90      160
Jan     05-2008 900     1000    280     120     160     275
Jan     06-2008 1300    2000    380     140     160     270
Jan     07-2008 650     1400    180     55      120     215
Feb     02-2008 300     260     50      20      100     70
Feb     03-2008 300     500     280     150     100     150
Feb     04-2008 325     530     250      80      90     160
Feb     05-2008 900     1000    280     120     160     275
Feb     06-2008 1300    2100    380     140     160     270
Feb     07-2008 680     1400    160     55      120     215
Mar     03-2008 300     500     250     160     200     220
Mar     04-2008 325     530     250     90      130     220
Mar     05-2008 900     1000    280     120     160     275
Mar     06-2008 1300    2100    380     140     160     270
Mar     07-2008 680     1400    160     55      120     215
Apr     04-2008 325     500     150     55      160     340
Apr     05-2008 900     1000    250     120     160     275
Apr     06-2008 1320    2120    330     140     160     270
Apr     07-2008 650     1420    140     55      120     215
May     05-2008 900     1000    200     90      185     320
May     06-2008 1320    2120    330     140     180     270
May     07-2008 650     1420    140     55      135     215
Jun     06-2008 1450    2300    400     150     200     330
Jun     07-2008 650     1420    140     55      135     215
Jul     07-2008 700     1815    300     100     160     280

The Output

Upon successful completion, the script should generate the following output. Notice that the projects are in alphabetical order and that only the forecast values for July, 2008 are included in the output.

July, 2008 DRM Forecast by Project and Forecast Month in KA-F

Project:     Jan     Feb     Mar     Apr     May     Jun     Jul

FTPK   :   650.0   680.0   680.0   650.0   650.0   650.0   700.0
FTRA   :    55.0    55.0    55.0    55.0    55.0    55.0   100.0
GAPT   :   120.0   120.0   120.0   120.0   135.0   135.0   160.0
GARR   :  1400.0  1400.0  1400.0  1420.0  1420.0  1420.0  1815.0
OAHE   :   180.0   160.0   160.0   140.0   140.0   140.0   300.0
SUX    :   215.0   215.0   215.0   215.0   215.0   215.0   280.0

Edit the Script in IDLE

1 import sys
2 fcst_filename = "fc2008.txt"
3 
4 with open(fcst_filename, "r") as f : 
5      lines = None # parse file into lines
6 
7 projects = [] 
8 columns = {} 
9 fcst_vals = {}
10 fcst_months = ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul")
11 for i in range(1, len(lines)) :
12      if i == 1 :
13           projects = None # parse projects from line
14           # populate columns variable such that columnsproject = the field 
15           # number for that project in the lines (e.g. columns'OAHE' = 4
16      else :
17           fields = None # split current line into fields
18           fcst_month, month, values = None # assign the fields to these variables
19           # make sure that the values variable is a list of valid floats
20           # make sure that fcst_month variable is contained in fcst_months variable
21           # assign the values to fcst_valsfcst_month only if the month is July
22 
23 print("\nJuly, 2008 DRM Forecast by Project and Forecast Month in KA-F\n") 
24 sys.stdout.write("Project:")
25 for fcst_month in fcst_months : sys.stdout.write("%8s" % fcst_month)
26 sys.stdout.write("\n\n")
27 projects.sort()
28 for project in projects :
29      sys.stdout.write("%-7s:" % project)
30      for fcst_month in fcst_months :
31           fcst_val = fcst_valsfcst_month[columnsproject]
32           sys.stdout.write("%8.1f" % fcst_val)
33      sys.stdout.write("\n")

Read the Input File

Line 4 of the skeleton script uses the "with" statement to create a context manager for reading the input file. Edit line 5 (and possibly more lines) to read the file into the lines variable. The lines variable should be a list of strings with each string containing exactly one line of the file.

Parse the Projects from the Second Input Line

Line 11 tests for the 2^nd line (offset 1) of the input file. Modify the script to parse the project names into the projects variable. Its contents should be:

[‘FTPK’, ‘GARR’, ‘OAHE’, ‘FTRA’, ‘GAPT’, ‘SUX’]

Since we don’t output the projects in this order, we also need to identify each project’s position in this list. Populate the columns variable such that columns[project] = its position in the list. For example, columns[‘OAHE’] should equal 2.

Parse Each Additional Line

All lines after the 2^nd line are processed in lines 17-21 of the skeleton script. For each of these input lines you should:

Split the line on whitespace and assign the sequence to the fields variable.
Assign the first two values of the fields sequence to the fcst_month and month variables.
Assign the remaining values of the fields sequence to a sequence variable named values as floats.
Verify that the fcst_month variable contains a value in the fcst_months tuple.
If (and only if) the value of the month variable is July, assign the values variable to the fcst_vals dictionary using the value of the fcst_month variable as the key.

If you are successful, running the script will produce the output shown above.