Download PDF
Download page Data Parsing.
Data Parsing
Workshop_DataParsing.zip
Overview
In this workshop you will complete a script that reads a DRM forecast file and outputs the forecast for July by project and forecast month. You will need to open the forecast file, read the data into lines and parse the lines into specified data structures. The output portion of the script is already complete and will print correctly only if you succeed in parsing the data correctly.
The Input File
The input file is an actual DRM forecast file for the year 2008. Its contents are shown below.
FORECAST 2008 KA-F
FC_Mth Month FTPK GARR OAHE FTRA GAPT SUX
Jan 01-2008 250 210 10 20 90 30
Jan 02-2008 300 300 80 40 100 70
Jan 03-2008 300 500 280 150 100 150
Jan 04-2008 325 530 250 80 90 160
Jan 05-2008 900 1000 280 120 160 275
Jan 06-2008 1300 2000 380 140 160 270
Jan 07-2008 650 1400 180 55 120 215
Feb 02-2008 300 260 50 20 100 70
Feb 03-2008 300 500 280 150 100 150
Feb 04-2008 325 530 250 80 90 160
Feb 05-2008 900 1000 280 120 160 275
Feb 06-2008 1300 2100 380 140 160 270
Feb 07-2008 680 1400 160 55 120 215
Mar 03-2008 300 500 250 160 200 220
Mar 04-2008 325 530 250 90 130 220
Mar 05-2008 900 1000 280 120 160 275
Mar 06-2008 1300 2100 380 140 160 270
Mar 07-2008 680 1400 160 55 120 215
Apr 04-2008 325 500 150 55 160 340
Apr 05-2008 900 1000 250 120 160 275
Apr 06-2008 1320 2120 330 140 160 270
Apr 07-2008 650 1420 140 55 120 215
May 05-2008 900 1000 200 90 185 320
May 06-2008 1320 2120 330 140 180 270
May 07-2008 650 1420 140 55 135 215
Jun 06-2008 1450 2300 400 150 200 330
Jun 07-2008 650 1420 140 55 135 215
Jul 07-2008 700 1815 300 100 160 280
The Output
Upon successful completion, the script should generate the following output. Notice that the projects are in alphabetical order and that only the forecast values for July, 2008 are included in the output.
July, 2008 DRM Forecast by Project and Forecast Month in KA-F
Project: Jan Feb Mar Apr May Jun Jul
FTPK : 650.0 680.0 680.0 650.0 650.0 650.0 700.0
FTRA : 55.0 55.0 55.0 55.0 55.0 55.0 100.0
GAPT : 120.0 120.0 120.0 120.0 135.0 135.0 160.0
GARR : 1400.0 1400.0 1400.0 1420.0 1420.0 1420.0 1815.0
OAHE : 180.0 160.0 160.0 140.0 140.0 140.0 300.0
SUX : 215.0 215.0 215.0 215.0 215.0 215.0 280.0
Edit the Script in IDLE
1 import sys
2 fcst_filename = "fc2008.txt"
3
4 with open(fcst_filename, "r") as f :
5 lines = None # parse file into lines
6
7 projects = []
8 columns = {}
9 fcst_vals = {}
10 fcst_months = ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul")
11 for i in range(1, len(lines)) :
12 if i == 1 :
13 projects = None # parse projects from line
14 # populate columns variable such that columnsproject = the field
15 # number for that project in the lines (e.g. columns'OAHE' = 4
16 else :
17 fields = None # split current line into fields
18 fcst_month, month, values = None # assign the fields to these variables
19 # make sure that the values variable is a list of valid floats
20 # make sure that fcst_month variable is contained in fcst_months variable
21 # assign the values to fcst_valsfcst_month only if the month is July
22
23 print("\nJuly, 2008 DRM Forecast by Project and Forecast Month in KA-F\n")
24 sys.stdout.write("Project:")
25 for fcst_month in fcst_months : sys.stdout.write("%8s" % fcst_month)
26 sys.stdout.write("\n\n")
27 projects.sort()
28 for project in projects :
29 sys.stdout.write("%-7s:" % project)
30 for fcst_month in fcst_months :
31 fcst_val = fcst_valsfcst_month[columnsproject]
32 sys.stdout.write("%8.1f" % fcst_val)
33 sys.stdout.write("\n")
Read the Input File
Line 4 of the skeleton script uses the "with" statement to create a context manager for reading the input file. Edit line 5 (and possibly more lines) to read the file into the lines variable. The lines variable should be a list of strings with each string containing exactly one line of the file.
Parse the Projects from the Second Input Line
Line 11 tests for the 2nd line (offset 1) of the input file. Modify the script to parse the project names into the projects variable. Its contents should be:
[‘FTPK’, ‘GARR’, ‘OAHE’, ‘FTRA’, ‘GAPT’, ‘SUX’]
Since we don’t output the projects in this order, we also need to identify each project’s position in this list. Populate the columns variable such that columns[project] = its position in the list. For example, columns[‘OAHE’] should equal 2.
Parse Each Additional Line
All lines after the 2nd line are processed in lines 17-21 of the skeleton script. For each of these input lines you should:
- Split the line on whitespace and assign the sequence to the fields variable.
- Assign the first two values of the fields sequence to the fcst_month and month variables.
- Assign the remaining values of the fields sequence to a sequence variable named values as floats.
- Verify that the fcst_month variable contains a value in the fcst_months tuple.
- If (and only if) the value of the month variable is July, assign the values variable to the fcst_vals dictionary using the value of the fcst_month variable as the key.
If you are successful, running the script will produce the output shown above.