How to turn raw data like this…
Into visualizations like this….
When diving into a new data analysis project, ask yourself what, how, and who questions. What questions am I trying to answer with my data? How will I communicate the data? Who will utilize my data and see my end products? Do you plan to submit data to EPA’s Water Quality Exchange Portal, or create an interactive map in Water Reporter? The data decisions you make today should be informed by your questions about tomorrow. A few extra minutes of process planning can save you hours or even days later on.
To get you thinking about process, here are a few strategies that help me spend more time creating and less time pulling my hair out.
1) Documentation is King. Whether you are downloading files, cleaning data, or making charts, write it down! I use either a spreadsheet, Google document, or readme.txt file depending on the complexity, collaboration needs, and scope of the project. For every hour of work, I spend 15 minutes documenting. Good documentation helps you retrace steps, remember where you put that pesky file, and easily share work with collaborators. Whatever system you choose, just use it. Hearty investments in documentation early on will inevitably pay dividends later in your project.
2) Prep Your Space. Data analysis is just like cooking — preparation is key. Set up a file directory, close out your extraneous tabs and programs, set achievable goals and deadlines. If you rush through prepping ingredients and your space is messy, you’ll make more work for yourself and your end product will be of lower quality.
3) Name and Save. Mistakes and revisions are an inevitable part of data analysis. Using a consistent file and variable naming convention will help you keep track of your work and inherently log your process. I like to use the format of ‘Name_Detail_v1’ e.g: Documentation_Draft_v1. After any significant alterations, l’ll save a new file titled Documentation_Draft_v2. The same extends to variables as well. Store files in a logical structure so you and others can navigate your project easily.
Implementing a consistent and methodical data analysis process will help you leverage your data in Water Reporter, FieldDoc, and beyond. With these skills you can uncover interesting findings like relationships between water quality parameters, or the effect of a new policy on your local watershed health.
The Commons recently published a story about sanitary sewage overflows (SSO’s) in Baltimore and their effect on local waterways. I’ll use this project as an example of my process.