In his blog, Simply Stats, Roger Peng outlines The Four Jobs of a Data Scientist: Scientist, Statistician, Systems Engineer, and Politician. How do these apply to me? Let’s take each one in turn.
The Scientist
Peng’s definition of scientist in this context is broad. It is someone who understands the science, designs experiments to collect data, and designs a system for analyzing it.
My time spent as a molecular biology researcher has given me an excellent foundation for the exploration aspect of data science. All good research starts with the statement, “I don’t know” immediately followed by “What if”. Curiosity combined with the ability to extrapolate from known patterns, with the willingness to diverge from them into new territory is the beginning of that exploration. But it must be followed with the rigor of logical analysis, the ability to admit a hypothesis may not pan out, and the diligence of tedious data collection, literature review, and meta-analysis to get to the answer.
This is the fun part of the job for me. The blank slate of a new project, fllled with possibilities, and the opportunity to design the project as a whole is exciting. Doing the deep dive and discovering the answer, especially the one you did not expect, is what drives me. Painting the big picture, and figuring out how all the pieces fit together to make it, is the most satisfying part of the job.
The Statistician
Peng describes this person as the one who applies the data analytic system to the data and produces the data analytic output.
If the scientist is the project manager, the statistician is the software developer. This is where it is all implemented. The pieces that have been assembled are turned into a well-oiled machine here, with math driving the engine. Precision is key, and the rigor and diligence I developed as a researcher is an asset here. For me, this is a means to an end, but I do like the satisfaction of seeing the process move forward in this stage.
The Systems Engineer
Peng points out that there are two outcomes: The output meets expectations, or it doesn’t. When it doesn’t, it is the job of the systems engineer to diagnose the potential root causes of the anomaly. To extend the software development analogy, this is the tester. Admitting there are problems is not the highlight of a project for me, but I am well-versed in the methodical breakdown of testing code. It requires an understanding of how the system pieces fit together as well as what each one does. It also requires the skill of knowing when to ask for an outsider’s viewpoint because your own is missing a key element to the solution.
The Politician
Peng again is using a broad definition here. This is the person who takes the inherent conflicts of a project and negotiates to bring them into agreement.
Here is where my experience as a business owner sets me apart. An output that cannot be leveraged in a solution, perhaps because of budget or time constraints, lack of buy in by key stakeholders, or personnel expertise, is of no use. The data and analysis are critical, but so is the real world application. Having been a decision maker, I am constantly bringing the negotiator to the table when I design and implement a project. The desired result must be clear through every phase of the project if it is not to go off on time consuming (and expensive) tangents.
Each of these roles is important, and the varied nature of the job is part of what makes being a data scientist challenging. It is also what makes the career fulfilling for me.