Taking on a machine learning project has a sequence of steps that is often an iterative process. This post outlines the steps of the Cross-Industry Standard Process for Data Mining (CRISP-DM), which has six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.
It’s easy to think of a dataset as a defined unit. After all, it was created with specific criteria. But what do you do when you build a model from that data, and a small subset of point just won’t behave. Perhaps they are outliers to be ignored. Or perhaps there’s something more. This is where data science gets interesting.
In his blog, Simply Stats, Roger Peng outlines The Four Jobs of a Data Scientist: Scientist, Statistician, Systems Engineer, and Politician. How do these apply to me? Let’s take each one in turn.