Musings in Data Science


A journey of pondering all aspects of the field of data science

Modeling Time Series Data

The ARIMA model


An Outline for Building Machine Learning Models

Taking on a machine learning project has a sequence of steps that is often an iterative process. This post outlines the steps of the Cross-Industry Standard Process for Data Mining (CRISP-DM), which has six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.


Looking Beyond the Dataset When Building Models

It’s easy to think of a dataset as a defined unit. After all, it was created with specific criteria. But what do you do when you build a model from that data, and a small subset of point just won’t behave. Perhaps they are outliers to be ignored. Or perhaps there’s something more. This is where data science gets interesting.


The Four Roles of a Data Scientist

In his blog, Simply Stats, Roger Peng outlines The Four Jobs of a Data Scientist: Scientist, Statistician, Systems Engineer, and Politician. How do these apply to me? Let’s take each one in turn.