In a modern data-driven enterprise, there are three functions a data scientist can work on: data collection, data-driven product or data-driven prediction. Each of their tasks uniquely falls into one and only one of these categories. I will use the case of AlphaGo to describe why this is because its pure structure provides a useful thought vehicle.

AlphaGo implements a Mathematical concept called “Markov Decision Process” (MDP) to do its job and crush Go players around the world. Technical details are not relevant here; it is essentially the science of taking action to achieve one’s goals. An MDP like AlphaGo consists of three main components: State, Policy and Reward. These overlap with the three parts of a data-driven enterprise:

State (data collection)

This is the part that describes the world with data, anything that can be of any relevance downstream must be described here. AlphaGo has a pretty easy job here as its entire world is a 19x19 grid with some black and white stones. A company, on the contrary faces an incredibly difficult task here describing the messiness of the real world with users, environment, competition and hard to describe human concepts (e.g., sentiment analysis).

Data Science in this role performs an explanatory function. What are the available and recordable aspects of the world that can help downstream functions? What logs should be recorded and how they should be transformed? If a KPI needs to be calculated what it should be so it is clear and understandable and its changes actionable? What information should be recorded about a user in a user profile and what shouldn’t? If external data needs to be acquired, how it is harmonised with the company’s own data? The point is that this function is the gatekeeper, nothing downstream can depend on anything other than “State”.

Policy (data-driven product)

This is the big one for both AlphaGo and a company. The policy is the tool to change the world (as described by the State). It consists of a set of actions and a function (the “Policy”) mapping each State to the best action. AlphaGo’s job is easy on the actions, there are 19x19 places, and it can place a stone on an empty one, and that’s it. Now, where that should be in each turn is the million GPU question, and Deepmind’s tremendous achievement was to solve this problem.

Data Scientists working in this area are focusing on automation. They are part of a product team, and their tools are only used when the product requires them, usually when traditional software engineering cannot solve it. Whether they work on the internal or the external user-facing products, the job is to use the available data and select the best course of actions automatically. They are determining the State the company is in and the State it should be after the action. This action can be anything from a recommender system selecting recommended items, a model providing yes/no answers or a query selecting your friends’ friends in a social network. The point is that this function is automated and it must work without supervision taking the world from one State to a desired next one.

Reward (data driven prediction)

The final area is determining the value of being in one State or another. Just like “Policy” this works as well with data from “State” since you can only cook with what you recorded in your database. While the rules of Go provide a straightforward way of measuring reward (the score), AlphaGo uses an intelligent prediction of future reward called the Value function.

Value functions are much closer to what Data Scientists are working on in this field: the prediction of the business’s health in the future or the reward for the company in imagined “States” (scenarios). Anything to do with estimation of the KPIs in the future, financial prediction, estimation of the value of features to be implemented, risk scenarios. Focus is on prediction and business decision support and not automation; the primary concern is the accuracy of the prediction.

A/B tests, for example, are not in this category, as to run an A/B test you need to be in production with both versions of the product. You are going to influence the world with it. A/B tests are also inherently non-predictive and more of a measuring problem, therefore its data concerns belong to the “State” area.

Differentiating between “Policy” and “Reward” areas are crucial as a lot of the techniques DSes are using are shared (e.g., Statistical Modelling) but the purpose is not. Differentiating between “State” and “Reward” areas is another important distinction: One is about being descriptive, and the other is about being predictive.

AlphaGo (and MDPs) are end-to-end systems just like data-driven enterprises. Their respective three components map to each other without a gap and describe a complete classification of areas for Data Science.


We saw that the three areas in a data-driven business overlap with the mathematical concept underpinning decision making, and we identified key aspects of each of them:

  • “State” area corresponds to “data collection” and is concerned with how to describe the world with numbers and also acts as a gatekeeper to the other two areas. If no data is recorded, no action or value can be calculated.
  • “Policy” area corresponds to “data-driven product” and is concerned with automation. If it cannot be reliably automated, it won’t be feasible. Also, it might not be data science at all.
  • “Reward” area corresponds to “data-driven predictions” and is concerned with predictions and scenarios.

I hope the above provides a clear mental framework partitioning DS work and helps you in the future.