As with all hyped technologies, terminology gets inflated as we are reaching the top of the hype cycle. I am sure many of us fondly remember the days of “Big Data”, “Cloud” or “Agile”. This is the time it’s worth drawing up a list of terms and define them so it can be used to see what’s what clearly. The name Data Science become such an umbrella term in the last years that the only way to define it is: “Data Science is what a Data Scientist does”. While this is not very useful, it’s better to describe the subfields based on critical factors such as the activities the actors perform, the tools they use and the business need they serve.

It’s important to note that these are not value judgements of how one is better than the other. All of these functions have fundamental importance in their role and concerns that are unfillable by the others. There is a crossover between these functions as businesses allocate needs to employees regardless of titles and positions.

  • Business Analysis:
    • Concerned with supporting business decisions quantitatively or qualitatively with a wide range of tools. Quantitative results are generally one-offs and not repeated in an automated way.

  • Financial Analysis:
    • Same as Business Analysis, just with more Excel and more numbers. Supports financial decisions. Periodically repeated with great efforts, but not automated.

  • Business Intelligence:
    • Concerned with maintaining a coherent view of the business in a quantitative way. Lots of SQL and calculating KPIs. Provides self-serving capabilities for the business in the form of dashboarding systems. Works closely with Data Warehousing. Dashboards and views updated automatically, but no decisions are made.

  • Data Science Analysis:
    • Relying on tools from BI and DWH, performs business-specific tasks and prediction. Lots of one-off modelling and analysis of A/B tests. Solutions that are deemed worthy handed over to BI and DWH for automation or automated by the DSAs themselves. Lots of SQL and some python.

  • Statistical Modelling:
    • If Statistics provides a well understood mathematical tool to answer a question at hand, it is employed to draw that conclusion (usually in the form of a hypothesis test or a prediction). Lots of python or R with statistical packages.
    • This term causes a lot of confusion. All of the functions on this list can employ these tools for their job. But its usage is limited by the circumstances of the problems. All of the conditions of applicability must be met, and even then the term means applying a small set of already existing tools to the problem.
    • When the term “Explainable” is mentioned, it is usually in the context of statistical modelling as the following techniques on this list don’t have the mathematical rigour or insight to describe their decisions. Ethical and privacy-related questions should lead back here also.

  • Machine Learning:
    • If the business needs to make repeated, quantitative decisions in its product that cannot be expressed with traditional algorithmic methods, the product team can deploy ML to solve the problem. Focus is on automation as this needs to be in a productionised system. Various engineering concerns need to be taken care of regarding the correctness of the system. Often uses and automates the tools of Statistical Modelling.

  • Deep Learning:
    • If a Machine Learning solution is needed, but the tools of Statistical Modelling are inadequate, Deep Learning should be applied. Deep Learning frameworks are flexible computational tools that can adapt to a wide variety of problems but with less mathematical rigour than Statistical Modelling. This is also highly automated in production and questions regarding correctness according to specification and intent or performance in unknown circumstances are of great concern (see above: ethical questions and Statistical Modelling).
    • Most of the successes in recent years happened in DL as increased computing power allowed to deploy more convoluted solutions than SM into production that are capable of breakthrough performance. Image recognition, machine translation, speech recognition and synthesis are all in this category. One can argue that these exhibit signs of intelligence but they rely on fairly well understood mathematical models.

  • Artificial Intelligence:
    • Marketing term used for all of the above. (/jk)
    • This term is attached to everything under the sun and therefore essentially meaningless.
    • To judge if something is worth the title, one should focus on the second word of the term and seek signs of cognitive intelligence, matching or surpassing human skills. In my personal shortlist, only two systems deserve this title: Deepmind’s AlphaGo and OpenAI’s Five. Neither of these was built or applied in a commercial sense…

Thanks for reading this list, I hope it helps to clarify terminology and please subscribe for further posts in the topic.