There is a lot of controversy around the definition of a Data Scientist.
Some think it means being a statistician, others think it means being a technologist, and others have still other requirements.
I think the best definitions are more general and goal-based, and look something like these:
- Data Scientist
- 1. Someone who specializes in collecting, massaging, and/or displaying data in order to tell a story that results in a positive outcome.
- 2. Someone who can technically extract meaning from information in a way that enables decision makers to make better choices.
- 3. Someone who can extract business value from data using mathematics and technology.
Importantly, this could be a triple-Ph.D in statistics, maths, and computer science, or a talented graphic designer with some decent Python skills.
The key is that they’re able to use data to illuminate how the world works and facilitate progress.
So you can break down the definitions into 49.6 different categories and sub-categories, or you can use this approach and focus on outcomes.
I think this approach is more resilient, especially given how quickly the field is changing.
- The definitions above assume both good faith and possession of requisite talent/skills. Manipulation and incompetence are not in scope.
- There’s a humorous alternative definition which says, “A data scientist is someone who’s better at statistics than any software engineer, and better at software engineering than any statistician.”