What should I learn to be a good data analyst?

Tags: : General, Statistics, Machine Learning

Statistics? Machine learning? Probability theory? MS Excel? Programming? Visualization? So many things, so little time! Here is my view on the tool-set a good data hacker requires, in order of importance.

The Language

One can understand much of what is going around just by knowing the terms in use by the statistical community (ANOVA, regression, KNN, permutation, bla,…) while having some intuition of the different methodologies and the software that performs it. No heavy mathematics nor programming required. Sadly, this is no simple task since there are so many different procedures; Many having more than one name. To make things even worse, there are many communities which have developed similar ideas with different names.


The pen-and-paper days of data analysis are over. If you cannot operate some statistical software (MS Excel, JMP, SPSS, SAS, …) and find your way in data formats (CSV, tab delimited, Json, …) , there is nothing to look for in the business.

Scientific Programming

Once things get serious, you might be having ideas of an analysis and visualization not implemented in your favorite analysis suite. It is then, that mastery of a programming language will be very handy. Especially the scientific programming languages such as R, MATLAB, Python, Ruby and the sort.

Statistical Mathematics

It is my (possibly controversial) opinion, that one can get very far in data analysis without too much mathematics. On the other hand, to properly understand the tools you are using, knowledge of linear algebra, probability theory and differential calculus is a must.

Advanced Mathematics

After you have mastered existing methodology and it’s underlying mathematics, you might feel they do not quite capture the nature of the problem you are working on, and wish to develop some methodology of your own. Your will not only need to know mathematics, but you will need to understand it, and have an good mathematical intuition– you will have to “feel” it. You might also want to augment your existing knowledge with some more probability theory, optimization, differential calculus, graph theory measure theory and more.

Written on