What should I learn to be a good data analyst?
Statistics? Machine learning? Probability theory? MS Excel? Programming? Visualization? So many things, so little time! Here is my view on the tool-set a good data hacker requires, in order of importance.
One can understand much of what is going around just by knowing the terms in use by the statistical community (ANOVA, regression, KNN, permutation, bla,…) while having some intuition of the different methodologies and the software that performs it. No heavy mathematics nor programming required. Sadly, this is no simple task since there are so many different procedures; Many having more than one name. To make things even worse, there are many communities which have developed similar ideas with different names.
The pen-and-paper days of data analysis are over. If you cannot operate some statistical software (MS Excel, JMP, SPSS, SAS, …) and find your way in data formats (CSV, tab delimited, Json, …) , there is nothing to look for in the business.
Once things get serious, you might be having ideas of an analysis and visualization not implemented in your favorite analysis suite. It is then, that mastery of a programming language will be very handy. Especially the scientific programming languages such as R, MATLAB, Python, Ruby and the sort.
It is my (possibly controversial) opinion, that one can get very far in data analysis without too much mathematics. On the other hand, to properly understand the tools you are using, knowledge of linear algebra, probability theory and differential calculus is a must.
After you have mastered existing methodology and it’s underlying mathematics, you might feel they do not quite capture the nature of the problem you are working on, and wish to develop some methodology of your own. Your will not only need to know mathematics, but you will need to understand it, and have an good mathematical intuition– you will have to “feel” it. You might also want to augment your existing knowledge with some more probability theory, optimization, differential calculus, graph theory measure theory and more.