In our recent contribution , just published in The American Statistician we revisit the power analysis of the t-test.
Respondent driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral. It is especially useful when sampling stigmatized groups, such as injection drug users, sex workers, and men who have sex with men, etc. In our latest contribution, just published in Biometrics, Yakir Berchenko, Simon Frost and myself, take a look at RDS and cast the sampling as a stochastic epidemic. This view allows us to analyze RDS using the likelihood framework, which was previously impossible. In particular, this allows us to debias population prevalence estimates, and estimate the population size! The likelihood framework also allows us to add Bayesian regularization, debias risk estimates a-la AIC, or cross-validation, which were previously impossible, without the sampling distribution.
Gave a guest lecture on dimensionality reduction at Amir Geva’s “Clustering and Unsupervised Computer Learning” graduate course. I tried to give a quick overview of major dimensionality reduction algorithms. In particular, I like to present algorithms via the problem they are aimed to solve, and not via how they solve it.
In our recent contribution , just published in Neuroimage we cast the popular Multi-Voxel Pattern Analysis framework (MVPA) in terms of hypothesis testing. We do so because MVPA is typically used for signal localization, i.e., the detection of “information encoding” regions.
Most machine learning algorithms are optimization problems. If they are not, they can often be cast as such. Optimization problems are notoriously hard to distribute. That is why machine learning from distributed BigData databases is so challenging.
In their recent, high-impact, PNAS publications, a Tel Aviv University research group led by Prof. Daphna Joel claims that no difference exists between male and female brain. This was a very high profile study as can be seen by the mentions in The New Scientists, TheGuardian, MedicalPress, IsraelScienceInfo, DailyMail, TheJerusalemPost, CBCNews, and many more.
Efrat is a MSc. student in my group. She works on integrating advanced Multivariate Process Control capabilities in interactive dashboards. During her work she aquired an impressive expertise in interactive plotting with R, and D3JS.
Now that I am a member of the Industrial Engineering Dept. at Ben Gurion University, I am naturally looking into statistical aspects of Industrial Engineering. In particular process control. This being the case, I started teaching Quality Engineering. While preparing the course, I read the classical introductory literature and I felt it failed to convey the beauty of the field, by focusing on too many little details. I thus went ahead and wrote my own book, which can be found online.
The term “Bayesian Statistics” is mentioned in any introductory course to statistics and appears in countless papers and in books, in many contexts and with many meanings. Since it carries different meaning to different authors, I will try to suggest several different interpretations I have encountered.
I have attended this week the ICML2015 conference in Lille France. Here are some impressions…
If you want use
your data does not fit in your hard disk, or
you want to do some ad-hoc distributed computations, or
you need 256 GB of RAM for fitting your model, or
you want your data to be accesible from anywhere in the world, or
you heard about “AWS” and want to know how it may help your statistical needs…
Then it is time to remind you of an old post of mine explaining how to setup an environment for data analysis with R in the AWS cloud.
One can easily be confused by the sea of methods and terms in machine learning. I find the endless terminology confusing and counter productive. One might have a perfect understanding of a method “A”, but is unaware that the new state of the art algorithm, “B++”, is merely a small twist to his familiar “A”. I have spent hours trying to disambiguate terms just to realize that a simple idea was occluded by terminology.
After thinking about it a long time, I finally decided to create a blog of my own. Learning of Jekyll with kramdown markdown for interpreting MathJax made the difference, as I can now easily post with all the mathematical detail I want. Barry Clark’s guide really helped me getting started. I can event edit RMarkdown in RStudio, outputting text and images, and automatically publish on Github. I did not try this exciting pipeline yet, but it is documented by Andy South.