Jekyll2017-11-16T09:44:29+00:00http://johnros.github.io/johnros.github.comStats, R, and possibly beach volley.Ranting on MVPA2017-09-03T00:00:00+00:002017-09-03T00:00:00+00:00http://johnros.github.io/mvpa-rant<p>The use of MVPA for signal detection/localization in neuroimaging has troubled me for a long time.
Somehow the community refuses to acknowledge that for the purpose of localization, multivariate tests (e.g. Hotelling’s <script type="math/tex">T^2</script>) are preferable.
Why are multivariate tests preferable than accuracy tests?</p>
<ol>
<li>They are more powerful.</li>
<li>They are easier to interpret.</li>
<li>They are easier to implement.</li>
<li>Because they are not cross validated then:
<ol>
<li>They are computationally faster.</li>
<li>They do not suffer biases in the cross validation scheme.</li>
</ol>
</li>
</ol>
<p>I read and referee papers where authors go to great lengths to interpret their “funky” results.
To them I say:
Your cross validation scheme is biased and your test statistic is leaving power on the table!
Please consult a statistician and replace your MVPA with a multivariate test.
For a more “scientific explanation” read [1] and [2].</p>
<p>If you justify the use of the prediction accuracy because it is also an effect-size, then please acknowledge that <em>effect size</em> is a different problem than <em>localization</em> and read the multivariate effect size literature (e.g. [3]).</p>
<p>When would I really want to use the prediction accuracy as a test statistic?
When doing actual decoding and not localization, such as brain-computer interfaces.</p>
<hr />
<p>[1] Rosenblatt, Jonathan, Roee Gilron, and Roy Mukamel. “Better-Than-Chance Classification for Signal Detection.” arXiv preprint arXiv:1608.08873 (2016).</p>
<p>[2] Gilron, Roee, et al. “What’s in a pattern? Examining the type of signal multivariate analysis uncovers at the group level.” NeuroImage 146 (2017): 113-120.</p>
<p>[3] Olejnik, Stephen, and James Algina. “Measures of effect size for comparative studies: Applications, interpretations, and limitations.” Contemporary educational psychology 25.3 (2000): 241-286.</p>The use of MVPA for signal detection/localization in neuroimaging has troubled me for a long time. Somehow the community refuses to acknowledge that for the purpose of localization, multivariate tests (e.g. Hotelling’s ) are preferable. Why are multivariate tests preferable than accuracy tests?A surprising result on the power of the t-test2017-08-30T00:00:00+00:002017-08-30T00:00:00+00:00http://johnros.github.io/wilcoxon-power<p>In our recent contribution [1], just published in <a href="http://amstat.tandfonline.com/doi/full/10.1080/00031305.2017.1360795">The American Statistician</a> we revisit the power analysis of the t-test.</p>
<p>The fundamental observation is that the t-test has been proposed, and studied, as a detector of <em>shift alternatives</em>.
By shift alternatives, a statistician means that if two populations differ, then they differ by their mean.
Put differently, it is assumed the factor of interest has the effect of <em>shifting</em> a distribution.
For many phenomena, however, we would not expect an effect of shift-type.
Consider a clinical trial: if we expect a drug to affect only part of the population, we are no longer looking for a shift alternative, but rather, a <em>mixture alternative</em>.</p>
<p>We show, that for mixture alternative, much of the folklore on the t-test no longer holds (nor should it).
We show that Wilcoxon’s signed-rank test may be more powerful than a t-test under a Gaussian null.
This is bacause Wilcoxon’s signed-rank test may capture the assymetry in the mixture, before the t-test captures the changed mean.</p>
<p>This has interesting implications.
A practitioner may be willing to pay with some power, and opt for Wilcoxon’s test because they will not assume Gaussianity.
Our results show that it is possible that this practitioner did not lose any power, but in fact, has gained some.
Not because the null is non-Gaussian, but rather, because the alternative is of mixture-type.</p>
<p>After so much as been said on the t-test, I am rather proud that we can still inovate on the matter.</p>
<p>[1] Rosenblatt, Jonathan D., and Yoav Benjamini. “On Mixture Alternatives and Wilcoxon’s Signed-Rank Test.” The American Statistician, August 1, 2017. doi:10.1080/00031305.2017.1360795.</p>In our recent contribution [1], just published in The American Statistician we revisit the power analysis of the t-test.Sampling as an Epidemic Process2017-02-22T00:00:00+00:002017-02-22T00:00:00+00:00http://johnros.github.io/rds<p><em>Respondent driven sampling</em> (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral.
It is especially useful when sampling stigmatized groups, such as injection drug users, sex workers, and men who have sex with men, etc.
In our latest contribution, just published in <a href="http://onlinelibrary.wiley.com/doi/10.1111/biom.12678/abstract">Biometrics</a>, <a href="https://scholar.google.co.il/citations?user=U3ykKLQAAAAJ&hl=en">Yakir Berchenko</a>, <a href="http://www.infectiousdisease.cam.ac.uk/directory/sdf22@cam.ac.uk">Simon Frost</a> and myself, take a look at RDS and cast the sampling as a <strong>stochastic epidemic</strong>.
This view allows us to analyze RDS using the likelihood framework, which was previously impossible.
In particular, this allows us to debias population prevalence estimates, and estimate the population size!
The likelihood framework also allows us to add Bayesian regularization, debias risk estimates a-la AIC, or cross-validation, which were previously impossible, without the sampling distribution.</p>
<p>I particularly like this project, because it is a real end-to-end statistical challenge with nice theory, computational considerations, and a deliverable R package:</p>
<ul>
<li>
<p>A widely applicable problem:
sampling in hidden populations is both very important, and a real challenge to classical sampling techniques.
RDS is also a potential tool to analyze “Facebook-samples”, which are becoming more prevalent.</p>
</li>
<li>
<p>The theory:
viewing the sampling as a stochastic epidemic, an idea due to Yakir, allows to link the sampling literature to the vast corpus of knowledge on epidemics, software reliability, and counting processes.</p>
</li>
<li>
<p>A computational challenge:
The likelihood function implied by the stochastic epidemic is essentially, a <a href="https://en.wikipedia.org/wiki/Stochastic_differential_equation">stochastic differential equation</a>.
The counting processes literature allowed us to state the likelihood directly, observe it is separable, and solve the maximum-likelihood problem efficiently.</p>
</li>
<li>
<p>An R package:
Our RDS estimator, with the numerical “tricks” above, is implemented in the <strong>chords</strong> package, available from <a href="https://CRAN.R-project.org/package=chords">CRAN</a>.</p>
</li>
</ul>Respondent driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral. It is especially useful when sampling stigmatized groups, such as injection drug users, sex workers, and men who have sex with men, etc. In our latest contribution, just published in Biometrics, Yakir Berchenko, Simon Frost and myself, take a look at RDS and cast the sampling as a stochastic epidemic. This view allows us to analyze RDS using the likelihood framework, which was previously impossible. In particular, this allows us to debias population prevalence estimates, and estimate the population size! The likelihood framework also allows us to add Bayesian regularization, debias risk estimates a-la AIC, or cross-validation, which were previously impossible, without the sampling distribution.Intro to dimensionality reduction2017-01-02T00:00:00+00:002017-01-02T00:00:00+00:00http://johnros.github.io/intro-to-dim-reduce<p>Gave a guest lecture on dimensionality reduction at <a href="http://www.ee.bgu.ac.il/~geva/">Amir Geva’s</a> “Clustering and Unsupervised Computer Learning” graduate course.
I tried to give a quick overview of major dimensionality reduction algorithms.
In particular, I like to present algorithms via the problem they are <strong>aimed</strong> to solve, and not via <strong>how</strong> they solve it.</p>
<p><a href="https://github.com/johnros/dim_reduce/blob/master/dim_reduce.pdf">Class notes may be found here</a>.</p>Gave a guest lecture on dimensionality reduction at Amir Geva’s “Clustering and Unsupervised Computer Learning” graduate course. I tried to give a quick overview of major dimensionality reduction algorithms. In particular, I like to present algorithms via the problem they are aimed to solve, and not via how they solve it.What is a pattern? MVPA cast as a hypothesis test2016-11-30T00:00:00+00:002016-11-30T00:00:00+00:00http://johnros.github.io/what-is-a-pattern<p>In our recent contribution [1], just published in <a href="http://www.sciencedirect.com/science/article/pii/S1053811916306401">Neuroimage</a> we cast the popular Multi-Voxel Pattern Analysis framework (MVPA) in terms of hypothesis testing.
We do so because MVPA is typically used for signal localization, i.e., the detection of “information encoding” regions.</p>
<p>Our major conclusion is that <strong>group MVPA tests a qualitatively different hypothesis than that tested in univariate analysis</strong>.
We show that in regions detected with MVPA subjects may have actually responded very differently.<br />
In particular, an “information encoding” region may be one where some subjects show an <strong>increase</strong> in blood oxygenation (BOLD), while others a <strong>decrease</strong>.</p>
<p>This is a surprising result since it means that the shift from the analysis of one voxel at-a-time to several-voxels at a time, also entailed a re-definition of “what is an activation?”.
In particular, the MVPA definition of activation is such that it is much harder to interpret biologically.</p>
<p>Clearly, the choice of the null and alternative, i.e., the definition of signal, is case dependent, and should be left to the neuroscientist’s best judgement.
It is our hope, that our observation will facilitate such an informed choice.</p>
<p>En passant, we observe that recurring patterns between subjects imply that activation patterns are <strong>asymmetrically distributed</strong> about the null.
Following this observation, we call upon the statistical literature to offer several measures of multivariate symmetry.
These allow the researcher to quantify the degree of multivariate “agreement” between subjects, instead of committing a-priori to a particular notion of “agreement” to be tested.</p>
<p>[1] Gilron, Roee, Jonathan Rosenblatt, Oluwasanmi Koyejo, Russell A. Poldrack, and Roy Mukamel. “What’s in a Pattern? Examining the Type of Signal Multivariate Analysis Uncovers at the Group Level.” NeuroImage 146 (February 1, 2017): 113–20.</p>In our recent contribution [1], just published in Neuroimage we cast the popular Multi-Voxel Pattern Analysis framework (MVPA) in terms of hypothesis testing. We do so because MVPA is typically used for signal localization, i.e., the detection of “information encoding” regions.Almost-embarrassingly-parallel algorithms for machine learning2016-06-12T00:00:00+00:002016-06-12T00:00:00+00:00http://johnros.github.io/parallelized-learning<p>Most machine learning algorithms are optimization problems.
If they are not, they can often be cast as such.
Optimization problems are notoriously hard to distribute.
That is why machine learning from distributed BigData databases is so challenging.</p>
<p>If data is distributed along observations (and not variables), one simple algorithm is to learn your favorite model using the data on each machine, and then aggregate over machines.
If your favorite model is in a finite-dimensional parametric class, you can even aggregate by simple averaging over machines.</p>
<p>This averaging approach is known as <em>command and conquer</em>, <em>one-shot averaging</em>, and <em>embarasingly parallel learning</em>, among others.
It is attractive because of its low communication requirements and simplicity to implement.
Indeed, it can be implemented over any distributed abstraction layer such as Spark, Hadoop, Condor, SGE, and more.
It can also be implemented on top of most popular distributed databases such as Amazon-Redshift and HP-Vertica.
It also covers a wide range of learning algorithms such Ordinary Least Squares, Generalized Linear Models, and Linear SVM.</p>
<p>In our latest contribution, just <a href="http://imaiai.oxfordjournals.org/content/early/2016/06/09/imaiai.iaw013.abstract?keytype=ref&ijkey=TbndI5rIDAxDEzz">published in Information and Inference, a Journal of the IMA</a>, we perform a statistical analysis of the error of such an algorithm and compare it with a non-distributed (centralized) solution.</p>
<p>Our findings can be summarized as follows:
When there are many more observations, per machine, than parameters to estimate, there is no (first order) accuracy loss in distributing the data.
When the number of observations is not much greater than the number of parameters, then there is indeed an accuracy loss. This loss is greater for non-linear models, than linear.</p>
<p>If it unclear why accuracy is lost when averaging, think of linear regression.
The (squared) risk minimizer is <script type="math/tex">\beta^*=\Sigma^{-1} \alpha</script>, where <script type="math/tex">\Sigma= E[x x']</script> and <script type="math/tex">\alpha=E[x y]</script>.
The empirical risk minimizer, <script type="math/tex">\hat{\beta}=(X'X)^{-1} X'y</script>, is merely its empirical equivalent.
If rows of the <script type="math/tex">X</script> matrix are distributed over machines, which do not communicate, then instead of the full <script type="math/tex">(X'X)^{-1}</script> we can only compute machine-wise estimates.
It turns out, that even in this simple linear regression problem, aggregating the various machine wise <script type="math/tex">\hat{\beta}</script>, e.g., by averaging, is less accurate than computing <script type="math/tex">\hat{\beta}</script> with the whole data.</p>
<p>The statistical analysis of the split-and-average algorithm has several implications:
It informs the practitioner which algorithms can be safely computed in parallel, and which need more attention.
Put differently- no learning algorithm is truely <strong>embarassignly-parallel</strong>, but some are <strong>almost-embarasingly-parallel</strong>.</p>
<p>Equipped with guarantees on the learning error, one can apply our results to compute the required number of machines that achieves a given error.
Since increasing the number of machine increases the error, but decreases the learning speed, our results can also be seen as a <strong>learning accuracy-complexity curve</strong>.
Finally, the error decomposition for split-and-average algorithms also implies a Gaussian limit. Our results can thus be used also for inference and model selection.</p>
<p>To prove our results we mostly used <a href="https://en.wikipedia.org/wiki/Lucien_Le_Cam">Lucien Le-Cam</a>, and <a href="https://en.wikipedia.org/wiki/Peter_J._Huber">Peter Huber’s</a> classical asymptotic statistics.
We take particular pride in the use of classical statistical theory to solve cutting edge learning algorithms for BigData.</p>Most machine learning algorithms are optimization problems. If they are not, they can often be cast as such. Optimization problems are notoriously hard to distribute. That is why machine learning from distributed BigData databases is so challenging.Multivariate difference between male and female brain2016-02-18T00:00:00+00:002016-02-18T00:00:00+00:00http://johnros.github.io/genders-and-brains<p>In their recent, high-impact, <a href="http://www.pnas.org/content/112/50/15468.abstract">PNAS publications</a>, a Tel Aviv University research group led by <a href="http://people.socsci.tau.ac.il/mu/daphnajoel/">Prof. Daphna Joel</a> claims that no difference exists between male and female brain.
This was a very high profile study as can be seen by the mentions in
<a href="https://www.newscientist.com/article/dn28582-scans-prove-theres-no-such-thing-as-a-male-or-female-brain/">The New Scientists</a>,
<a href="https://www.theguardian.com/science/2015/dec/01/brain-sex-many-ways-to-be-male-and-female">TheGuardian</a>,
<a href="http://medicalxpress.com/news/2015-11-male-female-brain-valid-distinction.html">MedicalPress</a>,
<a href="http://www.israelscienceinfo.com/en/medecine/femmes-et-sciences-pour-luniversite-de-tel-aviv-les-cerveaux-feminins-et-masculins-sont-un-patchwork-de-caracteristiques/">IsraelScienceInfo</a>,
<a href="http://www.dailymail.co.uk/sciencetech/article-3340123/Male-vs-female-brain-Not-valid-distinction-study-says.html">DailyMail</a>,
<a href="http://www.jpost.com/Business-and-Innovation/Health-and-Science/TAU-neuroscientists-Brains-are-not-gendered-435882">TheJerusalemPost</a>,
<a href="http://www.cbc.ca/news/technology/brain-sex-differences-1.3344954">CBCNews</a>, and many more.</p>
<p>This publication contradicts much of the corpus of knowledge on brains and gender, and thus took the scientific community by surprise. How can this be?</p>
<p>In short and as put by Carl Sagan:
“<strong>Absence of evidence is not evidence of absence</strong>”.</p>
<p>Indeed, by performing many univariate analyses, the authors show that males and females do not show any particular pattern in the brains’ structure, as least as recorded by MRI scans.
It is, however, quite possible for two multivariate data sets to be nicely separated, but not so in any of the “raw” univariate measurements.
The following figure is a toy example of a dataset which cannot be separated by any single (raw) variable, but certainly can when considering two variables simultaneously.</p>
<p><img src="../images/overlap.png" alt="Multivariate seperability" /></p>
<p>I suspect this is what happened in the case of “Sex Beyond the Genitalia”. When I <a href="http://www.pnas.org/content/early/2016/03/15/1523961113.full?sid=71a90a9a-ec35-45a3-a11a-63d0fc116fa9">reanalyzed the same data</a> the <strong>multivariate</strong> brain structures of males and females was different enough, so that the gender could be inferred from the MRI data alone, with <script type="math/tex">~ 80\%</script> accuracy(!).</p>
<p>It also seems I was not the only one troubled by Joel et al.’s findings. Here is
<a href="http://www.pnas.org/content/early/2016/03/15/1525534113.full?sid=71a90a9a-ec35-45a3-a11a-63d0fc116fa9">Del Giudice et al.’s</a> comment,
<a href="http://www.pnas.org/content/early/2016/03/15/1523888113.full">Chekroud et al.’s</a>,
<a href="http://www.pnas.org/content/early/2016/03/07/1524418113.extract">Marek Glazerman’s</a>,
<a href="https://www.psychologytoday.com/blog/sexual-personalities/201512/statistical-abracadabra-making-sex-differences-disappear">David Schmidt’s</a>.</p>
<p>In <a href="http://www.pnas.org/content/early/2016/03/15/1600792113.full?sid=71a90a9a-ec35-45a3-a11a-63d0fc116fa9#ref-8">Joel’s reply to the critics</a> they no longer insist that
“<em>human brains do not belong to one of two distinct categories: male brain/female brain</em>”, but rather soften their claims:
“<em>it is unclear what the biological meaning of the new space is and in what sense brains that seem close in this space are more similar than brains that seem distant</em>”.</p>
<p>I agree. For the purposes of <strong>intepreting</strong> the dimensions in which male and female differ, some feature selection can be introduced.
I will leave that for future neuroimaing research.</p>
<p><strong>Edit</strong>(19.3.2016):
Here is the code that generated the above figure:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">mvtnorm</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">MASS</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">gridExtra</span><span class="p">)</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1e3</span><span class="w">
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">999</span><span class="p">)</span><span class="w">
</span><span class="n">X</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rmvnorm</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">))</span><span class="w">
</span><span class="n">beta</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10</span><span class="o">*</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbinom</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="o">=</span><span class="n">plogis</span><span class="p">(</span><span class="n">X</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">beta</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as.factor</span><span class="w">
</span><span class="n">xy</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="m">.1</span><span class="o">=</span><span class="n">X</span><span class="p">[,</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">x</span><span class="m">.2</span><span class="o">=</span><span class="n">X</span><span class="p">[,</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="n">empty</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">plot.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.grid.major</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w"> </span><span class="n">panel.grid.minor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w"> </span><span class="n">panel.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w"> </span><span class="n">axis.title.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.title.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w"> </span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w"> </span><span class="n">axis.text.y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">())</span><span class="w">
</span><span class="c1"># scatterplot of x and y variables</span><span class="w">
</span><span class="n">scatter</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">xy</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="m">.1</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="m">.2</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_point</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scale_color_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"orange"</span><span class="p">,</span><span class="w">
</span><span class="s2">"purple"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">legend.justification</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w">
</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="c1"># marginal density of x - plot on top</span><span class="w">
</span><span class="n">plot_top</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">xy</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="m">.1</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_density</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"orange"</span><span class="p">,</span><span class="w"> </span><span class="s2">"purple"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w">
</span><span class="c1"># marginal density of y - plot on the right</span><span class="w">
</span><span class="n">plot_right</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">xy</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_density</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"orange"</span><span class="p">,</span><span class="w"> </span><span class="s2">"purple"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"none"</span><span class="p">)</span><span class="w">
</span><span class="n">grid.arrange</span><span class="p">(</span><span class="n">plot_top</span><span class="p">,</span><span class="w"> </span><span class="n">empty</span><span class="p">,</span><span class="w"> </span><span class="n">scatter</span><span class="p">,</span><span class="w"> </span><span class="n">plot_right</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">widths</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="w">
</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">heights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>In their recent, high-impact, PNAS publications, a Tel Aviv University research group led by Prof. Daphna Joel claims that no difference exists between male and female brain. This was a very high profile study as can be seen by the mentions in The New Scientists, TheGuardian, MedicalPress, IsraelScienceInfo, DailyMail, TheJerusalemPost, CBCNews, and many more.Interactive Plotting with R2016-01-05T00:00:00+00:002016-01-05T00:00:00+00:00http://johnros.github.io/interactive-plot-r<p><a href="https://www.linkedin.com/in/efratvilenski">Efrat</a> is a MSc. student in my group.
She works on integrating advanced Multivariate Process Control capabilities in interactive dashboards.
During her work she aquired an impressive expertise in interactive plotting with R, and <a href="http://d3js.org/">D3JS</a>.</p>
<p>Yesterday, 2016-3-01, she gave a workshop on the topic for the <a href="http://www.statistics.org.il/">The Israeli Statistical Association</a>.
About 60 participants attended the Google Campus at Tel Aviv to hear about R, plotting, and JavaScript.</p>
<p>Her slides, code and links, can be found <a href="http://efratvil.github.io/R-Israel-Jan-2015/links.html">here</a>.</p>Efrat is a MSc. student in my group. She works on integrating advanced Multivariate Process Control capabilities in interactive dashboards. During her work she aquired an impressive expertise in interactive plotting with R, and D3JS.Quality Engineering Class Notes2015-11-20T00:00:00+00:002015-11-20T00:00:00+00:00http://johnros.github.io/quality-engineering<p>Now that I am a member of the <a href="http://in.bgu.ac.il/engn/iem/Pages/default.aspx">Industrial Engineering Dept.</a> at Ben Gurion University, I am naturally looking into statistical aspects of Industrial Engineering. In particular process control.
This being the case, I started teaching Quality Engineering.
While preparing the course, I read the classical introductory literature and I felt it failed to convey the beauty of the field, by focusing on too many little details.
I thus went ahead and wrote my own book, which can be found <a href="https://github.com/johnros/qualityEngineering/blob/master/Class_notes/notes.pdf">online</a>.</p>
<p>How does it differ from existing literature:</p>
<ul>
<li>Being an introductory text it has a very wide scope of topics. The focus is on the underlying ideas and terminology, and details are given in the references.</li>
<li>Topics covered: History of quality engineering, exploratory data analysis, process control charts, design of experiments, acceptance sampling, and reliability.</li>
<li>The design of experiments and reliability chapters have much wider scopes then typically found in quality engineering textbooks.</li>
<li>I read many books and papers while researching the literature, and I tried to bring the most recent and clear references to each topic.</li>
</ul>
<p>I hope readers will find my notes useful.
Being experimental, they may still contain mistakes. I would be very thankful to whoever decides to inform me of any mistakes found.</p>Now that I am a member of the Industrial Engineering Dept. at Ben Gurion University, I am naturally looking into statistical aspects of Industrial Engineering. In particular process control. This being the case, I started teaching Quality Engineering. While preparing the course, I read the classical introductory literature and I felt it failed to convey the beauty of the field, by focusing on too many little details. I thus went ahead and wrote my own book, which can be found online.Disambiguating Bayesian Statistics2015-09-02T00:00:00+00:002015-09-02T00:00:00+00:00http://johnros.github.io/disambiguating-bayesian-statistics<p>The term “Bayesian Statistics” is mentioned in any introductory course to statistics and appears in countless papers and in books, in many contexts and with many meanings.
Since it carries different meaning to different authors, I will try to suggest several different interpretations I have encountered.</p>
<p>First, I will classify several possible generative models according to the <script type="math/tex">4</script> following attributes:</p>
<ol>
<li>Is there a probability on the parameter space (a “prior”)?</li>
<li>Is the prior subjective (epistemic, beliefs) or objective (physical)?</li>
<li>Is the prior parametric or non-parametric?</li>
<li>Is the prior simple or composite?</li>
</ol>
<p>We consider these attributes on some common modelling approaches:</p>
<ul>
<li>Neyman-Pearson frequentist inference has <em>no prior</em> on the parameter space.
Example: MLE for the mean of a normal population.</li>
<li>A pure/subjective/DeFinetti Bayesian has a <em>simple, subjective</em> prior. It may be parametric or not.
<strong>Example</strong>: <a href="https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation">MAP</a> estimate of the mean of a normal population.</li>
<li>A Semi-Bayesian (a.k.a. pseudo-Bayesian, semi-Empirical-Bayesian) is essentially a frequentist with a <em>composite, parametric, objective</em> prior.
Sometimes referred to as Empirical-Bayesian [1] albeit not in it’s original historical sense [2].
<strong>Example</strong>: Type-II Maximum Likelihood [2] or Restricted-Maximum-Likelihood estimation of the variance components in a mixed effects model.</li>
<li>Empirical Bayesian in it’s original historical sense has an <em>objective, non-parametric prior</em>, specified up to it’s two first moments [2].
It differs from the (original) Semi-Bayesian, in that the prior is non-parametric.
<strong>Example</strong>: Sample coverage estimation- “What is the probability that the next sample will be of an unseen species?” [3].</li>
<li>A <em>parametric, composite, subjective</em> prior is something interesting.
It is hard to interpret since it implies that “my beliefs are sharp, but I don’t know what they are (yet?)”.
It is typically encountered as a mathematical regularization device.
<strong>Example</strong>: Ridge regression; the Gaussian prior on the coefficients can hardly be interpreted as a limiting frequency, thus it is subjective. The variance of the prior is unspecified, usually estimated using cross-validation, thus a composite prior.</li>
</ul>
<h1 id="critique">Critique</h1>
<p>Q: Epistemic probability? Is there really such a thing as epistemic probability? Is it not just the limiting frequency of events in our accumulative experience?</p>
<p>A: If it were epistemic and universal (objective), then the distinction might be only a matter for philosophers. That fact that it is personal (subjective), is of real practical implications.</p>
<hr />
<p>Q: Objective probabilities?!? Except for non-parametric, I am always assuming the distribution of the population. Isn’t this subjective? Am I not subjectively assuming the differentiability of the CDF for density estimation? We are thus, all subjective Bayesians. Yes, even Fisher!</p>
<p>A: I would say one cannot approach data completely assumption free, so saying we are all subjective Bayesians, might be true, but non-informative. I feel different “philosophies”, are useful to convey what kind of argument are we making with the data. Namely, the answer to the 4 above questions. Is my answer true, but non informative as well? :-)</p>
<hr />
<p>Q: If we acknowledge that “Empirical Bayes” has caught a new meaning since it’s initial introduction, why bother with history? Why not use the composite epistemic meaning only?</p>
<p>A: Because it seems there no agreed upon “new meaning”. Some will use it for composite-epistemic priors, but some will use it for physical ones. The smallest common denominator, is the fact that the prior is composite and not simple (whether it be parametric or not).</p>
<h1 id="references">References</h1>
<p>[1] E.L. Lehmann and George Casella, Theory of Point Estimation, 2nd ed. (Springer, 1998).<br />
[2] I. J. Good, “Introduction to Robbins (1955) An empirical Bayes approach to statistics,” Breakthroughs in Statistics: Foundations and basic theory (1992): 379.<br />
[3] Bradley Efron and Ronald Thisted, “Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know?,” Biometrika 63, no. 3 (December 1976): 435-447.<br />
[4] Fienberg, Stephen E. “When Did Bayesian Inference Become ‘Bayesian’?” Bayesian Analysis 1, no. 1 (March 2006): 1-40. doi:10.1214/06-BA101.</p>The term “Bayesian Statistics” is mentioned in any introductory course to statistics and appears in countless papers and in books, in many contexts and with many meanings. Since it carries different meaning to different authors, I will try to suggest several different interpretations I have encountered.