A Modest Proposal for Peer Review Research
With the advent of, and inexpensive nature of online sharing of information, I propose that all peer-review research should include (anonymous) raw data for each subject. At the very least, there should be scatter plots presented for the individual data points for the main outcomes.
I frequently teach statistics, and one of the first things we discuss in that class is sort of the "first purpose" of it all. Because before we can analyze data, first we must summarize and present the data in such a way that the "consumer" can readily glean information. In one classic stats text -- Triola -- this part is given the acronym CVDOT. C = Center, V = Variation, D = Distribution, O = Outliers and T = Time. So we go through the various ways we can convey the center of a data set, it's variability, distribution, etc. In most of the studies we discuss here, data is presented as a mean value +/- either the standard deviation or standard error (C +/- V in the acronym). And further statistical analysis compares these means between groups for statistically significant differences. If I have 20 subjects in a study, I can provide you with a table of all results sorted by subject number assigned randomly. This tells you very little. If all I do is sort the data ascending or descending, you can now readily pick out the range and "center" of the data. Perhaps if data is of a more rounded nature, you might be able to pick out the most frequent or common values. Outliers will jump off the page.
One huge problem with reporting and analyzing means is that -- especially with small samples -- the "D" and "O" in CVDOT can have a huge impact on both C and V. There's a tendency to read 10 +/- 5 to mean that the values ranged from 5 to 15 and averaged 10. This is not true. If the distribution of the data set is normal, e.g. a scatter plot looks like a symmetrical "bell curve", then what the standard deviation tells us is that 68% of values fall between +/- 1SD, 95% within +/- 2SD, and 99% withing +/- 3SD. So when you see a value like 10 +/- 5 for blood concentration of X, that can't even be a negative value, Houston we have a problem. Which is not to say that statistics are bunk, but often they "smooth" or "obscure" -- however unintentionally -- what's happening. And this is always something to keep in the back of your mind when evaluating studies -- whether they confirm your biases or not.
Statistically significant differences, or correlations for that matter, do not de facto beget "significant" differences, and objectively different populations can easily be deemed statistically similar. For example, witness the data sets at right. The left set, 21 values, 1 through 21, is a classic uniformly distributed data set. It is also symmetrical about the median of 11, the physically center value. Next door there's an equally symmetrical, uniform distribution with similar mean and variability, but it's clearly different. Third column is the same data set but with three outliers -- I basically doubled the top three values. This altered the mean a bit, but almost doubled the standard deviation. Lastly, the set on the right is identical in the lower half, but values increase by 2 above the median value that remains the same. A skewed population with about the mean value of the data set immediately to its left with three "outliers" on the high end, but slightly less variation by std. dev.
I could generate innumerable similar examples of samples or populations that appear similar by mean +/- SD that are notably different, and vice versa. Which is not to say that statistics are meaningless, just that we have to be careful to look closely at what is being compared. I might add that values that can change in both directions can further muddy the waters of comparison. In any case, none of these data sets passed even the minimal hurdle of being significantly different. Surprised? Here's a screenshot using this online stats tool for data set 1 vs. 3. This is for the lowest hurdle that can be applied -- 80% -- e.g. only 1 in 5 chance that the differences could be attributed to the natural fluctuations of random selection, otherwise known as *chance*. These two data sets would be described as "similar" or "the same" or "not significantly different".
Show me the data!!
Especially with smaller studies, ANYTHING where n<30 per group, show me the raw data. Why not? If I were a journal editor I would require this. I don't know what goes through peer reviewers' heads these days, but it's a rare study that I don't look at a data summary, graph or table and wonder "what were the results for XYZ subgroup?"
What got me off on this tangent??
Well, the JAMA study that came out the other day showing LC depressed energy expenditure the least in post weight loss maintenance. There's "red meat" in this study for everyone. The MAD crowd will hail the metabolic advantage, the Type A = Atkins Kills crowd will focus on the negative cortisol and CRP levels. For anyone who hasn't heard yet, there was a study that came out where a small (n=21) still obese, formerly more obese adults were put on three different weight stabilizing diets for four weeks each, in random order, and the average TEE for low carb was on average about 300 cal/day more than for low fat. Yikes!! Anthony Colpo must be in a furious frenzy editing that Fat Loss Bible of his and questioning all of those metabolic ward studies. Somehow I doubt it.
I have a lot of thoughts on this study. I'll summarize a few and then show one scatter plot. And then perhaps I'll comment about the net on the analyses of others, because I think this boils down to an erratic, short term alteration in some from a metabolic "jolt" to the system ... or we wouldn't have so many low carbers who mysteriously regain despite remaining low carb.
- If TEE differed and feeding was isocaloric but they were weight stable, howzzat?
- Was there ANY effort WHATSOEVER made to assess compliance?
- Protein intake varied significantly
At right are the scatter plots for REE and TEE for each individual ... with lines connecting the dots for each individual. Oh ... the means increase but notice there many individuals with angular trajectories -- in both directions. IOW, their EE with low fat or low carb were comparable, but with low GI it was either significantly higher OR lower. Huh? The scatter plots don't nullify the entire impact of the mean +/- SD, but they sure do mute (or is that moot) the impact!
And now for the bad news for the low carbers. This study showed that LC raised cortisol and CRP. Again, this may just be adaption in action. But something about the scatter plot reminded me of a very early post in history of the Asylum on this study. Interestingly enough it compared a standard low fat vs. low carb diet implemented for 4 weeks. It was a weight loss trial, and predictably (in the short term), low carbers lost more weight. But CRP (marker for inflammation) went up for the low carbers, as in the JAMA study. But now let's look at that scatter plot, shall we? This is Change-in-CRP vs. Initial-CRP. The triangles are the low carbers. CRP either stayed roughly the same or went up -- with little to no exception. OTOH, look at those squares -- not a single value cresting baseline, only varying degrees of improvement.
The means in these two studies tell part of the story. Over 4 weeks, REE & TEE is reduced less and even increased for some on a LC diet. CRP is also increased. But what the scatter plots give us is context. Not all participants had significant differences in EE on LC v. LF, and some had dramatically different responses to the intermediate diet. This individual variability should make one caution drawing any sort of conclusion from these results. But in the CRP study you see an increase in mean values. But in the scatter plot you get the full story -- in this sample, where CRP changed it increased for LC and decreased for LF.
Show me the Scatter Plots!