A Modest Proposal for Peer Review Research

With the advent of, and inexpensive nature of online sharing of information, I propose that all peer-review research should include (anonymous) raw data for each subject.  At the very least, there should be scatter plots presented for the individual data points for the main outcomes.

I frequently teach statistics, and one of the first things we discuss in that class is sort of the "first purpose" of it all.  Because before we can analyze data, first we must summarize and present the data in such a way that the "consumer" can readily glean information.  In one classic stats text -- Triola -- this part is given the acronym CVDOT.  C = Center, V = Variation, D = Distribution, O = Outliers and T = Time.  So we go through the various ways we can convey the center of a data set, it's variability, distribution, etc.  In most of the studies we discuss here, data is presented as a mean value +/- either the standard deviation or standard error (C +/- V in the acronym).  And further statistical analysis compares these means between groups for statistically significant differences.   If I have 20 subjects in a study, I can provide you with a table of all results sorted by subject number assigned randomly.  This tells you very little.  If all I do is sort the data ascending or descending, you can now readily pick out the range and "center" of the data.  Perhaps if data is of a more rounded nature, you might be able to pick out the most frequent or common values.  Outliers will jump off the page.  

One huge problem with reporting and analyzing means is that -- especially with small samples --  the "D" and "O" in CVDOT can have a huge impact on both C and V.   There's a tendency to read 10 +/- 5 to mean that the values ranged from 5 to 15 and averaged 10.    This is not true.  If the distribution of the data set is normal, e.g. a scatter plot looks like a symmetrical "bell curve", then what the standard deviation tells us is that 68% of values fall between +/- 1SD, 95% within +/- 2SD, and 99% withing +/- 3SD.    So when you see a value like 10 +/- 5 for blood concentration of X,  that can't even be a negative value, Houston we have a problem.  Which is not to say that statistics are bunk, but often they "smooth" or "obscure" -- however unintentionally -- what's happening.  And this is always something to keep in the back of your mind when evaluating studies -- whether they confirm your biases or not.  

Statistically significant differences, or correlations for that matter, do not de facto beget "significant" differences, and objectively different populations can easily be deemed statistically similar.  For example, witness the  data sets at right.  The left set, 21 values, 1 through 21, is a classic uniformly distributed data set.  It is also symmetrical about the median of 11, the physically center value.  Next door there's an equally symmetrical, uniform distribution with similar mean and variability, but it's clearly different.  Third column is the same data set but with three outliers -- I basically doubled the top three values.  This altered the mean a bit, but almost doubled the standard deviation.  Lastly, the set on the right is identical in the lower half, but values increase by 2 above the median value that remains the same.  A skewed population with about the mean value of the data set immediately to its left with three "outliers" on the high end, but slightly less variation by std. dev.  
I could generate innumerable similar examples of samples or populations that appear similar by mean +/- SD that are notably different, and vice versa.  Which is not to say that statistics are meaningless, just that we have to be careful to look closely at what is being compared.  I might add that values that can change in both directions can further muddy the waters of comparison.  In any case, none of these data sets passed even the minimal hurdle of being significantly different.  Surprised?  Here's a screenshot using this online stats tool for data set 1 vs. 3.   This is for the lowest hurdle that can be applied -- 80% -- e.g. only 1 in 5 chance that the differences could be attributed to the natural fluctuations of random selection, otherwise known as *chance*.  These two data sets would be described as "similar" or "the same" or "not significantly different".  

Show me the data!!

Especially with smaller studies, ANYTHING where n<30 per group, show me the raw data.  Why not?    If I were a journal editor I would require this.  I don't know what goes through peer reviewers' heads these days, but it's a rare study that I don't look at a data summary, graph or table and wonder "what were the results for XYZ subgroup?"  

What got me off on this tangent??  

Well, the JAMA study that came out the other day showing LC depressed energy expenditure the least in post weight loss maintenance.  There's "red meat" in this study for everyone.  The MAD crowd will hail the metabolic advantage, the Type A = Atkins Kills crowd will focus on the negative cortisol and CRP levels.  For anyone who hasn't heard yet, there was a study that came out where a small (n=21) still obese, formerly more obese adults were put on three different weight stabilizing diets for four weeks each, in random order, and the average TEE for low carb was on average about 300 cal/day more than for low fat.  Yikes!!  Anthony Colpo must be in a furious frenzy editing that Fat Loss Bible of his and questioning all of those metabolic ward studies.  Somehow I doubt it.  

I have a lot of thoughts on this study.  I'll summarize a few and then show one scatter plot.  And then perhaps I'll comment about the net on the analyses of others, because I think this boils down to an erratic, short term alteration in some from a metabolic "jolt" to the system ... or we wouldn't have so many low carbers who mysteriously regain despite remaining low carb.
  • If TEE differed and feeding was isocaloric but they were weight stable, howzzat?
  • Was there ANY effort WHATSOEVER made to assess compliance?
  • Protein intake varied significantly
At right are the scatter plots for REE and TEE for each individual ... with lines connecting the dots for each individual.  Oh ... the means increase but notice there many individuals with angular trajectories -- in both directions.  IOW, their EE with low fat or low carb were comparable, but with low GI it was either significantly higher OR lower.  Huh?  The scatter plots don't nullify the entire impact of the mean +/- SD, but they sure do mute (or is that moot) the impact!

And now for the bad news for the low carbers.  This study showed that LC raised cortisol and CRP. Again, this may just be adaption in action.  But something about the scatter plot reminded me of a very early post in history of the Asylum on this study.  Interestingly enough it compared a standard low fat vs. low carb diet implemented for 4 weeks.   It was a weight loss trial, and predictably (in the short term), low carbers lost more weight.  But CRP (marker for inflammation) went up for the low carbers, as in the JAMA study.  But now let's look at that scatter plot, shall we?  This is Change-in-CRP vs. Initial-CRP.  The triangles are the low carbers.  CRP either stayed roughly the same or went up -- with little to no exception.  OTOH, look at those squares -- not a single value cresting baseline, only varying degrees of improvement.

The means in these two studies tell part of the story.  Over 4 weeks, REE & TEE is reduced less and even increased for some on a LC diet.  CRP is also increased.  But what the scatter plots give us is context.  Not all participants had significant differences in EE on LC v. LF, and some had dramatically different responses to the intermediate diet.  This individual variability should make one caution drawing any sort of conclusion from these results.   But in the CRP study you see an increase in mean values.  But in the scatter plot you get the full story -- in this sample, where CRP changed it increased for LC and decreased for LF.  

Show me the Scatter Plots!


P2ZR said…
When I first saw the title of this post, I thought, 'Wow, does she want papers to come with appendices 10x longer than the actual article?' It's so weird to those of us who work with simulations and/or voluminous real data to be reminded that in biology, RCT's (or uncontrolled 'experiments') comprise so few individuals. Like, what do you mean, there aren't 1,000 data points?!

You can see with the JAMA study that with n=21, it's already getting hard to visually distinguish the paths for each individual. They could partition the results into response type (X marker responding more favorably to LC or LF, etc.), but then run the risk of games of scientific telephone causing the partitioning criterion/algorithm to be ignored/miscommunicated. Similarly, they could also present plots that omit the outliers, but then risk the justification for such adjustments (even if it's just a simple formula identifying a threshold value) being ignored/miscommunicated. Headline writers and people with agendas are so adept at seeing what they want to see--making presentation of raw data a requirement might just invite more data manipulation.

(So we have the real data, and the shiny new denoised [how??, precisely, being lost amidst all the soundbites] data that is the 'good' info that is meant to be seen. Urgh.)

Maybe you'd need to require also that the untouched raw data be presented first, and in size at least as large as the data that is edited for whatever reason. And the editing mechanism presented before the edited data, and not as some size 6.5-font caption underneath. More annoying guidelines that make publishing such a pain in the arse ;p
CarbSane said…
I'm not proposing anything more than making the raw data available. For example, some studies publish just the "intent to treat" analysis, or carry forward last data value for dropouts, etc. ... while others will do a "completer analysis". Yeah, scatter plots for even that n=21 are difficult to follow, I'm NOT suggesting that. But if I had a spreadsheet of all data for all 21 subjects, I could do my own analysis if I wanted to. And, for example, I'm curious what the ACTUAL weight loss was for the "low" but really "non" responders. Just call it "supplemental online material".
P2ZR said…
So you mean more like a csv file of the raw data that you can download and then import into your spreadsheet app of choice, so that you can create your own scatter plots accordingly? That's actually the norm in computational disciplines; there's often a sizable zip file of the data. And if the paper discusses an algorithm that the authors did implement (as opposed to discussing theoretically), the actual code for that is often available, too.

More of an open-source philosophy, I suppose (even if the articles themselves aren't free without institutional access). And also probably because replicability is of relatively greater practical importance: it's *expected* that you'll go through crunching the data and poking 'under the hood' of the code because the interest is usually in picking up where the authors left off, or taking the authors' work in a new direction. (Which doesn't require changing up all the hardware one has access to, as might be required in a wet lab.)

I can see why you'd want to do your own analysis. If you had 10 comparable studies of ~20 subjects each, and only one of them had a 'john', whose all-important special REE is 15 SD's above all sample means...you might want to see what removing him does to the study he's found in ;)
CarbSane said…
Yeah, just the data. This would be especially useful where "intent to treat" analyses are done (I think those should be banned! They are utterly useless to everyone). I'm reminded of the recent LC/LF comparison (http://carbsanity.blogspot.com/2012/05/lc-v-lf-diet-comparison-study-shows.html) where they did three analyses, two in the paper and a third online supplemental. It would be nice to know how close to compliance the calorie or fat% compliers were, for example.

Providing such even 10 years ago would have been prohibitive. But giving everyone with online access to the published study, access to a file of individual data would be far less time consuming even than fielding communications. And it would (hopefully) stop some of the misleading presentations of data a la that Lustig octreotide one.
Puddleg said…
One thing; NONE of the diets elevated CRP. They all decreased it significantly.
The VLC diet decreased CRP slightly less than the others; just enough to be "significant". But compared with the baseline SAD, all these diets were much of a muchness, a "much improved" muchness, in terms of the cortisol and CRP.

Colpo chewing the carpet on this one was funny, in a scary sort of way.
Anonymous said…
Pretty section of content. I just stumbled upon your website and in accession capital to
assert that I acquire actually enjoyed account your blog posts.
Any way I will be subscribing to your augment and even I achievement you
access consistently fast.

Also visit my blog post :: http://mails-world.com/blogs/entry/Commence-Slimming-With-No-Dieting