You hear it all the time -- the randomized controlled trial is the "gold standard" for clinical research. Unfortunately, just because something is an RCT doesn't make it a good study, not by a long shot. I blogged recently calling for an end to diet comparison RCTs because I don't believe this model is productive in identifying what a healthful non-obesigenic diet is for humans. All of the foods that have been blamed have been around for far longer than this problem has, so, frankly it really isn't the food. There's no magical macronutrient ratio, no superfoods that will impart you with superhuman powers, and no foods that are inherently fattening or slimming ... even butter and celery, but please no buttered celery.
I really do believe that observational studies have far more value than clinical trials in this realm. I'm not talking about the ones taking two hours of self-reported intake and massaging the data for obscure correlations to demonize some dietary factor. No, I'm talking about long term observations of traditional cultures. It seems that every culture ate different foods and they were all relatively free from chronic diseases associated with the "First World". Pretty much every demonized food (not talking chips, dips and nips or scones and cones) is prevalent in some culture that is healthy despite(?) its consumption.
As Gary Taubes rightly (yes it happens!) points out in his recent NYT article, short term trials are never going to answer the question of what happens in the long term. Further, let's face it, there will no more be one perfect human diet agreed upon, than there will ever be consensus on the paleo diet.
When I lost a good chunk of weight using very low carb several years back, I felt pretty good (though frustrated with the plateauing out). Several years prior, I had experienced some disconcerting side effects following a similar diet including racing heart, and basically feeling "skiddly" and "not right" sometimes with even mild exertion or none at all. So since I wasn't losing more weight, I went looking for reassurances that how I was eating was healthy for the long term. This is where I've gotten in trouble for my honest assessment of the only "evidence" there is out there: the advocates who claim to have been following a VLC diet for years and decades even. But there really isn't much more to go by. The essentially starch and sugar free diet with relatively high domestic animal fat and dairy fat does not emulate any, and I do mean any, traditional culture's diet.
The even "long term" diet comparison studies are essentially useless, and this latest one pitting "paleo" vs. "low fat" was no exception. In a nutshell, the diets as implemented differed considerably from as described, and there is a lot of ambiguity as to the adherence to "the diets" themselves. This is nothing new. Other studies such as Dansinger, Foster, Shai, and many more have all suffered the same problems. Low fat not really being low fat, reported calorie intakes not squaring with weight regain, veggie oils as margarine, etc. When celebrity followers of various diets die unexpectedly, and especially when the diet gurus themselves die of undisclosed causes, it makes for a rather scary landscape out there. We are inundated with before and after testimonials of amazing success in all things and boundless energy, only when you read the personal posts of these same people they tell a different story. They report questionable health markers (to put it mildly) yet claim to have the answers to healthy living. It is all a bit too much. Rather than these contests in the guise of science, I'd rather see what 30 motivated people who will adhere (as much as can be expected) to their diet for several years and be verifiably monitored for health markers and incidence of disease, etc.
Randomized Controlled Trials
RCT's have their place. They are truly a gold standard for things like pharmaceutical research or limited dietary interventions. For example, let's use a hypothetical pain killer study involving aspirin, acetaminophen, ibuprofen and naproxen. One might be interested in comparing the effectiveness for a certain kind of pain (let's say post surgery) where the subjects will rank pain levels on some sort of scale before and at some time after taking the medication. So
So first, some definitions and explanations. I've probably blogged on this stuff in parts elsewhere, but heck, it's easier to just rattle it off again. RCT stands for Randomized Controlled Trial. Before we discuss the "R" and the "C", let's talk experiments and confounding variables.
Experiment: An experiment involves applying some "treatment" and measuring/assessing the effect of the treatment on some variable or variables. There is a designated primary outcome, in our pain relief study this would be reduction in some pain score, but there may be secondary outcomes as well, perhaps mood changes or cognition. Experiments differ from Observational Studies in that those only involve collecting data with no active intervention.
Confounding Variable or Confounder: In the context of an experiment, a confounder would be some factor or variable other than the "treatment" that might be responsible for the observed effects. If a new heart drug is tested and an increase is seen in heart attacks, if there were a lot of smokers in the group, this might be a confounding variable. The smoking, and not the drug, could explain the results. Technically speaking, any factor can be a confounder, but usually when we use the term we mean the more likely ones.
Pre-Screening & Confounders: In most studies, the subects are narrowed down to a relatively homogenous group to avoid some obvious issues such as medications, complications of unrelated medical conditions, smoking, weight status, thyroid issues, age, etc. This helps cut down on the possible "silent confounders", but it also narrows the population for which inferences can be drawn from the results.
Controlled: Control in the context of an experiment refers to holding as many possible confounding variables constant between two or more groups so that ideally the only thing that differs between groups is the treatment. In our pain killer hypothetical, this is rather easy as the treatments can be easily standardized to the same number of the same sized pills or capsules, and if the study is done in a post surgery recovery room, the environment can be kept quite consistent. The constancy of possible confounders should not be confused with compliance control. It is unfortunate that control is such a general word that can be confused with control over implementation of the experimental protocol. This is an entirely different type of control, discussed this in more depth here: A Matter of Control. I'm also not a big fan of describing a study as "uncontrolled" (see: http://scholar.google.com/scholar?q=uncontrolled+study), if only because it seems to imply useless chaos. Later in this post, I'll make the case for where such "lack of control" in the experimental sense may not be such a big deal after all, especially if "control" in the compliance sense is more stringent.
Randomizing: This indicates that the subjects are randomly assigned to at least one treatment group and one control group. In my pain killer hypothetical, these drugs have all been available OTC for quite some time, and some are advertised quite aggressively for superior pain relief. Therefore potential subjects may have a preferred drug and/or perception of the efficacy of these drugs. We wouldn't want subjects self-selecting their treatment. Random assignment to a group serves to "average out" potential confounders, even those that may not be considered to be important. I would say that height is likely not a factor in pain killer efficacy, but you never know. Rather than taking any chance on the investigator assigning taller people to one treatment on some subconscious level, if subjects are randomly assigned, the height of the groups should ... well ... "average out" to be roughly equal. With smaller sample sizes, "pure" randomizing may not be the best assurance of equal distribution, something that I've pointed out when discussing some studies where baseline characteristics differ between groups.
Blinding: Blinding refers to knowledge on the part of the subject and/or investigators as to the treatment each subject receives. In single blind, the subjects do not know which treatment they receive, in double blind neither subjects nor investigators know. The former is sufficient in cases where objective measures are involved. For example, a cholesterol lowering drug where cholesterol is measured. Cholesterol levels are what they are, there's no "I think" or "I feel" involved. If you have a subjective measure, double blinding is essentially required and failure to do so renders results suspect. Since pain is a subjective measure (see here for example), the potential for bias on the part of the patient and the person administering the test is high.
The Hypothetical Pain Killer RCT
Recruit potential subjects at a clinic that does minor surgery. OK ... let's say bunions. Crap ... I just checked ... that can be more painful than I thought. Run with me on this one and let's presume this is a routine outpatient deal. I could prescreen patients at a clinic for eligibility according to pre-surgical medical records (however to work that with access, etc.), excluding those with known circulation issues, smokers, and those with diabetic neuropathy to name a few. Might exclude those on medications entirely or at least those whose medication schedule would be altered for surgery. This pool would then be offered free surgery in exchange for participation. Within ethical guidelines they would know in advance that they would experience some post-op pain at baseline and again the experts could dither on what level of pain/duration they would be allowed to opt out at. This probably selects a more pain-tolerant group, but they will all be so predisposed.
So we'd have 4 study groups receiving different meds and a control group receiving a placebo pill. Let's say, 40 subjects in each group. Assignment would be random. The surgery would be performed, and some fixed time afterwards when the local had worn off, the subjects' baseline pain would be evaluated. After which they would receive treatment and monitored for pain levels every two hours for the next 12. This would occur in a recovery room where the subject had a comfy recliner and access to a ton of entertainment or could bring stuff from home. Assistance using the bathroom, meals provided and all that jazz. The dose/frequency would be established for each drug for "standard care" (e.g. if one is long acting or the other short) and placebo pills would be added to other regimes so that all subjects took identical appearing pills on the same schedule. A system of "packs" could be worked out so the person assessing pain could also administer the meds "blinded".
This is a pretty perfect type of experiment for the blinded RCT. Pretty easy, with a little creativity, to standardize things such that all other factors are "averaged" out of the equation and we just measure the effects of the drugs themselves. It's done all the time. It's a great model ... one might even say "gold standard". It can accommodate a few challenges.
Limitations for the Nutritional RCT
Let's cut to the chase and then backtrack here. The RCT model is of very limited usefulness in the context of nutrition. Food is not a pill, or an injection. Varying diet is not like 100 minutes in a dark room with The Ramones vs. Miley Cyrus blaring nonstop. There are too many other variables that get in the way most of the time.
A successful RCT with meaningful results would require a metabolic ward setting. In addition, at the very least we're talking Nutrisystem style meal delivery with some attempt to match the tastes, textures and visual appeal of the meals between diets. This is rather impossible if we're talking Atkins induction vs. Ma Pi. In terms of diets per se, it seems relatively futile to randomize people to different extremes as you are dooming some to failure no matter what. So what if either of the aforementioned diets can lead to miraculous results if it's not a realistic plan for a person to sustain for the long term.
The RCT seems a more effective model for shorter term evaluation of particular dietary components. So, for example, the type of fat can be easily manipulated by using shakes, muffins or even working it in as supplement capsules. Ditto protein or sugars or different carbohydrate sources. Even drastic variations in macros can be accommodated by using shakes. The results of these studies are of limited value translated to any general population in free-living conditions, however. They are good for teasing out the sort-of quality vs. quantity questions.
The Paleo vs. Nordic Nutrition Recs Single-blind RCT
After baseline measurements, the women were randomized to a PD or a Nordic Nutrition Recommendations (NNR) diet for 24 months.
All study personnel (except the dieticians) were blinded to the dietary allocation of the participants.
Both diets were consumed ad libitum.
Each group took part in a total of 12 group sessions held by a trained study dietician (one dietician per diet) throughout the 24-month study period. The group sessions consisted of information on and cooking of the intervention diets, dietary effects on health, behavioral changes and group discussions. The subjects were given recipes and written instructions to facilitate the preparation of meals at home. Eight group sessions (four cooking classes and four follow-up sessions) were held during the first 6 months of the intervention. Additional group meetings were held at 9, 12, 18 and 24 months.
As a result here was the compliance ...
|values in ( ) are negative|
I believe this study is a perfect example of the futility of trying to fit dietary research into the blinded RCT model just for the sake of it.
Blinding - What's the Point? Think about this. You sign up for a study. You will be prescribed some diet to follow for two years. Do you think that in a free living situation where you are selecting and preparing your own foods you won't eventually realize what diet you're following? Nevermind if you're the least bit curious ... In the end, how can you blind a diet? Unless you were to feed these people Nutrisystem style 24/7/365 it seems ludicrous to even bother.
Randomizing - Good Idea if it's a Competition! I remember listening to an interview with Dr. Dansinger who did the first dietary RCT. It was a novel idea and he discussed the challenges of randomizing people to different diets. The problem with all of those studies is that they were set up more as contests than anything else. Which one produced more weight loss, which one lowered LDL the most, which one improved blood sugar best, etc. You'd think by the third one or so these researchers might catch on. Over and over (and over and over) again, tucked in the discussion somewhere, the following seems to always hold: adherence is the best predictor of success. The diet is irrelevant, and for most, losing a degree of excess weight precipitates the rest of the health improvements. When someone wants to lose weight or improve health (or both) in real life, they choose their way. It's still dang hard to enact changes, most will rapidly abandon what's not working and gravitate towards what they are most motivated to sustain. I think we would get more actionable information from these sorts of trials if we don't set things up as a competition and allow participants to self-select their diet. Who cares what unquantifiable biases this may throw into the mix -- there are all manner of those inherent in the process anyway. If someone chooses the diet there is a greater chance they'll adhere to the program and stick with it for the duration of the study ... in other words, provide any sort of meaningful information to the rest of us!
Control? Sure, if it's a Contest! Speaking of self-selecting, why are we having these diet contest RCT's? If the low sat fat, high fruit, moderate carb, high protein diet used in these paleo studies produced dramatically better results in 10 trials, would this change anything? No. Most will not find a dairy, grain and legume free existence sustainable. Nor would the majority prefer 150g protein per 2000 calories. Ditto the spartan use of salt, dressing, etc. I would much rather see 20 dedicated ketogenic dieters submit to thorough health monitoring for a period of years to see what effect it has on biomarkers and metabolism than another study like this. But it would be uncontrolled!! So what, I say! We actually have proxy controls which would be the baseline anthropometrics for subjects and the general population. This is the important stuff. Does the Atkins diet cause improvements or worsening of risk factors with time. How about if I went raw vegan? What can I expect if I commit to such a lifestyle? Wouldn't we all much prefer that 100 vegans register for such a study where instead of anecdotes on the internet we can never confirm shared by people with a financial interest to their entire livelihood wrapped up in advocating a particular diet.
Moving Forward ...
A year ago today, a NuSI-funded study got underway involving 600 people. I'm not sure the intervention part has yet started. Subjects were or will be randomized to one of two "extreme" diets:
Extremely low-fat dietDiet reduces fat intake as much as possible (below 10%), with as much carbohydrate and protein intake as desired and highly processed grains and sugars discouraged.
Extremely low-carbohydrate dietDiet reduces carbohydrate intake as much as possible (below 10%), with as much fat and protein intake as desired and highly processed grains and sugars discouraged.
The lead researcher is Dr. Gardner of Stanford who conducted one of the famous diet comparison studies: The A to Z Study. This one is popular in LC circles because, "A" stands for Atkins, and Atkins "won" the race.
Although adherence to the 4 sets of dietary guidelines varied within each treatment group and waned over time, especially for the Atkins and Ornish diets, we believe that the adherence levels obtained are a fair representation of studying the diets and variations in macronutrient intake under realistic conditions and, therefore, increase the external validity of the findings.
Ornish is supposed to be very low fat, and yet at 2 months participants were consuming over 20% fat and this increased to about 30% by 1 yr. The Atkins dieters, meanwhile, dipped below 20% carb at 2 months, but were up over 30% by study end.
Mean 12-month weight change was −4.7 kg ... for Atkins, −1.6 kg ... for Zone, −2.2 kg ... for LEARN, and −2.6 kg ... for Ornish and was significantly different for Atkins vs Zone.
To translate, the Atkins dieters had a net **statistically INsignificant** loss of 2.1 kg (< 5 lbs) more vs. Ornish at the one year mark. But you know, those studies haven't been done yet to give us the answers. According to NuSI, their new study is ...
... designed to test two very different dietary interventions in free-living subjects to elucidate the role of fat and carbohydrates on body fat and chronic disease risk factors. It expands on work by Gardner et al. reported in JAMA in 2007 – “Comparison of the Atkins, Zone, Ornish, and LEARN Diets for Change in Weight and Related Risk Factors Among Overweight Premenopausal Women” (commonly referred to as the “A TO Z” study). In this previous study, which compared four diets varying in macronutrient content, Gardner et al. reported that consumption of fat and carbohydrates converged over time toward a single common diet similar to the subjects’ pre-study diets.
By the end of the year-long study, only the subjects assigned the two diets most divergent in fat and carbohydrate (the Atkins diet and the Ornish diet) differed substantially in their intake of these macronutrients. The difference between these two groups in the A TO Z study remains one of the best examples of dietary differentiation in a free-living study, and the A TO Z study is currently the most read study on nutrition in JAMA.
So. The problem with A-to-Z was what? The prescribed diets weren't extreme enough? It doesn't appear so, though perhaps not to Gary Taubes' tastes. No, the problem was -- especially in the most extreme cases of Atkins & Ornish -- even sticking to those.
I wonder what the budget on this thing is. I wonder what better uses those dollars could have gone to. In 2016, when this study gets published up, will we have more answers than we do today? Very highly doubtful.