Statistics ~ Sampling vs. Non-sampling Error

Thought I would post this so that I can link to it if/when I reference this in other posts.

There are two types of error in statistical studies.  One is that which is inherent in the statistical process and this type is called sampling error.  Unless a census is done -- e.g. data is collected from every member or item in a population -- selection of a sample introduces some error into the process.  This is often called the error due to natural fluctuations in a random selection process.    If you have 10,000 people, half male half female, you have a  50% chance of selecting a male or female.  If you select 10 people from this group, however, there is also a reasonable probability of selecting, say, 7 males and 3 females.   This is sampling error.  We can quantify it.  If we select 100 people instead, the probability of getting 70/30 -- a gender split dramatically differing from the population -- is very small, but again this is quantifiable sampling error.   When you see poll results with +/-3% margin of error, they are reporting a measure of sampling error related solely to the sample size and results.  Sampling error is decreased as sample size increases.  In my gender example, as the sample size increases samples deviating from the 50/50 split become less and less probable.   

Every other form of error in a study is lumped under the term "non-sampling" error.  We can't put a number on this type of error only make judgment calls.   So, for example, an internet poll on AOL may not be representative of the country as a whole .   One can do the stats on the number of respondents and the results and report margin of error, but this "nuts and bolts" analysis tells us little about how accurately the poll represents the pubic.  Statistically speaking these polls will have low margins of error, but they are usually taken with a grain of salt because AOL users are probably not representative of the public at large.

In clinical trials and long term studies, here is where attrition becomes so crucial.  First, attrition leads to smaller sample sizes and differences between groups will have to be greater to meet the statistical "bar" of significance.  But attrition can also lead to that non-quantifiable non-sampling error.  For example, if it is a weight loss study and you have more no-shows in Group A vs. Group B and according to the statistics A lost a stat.sig. amount more weight than B, does this result demonstrate the better efficacy of treatment A?  By the numbers, yes, but we need to remember that a higher attrition rate is usually associated with a program that is more difficult to adhere to.  This has a two-fold effect on the results:  non-adherants (who may well gain weight) are removed from the sample and weight calculations, and the "survivors" are likely to be a more motivated/compliant group.   Differing attrition rates (and/or missing data points from a missed evaluation), even if the differences aren't statistically significant, may well introduce non-sampling error into the study.  

The problem here is that we can only draw inferences and make wishy washy statements.  All the good study design and statistical analysis in the world cannot overcome the problems introduced that lead to non-sampling error.  Study authors should address these issues in their discussions when they arise.  Failure to do so is up there with failure to address glaring confounding variables.  But the discerning consumer of information should be aware that these are rarely addressed in depth.   In the end it is each of our own judgement calls to interpret the ultimate significance of a finding.