The two most dangerous words in the English language: “Studies show“.
Or just as valid: “Statistics show“.
Or perhaps: “Surveys show”
I’m going to bounce back and forth between a critique of studies, statistics, and surveys because in a lot of cases an evaluation of each has the same basis.
There ARE differences, though, and I’ll point them out later.
BEGIN DISCLAIMER: This is NOT a rant AGAINST studies/statistics/surveys. Rather, it is a CAUTION that these things need to be scrutinized BEFORE accepting them as any kind of indication. Too often, posters present a link as if it were “proof” positive that their argument is true. Not so fast. Yes, I have used links to these things myself (though not that often), and I’m not suggesting that SOME are without merit. But some are just flat out junk science. END DISCLAIMER.
Each reader must scrutinize these things, and decide for themselves how valid these things are. And if you’re posting a link to one, this should be done BEFORE you post it, or else you may end up with egg on your face.
I’m not talking about the link/source itself. All that challenge does is get you embroiled in a link war that has no end. “My link is better than your link”, “My source is better than your source”, or “My expert is better than your expert.” No resolution there.
READ it and base any criticism you have on the thing ITSELF. I’ll get into “how” to read them in a bit.
If you honestly believe the thing is valid, reading it CLOSELY ahead of time will prepare you to identify any weak points and respond to the challenge. You must read it closely because your challenger will. Your challenger will miss nothing. Expect it.
There are all kinds of studies, from market studies,
to sociology studies, to psychological studies, to climate studies (which of course Al Gore invented), to “studies of studies” (typically called “meta-reviews”), but they all claim to be “scientific”.
Most often when we think of “studies”, it’s in the medical field.
Unless a medical study has been repeated many many times USING THE EXACT SAME PROTOCOL/METHOD, AND HAS THE SAME RESULTS, it is dubious at best, and certainly is NOT proof of anything. It may be a “groundbreaking” study (IOW, the first of its kind), but any author worth his salt will specifically say that and caution the reader from considering it as established science. Sometimes posters fail to mention this, which is one reason you need to scrutinize these things.
One poster here doesn’t even cite studies, but rather argues a point on it’s own merits (our member from Central California that runs a trucking business). Challengers often criticize that method as not being “supported”, and use that as a lame excuse to avoid addressing the argument itself. They just throw out the argument on the flawed notion that a valid argument HAS to be accompanied by a link. But presenting a clean argument without a link is just as valid, and AFAIC is sometimes MORE valid.
A heavy reliance on “studies” leads to a dizzying argument that often steals the focus from the whole point itself.
“As every divided kingdom falls, so every mind divided between many studies confounds and saps itself.” – Leonardo da Vinci
It is necessary to take a hard and long look at a study before forming an opinion, but many (especially Liberals) will take a single casually examined study or poll and run with it, using it as “support” for their position. What you need to probe are elements like the design and protocol of the thing, whether or not those EXACT elements have been duplicated elsewhere and had the same results, population size, population makeup, wording of conclusions, author caveats, publishing journal, peer comments, research sponsorship, etc.
Of course, you need to decide BEFORE you jump into this if it’s even worth the considerable amount of time you’ll be putting into it. Sometimes it’s not.
Now the jargon used in a study is most often obscure and confusing. So, do you need to be a PhD in that field to understand it? Most emphatically, NO. A simple understanding of the English language, and sometimes help from a simple Google search, common sense, and a good dose of patience and due diligence, will allow you to decipher the lingo sufficient to get the big picture.
For example, instead of a simple plain English, “We turned the lights on“, the tightly wound researcher might write it in the study as, “We turned the artificial illumination mode selector switch to the O-N position.” Tedious and sometimes pompous? Yes. But in defense of study authors, they are writing for their peers, not you.
Center your attention on the SUBJECT/VERB/OBJECT of a sentence. For example, in the sentence, “The development of the myelin sheath enables rapid synchronized communication across the neural systems responsible for higher order cognitive functioning”, take “development . . . . enables . . . . communication” and start with that plain English foundation. Then you can Google things like “myelin sheath” to fill in the blanks.
Unfortunately, the use of jargon allows for pseudo-intellectual authors AND posters to hide behind obfuscation, and they often DO.
CORRELATION DOES NOT IMPLY CAUSATION
“Correlations” are frequently abused and asserted as established facts. For example, let’s say a study on murder . . . “shows” . . . that people who eat blueberry muffins for breakfast commit murders more often than people who don’t eat blueberry muffins for breakfast. The study wasn’t designed to show that, though. Let’s say that the study was designed only to show that people who DIDN’T eat a certain number of calories for breakfast were more likely to become violent later in the day. But a passage in the study explains that some of the subjects were fed a breakfast of scrambled eggs, hash browns, bacon, AND BLUEBERRY MUFFINS. While the purpose of the study had absolutely nothing to do with whether murderers eat blueberry muffins or not, this little tidbit may next be seen in a New York Times headline:
“Study shows that blueberry muffins are the new “Breakfast of Murderers.”
Some years ago, newspapers ran a sensationalist headline that said, “Harvard study shows that taking estrogen for hot flashes causes cancer.”
And, the study DID indeed draw that conclusion. Except the print media neglected to add that the study also pointed out that this was only for younger women that hadn’t had a hysterectomy.
Gynecologists were flooded with older hysterectomy patients that had read the article and insisted they wanted to be taken off of estrogen therapy because the article “said” they would get cancer (reinforced of course by their notion that a “Harvard Study” CAN’T be wrong . . . and the papers made “Harvard Study” prominent.) All that sensationalist headline did was create a lot of women that suffered hot flashes needlessly . . . and sold a lot of papers.
Sensationalism sells papers, and generates hits on web pages, but unless you read the study, you don’t have all the facts.
Researchers will frequently have a statement, often with the flavor of an afterthought, that says something like, “Further study controlling for some correlations needs to be done.” And in that same study, they’ll draw a conclusion that says something like this, “We have shown IN THIS STUDY that since there’s a correlation between . . . blah, blah, blah . . . that . . . blah, blah, blah . . . is the likely cause of . . . blah, blah, blah.”
Somewhere in that study should be a mathematical expression of the “significance” of that correlation. Sometimes it’s just simply stated in the conclusion as a “significant correlation”, which is pretty vague and doesn’t tell you much unless the study adhered to a strict scientific definition of “significance”. Some do and some don’t. That’s why you need to scrutinize the study.
For example, the size of the population studied has a lot to do with the mathematical expression of significance. The larger the population, the more RELIABLE the significance number, and vice versa. You can’t just take the studies conclusion as fact.
THE FAMOUS “P-VALUE”
I’m not going to go into the jargon of mathematical representations of statistics . . . indeed, a poster who posts something like “P(θ | Y ) , which tells us what is known about given knowledge of the data, is called the posterior distribution of θ given Y, or the distribution of θ a posteriori. The quantity k is merely a “normalizing” constant necessary to ensure that the posterior distribution P(θ | Y) integrates or sums to one” immediately raises a red flag for me.
That person is either a pseudo-intellectual impressed with him/her self and trying to impress you with his/her towering intellect, or is counting on obfuscation to fool you, bore you, or otherwise get you to throw your hands up in surrender and think, “I don’t understand a thing he/she is saying. It’s way over my head, so this person must know more then I do, so he/she is probably right . . . in any case, I can’t argue.”
That’s precisely the reaction they’re looking for, and most often get.
So lets simplify a study and put it into plain English. Of course by doing that, there’s unavoidably going to be some generalizations that can be attacked by that same pseudo-intellectual as “technically” inaccurate. I’m not writing this for “technical” review, but rather to enable you to get your arms around the concept . . . which is exactly what this pseudo-intellectual DOESN’T want. He/she is the custodian of the tree of forbidden knowledge, and only he/she can know this stuff.
B.S.!!! You don’t even need to climb that tree. Just cut it down.
Again, you don’t need to be a nerd to get the general idea.
Let’s go back to that notion that a study may reveal that an event is related SIGNIFICANTLY to another event. Remember when I said, “Somewhere in that study should be a mathematical expression of the “significance” of that correlation. Sometimes it’s just simply stated in the conclusion as a “significant correlation”, which is pretty vague and doesn’t tell you much unless the study adhered to a strict scientific definition of “significance”. Some do and some don’t.“?
Well, the “p-value” is often presented as that “mathematical expression“. But what they don’t tell you is that this “p-value” only pertains to what researchers call the “null hypothesis”. That “null hypothesis” is another phrase shrouded in mystery for all but those that deal with it regularly.
So, let’s first put this “null hypothesis” stuff into plain English.
What is a “hypothesis”? It is simply a provisionally adopted supposition used to explain certain observations, and to guide in the investigation of others . . . hence, it’s frequently called a “working hypothesis”. It is a concept that is not yet verified but that if true might explain certain phenomena . . . or might not.
The hypothesis that chance alone is responsible for the results is called the “null hypothesis”.
For example . . . a certain drug may reduce the chance of having a heart attack. Possible null hypotheses are “this drug does not reduce the chances of having a heart attack” or “this drug has no effect on the chances of having a heart attack“. The test of the hypothesis consists of administering the drug to half of the people in a study group as a controlled experiment. If the data show a statistically significant change (measured as the “p-value”) in the people receiving the drug, the null hypothesis is rejected.
Notice in this example that the p-value says absolutely nothing about alternate hypotheses, only that there is . . . SOME significant relationship between this drug and not having a heart attack (that is, if the p-value is less than 0.05 . . . an arbitrarily chosen number that is called the “significance level” . . . and I’ll get back to that in a bit.) The p-value says that it’s not chance, BUT SAYS NOTHING ELSE. The p-value ONLY separates the results from statistical background noise.
Rejecting the hypothesis that a large paw print originated from a bear does not immediately prove the existence of Bigfoot.
OK, let’s get back to statistical evaluations of results for a second.
There are two methods of statistical evaluations, and both are very very different.
A fellow by the name of Ronald Fisher developed one method and that method is accepted in most scientific studies . . . and p-values are an integral part of what Mrs. Fisher’s son theorized. OTOH, a fellow by the name of Thomas Bayes developed another method . . . called Bayesian Statistics. The p-value is NOT an integral part of Bayesian Statistics.
Fisher was a neo-Darwinian and of him, Richard Dawkins said Fisher was “the greatest biologist since Darwin“. Fisher was a proponent of evolutionary biology and, like Margaret Sanger, Eugenics (he was instrumental in forming the University of Cambridge Eugenics Society, along with Horace Darwin, the son of Charles Darwin).
In the 1920′s he wrote a book titled “Statistical Methods for Research Workers“, and in the 30′s he wrote a book titled “The Design of Experiments“. Both those books have since become standard works used by many of today’s researchers and universities.
He was a strong opponent of Bayesian Statistics.
Thomas Bayes developed his methods in the 18th century, and a lot of them were based on the work of mathematician Pierre-Simon Laplace.
My point is not to pit political philosophies against each other as the basis for selecting statistical methods (though it may be revealing that universities . . . typically dominated by liberals . . . generally use Fisher’s methods, though Bayesian Statistics is now rising from the ashes at some universities to challenge Fisher’s methods), but rather to highlight that the p-value is not all it’s trumped up to be.
(Of universities, Thomas Sowell said, “Few professors would dare to publish research or teach a course debunking the claims made in various ethnic, gender, or other ‘studies’ courses.“)
The p-value is frequently misunderstood and often presented by that pseudo-intellectual as “proof” that an argument is true. While it can be useful, particularly when it comes to rejecting or accepting the null hypothesis, in interpreting p-values one must also know other elements in the study, like the sample size.
For example, let’s take the coin toss. If I toss a coin 50 times, there will be a p-value. If I toss a coin a hundred times, there will be a p-value. But the reliability of the p-value for the 50 coin toss will not be as great as the reliability of the p-value for the hundred coin toss. (So says BAYESIAN STATISTICS, not necessarily Fisher’s brand.)
Let me put it another way, and at the same time address what I said I’d come back to when I referred to the 0.05 value as being arbitrary. The generally accepted p-value threshold is 0.05 . . . by those that favor Fisher’s method that is, but that number is designated by the researcher him/her self. Over 25% of studies using that p-value threshold fail to be duplicated. Some have suggested that a p-value threshold of 0.005 would reduce those false positives. “Very few studies that fail to replicate are based on P values of 0.005 or smaller” says researcher Valen Johnson of Texas A&M University.
So, at the end of the day, don’t buy into the p-value argument ALONE . . . there are quite a few other factors that must be considered IN ADDITION TO THE P-VALUE.
One disadvantage to the modern obsessive attention on p-values is the emphasis it places on statistical significance TO THE EXCLUSION OF CONFIRMATION BY REPEATED EXPERIMENTS. Often, a paper will be perceived as not needing duplication if it “passes” the p-value threshold.
Statistical hypothesis testing is misunderstood, overused and misused.
Given the problems of statistical induction, one must finally rely, as have the older sciences, on replication . . . but that’s not the trend. The alternative to significance testing is repeated testing, but that requires funding and resources that often are not forthcoming.
This is often seen as one of the two essential elements of a foundation that renders “proof” that a paper is factual. The other element of course is “p-value”. If a paper both passes the threshold of p-value AND peer review, it is often presented as “fact”.
Again, not so fast.
“Peer review” is seen as the Gold Standard (or sacred cow, if you will) of publishing. I wouldn’t disagree that it is something to be considered, and it is something that I always want to see (more on that in a bit), just like I want to see what p-value was calculated (AND what the null hypothesis was specifically), but neither peer review nor p-value represents the WHOLE picture.
Pseudo-intellectuals often hang their hat on p-value and peer review, as though that was all that was necessary to validate the study. Not so fast, again.
Let’s look at this “peer review” business a little closer than just the buzz phrase that the unwashed masses view as this sacred cow. I mean, when they hear that phrase . . . that’s it, period . . . no more scrutiny necessary. After all, if it’s been reviewed and “approved” by scientists that are much more knowledgeable in that field than me . . . who am I to question their judgment?
You guessed it . . . NOT SO FAST.
That is exactly why the pseudo-intellectuals shroud these things in mystery. THEY DON’T WANT YOU TO THINK YOU HAVE THE ABILITY TO SCRUTINIZE THESE THINGS.
First of all, the phrase “peer review” as it applies to studies is actually a misnomer. Those that examine studies for the accuracy and quality necessary for publishing in journals are known as “REFEREES”, and the study/paper is REFEREED, NOT PEER REVIEWED. The phrase “peer review” applies to the scrutiny necessary for FUNDING OF GRANTS.
If a scientist submits an application for a GRANT, to say, the National Science Foundation, or the National Institutes of Health, then that application is PEER REVIEWED. If a scientist submits a study/paper for publication in, say, the journal Nature, then that article submission is REFEREED.
But the phrase “peer review” has become recognized by the unwashed masses as what’s done to studies/papers in journals, so we’ll use “peer review” in that context (even though it’s not accurate . . . but I’m not going to stand on nitpicking technicalities).
Peer review became popular only in the past few decades, although it was used going all the way back to the mid 1600′s. That recent popularity is NOT based on any new notion that peer review enhances the credibility of a paper, but rather relieves overworked journal editors of the burden of reviewing thousands of papers (see my fourth and last “TRIVIA” item below.)
Now let’s take a look at some of the trivia, flaws, and criticisms of peer review.
* TRIVIA: Watson and Crick’s breakthrough on DNA was NEVER subjected to peer review.
* TRIVIA: Many papers that have been cited in work that won Nobel Prizes were originally rejected by peer review.
* TRIVIA: Edward Jenner’s paper on vaccination for smallpox was rejected by some peer review people.
* TRIVIA: In 2013, some 10,952 papers were submitted to the journal Nature. In 1997, there were only 7, 680 submissions.
* Reviewers seem biased in favor of authors from prestigious institutions (the “halo effect”). In a study in which papers that had been published in journals by authors from prestigious institutions were retyped and resubmitted with a non-prestigious affiliation indicated for the author, not only did peer reviewers mostly fail to recognize these previously published papers in their field, they recommended rejection.
* The chairman of the investigating committee of the Royal Society told a British newspaper in 2003, “We are all aware that some referees’ reports are not worth the paper they are written on. It’s also hard for a journal editor when reports come back that are contradictory, and it’s often down to a question of a value judgment whether something is published or not.”
* He also pointed out that peer review has been criticized for being used by the scientific establishment “to prevent unorthodox ideas, methods, and views, regardless of their merit“.
* In one study, researchers deliberately inserted errors into a manuscript, and referees did NOT detect some of them.
* The deputy editor of the Journal of the American Medical Association once said, “There seems to be no study too fragmented, no hypothesis too trivial, no literature too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self-serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”
* The editor of the British medical journal The Lancet once said: “The mistake, of course, is to have thought that peer review was any more than just a crude means of discovering the acceptability . . . not the validity . . . of a new finding. Editors and scientists alike insist on the pivotal importance of peer review. We portray peer review to the public as a quasi-sacred process that helps to make science our most objective truth teller. But we know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong.”
* Competitors are often chosen as peer reviewers. Might a competitor be inclined to unfavorably review a submission and then steal the idea for him/her self? The irresistible opportunity to put a spoke in a rival’s wheel?
* Peer review in journals assumes that the article reviewed has been honestly prepared and the process is not designed to detect fraud. It assumes ALL scientists are integral, IOW not subject to human flaws. A peer reviewer must preserve scholarly integrity by rising above the three deadly sins of intellectual life: envy, favoritism, and the temptation to plagiarize.
Peer review is under reconsideration even within the heart of establishment scientific publishing.
But the most damaging criticism of peer review may be that which is exemplified by the cloning hoax of Hwang Woo Suk. (http://news.bbc.co.uk/2/hi/uk_news/magazine/5181008.stm)
Hwang submitted a paper to the journal Science which was later found to be hugely fraudulent. Of course, it passed peer review. It could NOT have been duplicated simply because the results were totally fabricated. In this case, DUPLICATION, NOT peer review, would have uncovered the hoax. To paraphrase what I said earlier, one disadvantage to the modern obsessive attention on PEER REVIEW is the emphasis it places on that peer review TO THE EXCLUSION OF CONFIRMATION BY REPEATED EXPERIMENTS. Often, a paper will be perceived as not needing duplication if it “passes” the PEER REVIEW threshold.
Of course, no one should expect a perfect system, or condemn peer review as a whole for its occasional failures. and I’m not doing that. Peer review is like democracy was to Churchill: “the worst form of government except all those other forms that have been tried from time to time”
What I’m pointing out is, much like p-values, it ain’t all it’s trumped up to be. It forms ONLY part of the picture, and to hang your hat on it shows me that your analysis of a study is extremely flawed.
Oh . . . I almost forgot. I had said earlier that peer review was something I would like to see. Most journals maintain peer reviewers in anonymity, and the identity of a peer reviewer is a closely guarded secret, generally held ONLY by the journal chief editor. Peer reviewer identities are not normally published (there are exceptions). Consequently, one CANNOT normally see who a peer reviewer was, whether or not he/she is a competitor, and perhaps more importantly, WHERE he/she draws financial support from. The only thing you DO see is that the article got published, which means it passed peer review.
The anonymity of peer reviewers contributes to the “Oz-behind-the-curtain” effect: Reviewers that work anonymously have a greater opportunity to act arbitrarily. The REVIEWEE has no comparable curtain to stand behind. Basically, the REVIEWER can take potshots at the REVIEWEE with NO accountability.
At least one chart is likely to appear in a study . . . sometimes more than one. Very rarely is it simple, especially if it’s a line chart. Very often line charts show three axes, x, y, and z, so that a representation of data points is three dimensional.
If the chart meaning is not immediately apparent to you, move on to the text and come back later. Your goal is to just get your arms around the big picture, and frustration (which is what you will experience looking at some charts) can be a roadblock to that. Focus on that subject/verb/object stuff.
I said earlier that I would speak to the difference between STUDIES and POLLS/STATISTICS/SURVEYS. Here it is: Pollsters present RAW NUMBERS and fastidiously refrain from interpreting results while Study authors DO interpret results.
In a poll, interpretation is left to the talking heads in the media, and the reader.
Yes, too often they’re seen as certainties, and not the guesses they really are. Pollsters DON’T draw conclusions, they just sell their polls to the media talking heads. The talking heads then say something like, “Polls/Statistics show . . .” and then portray their conclusion as a fact.
Disraeli and Twain nailed it.
“There are three kinds of lies: lies, damned lies, and statistics.” – Benjamin Disraeli
“Facts are stubborn, but statistics are more pliable.” – Mark Twain
The only place I know of where statistics are of ANY value is sports.
Any evaluation of a poll/statistic/survey has to consider the manner in which questions were asked.
Does the media know how the question was asked exactly (more on that in a bit)? Noooooo.
Do they know if the respondents were being truthful? Nooooo.
The pollster’s “margin of error” is supposed to take care of the “truthful” part, but I think it’s more of a marketing tactic. I mean, who’s going to buy a poll where the margin of error is plus or minus, say, 50%? And the point of the whole thing is to SELL IT.
Pollsters are very good at phrasing questions that will get results necessary to SELL their polls . . . like the oft used example of this, “Are you still beating your wife“, with the result being maybe “39% percent say they beat their wives.” A poll like that can be honestly represented as accurate, but unless you know how the question was phrased, you can’t really evaluate it.
And pollsters are not compelled to reveal how the question was phrased. In the example above, they can say, “We asked 1000 Americans if they were beating their wife“. They can leave out the word “still” and yet consider that the question was honestly given to the media/reader . . . that’s what I mean when I say “Does the media know how the question was asked exactly?“.
Now I wouldn’t deny that major polling organizations of any kind are not going to risk their reputations by inserting a bias. But only one that is NOT easily detected by such as the unwashed masses . . . which pretty much allows ANY poll except the most blatantly biased.
Finally, if these conclusions are indeed objective facts anyway, then we had a lot of contradictory “facts” election eve, when the conclusions the talking heads drew were all over the map . . . some concluded from the surveys that Romney would win (that would be the “Carl Rove school of poll interpretation“), and some talking heads concluded from the surveys that Romney would NOT win (sometimes the very same surveys were used to conclude just the opposite.)
That alone to me seems that conclusions drawn from surveys are NOT necessarily . . . FACTS.
So when poll results are presented, much like Google results, I don’t consider them credible until scrutinized.
As I said in the opening, the point of all this is that SCRUTINY is necessary before a Study/Poll/Statistic/Survey can even be established as support for an argument. And YOU DO have the capability to evaluate these things. Don’t let that pseudo-intellectiual “towering intellect” slip one through by making you throw your hands up in frustration. That’s exactly what they are hoping.
(BTW, I’ve used the word “pseudo-intellectual” a lot here. What I mean by that is the individual that thinks they have everything figured out, including, but not limited to: life, religion, politics, education, and child rearing. Ironically, they have never quite experienced either of the aforementioned. It is someone who acts pretentiously and wishes to win an argument or impress, rather than modestly trying to find the truth . . . a focus on rhetoric over content. These people often show a superficial understanding of a subject. We have a few here at RO. They exhibit pre-pubescent behavior and frequently post links to studies, and are actually a dull-witted lot . . . universities are full of ‘em.)
NOTE: I’ve been holding on to my sig line: “Latest survey shows that 3 out of 4 people make up 75% of the world’s population.” for this very blog.