Famous Carlson Study Seriously Flawed

January 4, 2011

Results of a research study that skeptics have used for more than two decades to debunk and deride astrology “violated the demands of fairness and common norms of statistical analysis,” an article in the current issue of the Journal of Scientific Exploration concludes.

What’s more, in his critique, author Suitbert Ertel of the GEM Institute of Psychology in Gottingen, Germany, argues that the study organized by researcher Shawn Carlson and known as “A Double Blind Study of Astrology” followed a faulty, piecemeal analytical approach to the wrong conclusions.

A statistical expert, Ertel is Professor Emeritus at Gottingen University and the author of numerous research articles. The Tenacious Mars Effect, a book he co-authored with American Kenneth Irving, defended the statistical credibility of astrological effects reported by French statistician Michel Gauquelin.

The Carlson study is widely viewed by skeptic groups as the test that successfully demonstrated the inability of professional astrologers to correctly match individual birth charts with personality profiles generated by a standard psychological personality test. The three-part study also has been cited as evidence that individuals don’t appear to know themselves all that well, and can’t distinguish between their own and other birth charts (horoscopes) interpreted by professional astrologers.

The controversial study published in the prestigious science journal Nature in 1985 and is referred to today by more than 400 internet pages listed on the Google search engine, which is more than any other research paper of its kind.

Carlson claimed that failed tests involving student volunteers and professional astrologers from the San Francisco Bay Area “produced a strong case against natal astrology as practiced by reputable astrologers.” Professor Ertel reanalyzed data from the study and found this claim to be “untenable.”

Astrologers participating in the high profile study performed much better than may be generally believed, he observed.

Test Vehicles Described

In one of the three tests conducted by Carlson, professional astrologers were asked to match the birth charts of student volunteers with personality profiles generated by the California Psychological Inventory (CPI) questionnaire. Birth charts were cast for 116 volunteer students, and each volunteer was asked to complete the lengthy CPI questionnaire.

The birth charts of the volunteers were then divided among 28 participating astrologers. For comparison purposes, astrologers were given three CPI personality profiles with every birth chart. One of the profiles was a match for the chart and the other two were two randomly selected profiles matching other volunteers in the pool.

Simply, the idea was to determine if the astrologers would be able to collectively match the charts with the correct CPI profiles in a statistically meaningful way. As part of the assignment, the astrologers were asked to identify which profile was the best, second- and third-best “fit” with the natal chart, and to rank how strongly they felt about their choices on a scale from one to 10.

A similar test was done to see if student test subjects could identify their own CPI personality profiles from a group that included their own and two others. A third test was devised to see if the student volunteers would perform equally well, or badly, when the personality profiles were prepared by professional astrologers.

Effectively, the design for all three tests was the same. But student volunteers were not asked to subjectively rank or indicate how strongly they felt about their choices.

It was Carlson’s conclusion that the astrologers failed to match horoscopes with the correct profiles. He also reported that the student volunteers were unable to identify their own CPI profiles, and fared no better when the written profiles were prepared by professional astrologers.

A failed or chance result would mean that subjects only picked the correct profiles about 33 percent of the time (once in every three tries), which is roughly the result Carlson reported for all three tests.

When the study was completed in 1983 Carlson was a PhD candidate in physics at the University of California with no special training or expertise in psychological testing, Concerns about the methods used and conclusions reached quickly surfaced. 2 For example, Carlson was criticized for asking the astrologers to do something the CPI specifically warned against: interpreting scores without any prior experience using the test vehicle in the way it was intended to be used by trained professional psychologists.

Because the CPI results in a different profile for men and women, no psychologist would attempt to interpret the CPI without knowing the gender of the test subjects. However, the astrologers later claimed that Carlson deliberately withheld this information from them. Another concern focused on publication of the research paper in Nature’s Commentary section, the only articles section of the journal that is at the editor’s discretion. Content published here is not subjected to the peer review process.

Fairness Issues Raised

Ertel’s critical assessment of the study raised fairness issues, questioned the sample size, found flaws with the methodology used, and noted the possibility that careless data handling may have tainted the results. Even more damning, he said the researcher’s “piecemeal approach to statistical analysis” contributed to calculation errors.

The tests involving student volunteers were especially suspect. Ertel noted that half of the data ratings were so poor they could not be analyzed, and the half that could be analyzed was not carefully completed. More than one in three students failed to complete their assignments.

The astrologers were more motivated, but Ertel argues that a two-choice format (pair comparison) would have been the more fair way to conduct the tests “because this approach minimizes the complexity of the subject’s task while increasing the precision of results.”

At the time, astrologers protested that the profiles they were asked to compare were very similar. More fairly, in a test like this, astrologers should be asked to compare the profiles of individuals known to have dissimilar personality traits, Ertel believes.

He points out that Carlson initially intended to see if the astrologers could select the correct profile as either their first or second choice at a statistically higher than expected rate but “ignored his own protocol without giving reasons.” Based on his reanalysis of the data, Ertel confirmed that the astrologers were able to select the correct profile as either their first or second choice at a rate significantly better than expected by chance.

With statistical significance, the astrologers also were able to identify which personality profile was the poorest “fit,” a finding that also contradicts Carlson’s original claim. And a statistical reanalysis of the one to 10 numerical ranking system the astrologers were asked to use to subjectively indicate how strongly they felt about their individual choices weighed in their favor as well.

In view of at least two significant test results, Carlson’s claim that we are now in a position to argue a surprisingly strong case against astrology as practiced by reputable astrologers isn’t justified, Ertel insists.

Ron

Share Our Story

Category: News Research Tags: double-blind experiment, Michel Gauquelin, skeptics claims refuted, statisical evidence for astrology, Suitbert Ertel, The Carlson Study

Ron

Leave a Reply Cancel reply