research – Trish Thinks…

Was that last contraction 6 or more like 7 units of pain Mrs Frazer?

I’m Trish, and I’m a quantophile. The potential of quantitative research to answer seemingly unanswerable questions about our innermost experiences is what first attracted me to psychology. And over a decade later, I’m still in love. But the rose tint is definitely starting to wear off as I see more and more examples of balanced judgement replaced with mindless quantification.

If you have had any formal training in research methods you will have learned that all measurements should be both reliable and valid. Reliability involves types of consistency. If Doctor 1 and Doctor 2 both administer a structured interview to assess your mental health they should agree on whether you are clinically depressed, and at what level of severity. You can even use some statistical wizardry to put a number on how ‘reliable’ your measurement is. Your friends and classmates will gasp in wonder as numbers pour satisfyingly into your output file!

Establishing validity is a lot trickier than reliability. It taps into what our measurements and results mean. This can be particularly tricky when we try to measure a new construct such as “Mental Health Literacy”. For example, in a study by Aromaa, Tolvanen, Tuulari and Wahlbeck published in 2011 agreement with the statement “Antidepressants have plenty of side effects” was used to measure “personal stigma” in relation to depression. Endorsement of this statement was assumed to reflect a lack of “realistic” views about medication. And yet the Royal College of Psychiatrists’ position statement on antidepressants released earlier this year states that reactions to antidepressants can range from “an overall improvement in levels of depression and quality of life, to feeling the benefit of functioning better while suffering adverse side effects, to finding them ineffective with intolerable and harmful side effects” . So a lot hinges on what exactly is implied by the word “plenty”. It seems a pretty soft subjective bedrock on which to build a firm, objective science of attitude change intervention. The association for psychological science states that there are “more than 280 different scales for assessing depression” in current use. To paraphrase the (possibly apocryphal) statement by Einstein if the scale worked, one would be enough.

Even if we do manage to measure our constructs of interest accurately, there are still many pitfalls to beware of in their analysis and interpretation. For example the hallowed ‘p’ value. You probably remember that a p value is the chance that your results were a ‘fluke’. It isn’t though. Nor does it tell you how important your results are or anything about your effect size. The p value is so widely misunderstood that Haller and Kraus in 2002 administered a quiz testing 6 frequent misunderstandings of the p value and found that not only did 100% of students sampled make at least one mistake, but so did 80% of instructors.

Created by Brendan Powell Smith http://www.TheBrickTestament.com, not endorsed by Lego

Pain is a subjective, complex and mysterious experience, but one that we nonetheless must strive to measure in some objective way for research to take place. We use tools like visual analog scales that allow us to rate pain numerically, by placing a mark on a line, or with smiley faces. These all tap into to relative and continuous nature of pain by asking us to relate our current pain to an absolute absence of pain or “the worst pain imaginable”. This makes intuitive sense. Since pain does not correlate perfectly with tissue damage or any other objectively observable physical signal the research attempting to assess the validity of these pain scales usually relies on reliability assessments (people rate previous pain experiences similarly over time, vulnerable to all the biases in recall), and the level to which they are affected by pain relief. Since the pain assessment tools are often developed in order to reliably test pain relief strategies there is a certain circular logic at play that I find very irritating while being asked repeatedly to rate my labour contractions on a scale of 1- 10 to establish that my epidural has not been effective.

It is unrealistic to think we can get by currently without rating scales in psychology or medicine, but we should be aware that a tool that might help with research or provide a useful aggregate with which to compare groups may not be the most useful tool to connect with and understand the person in front of us.

Sense about science are publishing a Data Science Guide to help the public critically evaluate the sea of seemingly meaningful numbers we are bombarded with on a daily basis. Their advice, when looking at claims based on data analysis of any kind, is to always ask yourself:

Where does it come from?
What is being assumed?
Can it bear the weight being put on it?

Sound advice. So, how does psychology as a discipline measure up in applying this advice? A systematic review of 433 scales reports that around 50% of them cited no evidence to support validity whatsoever (see para 8). Like the lego pain scale, it seems we are relying on face validity. I would love to hear your thoughts- is this a problem for psychology?

Baby Frazer, by the numbers Birthweight: 4.33 kg Overdue by: approximately 252 unusually long hours Pain caused: 8-9 on the ‘lego’ scale

Will the real Rosenhan pseudopatients please stand up?

If you have taken more than a passing interest in psychology, at any level, you have almost certainly come across the Rosenhan Experiment in which 8 pseudopatients claiming to hear a voice were admitted to psychiatric hospitals and diagnosed rapidly and fixedly with various psychotic conditions. Even if some other aspects of your course (like hand calculation of correlation coefficients for example) left you cold, I’m willing to bet this piqued your interest. As a lecturer in mental health I speak about this study to students at least 3-4 times each year, and I have yet to tire of it. If anything, my fascination deepens.

Although most students are familiar with the main thrust of the study and its outcome, it is so familiar that most don’t go and read the original paper in full, which is a pity. They miss some juicy details, like the ways in which the patients’ every behaviour was interpreted as pathological in the medical and nursing notes. “Patient engages in writing behavior” illustrates neatly how context is all important in our perception and interpretation of behaviours as symptoms. The “oral-acquisitive nature of the syndrome” of schizophrenia was the supposed cause of patients loitering outside the canteen before meal times on a ward where there was quite simply nothing else to do.

My favourite detail is that the great Rosenhan makes an error in describing an everyday statistical concept in this paper! There is hope for us all (I will post a lollipop to anyone who can find it).

Although the paper is largely descriptive, there are some interesting numbers in there too. For example, the mean number of minutes per day that the hospitalised patients had contact with psychologists, psychiatrists or physicians was 6.8. This figure includes the admissions and discharge interviews as well as group and individual psychotherapy sessions. The 8 patients collectively were administered 2,100 pills. Despite all but one of the pseudopatients striving to be released after the first day the length of hospitalisation ranged from 7 to 52 days and many retained a diagnosis of “schizophrenia in remission” on discharge.

Publication of this paper in 1973 caused a lot of controversy, with psychiatrists defending the validity of their diagnostic systems and pointing out that many medical illnesses can be feigned without causing us to doubt their validity.

Psychologists have generally embraced the study as a powerful demonstration of the biasing effect and stickiness of diagnostic labels. For some it may be perceived to bolster the argument that using a medical model to research and treat mental health difficulties just doesn’t work. That it is still being discussed and articles being published on how it is represented in textbooks is testament to the enormity of its influence in this field of psychology.

David Rosenhan passed away in 2012 after a long and successful career, finishing at Stanford. The pseudopatients (“three psychologists, a pediatrician, a psychiatrist, a painter, and a housewife”) have to my knowledge never been identified or come forward. I would love to hear from them, or their friends or relatives in the case that some of them have passed away. There was also a ninth patient who violated the ground rules by giving false information outside the claim to hear a voice in his admission interview.

So is the infamous Rosenhan Experiment a damning indictment of psychiatric diagnostic systems that still exist today, a merely historical embarrassment that serves to demonstrate how far we have come, or a gimmicky but unscientific reminder that doctors are human too, and can be fooled like the rest of us?

What do you think?

Update to this article: In November 2019 Susannah Cahalan published a must-read book (The Great Pretender) for anyone interested in this topic, suggesting aspects of the study were fraudulent. Thanks to the commenter who mentioned this book.

How can we address mental health stigma at work?

Interest in the stigma surrounding mental health difficulties has been increasing amongst researchers and health practitioners, and with good reason. Experiencing discrimination and negative attitudes as a result of mental health difficulties can lead to social isolation and reduce the chance of recovery. Those who have experienced psychosis have even been presented by the media as dangerous , with some sufferers describing the prejudice they have faced as worse than the symptoms themselves. This stigma can have a serious impact on all areas of life, as well as on business and employment, both for individuals and for the wider organisational culture.

Much effort has rightly focused on calls for funding to decrease stigma and improve attitudes. However, if the budget were to double, or even triple, tomorrow would we know how to spend it? How much do we know about which specific attitudes are most harmful to well-being and recovery, and how to change them? Researcher John Read, along with Clinical Psychologist and voice-hearer Jacqui Dillon have questioned the efficacy of many well-intended campaigns to reduce stigma that have been based on promoting the idea of equivalence of mental illness with physical illness. The “illness like any other” approach can lead to decreased stigma around help seeking, but can also lead to a reduction in the perceived potential for recovery and an increase in perceived dangerousness and unpredictability . One 1997 study (harking back to the dark days of the Milgram Experiment) even found that emphasising biomedical ‘illness’ type explanations leads to a higher amount of ‘electric shocks’ administered to a research confederate posing as someone who had experienced mental health difficulties. Still other researchers have found that concentrating on whether mental health difficulties are ‘real’ biological illnesses or not has no impact on stigma at all (as did our own online experiment).

And then we have the equity versus equality debate. Is our goal for employers and colleagues to be ‘blind’ to cognitive and emotional problems the way we might talk about being ‘colour blind’ when it comes to race issues? Or is it more about providing the necessary supports to increase individual performance and fulfillment at work for individuals who might have specific needs? Mental Health issues are covered under the term ‘disability’ in employment legislation in Ireland. Are those affected comfortable with the perception of their experiences as a disability?

What are your thoughts on what reduced mental health stigma should look like in employment? And what are your ideas on how we can get there?