The problem with predicting suicide
An introduction to testing metrics and to why no tool in existence can accurately predict suicide
Editors note: Hey! I’m the editor too! I’ve been under the weather for the past week so writing was unfortunately something that had to be sacrificed. But I’m back and on the mend and it’s time to get back to it! Sorry for the absence.
Suicide is unpredictable. That kind of spoils the ending, doesn’t it? But I want you to spend a moment to think about any characteristic, and the context in which it could or could not be a predictor of suicide.
I spent a few moments and thought of three well-established “predictors” of suicide.
Male sex - Being male or male-identifying seems predicts a 2-3 fold chance of suicide. But of course, most men don’t die of suicide, and it would be silly to think being male is a “risk factor” for suicide.
Mental Illness - seems to increase the risk by about 5-6 fold, but I suspect someone who has been through mental illness and is now feeling much better is much closer to their baseline risk, and, in fact, they could be now someone who helps others through crises and is at less risk than the average person because of their resilience and experience.
Being divorced/losing a romantic relationship - universally, we see this as predicting a 1.5 to 2.5-fold increased rate of suicide. However, if you are divorced/separated from an unhealthy, toxic relationship, I suspect your suicide risk decreases instead of increasing.
These “imperfections” to our risk factors cause problems when it comes to predicting suicide, most of all because suicide, overall, is a rare event. In the United States, it occurs approximately 12 times per 100,000 people per year (0.012% per year), but obviously varies by a number of demographic and risk parameters. The uniqueness of the suicide prediction problem owes to this very rare nature, and a very important quality of a predictive test: the positive predictive value.
Test Metrics 101 - the LEFTY for Left-Handedness
These concepts can be really tricky to wrap your brain around, so let’s work on this together! We’ll develop a simple test called the LEFTY to predict left-handedness (which occurs in 10% of the population). After we develop it, we deploy it on 100 known left-handed and 100 known right-handed people, and we get the following properties for the LEFTY:
10% false positive rate - a false positive is a right-handed person that the LEFTY got wrong and said was left-handed.
1% false negative rate - a false negative is a left-handed person that the LEFTY got wrong and said was right-handed.
So we now have our specificity and sensitivity table, that looks like this:
Anyone who knows test metrics knows that a 99% sensitivity AND 90% specificity seems quite good! The LEFTY test seems really well designed for detecting left-handedness. So let’s deploy it in the real world. Remember, left-handedness is 10% of the population.
So we go to a school district with 1,000 students. If randomly distributed, statistics dictate that about 900 will be right-handed and 100 will be left handed. And we know the sensitivity (99%) and specificity (90%) of the LEFTY. True to its form in the test environment, we get the following results:
Yay! The Sensitivity remains at 99% (only 1% false negatives) and the specificity at 90% (only 10% false positives!). Our LEFTY test has been validated in the real world!
BUT WAIT… do you see a problem here? there are almost as many false positives (99) as there are true positives (90)!
Enter the PREDICTIVE VALUES
And here lies the problem. Because the NEGATIVE result of the test occurs in 90% of the population, the false positives (true NEGATIVES that test positive on the LEFTY) will significantly increase.
In this case, we have what is called the POSITIVE PREDICTIVE VALUE (the total number of true positives divided by the sum of the false and true positives), which is 99 ÷ ( 99 + 90 ) = 0.524 or 52.8%. We also have the NEGATIVE PREDICTIVE VALUE (the total number of true negatives divided by the sum of the false and true negatives), which is 810 ÷ ( 810 + 1 ) = 0.999 or 99.9%.
What does this mean?
For the LEFTY in the real world, the specificity of 90% creates issues in prediction, as there is only a 52.4% chance that the LEFTY predicted left-handedness correctly. On the other hand, a negative LEFTY test almost certainly (99.9%) predicted right-handedness. While 52.4% is certainly better than the 10% that random chance would give you, imagine the issue here practically. If you were to devote left-handed resources or left-handed specialty training or left-handed education to these LEFTY-assigned people, almost half of them would be right-handed! So yes, the LEFTY remains a pretty good test in terms of specificity and sensitivity, but ironically, it’s much better at EXCLUDING left-handedness than it is predicting it.
Agh, so many numbers! My brain hurts.
OK, so what does this have to do with suicide?
Suicide is much much much much rarer than left-handedness, especially if we are to consider predicting it in the general population. In the general population, like I said, we are talking about 0.012% in a given year. This places a huge burden on our tests! We now have a situation in which the math gets phenomenally distorted. Let’s design a suicide test with FANTASTIC properties:
Tyler Black Incorporated releases the SUICID-O Test! The SUICID-O Test has a 99% Sensitivity and a 98% Specificity for predicting suicide within a year.
Sounds great, right? In fact, 99% Sensitivity and 98% specificity is better than any predictive algorithim for suicide that has ever existed! However, the problem with the positive predictive value becomes abundantly clear:
Even though the sensitivity and specificity remained, the base rate of not dying by suicide in a year makes the 2% false positive rate SO MUCH MORE common than the true positive rate. In other words, for every 1 person that the INCREDIBLE SUICID-O correctly predicts, there will be 168 people who don’t die of suicide. This renders it mostly clinically useless. I suppose it could be useful if you wanted to direct, for example, public health messages at “high risk people.” But, say, if you hospitalized or treated these people, in a year, you would only be capturing 1 out of 169 suicides! 168 out of 169 would be detained or treated for a year, for a problem they won’t have.
For those curious, in the real world the best I’ve seen a test do test is 94% sensitivity and 64% specificity. Here’s what the table looks like in that scenario:
That’s right! The best tests we have show a positive predictive value of 0.03% in the general population; in other words, 1 in 3200 positives will be truly positive.
That leads us to, you know, what I said:
Suicide is not predictable. Suicide prediction should never be your goal in suicide risk assessment.
Whatever new fancy tool you read about, they will throw all these numbers at you (like high sensitivity or specificity), and it will not mean anything to you clinically. As of the time of this writing, there is no clinical value to any suicide prediction tool in existence.
That doesn’t mean that we throw our hands up and do nothing, but it does give us the incredible opportunity: don’t try to do the impossible.
I don’t know what the future holds. Maybe AI and machine learning and the worlds best suicidologists can create something with a true clinical utility… but for now… I just let prediction go. It’s not possible. I don’t stress about it.
In my day job as an emergency child and adolescent psychiatrist, I’ve assessed and managed thousands of kids in suicidal crisis! Why don’t I just quit trying to assess suicide risk? How do I still do this job and love it? Well, now that’s the cool part. Stay tuned for a future post: “Our predictions suck, but our approaches work!”