Weapons of "math" destruction and how to spot them?

Weapons of "math" destruction and how to spot them?

In "Weapons of Math Destruction", mathematician and writer Cathy O’Neil discusses possible negative effects of widely used mathematical models. As a data scientist working on different models, I found thinking on this concept very useful. After reading the book, as I do for many things, I have tried to come up with a systematic way of classifying  a model as "harmful" or not. In this post I would like to share my thinking on "harmful" data science and a proposed way of spotting it.

"Weapon of math destruction"  (shorted as  "WMD") is a mathematical model which promises a lot, such as spotting people who are  likely to commit a crime, but have harmful outcomes, such as spotted people being interrogated, and often reinforce inequality, such as poor people being much more likely to be interrogated,  and creating a positive feedback loop by keeping disadvantaged to be disadvantaged.

In my opinion, there are 5 questions that reveals whether a model is a WMD or not. These questions could be easily asked as a sanity check and even be used to improve the model. For the rest of the post I will define the questions as well as developing the idea of WMD with examples from the book.

1- Do data have predictive potential?

That is first natural question. Is it actually possible to predict the output using the data available. Somebody might just ask a data scientist to create a model which predicts their soulmate using weather data. It sounds stupid right? Because it is. There is basically no relation between the weather and your soulmate.  You need better data for that task. However for some tasks the data is actually quite hard to gather thus modelers are just tempted to gather proxies.  

O'Neil has a good example on this. In 2008, a San Francisco company called Cataphora marketed a software which is claimed to be able to rate employees on different aspects including generation of ideas. The motivation on the first sight is to spot the best idea generators to keep them and hopefully make the company  more innovative. Think about it for a second. How would you rate an idea? There is no such thing as a "good idea labeling system". Moreover good ideas are very rare. What Cataphora used was a proxy of course. Their hypothesis was that good ideas spread fast. For measuring how well an idea spread, they use  the messaging system of the client company and count how many times certain groups of words were cut and pasted to be shared. If you have a post that many people share than the system would rate  you as an idea generator.

In my opinion, this is a good example of how proxies could harm. Firstly hypothesis can be discussed. It is not only good ideas which spread fast. Jokes as well go viral. So instead of finding idea generators, one might as well end up finding 9gag people. Secondly, not all good ideas spread fast, the idea can be hard/complex to share or the person generating the idea might not be well positioned in the company network for spreading his/her ideas.  Using this model without understanding its limitations could easily cause harm for the company using this product since it is easily possible to mislabel the best idea generators in the company.

2- Can we actually measure success of the model on production?

Let's imagine a company is using a model to predict if a job application is a good fit or not for them. Assume that the model has major flaws and almost always rejects best applicants and selects above average applications. Assuming no human eye sees rejected applications, there is little chance that company will ever be aware of the fact that the best applicant was actually turned down. Thus there is no way to improve this system just because there is no feedback to the model.  

3- Would we understand if suddenly the model stops working? If so with how much casualties?

World is changing and some once perfect models are bound to give bad results with time. Think about a model predicting demand for shoes. It is actually needed to retrain the model because of market shift such as fashion changing over time.  

This issue is closely related with the previous question. Especially when the output is not easily measurable, there can be severe consequences of a model stopping work. Moreover in my opinion especially non-tech people tend to think mathematical model outputs as God words. One can easily argue against a human being, but how do you prove against an algorithm especially if many people take it as granted.  

A good example is models used to predict recidivism risk which attempt to calculate the risk of relapsing criminal behavior. These models assist judges on adjusting the sentence to be given. The problem with such a model is that if the explicability of the model is not a priority, one could easily miss the point when model drifts and not gives good outputs anymore. And think about the casualties then. It might be possible to end up penalizing innocent people.

4- Is it equal? Does it create a positive feedback loop?

Inequality is a hard topic. Not everybody in the society have equal chances of thriving that's for sure. As data scientists we should make sure we are not putting another brick to this huge problem. The models we are building shouldn't be against a group or class.

An example from the book is related to credit scores.  This is a commonly used model in United States which measures credit worthiness of an individual. Naturally poor people are much more likely to have lower credit scores. At this point there is no big issue because the point of the model is to rate likeliness to pay back debt. The problem starts when companies starts hiring people based on their credit scores. This would create a vicious circle where poor cannot get job because of their credit scores and cannot improve their credit scores because of unemployment.  

5 - Is the model transparent?

This last question is actually sums up the ones above. As long as we are aware that there is a serious problem with a model we can fix it. However models are very rarely transparent. All the examples above are in a way opaque. This might be natural in some part since many models are built by private initiatives. However, banks are also private companies, we still audit them and make sure that they don't abuse the markets and our money. (With GDPR , EU took a step, in a future writing I might actually write about the details of  this regulation)

In short,  weapons of math destruction are much more common than we imagine.  Everybody working with data should be aware of them. In this weekend afternoon writing, I tried to suggest some ways to detect them. Hopefully you enjoyed and it made you think about ethical aspect of mathematical modelling.

Finally, as you can understand by this point, I enjoyed the book "Weapons of Math Destruction" a lot. I would recommend reading to you all.