Saturday, March 04, 2017

Big Data Big Concerns

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Cathy O’Neil
Allen Lane, 2016
pp.259. Price £.12.99

These are the days of big data. This seems to be the new fix for all the problems that we forsee, particularly in providing technology enabled solutions for the most pressing problems of the world. Big data will help you diagnose diseases, it will predict frauds, tell you the patterns in customer behavior, and of course, there is a whole host of free stuff that you will get in return for authorizing an app to use your personal information. What possibly started as the Google experience – where you get the ease of searching with the non-intrusive big pasted advertisements or pop ups, has now become almost a lifestyle. While we merrily share data, the intrusion of commerce into our lives is subtle and slowly we are unable to see where our private persona ends and the public persona takes over.

Cathy O’Neil has been there and done that. She has worked on big data and has seen how the modelling happens, and how the results are interpreted from close quarters. She recognizes the importance of big data and the benefits it brings. At the same time, O’Neil puts out a warning bell on the indiscriminate use of big data for modelling on real lives – how this analysis does not consider exceptions and how even when exceptions are found, they become a data point for recaliberating the model, a human being or a life seen as a data point, falling down by the wayside as a collateral effect in a larger journey of data becoming commerce. It is an important voice to be heard when the big advocates of the JanDhan-Aadhar-Mobile trinity are talking of India moving from a data poor country to a data rich country. We need to understand the meaning of data rich and its implications.

O’Neil talks about where the data and patterns would be useful: Certainly in baseball games (or for that matter in cricket) where you could use this to analyse the opponent team and make your strategies. In the process you are making the game even more interesting and not killing anybody. However, what happens when the data that you use turns out to be circular and possibly leads to patterns similar to racial profiling in crime data? Herein comes the problem. Because, what big data does is exactly what our minds do – create patterns – based on past experience. These patterns would keep the exceptions out as “errors”. But what happens to these exceptions in real life? Would they become a victim of a predictive model? This is an important question to ask. This question then leads us to consider that more and more “scientific” models would have an objective way of getting people in, but will have no objective way of making exceptions. Afterall each human being is an exception and unique. While it is okay to make a game based prediction, how fair is it to take legal action based on a suspected movement, just because the machine told you so?

O’Neil brings in interesting human stories – of those who were victims of big data based models – who became collaterals in the performance modelling. Like the story of Sarah Wysocki and other teachers who were classified as failures because the district administration had used one of the sophisticated models. Firstly, firing her was an “error”. A large part of the evaluation was based on the difference between what her students scored when they came in and what the scored when they went out and there was no objective way of telling if they had come in with artificially inflated scores by the previous teacher who actually helped the students to score better with their own intervention. Secondly, the fact that she was fired was an error was not even reported back for the system to learn. Most of the Big Data models work as black boxes, without as much feedback that is necessary to train the models. In any case using Wysocki as a data point in itself should be an ethical and moral problem.

Given that we are on the verge of many tech-enabled start-ups coming into the fray to help the inclusive business – say peer to peer lending, payday lending, cross selling of third party products, the scene is getting scary. There are companies that are building credit behavior models based on the data mined from facebook, whatsapp posts and geo-locations. Big brother could never be watching the customers for preying so well, ever before. In this context it is important to read this book and look at the limitations of data and seriously examine the ethical limits of the machine invading our lives and making decisions for us.

Cathy O’Neil’s book is in the same league as Michael Sandel – though not having that width or depth – of reminding us of the limits of commerce, bringing fairness to the fore and asking difficult questions on whether poor and customers are to be seen as data points or as active and alive human beings. This is a book that should be a must read for all the youngsters building “apps” to play around with human behavior and all the venture funders who encourage these youngsters. It is important that they stand up to the start ups.


No comments: