AI: Catching Criminals Before the Act! (Part 1)
Movie plot in the making
This seems like a Hollywood movie plot: big brother is not only watching you, but is predicting whether you’ll commit a crime in the future so they can arrest you before you know you’ll even commit it. The idea is not new. (Tom Cruise starred in a movie whose central theme explored this future back in 2002). But the technology and its application to the gritty world of criminals are new.
When we read up on emerging technologies, the ideas we see often blur the lines among reality, science fiction and even Harry Potter-esque fantasy. (For example, Microsoft recently asked to patent a wand for use in augmented reality applications). According to the Financial Times, a major world power intends to make crime predicting AI a reality, breathing life into a film producer’s blockbuster idea. This technology will likely be a huge hit for law enforcement authorities (with spill-over applications in the commercial context), but will it be a hit among private citizens and corporations? The intent of this series is to objectively examine the benefits and risks of behavioral predictive technologies applied in the criminal justice context and is not meant to be critical of any national governments as this Site, like any emerging technology, is apolitical.
Part 1 of this series explores the world of AI within China and its plans to develop what it hopes to be a crystal-ball for predicting one’s criminal future (if any). Then we will look at the general machine learning technical principles underlying such technology and their major pitfalls.
Part 2 will look at the legal, privacy, ethical issues raised by a crime predicting AI system analyzing our future actions with general assessments on the future of “behavioral predictive technologies".
So strap on, this will be a fun one!
Gathering AI Storm
In July 2017, China announced its intention of becoming the world leader in AI by 2030. It put its money where its mouth is by earmarking over US$150 billion for its domestic AI industry. This announcement was made by the State Council, so its as serious of an official announcement as it gets in China. Private industry players like Baidu have been leading advances in AI through their own investments and joint ventures. If any country can make behavioral predictive technologies a reality it would be China because of its abundant financial resources, industry influence, domestic hi-tech infrastructure & landscape, political will and centralized manner of getting things done efficiently.
Equally ambitious is China’s plan to develop crime predicting AI systems that can predict crimes before they happen. Li Meng, vice-minister of science and technology said crime prediction would become an important use for AI technology. Domestic companies are helping to develop crime predicting AI capability through facial recognition, and the monitoring of people’s movements (whether they visit a knife shop) and behaviors (whether they changed clothing several times a day). "Crowd analysis" will identify abnormal patterns of behaviours in crowds. If a person’s actions are suspicious (like someone changing clothing after visiting a knife shop on his way to a crowded bus stop), then the matter is reported to the police to intervene. The intent is to use technology to assist crime-prevention, which sounds good, in theory.
Prediction via Machine Learning
The key technology underlying crime prediction (or any application whose purpose is to predict behavior) is machine learning which is not really new, having been in existence since 1959. As a subset falling within the bigger field of AI, the true potential for machine learning can be better harnessed using advances in 21st century computing power. Machine learning works by developing algorithms that draw inferences and make predictions based on data. Machine learning is automated and can improve itself by learning over time. There are three key components to machine learning:
- model; and
Data is fed into the machine network which then calculates values for various parameters so as to ensure that the resulting model can make accurate predictions based on the underlying data. There are two types of machines learning: supervised and unsupervised learning. Only the latter can identify general structures or patterns from unlabeled data examples. As such, unsupervised machine learning can be used to cluster data to establish a baseline for “normal” behavior so it can later identify instances falling outside of such norm. For example, unsupervised machine learning may be used to help assess whether someone is visiting a particular location (like a knife shop or a large transport hub like a bus station) more frequently than usual. Because of its ability to recognize larger pattern sets, unsupervised learning would be the more relevant technology to be incorporated into a crime predicting AI machine learning environment.
Pitfalls of Machine Learning
Is machine learning perfect? Well, no. According to Dr. Zulfikar Ramzan, Chief Technology Officer of RSA (speaking before a packed room of audience at the recent RSA Cybersecurity Conference in Singapore on July 27, 2017), machine learning suffers from serious pitfalls that have not been resolved by current technology.
First, Dr. Ramzan cautioned that machine learning works on a “garbage-in-garbage-out” principle. Meaning that bad or irrelevant data will undercut the accuracy of any predictions. Technicians need to be able to ask the right questions of the data being used, otherwise the system will be unable to predict the right answers. In the real world context, an authority implementing behavioral predictive technologies would need to feed into their machine learning system a significant amount of data about its citizens that goes beyond simply just their date of birth, tax ID, home address or mobile numbers. Terabytes of personally identifiable data like video images (captured by real-time city-wide surveillance cameras), geolocations, and online traffic logs will need to be fed into machine learning networks. Not only that, these data must also be continuously updated. Such tasks have been made possible by recent advances in computational power and connectivity speed.
Danger of False Positives
The second major pitfall of machine learning is the risk of “false positive” class imbalance. The goal of any behavioral predictive technology used in the criminal context is to consistently and correctly identify actual instances of criminal behaviour (or true positive rate “TPR”) and avoid incorrectly identifying innocent behaviours as criminal (or false positive rate “FPR”). Programming behavioral predictive technology to aggressively label an activity criminal will result in a high TPR but also a high FPR. The real world implication of a high FPR is that innocent people may be falsely arrested and thrown in jail. Adjusting the networks to be conservative in labelling an activity criminal will result in a low FPR but also a low TPR. Whats the point of pouring billions into this project if the networks can only produce a low TPR. In statistics, the receiver operating characteristic curve illustrates such tradeoff.
For example, if a behavioral predictive technology system is designed with a false positive rate of %0.1 and a true positive rate of 100%, then these rates seem superb. Assume that for every 10,000 instances of innocent behaviors, one has criminal intent. Based on these numbers, the behavioral predictive technology system would be generating 11 false positive alarms (10,000 * 0.001 + 1) of which only one would be an accurate prediction of criminal intent. This has the real world effect of causing the police to go on “wild goose” chases over 90% of the time (since 10 out of 11 of the alarm generated are false positives)! The people wrongly arrested (based on faulty predicting) would likewise suffer unwarranted humiliation and abrogation of their respective civil rights to say the least.
Establishing metrics challenging
Efforts to establish metrics for the predictive behavioral technology to evaluate the criminality of someone’s future behavior will be challenging for four reasons.
First, ultra-baseline acts do not necessarily equal criminal acts. Remember unsupervised machine learning clusters data to establish a baseline for “normal” behavior so it can later identify instances falling beyond such norm. The system will flag someone “Mr. A” for going into the corner knife shop too frequently relative to the average baseline visits. However, “Mr. A” may have a legitimate reason for his ultra-baseline behavior: he may have a fetish for collecting knives, a romantic crush on one of the shop-ladies, or may be doing his due diligence before buying the entire shop business from the current owner. Of course, Mr. A could be brought to the police station and be interrogated to explain himself, but there is the risk that Mr. A may be coerced into making a confession or saying something stupid to incriminate himself especially if the local jurisdiction does not afford a right to counsel before police questioning. What turned about to be a false positive in the data room may very well cause an innocent man to go to jail in real life.
Second, as the musical group the Doors observed, “people are strange when your’re a stranger”. When you are an AI trying to figure out the actions of people, then human behavior can be even more strange. Humans do the most unexplainable things in life and predicting whether these unexplainable things rise to the level of criminal acts will be difficult.
Third, people often change their minds and do something else at the last minute. Just because an algorithmic formula predicts an action, this does not necessarily mean that the predicted action will transpire, because the actor may decide not to carry through with the predicted behavior. In other words, the predicted action does not equate to the intent behind such actions.
The last factor is more related to the ability of the current state of technology to provide behavioral predictive systems with accurate data in order to even determine what is “average” or baseline behavior. For example, many users of ride-sharing apps get frustrated when their ride do not show up at their selected locations due to inaccurate geolocation data. Similar technical glitches arises when live video camera footages of suspected future criminals are too unclear, fuzzy or vague. If behavioral predictive systems are provided with faulty real world data, then its predictions will be faulty.
These pitfalls need to be worked out in the future before behavioral predictive technologies may be reliably and responsibly implemented.
Stay tuned for Part Two of this series when we will examine the legal, privacy, ethical issues raised by behavioral predictive technologies.