UW security researchers demonstrate that Google’s AI instrument for movie searching can be lightly deceived
The UW electrical engineering research team includes Baicen Xiao, Radha Poovendran and Hossein Hosseini.
University of Washington researchers have shown that Google’s fresh contraption that uses machine learning to automatically analyze and label movie content can be deceived by inserting a photograph periodically and at a very low rate into movies. After they inserted an photo of a car into a movie about animals, for example, the system returned results suggesting the movie was about an Audi.
Google recently released its Cloud Movie Intelligence API to help developers build applications that can automatically recognize objects and search for content within movies. Automated movie annotation would be a breakthrough technology, helping law enforcement efficiently search surveillance movies, sports fans instantly find the moment a objective was scored or movie hosting sites weed out inappropriate content.
Google launched a demonstration website that permits anyone to select a movie for annotation. The API quickly identifies the key objects within the movie, detects scene switches and provides shot labels of the movie events over time. The API website says the system can be used to “separate signal from noise, by retrieving relevant information at the movie, shot or per frame” level.
In a fresh research paper, the UW electrical engineers and security researchers, including doctoral students Hossein Hosseini and Baicen Xiao and professor Radha Poovendran, demonstrated that the API can be deceived by slightly manipulating the movies. They demonstrated one can subtly modify the movie by inserting an photo into it, so that the system comes back only the labels related to the inserted photo.
The same research team recently showcased that Google’s machine-learning-based platform designed to identify and weed out comments from internet trolls can be lightly deceived by typos, misspelling offensive words or adding unnecessary punctuation.
“Machine learning systems are generally designed to yield the best spectacle in benign settings. But in real-world applications, these systems are susceptible to intelligent subversion or attacks,” said senior author Radha Poovendran, chair of the UW electrical engineering department and director of the Network Security Lab. “Designing systems that are sturdy and resilient to adversaries is critical as we budge forward in adopting the AI products in everyday applications.”
As an example, a screenshot of the API’s output is shown below for a sample movie named “animals.mp4,” which is provided by the API website. Google’s implement does indeed accurately identify the movie labels.
The researchers then inserted the following photo of an Audi car into the movie once every two seconds. The modification is hardly visible, since the photo is added once every fifty movie frames, for a framework rate of 25.
The following figure shows a screenshot of the API’s output for the manipulated movie. As seen below, the Google device believes with high confidence that the manipulated movie is all about the car.
“Such vulnerability of the movie annotation system gravely undermines its usability in real-world applications,” said lead author and UW electrical engineering doctoral student Hossein Hosseini. “It’s significant to design the system such that it works identically well in adversarial scripts.”
“Our Network Security Lab research typically works on the foundations and science of cybersecurity,” said Poovendran, the lead principal investigator of a recently awarded MURI grant, where adversarial machine learning is a significant component. “But our concentrate also includes developing sturdy and resilient systems for machine learning and reasoning systems that need to operate in adversarial environments for a broad range of applications.”
The research is funded by the National Science Foundation, Office of Naval Research and Army Research Office.