Machine Learning for Emergency Response
What happens with 911 is overwhelmed during a natural disaster? Funded by an NSF grant, our team of researchers built a system to recognize those in need of rescuing based on private social media images and machine learning image classification models.
During Hurricane Harvey, 911 hotlines were overwhelmed — there is a clear need to innovate on current emergency response systems.
We created an image classifier, trained on signal data, that categorizes private and public social media imagery between three categories: rescuer, volunteer rescuer, & rescuee.
I managed the processes between our fieldworkers and computational team. I was one of three model trainers, focusing on data processing, evaluation, and reporting.
Crisis Information Literature \\ Hurricane Harvey Metrics \\ Customer Journey Map
Pitch: The 911 hotline can be supported through alternative emergency response systems, such as a social media image classifier that utilizes machine learning to predict if a user needs help, is an official rescuer, or a volunteer rescuer.
Crisis information literature
Through a formal academic literature review, included in our IEEE publication (coming in December), we assessed the state of current crisis informatics as it relates to emergency response.
A large area of crisis informatics methodology focuses on collecting vast amounts of data from public social media APIs (particularly Twitter), with inclusion criteria based on a combination of keywords, date ranges, and other attributes.
Though current state-of-the-art methods are able to classify the relevancy of content to a disaster, these rates have much room for improvement.
There is a dearth of work that has involved actually deploying teams in the field during a disaster to assess how social media is used, in addition to collecting data more indicative of a disaster experience that are circulated on private social media networks, particularly Facebook and Nextdoor.
Few crisis informatics studies have trained on highly curated non- public social media data gathered from the field. The graphic below summarizes our literature review findings. and illustrates why “signal” social media data (private messages) is the focus of our project.
Customer journey map
I later was able to use interview information to create a user journey map and competitive analysis for the purposes of explaining our research to a broader audience.
The current journey for a person to become aware they need help to when they actually receive help during a hurricane is an emotional ride, made worse by 911 hotline failures.
Residents turn to social media and private texting, sharing multi-media about their most desperate needs.
Data Collection \\ Data Summary \\ Data Processing \\ Pre-test \\ Training \\ Evaluation
Data Collection: Post Harvey Fieldwork
Our classifier relies on signal data, gathered in Houston through field workers post Hurricane Harvey. This is a different approach to other projects, which typically scrape data from the Twitter 1% spritzer.
Trained field workers visited Houston in the months following Hurricane Harvey. Their goal was to conduct interviews and gather training data for our model.
During each of these site visits, a member of the research team interviewed individuals following an approved Institutional Review Board (IRB) protocol, which involved the method of photo elicitation interview (PEI), in which respondents were asked to contribute their social media activity to the research team.
These included photos and videos taken during Hurricane Harvey as well as posted and received textual content as part of their rescue experience or by those actually conducting rescues.
When consented by the respondent, comments were also captured in screenshots and shared with the research team. As interviews took place between 15 minutes to more than an hour, multiple opportunities were available to collect these types of ”private” data which would be inaccessible to those acquiring data from public APIs.
Data collected in the form of screenshots were deposited in a central secure repository. Fieldworkers consisted of a team of trained graduate students and faculty at a public university, from multiple disciplines.
The Training DAta
Field workers gathered multiple forms of media. Below is a summary of the types of media gathered from participants who were in one of three categories: rescuees, official rescuers, and volunteers.
After several pretests, we chose to focus our classifier on images (rather than images and text). We fed our images through Google Vision, which proved to be more reliable than having humans tag attributes in the images.
The first order of business was to redact personal information in text and images such as names and profile pictures. We then cleaned up the image dataset by cropping images to the relevant content (i.e. cropping out the cell phone).
Images were processed by Google Vision through a built streamlined process for attribute detection. Google Vision identified attributes in each of the images and returned a structured JSON list (e.g. water, flood, and boat). We then chose to focus the first machine learning models only on image data.
Pre-test: Human or machine labels?
Can a machine identify features in an image as well as a human? We didn't know, so we decided to test this. In addition to using Google Visions machine coding to gather attributes, we tested whether humans would better label the images with attributes.
Similar to the functions of Google Vision API, the human codebook allowed coders to record manifest attributes found in the media (i.e. car, house, and water), in addition to latent attributes (i.e. phenomenon, disaster), without any restrictions such as a predefined dictionary.
We found that the human coders provided fewer attributes, misinterpreted some attributes, and were potentially biased because they were aware they were coding images related to Hurricane Harvey. Based on results of a pre-test comparing the models accuracy when trained on machine versus human attributes, we chose to rely on Google Vision attributes.
Pre-test: Human or machine labels?
Using the images that we gathered and processed, we first created a model that can differentiate between a signal versus noise image. We have preliminary findings into differentiating between our three categories of images: rescuee, official rescuer, and volunteer rescuer.
The next phase of the project involved the development of a classifier, whose aim was to classify content at scale from noisy data that was relevant to Hurricane Harvey. In other words, our methodology goes from signal-to-noise, rather than from noise-to-signal. By starting out with a high quality, fieldwork-elicited training data set, our hope was to develop a classifier with very high accuracy.
This model differentiates between relevant signal data and spurious noise data. The performance of the SNR classifier is assessed based on stacked accuracy and 8 base classifiers and on an F1 score. An F1 score measures the performance of a classifier, taking into account the model’s accuracy in classifying both positive cases and negative cases.
The classification accuracy achieved is high for all signal-to-noise data sets, though notably stacked accuracy falls as SNR approaches a 1:1 ratio (shown above). To back the high accuracy achieved in initial testing, we performed a second experiment to visually represent our data. The eight features gathered from the base classifiers were projected onto a 2-dimensional scatter plot using singular value decomposition. The resulting graph showed a clear linear separation between signal and noise data points, reinforcing the high accuracy of an SVM classification model.
Preliminary results of an additional model trained to classify signal data between image types indicate a 99% stacked accuracy for a threshold between 0.00 and 0.005. This model differentiates between images representing rescuees people who were in need of rescue at the time of Hurricane Harvey and rescuers people who took part in rescue efforts at the time of Hurricane Harvey.
There is still room for improvement in this classifier. By introducing different types of data and by testing it on noise data from other contexts, accuracy can be improved.
This is phase one of a crisis communication machine learning project. In ongoing research, we are integrating text data into the training model, with the goal of increasing context of the natural disaster data. This will improve the models classification ability when faced with confounding social media imagery (i.e. lakes, rivers, and weather reports). We plan to conduct further noisy data tests using confounding social media imagery pulled from Hurricane Harvey-related tweets.
This is an ongoing project! There are many improvements to be made on the classifier and there are also many ways this can be applied on a national level.
The model developed in our lab has potential application as an alternative emergency response system, highlighting groups or individuals who are potential rescuers or who are in need of rescuing. There are many useful applications for integrating this project into existing social media platforms. For example, in times of disaster, bots could continuously surf these social media sites and scrape data to be passed to the model. Once passed to the model, content from social media sites can be flagged as either signal or noise, and if signal, rescuer or rescuee. This model would perform best as a feature on a social media platform such as Facebook. We would recommend imbedding the tool in messaging or private groups to use signal data.
To date, we have presented our research to Facebook, and there is potential to apply this classifier to Facebook groups or messengers, with the permission of the users.
Future work will concern the integration of this classfiier with automated data collection directly from private and public social media streams.
This research is being presented at IEEE ICMLA in December and the published paper will be made available here soon.