Masked Contexts

About

Masked Contexts is an exploration into the COCO dataset and a dialogue with the photographers whose images were scraped by the dataset’s authors without their knowledge or consent. Email correspondence paired with the photographers' original images, the segmented masks and captions that they have been transformed into, and the photographers' possible reappropriations of their photos, all come together to present a multi-layered glimpse into the process of converting personal images into universal pieces of big data.

Originally published by Microsoft, the COCO (common objects in context) dataset contains 330,000 photos scraped from Flickr, many of which are intimate family photos uploaded by amateur photographers for personal use. Image datasets like COCO are often used to train surveillance technologies, amongst other types of computer vision programs. In 2019, I began contacting Flickr users to inform them that their images were scraped for this purpose, and documentation of those conversations are displayed on this site alongside the photographers' images. Parts of these conversations have been lost as a result of Flickr deleting some of my accounts.

Masked Contexts seeks to present a multi-layered glimpse into the process of converting personal images into universal pieces of big data. This is a project by Noah Edelstein.

Process

While the COCO dataset provides metadata that contains information about each image, no credit or reference is made to the original uploader.

The dataset was reverse engineered through a custom script to provide additional metadata, including the Flickr username associated with each image. 365 of these Flickr users were contacted and informed that their photos were part of the dataset. They were additionally asked to provide the original context of their photos, as well as their feelings regarding their images being used in a dataset like COCO.

The COCO metadata also provides annotations for each image in two forms, masks and captions. Both of these methods of annotation were made by hired Amazon Mechanical Turks. The masks outline particular object types found in each image, while the captions are meant to objectively describe each image. These captions are used for training computer vision programs so they can learn to analyze and describe visual inputs. There are five captions provided for each image. Masked Contexts showcases these annotations juxtaposed with the image authors’ response. The masks are additionally organized as a grid of meaningless visuals. When hovered over, their corresponding captions are read aloud.