ABOUT mPROV

Thanks to rapid growth in sensors (embedded in phones, wearables, vehicles, and environments), the quantity and richness of health-related sensor data is exploding. By continuously capturing human health, behavior, and environmental risk factors at fine granularity, such data hold tremendous potential to advance science, and to directly impact health, wellness, mobility/transportation, and energy. Health is a focal application area for sensors due to the confluence of rapidly dropping gene sequencing costs and the wide availability of health records that enable a “precision medicine” approach to discovery and care.

The need for computational models of human health and behavior, combined with the uncertainty and variable quality of sensor data collected in the mobile setting, motivates our proposed work to enable its study within the computing/CISE research community, which has been slow in making progress on these research challenges. Major hurdles include lack of access to high-quality mobile sensor data, regulatory obligations in accessing and using mobile sensor data collected from humans, and a lack of metadata capture and access services for the provenance, quality, and integrity of the data and inferences made from it. Doing research with mobile sensor data requires investigators to acquire sensors, design user studies, obtain IRB approval, recruit human subjects, conduct the studies, collect data, annotate and curate the data, develop computational models to process the data, and to evaluate the models.

To overcome the hurdles in using mobile sensor data collected in the context of health and wellness, mProv will develop a data cyberinfrastructure that addresses the unique aspects of mobile sensor data to facilitate their use and analysis by researchers in computing, engineering and other disciplines. This includes developing techniques for integrating metadata and data capture over mobile streaming data, and for propagating such data in order to enable reasoning about uncertainty and variability; runtime infrastructure and APIs for efficient sensor data acquisition and reply (integrated with human data capture); and mechanisms for managing privacy policies. To seed the entire effort with real data, we will utilize data collected in ongoing user studies in the MD2K Center of Excellence and collaborate with Open Humans project to recruit individuals who are willing to donate their sensor data for research use without restriction. Finally, metadata for mobile sensors that collect data in users’ natural environments may itself be privacy sensitive. We will investigate privacy mechanisms to ensure privacy of data contributors, while facilitating research with their data and associated metadata.