The Mathematics of Privacy and Synthetic Data

Speaker:

Thomas Strohmer, University of California, Davis

Date and Time:

Thursday, May 19, 2022 - 11:20am to 12:10pm

Location:

online

Abstract:

'Sharing is Caring', we are taught. However, in the Age of Surveillance Capitalism we better think twice what we share. As data sharing is increasingly locking horns with data-privacy concerns, synthetic data are gaining traction as a potential solution to the aporetic conflict between privacy and utility. The goal of synthetic data is to preserve meaningful statistical information about the dataset, but without risk of exposing private information. Synthetic data are expected to have great potential in areas such as health care, where patient data are protected by privacy laws. But can we even construct synthetic data that are simultaneously private and accurate? And what do privacy and accuracy actually mean in this context? Trying to answer these questions leads to deep mathematical challenges, as the road to privacy is paved with NP-hard problems! I will introduce various mathematical concepts of privacy and utility and discuss associated privacy-utility tradeoffs. I will then present some of our recent breakthroughs in the NP-hard challenge of the computationally efficient creation of synthetic data that come with provable privacy and utility guarantees. I will describe applications and open problems. This is joint work with March Boedihardjo and Roman Vershynin.

The Fields Institute for
Research in Mathematical Sciences

The Mathematics of Privacy and Synthetic Data

Scheduled as part of

People and Contacts

Calendar and Events