Towards Privacy-Preserving Data Cleaning
Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Given the proliferation of sensitive and confidential information, it is crucial to consider data privacy concerns during the data cleaning process. For example, in medical database applications, varying levels of privacy are enforced across the attribute values. Attributes such as a patient's country or city of residence are less sensitive than the patient's prescribed medication. Traditional data cleaning techniques assume the data is openly accessible, without considering the differing levels of information sensitivity. In this talk, I will present our framework based on k-anonymity that allows for two parties to exchange information without violating k-anonymity. The goal is to maximize data utility and consistency while minimizing the information disclosure of sensitive values.
--
Fei Chiang is an Assistant Professor in the Department of Computing and Software, and has over 15 years experience in data management spanning academic and industry roles, including serving as Associate Director of McMaster's MacData Institute. She leads the Data Science Research Group, focused on developing tools to facilitate data cleaning, improved data quality and fostering knowledge discovery. She is a Faculty Fellow at the IBM Centre for Advanced Studies, where she is the PI to develop data quality metrics for IBM Watson Analytics. Her recent work has been featured in McMaster Research News, and the SOSCIP 2017 Impact Report. She received her M. Math from the University of Waterloo, and B.Sc and PhD degrees from the University of Toronto, all in Computer Science.