Big data's dirty secret
"Let the data speak for themselves."
"We apply machine learning to the problem of..."
These are two commonly heard phrases these days. But what data exactly are we speaking about, and what do we intend to do with it? What is ignored all too often is the quality of the data being used and how it impacts the analyses being done. Are there holes in the data? Are there anomalies? Given how dirty data can be, a more apt phrase might be "Garbage in, garbage out". In this talk we will discuss some of the data problems we've encountered in financial data, and approaches that can be used to address them. Our particular focus will be on techniques we've employed to deal with missing data and bad data in credit default swap (CDS) spread histories.
Bio: Dr. Harvey J. Stein is Head of the Quantitative Risk Analytics Group at Bloomberg, responsible for all quantitative aspects of Bloomberg's risk analysis products. Dr. Stein is well known in the industry, having published and lectured on mortgage backed security valuation, CVA calculations, interest rate and FX modeling, credit exposure calculations, financial regulation, and other subjects. Dr. Stein is also on the board of directors of the IAQF, an adjunct professor at Columbia University, a board member of the Rutgers University Mathematical Finance program and of the NYU Enterprise Learning program, and organizer of the IAQF/Thalesians financial seminar series. He received his BA in mathematics from WPI in 1982 and his PhD in mathematics from UC Berkeley in 1991