Big Data Information Extraction: Extracting Product Information from Massive Free-Form Conversations
Information extraction has undergone big data transitions with unique challenges (i.e. massiveness and noisiness) and advantages (i.e. corpus statistics and data redundancy). Identifying names of products in text is important in the business setting for understanding marketing need and revenue growth. We identify the names of products in massive free-form conversation texts and classify/type them into a product catalogue. We implement three approaches to the problem: supervised (with crowd annotated training data), semi-supervised (using patterns), and unsupervised (with corpus statistics). First, our existing supervised models shows that deep learning model outperforms sequential graphical models in two specific domain can achieve top results. We also present preliminary finding of our semi-supervised and unsupervised methods. While supervised methods can achieve top results, we need a method is that is repeatable with little human effort and insight in another domain. We present a preliminary unsupervised method that leverages vastness and redundancy in big data.
En-Shiun Annie Lee is the Lead Research Scientist at VerticalScope Inc., one of the most highly visited networks of online forums consisting of approximately 1500 sites representing social communities. Dr. Lee holds a PhD from the Centre of Pattern Analysis and Machine Intelligence at the University of Waterloo with over 12 years of experience in the area of the pattern recognition and data mining (academic and industry combined). Her passion on finding patterns in society and in nature in the big data era has lead to dozens of publications in computational advertising, sentiment analysis, and sequence analysis. More notably, Dr. Lee developed unsupervised algorithms using patterns using clustering and partitioning of raw data and a priori knowledge. Dr. Lee joined the Data Science team at VerticalScope as a Research Scientist in 2015, where she is currently leading the in-house academic research and novel algorithm implementations. During her time at VerticalScope, Dr. Lee propelled the project on sentiments analysis towards product features, which generates key data reporting insights for brand health and site health.