Jalal Mahmud, Manager of Speech and NLP Products, Master Inventor, IBM Almaden Research Center.
Title: Data Wrangling for Natural Language Processing
Abstract: Data wrangling is a crucial step for any NLP application. Engineers spend enormous amounts of time processing data before they can start building the machine learning models that form the backbone of most NLP projects. It is important to be able to deal with messy data, whether that means missing values, inconsistent formatting, malformed records, or nonsensical outliers. This tutorial is focused on demonstrating the state of the art and industry-standard techniques of data wrangling for NLP applications.
Bio: Jalal Mahmud is a research scientist, master inventor and tech lead at IBM Almaden research center. Currently, he is managing teams developing NLP and Speech technologies for IBM Watson. Previously, he drove the development of IBM Watson Sentiment across multiple languages, accelerated innovations to several natural language understanding products including Sentiment, Entity, Keyword and Categories. Before that, he served as a technical lead for several Watson products such as personality insights, tone analyzer and emotion modeling. He received PhD in Computer Science from Stony Brook University in 2008. Dr. Mahmud published 80+ papers in top-rated conferences and journals, received several best paper nominations and regularly serve on technical program committees/organizing committees for major international conferences. He is a prolific inventor with 51 issued patents. Dr. Mahmud is a senior member of ACM and an adjunct faculty at University of California, Santa Cruz.