Building Domain Specific Language Models

N-gram model, RNN, LSTM, AllenNLP

In this liveProject, you’ll step into the role of a natural language processing data scientist working for Stack Exchange. Stack Exchange runs a network of question-and-answer sites on diverse topics ranging from programming to cooking. Your boss wants you to create language models that are tuned to the particular vocabulary of different Stack Exchange sites. Language is domain specific, for example an insurance company’s documents will use very different terminology than a post on a social media site. Because of this, off-the-shelf NLP models trained on generic text can be inaccurate for specialized domains. Your goal is to build a language model capable of query completion, text generation, and sentence selection for the domain-specific language of the Cross Validated statistics and machine learning site. Challenges you will tackle include preparing your datasets, building and evaluating n-gram word-based language models, and building a character-based language model with AllenNLP.

