NLP How To List - Part I - Language Model

A knowledge graph for common NLP tasks

Posted by Wanxin on Sunday, February 14, 2021

NLP Summarized — Part I: Language Model

A knowledge graph of common NLP tasks and solutions

Over the past couple of years, NLP field has developed quickly. There are now very standard NLP tasks and go to solutions. I have summarized them summarized in the knowledge grap below by timeline (I LOVE knowedge graphs). It’s always exciting to see the whole journey and evolution of an industry. Let’s Go!

Language model

Language modeling is the task of predicting what word comes next. More formally, given a sequences of words, we want to compute the probability distribution of the next word. It measures how valid is sequence comparing to the training text corpora. Such a system is called a Language Model.

1. Pre-deep learning

  • 1.1 statistical modeling

The very first model is n-grams based statistical modeling + Baysian rules. It calculates the co-occurence counts in training and derives the probability of the co-occurence in the testing sentence. This has shown great results in the early age.

However, it has its limitation. It introduce a high sparsity and requires a lot of storage.

2. Deep Learning Area

  • 2.1 Fixed Window Neural Network Model

We started to use word embedding but we still model with a fix window (given n words and predict the next). The limitation is still on the “n”.

  • 2.2 RNN

Introducing an RNN provides us the flexibility of modeling sentences of different length.

There are also standard RNN architectures to use on different tasks. Please check out the blog here for more details.

3. Evaluation

  • 3.1 Perplexity

The standard evaluation metric for Language models is perplexity. Perplexity is actually the exponential of the cross-entropy loss. The lower the perplexity/loss, the better the model is.

  • 3.2 A subcomponet of many other NLP tasks

Other NLP tasks like sentiment analysis or machine translation all depends on the model has a good understanding on language. Therefore, language modeling a subcomponenet of other NLP tasks, especially those involving generating text or estimating the probability of text. Therefore, language models evaluation can also leverage the downstream tasks like sentiment analysis etc..

  • NLP Tasks
    • Word Representation
      • one-hot vector
      • dictionary representation (WordNet)
      • one-hot vector + SVD based dimention deduction
        • D-W Matrix (primarily used for words similarity)
          • LSA/HAL/COALS etc.
        • co-corrence Matrix
      • context based distributional semantics: word embedding/vector representation
        • word2vect: predicting context (2013)
          • 2 frameworks:
            • skip-gram: center words –> outside words
            • CBOW : outside words –> center words
          • training variants:
            • negative sampling (for efficiency)
            • softmax normalization
            • gradient descent
          • Glove: combining the two methods: count based & preidction loss iteration
          • Evaluation of word vectors
            • intrinsic
            • extrinsic
    • Dependence Parsing
      • Two Linguistic structure for Dependency Parsing
        • Structure 1: constituency: context-free (CFGs)
          • phrase structure: words –> nested constituents
        • Structure 2: dependency structure
          • head –> dependent: which words depend on which other words
      • Conventional feature representation
      • Neural Dependency Parser [2014]
        • concat (words + pos + dep) – NN – get vec representation
      • graph-based dependency parser with NN
    • Language Model
      • pre- deep learning
        • n-grams statistical modeling + Baysian rules (occurrence counts based)
        • SLM (statistical LM for Information Retrieval)
      • Neural Network based
        • Fixed Window-based
        • RNN model
          • conditional models: for context domain only
          • RNN applications
            • part-of-speech tagging
            • sentence classification
            • speech recognition
      • Evaluation
        • perplexity
        • a subcomponet of many NLP tasks