RNNs

A knowledge graph for RNN

Posted by Wanxin on Saturday, February 13, 2021

RNN and variants

Some popular RNN archtectures and techniques to try out.

Vanishing/Exploding Gradient

Vanishing Gradient is a common problem for all neural architectures, especially the deep ones. RNNs are particularly suffered from this issue because it multiplies the same weigh over each step. Gradient can be viewed as a measure of the effect of the past on the future. If the gradient becomes vanishingly small over longer distances, then we will not be able to catch the long dependencies. Exploding gradient is the similar but on the opposite side. It will cause more issues on the math calculation side.

To fix these two issues, we need to help RNN to preserve information over long timesteps. Some common techniques that helps to reduce the vanishing/exploding gradient issue: gradient clipping, skip connections.

In addition, there are also popular neural architectures with good solutions on these issues. The most widely-used two solutiosn are LSTM and GRU.

LSTM

LSTM is one of the most popular ones. It was proposed by Hochreiter and Schmidhuber in 1997. LSTM can be taken as a good default choice when your data has long dependencies.

To Read more? Check out the details here.

GRU

GRU is another solution or a simpler alternative to LSTM. It was proposed by Cho et al. in 2014. GRU has fewer parameters so it takes less time for computation.

GRU can be a second choice if LSTM is not efficient enough for you.

Bidirectional RNNS

  • RNN
    • LSTM [1997]
      • for vanishing gradients
      • a good default choice
    • GRU [2014]
      • for vanishing gradients
      • quicker to compute and more efficient
    • Gradient Clipping
      • for exploding gradients
    • Bidirectional RNNs
      • when the full context avaliable
      • e.g. sentimnet analysis
    • Multi-layer RNN/stacked RNN
      • For high performance
      • encoder RNN: 2~4 layers
      • decoder RNN: 4 layers
      • deeper RNN >=8 layers
        • skip connection / dense-connection needed