How do you find the most probable sequence of POS tags from a sequence of text?

  1. This problem can be solved with a HMM.
  2. Using a HMM involves finding the transition probabilities (what is the probability of going from one POS tag to another and emission/output probabilities (what is the probability of observing a word given a POS tag) as explained in the question How do you train an hMM.
  3. Once hMM is trained using a large enough corpus, then use the Viterbi algorithm to find the most probable sequence of tags.

Interview Tip: Note that one might be in hurry to answer Viterbi algorithm. It is true that Viterbi algorithm is used, but we don’t start with Viterbi rather first train a HMM model and then apply the Viterbi algorithm with the learnt transition and emission matrices to answer the question of finding the POS tags.

Leave a Reply

Your email address will not be published. Required fields are marked *