What is speaker segmentation in speech recognition ? How do you use it ?

Speaker diarization or speaker segmentation is the process of automatically assigning a speaker identity to each segment of the audio file. Segmenting by speaker is very useful in several applications  to understand who said what in a conversation. Typically speaker information is crucial for applications such as emotion detection, behavioural analysis or topic analysis of…

What is a language model ? How do you create one ? Why do you need one ?

A language model is a probability distribution over sequences of words P(w_1,… ,w_m). It enables us to measure the relative likelihood of different phrases. Measuring the likelihood of a sequence of words is useful  in many NLP tasks such as speech recognition, machine translation, POS tagging, parsing, and so on. Example :  In any generative…

What are some knowledge graphs you know. What is different between these ?

DBPedia : Entities and relationships are automatically extracted from wikipedia. Wordnet: Lexical database of english language. Groups english words as synsets and provides various relationships between words in a synset. It is a knowledge base that tracks specific kinds of relationships like synonym, antonym, hyponymy and so on.  http://wordnetcode.princeton.edu/5papers.pdf Yago : Also extracts knowledge from…

What is the state of the art technique for Machine Translation ?

Rule based machine translation (Older techniques) : Uses dictionary between words of the two languages along with syntactic, semantic morphological analysis of the source sentence to define  context. Linguistic Rules are defined to translate a specific word in a given context into target language. https://en.wikipedia.org/wiki/Rule-based_machine_translation Advantages of this approach : No requirement of parallel corpora…

How do you design a system that reads a natural language question and retrieves the closest FAQ answer?

There are multiple approaches for FAQ based question answering Keyword based search (Information retrieval approach): Tag each question with keywords. Extract keywords from query and retrieve all relevant questions answers. Easy to scale with appropriate indexes reverse indexing. Lexical matching approach : word level overlap between query and question. These approaches might be harder to…

How do you deal with dataset imbalance in a problem like spam filtering ?

Class imbalance is a very common problem when applying ML algorithms. Spam filtering is one such application where class imbalance is apparent. There are many more non-spam emails in a typical inbox than spam emails. The following approaches can be used to address the class imbalance problem. Designing an Assymetric cost function where the cost…