Machine Learning – Page 13 – Machine Learning Interviews

What are some knowledge graphs you know. What is different between these ?

Posted on February 14, 2019February 14, 2019 by MLNerds

DBPedia : Entities and relationships are automatically extracted from wikipedia. Wordnet: Lexical database of english language. Groups english words as synsets and provides various relationships between words in a synset. It is a knowledge base that tracks specific kinds of relationships like synonym, antonym, hyponymy and so on. http://wordnetcode.princeton.edu/5papers.pdf Yago : Also extracts knowledge from…

How do you design a system that reads a natural language question and retrieves the closest FAQ answer?

Posted on February 14, 2019February 21, 2019 by MLNerds

There are multiple approaches for FAQ based question answering Keyword based search (Information retrieval approach): Tag each question with keywords. Extract keywords from query and retrieve all relevant questions answers. Easy to scale with appropriate indexes reverse indexing. Lexical matching approach : word level overlap between query and question. These approaches might be harder to…

How do you deal with dataset imbalance in a problem like spam filtering ?

Posted on February 14, 2019April 4, 2019 by MLNerds

Class imbalance is a very common problem when applying ML algorithms. Spam filtering is one such application where class imbalance is apparent. There are many more non-spam emails in a typical inbox than spam emails. The following approaches can be used to address the class imbalance problem. Designing an Assymetric cost function where the cost…

You have come up with a Spam classifier. How do you measure accuracy ?

Posted on February 14, 2019 by MLNerds

Spam filtering is a classification problem. In a classification problem, the following are the common metrics used to measure efficacy : True positives : Those data points where the outcome is spam and the document is actually spam. True Negatives: Those data points where the outcome is not spam and the document is actually not…

What are the different ways of representing documents ?

Posted on February 14, 2019 by MLNerds

Bag of words: Commonly called BOW involves creating a vocabulary of words and representing the document as a count vector, dimension equivalent to the vocabulary size – each dimension representing the number of times a specific word occured in the document. Sometimes, TF-IDF is used to reduce the dimensionality of the number of dimensions/features by…

What are popular ways of dimensionality reduction in NLP tasks ? Do you think this is even important ?

Posted on February 14, 2019February 14, 2019 by MLNerds

Common representation is bag of words that is very high dimensional given high vocab size. Commonly used ways for dimensionality reduction in NLP : TF-IDF : Term frequency, inverse document frequency (link to relevant article) Word2Vec / Glove : These are very popular recently. They are obtained by leveraging word co-occurrence, through an encoder –…

You are building a natural language search box for a website. How do you accommodate spelling errors?

Posted on February 14, 2019February 14, 2019 by MLNerds

If you have a dictionary of words, edit distance is the simplest way of incorporating this. However, sometimes corrections based on context make sense. For instance, suppose I type “bed color shoes” – these are perfect dictionary words, but a sensible model would come up with “red color shoes”. Using the language model to come…

What are knowledge graphs? When would you need a knowledge graph over say a database to store information?

Posted on February 14, 2019February 14, 2019 by MLNerds

A knowledge graph organizes real world knowledge as entities and relationships between entities. Creating a knowledge graph often involves scraping / ingesting unstructured data and creating structure out of it by extracting entities and relationships automatically. Examples of knowledge graphs are Wordnet, DBpedia,.. A database also is a knowledge graph in some sense. Since…

What are the advantages and disadvantages of using naive bayes for spam detection?

Posted on February 14, 2019February 14, 2019 by MLNerds

Disadvantages: Naive bayes is based on the conditional independence of features assumption – an assumption that is not valid in many real world scenarios. Hence it sometimes oversimplifies the problem by saying features are independant and gives sub par performance. Advantages: However, naive bayes is very efficient. It is a model you can train in…

How can you increase the recall of a search query (on search engine or e-commerce site) result without changing the algorithm ?

Posted on February 9, 2019March 9, 2019 by MLNerds

Since we are not allowed to change the algorithm, we can only play with modifying or augmenting the search query. (Note, we either change the algorithm/model or the data, here we can only change the data, in other words modifying the search query.) Modifying the query in a way that we get results relevant to…

← Newer posts Older posts →