🗣️ Natural Language Processing (NLP) Interview Questions and Answers (2025)
Basic Level Questions
What is Natural Language Processing (NLP)?▶
NLP is a branch of AI that enables computers to understand, interpret, and generate human language.
What are the main applications of NLP?▶
Applications include machine translation, sentiment analysis, chatbots, speech recognition, and text summarization.
What is tokenization in NLP?▶
Tokenization is splitting text into smaller units called tokens, such as words or subwords, for processing.
What is stemming and lemmatization?▶
Stemming cuts words to their base form often crudely; lemmatization reduces words to their dictionary form considering context.
What is a corpus in NLP?▶
A corpus is a large structured set of texts used for training and evaluating NLP models.
What is part-of-speech tagging?▶
Assigning word classes (noun, verb, adjective, etc.) to each token in a sentence.
What are word embeddings?▶
Dense vector representations of words capturing semantic relationships between words.
What is n-gram?▶
An n-gram is a contiguous sequence of n items (usually words) from text used to predict or analyze language.
What is the bag-of-words model?▶
A simple text representation method counting word frequencies without considering word order.
What is sentiment analysis?▶
Determining the sentiment or emotion expressed in text, such as positive, negative, or neutral.
Intermediate Level Questions
What is a language model?▶
A model that estimates the probability of a sequence of words, helping in prediction and text generation.
What are the differences between statistical and neural NLP?▶
Statistical NLP uses probabilistic models and hand-engineered features; Neural NLP uses deep learning for automatic feature extraction.
Explain TF-IDF.▶
Term Frequency-Inverse Document Frequency measures importance of a word in a document relative to a corpus.
What is word2vec?▶
A neural embedding method that generates vector representations of words based on their context.
What is the difference between CBOW and Skip-Gram in word2vec?▶
CBOW predicts the current word from surrounding context; Skip-Gram predicts surrounding words from the current word.
What are transformers?▶
A deep learning architecture that uses self-attention mechanisms, eliminating recurrent connections for faster training.
Explain self-attention mechanism.▶
Self-attention calculates weights between inputs to focus on relevant parts of the sequence for better context understanding.
What is BERT?▶
Bidirectional Encoder Representations from Transformers; a pre-trained model capturing deep bidirectional context for NLP tasks.
What is fine-tuning in NLP?▶
Adjusting a pre-trained model on a specific task by training on task-specific data for improved accuracy.
Explain sequence-to-sequence models.▶
Models that map input sequences to output sequences, commonly used in machine translation and summarization.
What is Named Entity Recognition (NER)?▶
Identifying and classifying named entities like persons, locations, organizations, in text.
What are attention weights?▶
Probabilities assigned to different tokens indicating their relevance in the current context.
What is transfer learning?▶
Using a model trained on one task as the starting point for a related task to improve learning efficiency.
Explain perplexity in language models.▶
A measure of how well a language model predicts a sequence; lower perplexity indicates better prediction.
What is token masking?▶
Hiding specific tokens during training so the model learns to predict them (used in BERT).
Difference between RNN and Transformer.▶
RNNs process sequences sequentially; transformers process all tokens in parallel with self-attention.
What is BLEU score?▶
A metric for evaluating the quality of machine-translated text compared to human reference translations.
Explain beam search decoding.▶
A search algorithm keeping top candidate sequences during decoding to find the most likely output in sequence generation.
What is language model pretraining?▶
Training a language model on large corpora to learn language structure before fine-tuning on specific tasks.
What is GPT?▶
Generative Pre-trained Transformer; a transformer-based model optimized for text generation.
How do word embeddings handle polysemy?▶
Traditional embeddings may struggle; contextual embeddings like BERT provide different representations based on context.
Advanced Level Questions
Explain the Transformer architecture.▶
An architecture using multi-head self-attention and feed-forward networks to process sequences efficiently and capture long-range dependencies.
What is positional encoding?▶
A method to inject order information into token embeddings since transformers process input tokens in parallel.
Describe multi-head attention.▶
Dividing attention mechanisms into multiple heads to capture different representation subspaces simultaneously.
Explain masked language modeling.▶
Training method where randomly selected tokens are masked and the model learns to predict them (as used in BERT).
What are attention weights?▶
Numerical scores indicating the importance of one token to another in self-attention layers.
What is text summarization?▶
The task of generating a concise and meaningful summary of a longer text document.
Differentiate extractive and abstractive summarization.▶
Extractive selects key sentences verbatim; abstractive generates new sentences conveying meaning.
What is Named Entity Disambiguation?▶
Resolving ambiguity when multiple entities share the same name, by correctly identifying the intended entity in context.
How do you implement transfer learning with transformers?▶
Start with pre-trained weights and fine-tune the model on your specific NLP task dataset.
Explain BERT’s pretraining objectives.▶
Masked language modeling and next sentence prediction to learn contextual language understanding.
What are generative adversarial networks (GANs) in NLP?▶
GANs can generate realistic text by training a generator and discriminator network in a competitive setting.
How are transformers used in question answering systems?▶
They encode questions and context to find answer spans with high accuracy using attention mechanisms.
What is zero-shot learning?▶
Predicting on tasks without task-specific training data by leveraging general knowledge encoded in large models.
How do you handle out-of-vocabulary (OOV) words in NLP?▶
Using subword tokenization methods, like Byte Pair Encoding (BPE), or character-level models to represent rare words.
What is attention masking?▶
Removing irrelevant tokens during attention computation to focus on relevant sequence parts.
What is the role of positional embeddings?▶
They encode the position of each token in the input sequence to capture order information.
Explain language model fine-tuning.▶
Adjusting a pretrained language model with labeled data to adapt it for a particular NLP task.
What are the challenges of NLP?▶
Ambiguity, context understanding, sarcasm detection, and domain adaptation among others.
What is text generation in NLP?▶
Generating coherent and contextually relevant text automatically by language models.
How do you evaluate NLP models?▶
Using metrics like accuracy, F1-score, BLEU, ROUGE depending on the task.