🤖 Machine Learning Interview Questions & Answers (2025)
Basic Level Questions
What is Machine Learning?▶
Machine Learning is a branch of AI where systems learn patterns from data without being explicitly programmed.
Types of Machine Learning?▶
Supervised, Unsupervised, Semi-supervised, and Reinforcement Learning.
What is supervised learning?▶
Model trained on labeled data to predict outcomes for new, unseen inputs.
What is unsupervised learning?▶
Model trained on unlabeled data to find patterns like clusters or associations.
Define reinforcement learning.▶
An agent learns to make decisions by taking actions in an environment to maximize rewards.
What is overfitting?▶
When a model performs well on training data but poorly on new data due to memorizing noise instead of general patterns.
What is underfitting?▶
When a model is too simple to capture data patterns, resulting in poor performance on both training and test sets.
Examples of ML applications?▶
Spam detection, recommendation systems, fraud detection, speech recognition, medical diagnosis.
What is a model in ML?▶
A mathematical representation of learned patterns that can make predictions on new inputs.
Difference between AI and ML?▶
AI is the overall concept of intelligent machines; ML is a subset focused on learning from data.
Intermediate Level Questions
Explain decision trees.▶
A model splitting data into branches based on feature conditions to arrive at a decision or prediction.
What is Random Forest?▶
An ensemble learning method combining multiple decision trees to improve prediction accuracy and control overfitting.
What is the bias-variance trade-off?▶
Balancing error due to bias (assumptions) and variance (sensitivity to data), aiming for optimum model complexity.
What is logistic regression used for?▶
For binary or multi-class classification tasks using a logistic (sigmoid) function.
Difference between classification and regression.▶
Classification predicts categories; regression predicts continuous values.
What is regularization?▶
A technique to prevent overfitting by adding a penalty to the model complexity in the loss function (L1, L2).
What is feature scaling?▶
Standardizing or normalizing input features to improve training stability and convergence speed.
What is PCA?▶
Principal Component Analysis reduces dimensionality by transforming features into uncorrelated components capturing most variance.
What are support vector machines?▶
Classifiers that find the optimal hyperplane separating classes with maximum margin.
What is a neural network?▶
An interconnected group of nodes simulating brain neuron behavior to learn data patterns.
Explain k-means clustering.▶
An unsupervised algorithm partitioning data into k clusters by minimizing intra-cluster variance.
What is cross-validation?▶
A method to evaluate model performance by training/testing on different data subsets.
What is gradient descent?▶
An iterative optimization algorithm used to reduce loss by updating weights in opposite direction of gradients.
What are hyperparameters?▶
Settings like learning rate, batch size, epochs, chosen before training starts and not learned from data.
What is confusion matrix?▶
A table showing correct vs incorrect predictions for classification problems, used to calculate precision, recall, etc.
Define precision and recall.▶
Precision: proportion of correct positive predictions; Recall: proportion of actual positives correctly predicted.
What is F1-score?▶
The harmonic mean of precision and recall, useful for imbalanced datasets.
What is a ROC curve?▶
A plot of true positive rate vs false positive rate to visualize classification performance trade-offs.
What is ensemble learning?▶
Combining multiple models to produce better predictive performance than a single model.
Advanced Level Questions
Explain XGBoost.▶
Extreme Gradient Boosting is an efficient, scalable implementation of gradient boosted decision trees, known for high performance in ML competitions.
What is bagging and boosting?▶
Bagging trains models in parallel on random data subsets; boosting trains sequentially, correcting errors of prior models.
What is stacking in ensemble learning?▶
Combining multiple base model predictions via a meta-model to improve accuracy.
What is the curse of dimensionality?▶
When feature space becomes high-dimensional, data becomes sparse, making models prone to overfitting and slow computation.
What are word embeddings in ML?▶
Vector representations of words capturing semantic meaning, useful in NLP tasks.
Explain reinforcement learning algorithms.▶
Includes Q-learning, Deep Q-Networks, and Policy Gradient methods where agents learn optimal actions through rewards.
What is deep reinforcement learning?▶
Combines deep neural networks with reinforcement learning to make decisions in complex environments.
What is transfer learning in ML?▶
Using a model trained on one task as a starting point for a different but related task, reducing the required data and training time.
Difference between batch and online learning.▶
Batch learning trains on the whole dataset; online learning updates the model incrementally as new data arrives.
Explain model interpretability.▶
Understanding how a model arrives at its decisions, using techniques like SHAP, LIME for transparency and trust.
What is anomaly detection?▶
Identifying rare instances in data that differ significantly from the majority, e.g., fraud detection.
What is learning rate scheduling?▶
Adjusting the learning rate over epochs to improve convergence and avoid overshooting minima.
What are generative models?▶
Models that learn the distribution of data to generate new, similar samples, e.g., GANs, VAEs.
What is model deployment?▶
Process of integrating a trained ML model into a production environment for real-world predictions.
Explain federated learning.▶
A decentralized learning approach where models are trained across multiple devices without centralizing raw data.
What is adversarial ML?▶
Techniques where inputs are manipulated to fool models, highlighting security vulnerabilities.
What is real-time inference?▶
Making predictions instantly as new data arrives, critical for applications like fraud detection.
Explain concept drift.▶
When the statistical properties of target variables change over time, requiring model adaptation.
What is meta-learning?▶
“Learning to learn” — developing models that learn new tasks quickly with minimal data.