Top 50+ Machine learning interview questions and answers

Spread the love

If you are preparing for machine learning engineer, these questions will help you to crack the job easily for tcs, google, facebook, cts and other companies. These are the best Machine learning interview questions for freshers and experienced professionals. In machine learning you can use python, r, scala languages and sas. 

Machine learning interview questions for beginners

What is machine learning  ?

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.


What are Different types of learning in machine learning ?

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning
  • Semi supervised learning

What are different applications of machine learning ?

  • Bioinformatics
  • Brain–machine interfaces
  • Computer Networks
  • Computer vision
  • Credit-card fraud detection
  • Financial market analysis
  • Handwriting recognition
  • Information retrieval
  • Insurance
  • Internet fraud detection
  • Medical diagnosis
  • Optimization
  • Recommender systems
  • Search engines
  • Sentiment analysis
  • Sequence mining
  • Speech recognition
  • Time series forecasting
  • User behavior analytics

What is Supervised learning ?

When the data is labeled in during training process it’s calling supervised learning

What is Unsupervised learning ?

When the data is not labeled in during training process it’s calling supervised learning.

What is Reinforcement learning ?

Reinforcement learning is an important type of Machine Learning where an agent learn how to behave in a environment by performing actions and seeing the results.


What is Semi supervised learning ?

Semi-supervised learning is a class of machine learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning and supervised learning


What is Anomaly detection ?

Anomaly detection is the identification of data points, items, observations or events that do not conform to the expected pattern of a given group. These anomalies occur very infrequently but may signify a large and significant threat such as cyber intrusions or fraud.


What is Bias, Variance and Trade-off ?

Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.


What are best algorithms for supervised learning ?

  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest
  • KNN
  • Naive bayes

What are best algorithms for unsupervised learning ?

  • K-means
  • Hierarchical Clustering
  • t-SNE Clustering
  • DBSCAN Clustering
  • Principal Component Analysis (PCA)
  • Anomaly detection

What are best algorithms for Reinforcement learning ?

  • Q-Learning  
  • SARSA  -State-Action-Reward-State-Action
  • DQN – Deep Q Network
  • DDPG – Deep Deterministic Policy Gradient

What is regression ? When we will use this method ?

A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables.

Read more

What is classification ? When we will use this method ?

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

Read more

What is clustering ? When we will use this method ?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups

Read more

What is regularization ?

This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.


What is Difference between l1 regularization and l2 regularization ?

Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights.

Read more

What ensemble learning ?

Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem.

Read more

What are ensemble techniques ?

Basic : max voting, averaging, weighted averages

Advanced : Stacking, Blending, Bagging, Boosting

What is bagging ?

Bagging is an abbreviation for “bootstrap aggregating”. It’a meta-algorithm, which takes M subsamples (with replacement) from the initial dataset and trains the predictive model on those subsamples. The final model is obtained by averaging the “bootstrapped” models and usually yields better results.

Read more

What is boosting ?

Boosting is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones.

Read more

What are the best bagging algorithms ?

  • Bagging Meta-estimator
  • Random Forest

What are the best boosting algorithms ?

  • Adaboost
  • Gradient boosting algorithm (GBM)
  • Extreme gradient boosting (XBM)
  • Light GBM
  • Catboost

What is dimensionality reduction ?

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

Read more

What are best dimensionality reduction algorithms ?

  • Missing Value Ratio
  • Low Variance Filter
  • High Correlation Filter
  • Random Forest
  • Backward Feature Elimination
  • Forward Feature Selection
  • Factor Analysis
  • Principal Component Analysis
  • Independent Component Analysis
  • t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • UMAP
  • readmore

What is recommendation system ?

A recommender system or a recommendation system is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item.


What are best techniques for recommendation system ?

  1. Content based filtering
  2. Collaborative filtering

What is Content based filtering ?

When a friend asks you for a book recommendation, it’s pretty natural to ask what kinds of books they like. From there, you could think of a few titles that are similar to the things they’ve liked in the past. This process, of recommending content based on its characteristics, is at the heart of content-based filtering, the technology behind Netflix and Pandora’s recommendation engines.

Read more

What is collaborative filtering ?

Collaborative filtering is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one

Read more

What is overfitting ?

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. Overfitting the model generally takes the form of making an overly complex model to explain idiosyncrasies in the data under study

Overfitting occurs when the model is working well with test data, fail at test data.


How can you overcome from it  ?

Cross-Validation : Cross Validation in its simplest form is a one round validation, where we leave one sample as in-time validation and rest for training the model. But for keeping lower variance a higher fold cross validation is preferred.

Early Stopping : Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit.

Pruning : Pruning is used extensively while building CART models. It simply removes the nodes which add little predictive power for the problem in hand.

Regularization : This is the technique we are going to discuss in more details. Simply put, it introduces a cost term for bringing in more features with the objective function. Hence, it tries to push the coefficients for many variables to zero and hence reduce cost term.


What is underfitting ?

Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias.


How can you improve model performance ?

You can improve the model performance by using following methods

  1. Add more data
  2. Treat missing and Outlier values
  3. Feature Engineering
  4. Feature Selection
  5. Multiple algorithms
  6. Algorithm Tuning
  7. Ensemble methods
  8. Cross Validation


How can you deploy your model ?

We have 2 libraries for model deployment in python pickle and joblib.

You can easily deploy your model using flask api.


What is Time series modeling ?

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. … Time series forecasting is the use of a model to predict future values based on previously observed values.


  1. Time series using R
  2. Time Series using Python

What is Stationary data  and non stationary data ?

A stationary (time) series is one whose statistical properties such as the mean, variance and autocorrelation are all constant over time. Hence, a non-stationary series is one whose statistical properties change over time.


What are different types of Times series algorithms ?

  • Naive Approach
  • Simple average
  • Moving average
  • Single Exponential smoothing
  • Holt’s linear trend method Method
  • Holt’s Winter seasonal method Method
  • ARIMA (Autoregressive and moving average)
  • Readmore

What is difference between statistical model and machine learning model ?

Machine Learning is an algorithm that can learn from data without relying on rules-based programming.

Statistical Modelling is formalization of relationships between variables in the form of mathematical equations.


Metrics :

Regression metrics:

What is mean square error ?

In statistics, the mean squared error or mean squared deviation of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and what is estimated. MSE is a risk function, corresponding to the expected value of the squared error loss

Read more

What is root mean square error ?

The root-mean-square deviation or root-mean-square error is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed.

Read more

What is R-Square ?

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model.


R-squared = Explained variation / Total variation

R-squared is always between 0 and 100%:

0% indicates that the model explains none of the variability of the response data around its mean.

100% indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data.

Read more

Classification Metrics:

What is Confusion Matrix ?

it is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

Read more

What is Type I Error  and Type 2 Error ?

In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis (also known as a “false positive” finding), while a type II error is the failure to reject a false null hypothesis (also known as a “false negative” finding). More simply stated, a type I error is to falsely infer the existence of something that is not there (confirming to common belief with false information), while a type II error is to falsely infer the absence of something that is present (going against the common belief with false information).

Read more


What is Difference between type 1 and type 2 error ?

Read more

What is Precision and Recall ?

In pattern recognition, information retrieval and binary classification, precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances

Read more

What is ROC curve ? when you will use it ?

As the area under an ROC curve is a measure of the usefulness of a test in general, where a greater area means a more useful test, the areas under ROC curves are used to compare the usefulness of tests. The termROC stands for Receiver Operating Characteristic.

Read more

What is F1 Score ? when you will use it ?

The f1-score is one of the most popular performance metrics. From what I recall this is the metric present in sklearn. In essence f1-score is the harmonic mean of the precision and recall.

Read more


Programming :

What is best programming language for machine learning Python or R or Spark or Sas ? Why ?

Python is best option for machine learning, If you want understand why read more

What is dummy variables ?

Dummy variables (sometimes called indicator variables) are used in regression analysis and Latent Class Analysis. As implied by the name, these variables are artificial attributes, and they are used with two or more categories or levels. It’s used when you want to work with categorical variables which have no quantifiable relationship with each other.

Read more

What is one-hot encoding ?

In digital circuits and machine learning, one-hot is a group of bits among which the legal combinations of values are only those with a single high bit and all the others low. A similar implementation in which all bits are ‘1’ except one ‘0’ is sometimes called one-cold

Read more


How can you handle missing data in your data set ?

In statistical language, if the number of the cases is less than 5% of the sample, then the researcher can drop them. In the case of multivariate analysis, if there is a larger number of missing values, then it can be better to drop those cases (rather than do imputation) and replace them.

Read more

How can you handle duplicate values in your data set ?

You can remove those values by using pandas dataframe with df.duplicates() function.

Read more

How can you handle outliers values  in your data set ?

  1. Univariate method:
  2. Multivariate method
  3. Minkowski error

Read more


What is data preprocessing ?

What are best libraries of data preprocessing ?

Numpy, Pandas

What are best libraries of machine learning ?

Scikitlearn, StatsModels, Tensorflow, Keras, Catboost, Xgboost, Light Boost

What are best libraries of data visialization ?

Matplotbit, seaborn

What is Rescaling ?

What is Feature scaling ?

What are different types of function in machine learning for Feature scaling ?

StandardScaler, MinMaxScaler, Normalizer


What is Data Normalization ?

It is also called as feature scaling.

What’s the difference between Normalization and Standardization?

Machine learning Algorithms interview questions

When to apply L1 regression ?

When to apply L2 regression ?

What is difference between KNN and K Means ?

How to choose k value in KNN ?

How the treen will be pruned in decision trees ?

How the tree will be split in decision trees ?

How can you divide the data to train set and test set ? Is it 50: 50 or 70: 30 ?

How can you do sentiment analysis ?


What is PCA  and How it will work ?

Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It’s often used to make data easy to explore and visualize. We can also use PCA to Reduce dim

PCA is the simplest of the true eigenvector-based multivariate analyses.

Reference 1

Reference 2


If you went for real time job interviews, you will get scenario based project questions on machine learning.

  • What is Problem statement ?
  • How you plan to handle it ?
  • How you collected data from different sources ?
  • How you done preprocessing ?
  • How you have chosen machine leaching model ?
  • How you have optimized that particular model ?
  • How you have deploy that model to production ?

The above questions will definitely popup, Be make sure practice on work on projects . you will get understanding of problem and data.

Practice make man perfect. Practice with different kind of problems and datasets you can easily crack the interview without any doubt.

Best of luck.


Leave a Reply

Your email address will not be published. Required fields are marked *