For a person being from a non-statistical background the most confusing aspect of statistics, are always the fundamental statistical tests, and when to use which. This blog post is an attempt to mark out the difference between the most common tests, the use of null value hypothesis in these tests and outlining the conditions under which a particular test should be used.

Null Hypothesis and Testing

Before we venture on the difference between different tests, we need to formulate a clear understanding of what a null hypothesis is. A null hypothesis, proposes that no significant difference exists in a set of given observations. …

A detailed introduction to all the concepts prevalent in the world of Natural Language Processing

One of the most fascinating advancements in the world of machine learning, is the development of abilities to teach a machine how to understand human communication. This very arm of machine learning is called as Natural Language Processing.

This post is an attempt at explaining the basics of Natural Language Processing and how a rapid progress has been made in it with the advancements of deep learning and neural networks.

Before we dive further into this it is necessary to understand the basics

What is Language?

A language, basically is a fixed vocabulary of words which is shared by a community of humans…

Image credit: Taken from google search, source was not mentioned. Let me know if it is your image and I can add the credit

Neural Networks is one of the most popular machine learning algorithms at present. It has been decisively proven over time that neural networks outperform other algorithms in accuracy and speed. With various variants like CNN (Convolutional Neural Networks), RNN(Recurrent Neural Networks), AutoEncoders, Deep Learning etc. neural networks are slowly becoming for data scientists or machine learning practitioners what linear regression was one for statisticians. It is thus imperative to have a fundamental understanding of what a Neural Network is, how it is made up and what is its reach and limitations. …

In this article I will be covering an interesting project I worked on recently in which I was able to proactively manage the size of a cluster based on load forecasting. I will go into the details of the inbuilt AWS capabilities available, why I chose to use FBProphet and how to deploy the solution into production.

For the purpose of this document the word cluster refers to a collection of Amazon EC2 instances.

In Built AWS capabilities

AWS comes with two inbuilt capabilities to scale clusters up and down to manage cluster size: dynamic scaling and predictive scaling. …


A lot of articles have been written on different algorithms present in the field of Machine Learning with most of them dividing the learning space between supervised and unsupervised learning and explaining (sometimes in detail) algorithms available in these spaces.

In this article, I have tried to break down a learning algorithm into its basic components; architecture, model, loss function, optimization and regularization which collectively can be understood as building blocks of any discriminative supervised learning algorithm.


Architecture is a word that gained prominence with the rise of Neural Networks and Deep Learning. Once one starts researching more into deep learning, the architecture of a neural network gains as much, if not more, importance than the optimization and regularization methods being used. It does not mean architecture was not present in the earlier versions of the algorithms.

Recommender Systems are one of the most rapidly growing branch of A.I. and have become a part of our daily life. From personalized ads to results of a search query to recommendations of items bundled together, each and every aspect of our life is being touched by recommender systems in one way or the other.

This article is an attempt to decode the basic blocks on which such systems are built and to provide a base to the enthusiasts who want to delve deeper into the world of recommender systems.

Recommender Systems

The goal of a Recommender System is to generate meaningful…

This article focuses on how to utilize a popular open source database “Influxdb” along with spark-structured streaming to process, store and visualize data in real time. Here, we will go in detail over how to set up a single node instance of Influxdb, how to extend the Foreach writer of SPARK to use it to write to Influxdb and what one needs to keep in mind while designing an Influxdb database.

In the data world, one of the major trends which people want to see is how a metric progresses with time. This makes managing and handling a time series data (simply meaning where data values are co-dependent on time) a very important aspect of a Data Scientist’s life.

A lot of tools and databases have been developed around this idea of handling time series data in efficiently. During my recent project, I got to explore one such very popular open source database called “Influxdb”, and this post is about how to process real-time data with Influxdb and Spark.


As from…

Image courtesy Carlos Muza @kmuza on unsplash

This post is a token of appreciation for the amazing open source community of Data Science, to which I owe a lot of what I have learned.

For last few months, I have been working on a side project of mine to develop machine learning application on streaming data. It was a great learning experience with numerous challenges and lots of learning, some of which I have tried to share in here.

This post is focused on how to deploy machine learning models on streaming data and covers all 3 necessary areas of a successful production application: infrastructure, technology, and…

Photo by rawpixel on Unsplash

What is Machine Learning, Data Science or Artificial Intelligence? is one of the most common questions which I have faced from people. Be it newcomers, recruiters or even people in leadership positions, this is a question which is puzzling everyone in its own way.

For beginners it takes the form of how do I become a data scientist? For leaders it becomes a question of whether it has an imperative business impact? and for people in the field it takes the form of what I should call myself, a data scientist a data engineer or a data analyst.

This post…

Image courtesy Owen Beard on Unsplash

Artificial Intelligence or AI as we call it has become a new buzz word in today’s world. It has become a phenomenon which attracts many and leaves many mesmerized with its astonishing accomplishments. On one hand people are awestruck with the major feats such as self-driving cars, alpha-go (which has handsomely beaten the world GO champion 4–1 ), autonomous machines, landing of Falcon 9 , and, on the other some are equally scared by the rise of machines against humans. The concern of AI ruling humans is not only amongst non-scientific community, but, is equally shared by some of the…

vibhor nigam

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store