Proactively Scaling cluster with FBProphet forecasting

vibhor nigam
Analytics Vidhya
Published in
4 min readDec 26, 2020

--

In this article I will be covering an interesting project I worked on recently in which I was able to proactively manage the size of a cluster based on load forecasting. I will go into the details of the inbuilt AWS capabilities available, why I chose to use FBProphet and how to deploy the solution into production.

For the purpose of this document the word cluster refers to a collection of Amazon EC2 instances.

In Built AWS capabilities

AWS comes with two inbuilt capabilities to scale clusters up and down to manage cluster size: dynamic scaling and predictive scaling. Let’s see what each of these mean

Dynamic Scaling: Dynamic scaling is an alternative provided by AWS which manages cluster size in response to live changes in resource utilization. One can set up separate thresholds both for upscaling and downscaling and cluster and dynamic scaling makes sure to either upscale or downscale once respective thresholds are breached. [1]

However, since the approach is reactive and cluster resizing does take some time, it ends up affecting the job time and subsequent customer experience.

Predictive Scaling: Predictive scaling as promised by AWS is supposed to utilize last 2-week resource utilization data and forecast for next 2 days. These forecasts then can be used to proactively resize clusters when higher load is expected, thereby alleviating the issue faced with dynamic scaling. [1]

However, the forecasts created by predictive scaling in Amazon did not seem to be that accurate leading to explore other libraries which can be used for forecasting.

Forecasting by Facebook Prophet

Prophet, which is a forecasting library by Facebook can be used for generating forecasts which in turn can be used to proactively scale clusters. For the forecasting to work properly it is also important to choose the correct metric.

AWS through its CloudWatch API exposes various metrics such as average CPU utilization, maximum CPU utilization, node count, memory used at specific time intervals. After evaluation CPU utilization and node count were the metrics used for this task.

Utilizing Facebook Prophet:

Prophet is an extremely easy to use and effective time series library open sourced by Facebook. How Prophet works is beyond the scope of this blog post but those interested can read about it at https://peerj.com/preprints/3190.pdf

As far as utilizing prophet API is considered there are few main things to keep in mind. Prophet allows for 2 main growth models, linear and logistic growth models (details of which can be found in the paper).

Apart from these models prophet also allows for possibilities to introduce daily, weekly and yearly seasonalities , periods and orders of which can be entered by domain expert. With this approach prophet allows for analysts having the domain expertise easily utilize it to train the models.

One of the quirks is for the Prophet model to work in python, the training data should be provided in the form of a pandas Dataframe with two specific columns ‘ds’ and ‘y’, where ‘ds’ represents timestamp and ‘y’ represents the metric value which is to be forecasted.

Included below is a sample code which was used to create the forecasts.

from fbprophet import Prophet# Define parameters of modelprophet = ( Prophet(growth = ‘logistic’)
.add_seasonality(name=’daily’, period= 1, fourier_order= specified_fourier_order_daily)
.add_seasonality(name=’weekly’, period=7, fourier_order=specified_fourier_order_daily)
)
# Fit prophet model on training data prophet.fit(df)# Prepare future Dataframe for the time period to forecast, to store the forecasted values future_df = prophet.make_future_dataframe(freq='min', periods = time_period_to_forecast)# Forecast the values for the specified time frame

forecast_df = prophet.predict(future_df)

It should be noted that forecast_df contains the fitted values of training data as well as the forecasted value and the forecasted values need to be extracted separately to be used further.

Once the model is trained one can also plot to see the trends and seasonalities in the model by utilizing plot_component functionality of Prophet.

fig2 = prophet.plot_components(forecast_df)
display(fig2)

This creates a plot similar to the one shown below. The actual plot will differ depending on the underlying behavior of the data.

Image credit: Facebook open source

One can also see how good or bad fit of prophet model has been by plotting the forecasted data frame.

# Code to plot prophet forecast

prophet.plot(forecasted_df)

It generates a graph as shown in the image below

Image credit: Facebook open source

If you see in the image above black dots are present till 2016, and after that it presents the forecasted value. The black dots represent the real values. The blue color region represents the forecasted/predicted values and the top and the lower regions in light blue shade represent the upper limit and the lower limit respectively.

Proactively Scaling AWS Cluster Size

Once we have the forecasted values setting up the infrastructure to scale is quite straight forward. A job can be set up to forecast the needed metric for a day and store it at an s3 location. This can be set up as a cron job or using any other available scheduler.

On a separate EC2 instance or through a lambda function another script can be set which pulls in cluster metrics in real time. AWS provides API’s to collect various metrics for instance desired capacity (instances), current capacity(instances) etc.

This taken together with forecasted metric can be put through logical checks to manipulate cluster size pro-actively and provide a much smoother experience to the users.

--

--