Numerous countries across the globe gear up for Christmas celebrations, and what better way to celebrate it than with a festive Data Science project?
Let’s forecast the popularity of the “All I Want for Christmas” search by Mariah Carey on YouTube in the upcoming weeks.
We can get the data from Google Trends. We will use data from the last 5 years. This data comes in weekly periods, so a year of data will consist of 52 samples.
This is what the data looks like. The first thing we observe is that the popularity column has some non-numeric values to show that on a week the value was greater than zero but lower than one. We can simply replace those values with zero:
df.popularity.replace('<1', 0, inplace=True)
Let’s check for missing values:
df.isna().sum()
We have no missing values! That’s great, we can continue.
Let’s check for the data types of our features:
df.dtypes
Both of our features are object types… We need to convert week
to datetime and popularity
to numeric (integer or float):
df['week'] = pd.to_datetime(df['week'], format='%Y-%m-%d')
df['popularity'] = df['popularity'].astype(int)
We will also set the index of the dataframe as the week
:
df = df.set_index('week')
Let’s finally visualize the data:
df.popularity.plot(figsize=(12, 5))
plt.grid(True, alpha=0.5)
plt.xlabel('Date')
plt.ylabel('Popularity')
plt.show()
We will use autoARIMA to train our SARIMA model. Why SARIMA? We can see that there is a clear seasonal component in our data!
# Install the pmdarima if you don't have it
!pip install pmdarima==2.0.3
# Import the library
from pmdarima.arima import auto_arima
Before proceeding let’s split our data into train and test sets:
samples_train = int(df.shape[0] * 0.9)
train = df.iloc[:samples_train]
test = df.iloc[samples_train:]
We have weekly data. From the previous graph, we can observe annual seasonality. Since a year has 52 weeks, we will select the seasonal period m
as 52. Let’s train a SARIMA model using the train set.
You can check the full article in our newsletter:
This was the final prediction:
That means that during this Christmas we can expect a popularity peak of around 60%! It seems like “All I Want for Christmas“‘ popularity has been decreasing year by year, right?
Enjoy it and Merry Christmas!
0 Comments