Time Series Analysis with Facebook Prophet Python and Cesium

Project Overview

The project aims to generate a predictive healthcare call data model using the Prophet, enhanced by features extracted by Cesium. The first step is cleaning the data, then followed by the extraction of important time series features—mean, standard deviation, and many others—using Cesium from the historical data before it's fed directly into the Prophet for forecasting. Future call volumes are predicted after training, which are visualized for interpretation. The trends, seasonality, and uncertainty intervals captured in the plots provide a comprehensive view of the forecast.

Prerequisites

Knowledge of time series analysis and forecasting to a basic extent.
Having some hands-on experience with Python and also manipulating data with pandas.
Knowledge about the Prophet model for time series forecasting.
Experience with Cesium for feature extraction from time series.
Knowledge of basic statistical features such as mean, standard deviation, and skewness.
Familiarity with data visualization using matplotlib and seaborn.
Python packages: pandas, prophet, cesium, matplotlib, seaborn, numpy, scipy.

Approach

We started by cleaning the healthcare call data and renaming columns ds (date) and y (value) so that we could work with the Prophet library. With this, we trained our Prophet model. We enhanced our dataset by adding empirical statistical features extracted from Cesium such as mean, standard deviation, or skewness, and used it for enriching the dataset and baseline modeling purposes. An advanced dataset is what we now have, which consists of all the initial call events and new features as regressors embedded in it. The new model based on the Prophet training dataset could be used to forecast the total count of call volumes for the next 12 months after it is fitted. Finally, forecast visualization is presented by which trends, seasonality, and other components interpreted by the model could be used for further understanding of the factors affecting the prediction. In this manner, it combines both historical data and custom features to enhance the accuracy.

Workflow and Methodology

Data Collection and Manipulation: Clean and collect the Healthcare Call Data, making sure it may be analyzed as a time series.
Data Transformation: Rename columns to ds, and y, and transform the data to the Prophet model format.
Feature Extraction: Extracting statistical features (mean, std, skewness) using mean, std, and skewness from time series using Cesium.
Data Enrichment: Put the extracted features with the original data and get an enriched dataset.
Model Training: Again train the Prophet model on the enhanced dataset but with additional features as regressors.
Forecasting: Using the trained model forecast the future call volumes for the next 12 months.
Visualization: Display the forecast results and visualize trends, seasonality, and uncertainty intervals.

Data Collection and Preparation

Data Collection:

In this project, we collected the dataset from a public repository. If you are looking to work on a real-world problem, you can get these kinds of datasets from publicly available repositories such as Kaggle, UCI Machine Learning Repository, or company-specific data. We will provide the dataset in this project so that you can work on the same dataset.

Data Preparation Workflow:

Import the healthcare call data, and put it into a dataframe for analysis using pandas.
Make the month column into a datetime form to be time series compatible.
Check missing values or null values in the dataset.
Clean up the dataset and extract relevant columns (only the month and Healthcare columns).
Rename the columns to ds (date) and y (value) so that the Prophet can use them.
Verify that the dataset is ready to be fed to time series forecasting.

Understanding the Code:

Here’s what is happening under the hood. Let’s go through it step by step:

Step 1:

Mounting Google Drive

Mount your Google Drive to access and save datasets, models, and other resources.

from google.colab import drive
drive.mount('/content/drive')

Management of Packages

In this code, we uninstall numpy, scipy, cesium, prophet, and seaborn and then reinstall the latest available versions. This guarantees that you are working with the latest changes made to these packages.

!pip uninstall -y numpy scipy cesium prophet seaborn
!pip install numpy
!pip install scipy
!pip install cesium
!pip install prophet
!pip install seaborn

Import Library

This code imports all essential libraries, such as the pandas library for data manipulation; seaborn for plotting; Prophet for time series forecasting; cesium for feature extraction; matplotlib for visualization; and statsmodels for seasonal decomposition. It also imports the diagnostic functions from Prophet for validating models.

import pandas as pd
import seaborn as sns
from prophet import Prophet
from cesium import featurize
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from prophet.diagnostics import cross_validation, performance_metrics

Step 2:

Loading Excel File

This code loads an Excel file put in a predefined path. If it succeeds, it prints the message "Successfully Loaded" and displays the contents. Otherwise, if the file does not exist or another error occurs, the given exception will be caught, and an error message will be displayed.

excel_file_path = '/content/drive/MyDrive/Aionlinecourse_badhon/Project/Time Series Analysis with Facebook Prophet Python and Cesium/CallCenterData.xlsx'
try:
df = pd.read_excel(excel_file_path)
print("Successfully loaded") # Display the first few rows of the DataFrame
except FileNotFoundError:
print(f"Error: File not found at {excel_file_path}")
except Exception as e:
print(f"An error occurred: {e}")

Previewing Data

This block of code displays the first few rows of the dataset to give a quick overview of its structure.