Image

Build an Autoregressive and Moving Average Time Series Model

Welcome to time series analysis! We explore this project at a much deeper level to understand and predict the IoT sensor readings. It is mainly to investigate how the sensor data can be used to analyze through Moving Average and Autoregressive models. The models above can help us find hidden patterns and predict future readings.

Project Overview

This project starts by cleaning and preparing the IoT sensor data so that we can analyze it. We then proceed to build many models: Moving Average models (MA(1), MA(2)) and then Autoregressive models (AR(1), AR(2), AR(3), AR(4)). Now that we have these models, we can understand the relationship between past and future values.

We then determine how well each model describes the predictions with Root Mean Squared Error (RMSE), a common way to gauge accuracy in forecasts. This makes things more insightful, so we bring in visualizations such as autocorrelation plots and rolling average plots. They help us see how sensor readings behave over time and how different models perform. By the end of this project, we will have a good idea of which model fits the data best and which one can confidently predict from historical sensor data.

Prerequisites

  • Knowledge of time series analysis and some basic concepts, such as stationarity and autocorrelation.
  • Python and libraries including Pandas, NumPy, and Matplotlib.
  • Knowledge of machine learning models such as Moving Average (MA) and Autoregressive (AR) models.
  • Experience with model performance metric computation including Root Mean Squared Error (RMSE).
  • The knowledge on how to preprocess clean data to analyze time series data.
  • Visualization tools for time series, for example, autocorrelation plots and rolling averages.
  • An Augmented Dickey-Fuller (ADF) test for stationarity

Approach

First, we clean up the IoT sensor data to prepare it for analysis and modeling. Then we dive into different time series models, starting with Moving Average (MA) models and then Autoregressive (AR) models with varying lags. These models aim to capture the dependence of future values of the data on the past. To check for stationarity we use the Augmented Dickey-Fuller (ADF) test, and to smoothen the data we use rolling averages. For each model, we compute the Root Mean Squared Error (RMSE) so that we can evaluate the accuracy of the predictions of each model. Autocorrelation plots give us visuals of how the data is related to each other. We finally pick the most accurate model using RMSE and then use the one we picked to make future predictions to gain valuable insights into IoT sensor behavior.

Workflow and Methodology

  • Load and prepare the IoT sensor data for further analysis.
  • Apply the Augmented Dickey-Fuller (ADF) test for stationarity.
  • Construct various time series models (MA and AR) under different lags.
  • Train each of the models with the prepared data.
  • Compute a fitted value for each model.
  • Calculate the Root Mean Square Error (RMSE) for each model to evaluate model performance.
  • Visualize the autocorrelation plots and rolling average of data.
  • Compare the RMSE value for every model to identify the best among the models about performance.
  • Have predictions based on the selected model.

Data Collection and Preparation

Data Collection:

In this project, we collected the dataset from a public repository. If you are looking to work on a real-world problem, you can get these kinds of datasets from publicly available repositories such as Kaggle, UCI Machine Learning Repository, or company-specific data. We will provide the dataset in this project so that you can work on the same dataset.

Data Preparation Workflow:

  • Import the dataset and inspect it for any missing values or inconsistencies.
  • Convert the date column to a proper timestamp format for time series analysis.
  • Set the time column as the index to allow for time-based operations.
  • Handle missing values by using methods like forward filling or backward filling.
  • Visualize the time series data to identify trends, seasonality, and noise.

Code Explanation

STEP 1:

Mounting of Google Drive

This code mounts your Google Drive into the Colab environment so that you can access files stored in your drive. Your Google Drive is made accessible under the /content/drive path.

from google.colab import drive
drive.mount('/content/drive')

Ignoring Warnings

This code will suppress all the warnings, thus preventing them from being displayed during execution. This ensures that the output is clean while running the program.

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

Required Library Installation

This code is meant to install the required libraries for Python such as: plotting through matplotlib, data manipulation by pandas, performing statistical modeling with statsmodels, seaborn for visualizing the data, scipy computes scientific in addition to mathematical problems, provides numerical work through numpy, and the last one is machine learning with scikit-learn.

!pip install matplotlib
!pip install pandas
!pip install statsmodels
!pip install seaborn
!pip install scipy
!pip install numpy
!pip install scikit-learn

Importing Required Libraries for Time Series Analysis

All the libraries have been imported to perform time series analysis, including pandas, numpy, statsmodels, and matplotlib. All the libraries support functions like seasonal decomposition, statistical tests, ARIMA modeling, and graphical representation of autocorrelation functions for time series data analysis.

#importing all required libraries
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller,kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error
from pandas.plotting import autocorrelation_plot
import scipy.stats
import pylab
from statsmodels.tsa.stattools import kpss
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose

STEP 2:

Loading Data and Checking Shape

This code loads the CSV file. After loading the dataset it prints the dataset’s shape to check the number of rows and columns. The %time magic command in the notebook records the time taken to perform the task.

#read the data
data = pd.read_csv('/content/drive/MyDrive/New 90 Projects/Project_15/Data-Chillers.csv')
df = data.copy()
df.shape

Previewing Data

This code displays the dataset's first few rows for a quick overview.

#checking first five rows of the data
df.head()

Dataset Info

The purpose of the given code is to provide a summary of the DataFrame df by displaying the number of records, names of the columns, types of columns, count of non-null values, and the size in memory.

# checking the structure of data
df.info()

Checking Missing Values

This code calculates the overall number of null values present in every column of the df Data Frame. This helps in identifying null values for further processing of the data.

#checking missing values
df.isnull().sum()

Convert the Date Column to Timestamp Format

Converts the time column of the DataFrame df to timestamp format as defined by the date and time format ('%d-%m-%Y %H:%M'). It makes the operations based on date easier when operating in a series analysis of time.

#converting date column into timestamp format
df.time = pd.to_datetime(df.time, format='%d-%m-%Y %H:%M')

This code extracts the maximum or latest date from the time column of the dataframe df so that one can infer the most recent record in the dataset.

#minimum data in the dataset
df['time'].max()

This code extracts the minimum or earliest date from the time column of the dataframe df so that one can infer the first record in the dataset.

#max date ibda dataset
df['time'].min()

The purpose of the given code is to provide a summary of the DataFrame df after the conversion by displaying the number of records, names of the columns, types of columns, count of non-null values, and the size in memory.

# checking the structure of data after converting datetime format
df.info()

Calculate Correlation with Other Features

This piece of code calculates the correlation matrix of the DataFrame df to show the correlation between the defined numeric features. Using this information, one will find how strongly every single feature is correlated with other features.

#correlation with other features.
df.corr()

STEP 3:

Plotting IoT Sensor Reading

This will plot the IOT_Sensor_Reading column from the DataFrame df as a time series. This plot is 20 inches wide by 5 inches height and has an appropriate title for visual clarity.

df.IOT_Sensor_Reading.plot(figsize=(20,5), title="IOT Sensor_Reading")
plt.show()

Plotting Error Present

This code plots the Error_Present column form of DataFrame df as a time series. The size of the plot is set as 20x5 along with the title to make the plot better understandable.

df.Error_Present.plot(figsize=(20,5), title="Error_Present")
plt.show()

This code plots the Sensor_2 column form of DataFrame df as a time series. The size of the plot is set as 20x5 along with the title to make the plot better understandable.

df.Sensor_2.plot(figsize=(20,5), title="Sensor_2")
plt.show()

This code plots the Sensor_Value column form of DataFrame df as a time series. The size of the plot is set as 20x5 along with the title to make the plot better understandable.

df.Sensor_Value.plot(figsize=(20,5), title="Sensor_Value")
plt.show()

QQ plot for IOT Sensor readings

This code is used to generate a QQ plot for IOT_Sensor_Reading to check if it is normally distributed using scipy.stats.probplot.

# The QQ plot
scipy.stats.probplot(df.IOT_Sensor_Reading, plot=pylab)
plt.title("QQ plot for IOT_Sensor_Reading")
pylab.show()

STEP 4:

Computing the Mean of IOT Sensor Readings

The mean value of the IOT_Sensor_Reading column in DataFrame df is computed by this code, which gives an idea of the sensor data's central tendency.

df['IOT_Sensor_Reading'].mean()

Computing the Minimum of IOT Sensor Readings

The minimum value of the IOT_Sensor_Reading column in DataFrame df is computed by this code, which gives an idea of the sensor data's lowest record.

df['IOT_Sensor_Reading'].min()

Computing the Maximum of IOT Sensor Readings

The maximum value of the IOT_Sensor_Reading column in DataFrame df is computed by this code, which gives an idea of the sensor data's highest record.

df['IOT_Sensor_Reading'].max()

Time of Day Categorization

The hour is extracted from the time column in DataFrame df, and so the hour is grouped into three labels, namely 'morning', 'noon', and 'evening'. A new column, time_of_day, is created with the hour of the time.

# Extract the hour of the day from the 'datetime' column
hour = df['time'].dt.hour
# Create a new column with labels for 'morning', 'noon', and 'evening'
df['time_of_day'] = pd.cut(hour, bins=[0, 11, 16, 23], labels=['morning', 'noon', 'evening'])
df.head()

Maximum IOT Sensor Readings across Various Times in a Day

The code aggregates the data categorically time_of_day and computes the highest given IOT_Sensor_Reading for that time ('morning,' 'noon,' 'evening'). It can indicate at which time during the day the maximum sensor reading occurs.

max_IOT_Sensor_Reading = df.groupby('time_of_day')['IOT_Sensor_Reading'].max()
max_IOT_Sensor_Reading

Minimum IOT Sensor Readings across Various Times in a Day

The code aggregates the data categorically time_of_day and computes the lowest given IOT_Sensor_Reading for that time ('morning', 'noon', 'evening'). It can indicate at which time during the day the minimum sensor reading occurs.

min_IOT_Sensor_Reading = df.groupby('time_of_day')['IOT_Sensor_Reading'].min()
min_IOT_Sensor_Reading

Maximum IOT Sensor Readings across Various Times in a Day

The code aggregates the data categorically time_of_day and computes the average given IOT_Sensor_Reading for that time ('morning', 'noon', 'evening'). It helps to understand the typical sensor reading during each part of the day.

avg_IOT_Sensor_Reading = df.groupby('time_of_day')['IOT_Sensor_Reading'].mean()
avg_IOT_Sensor_Reading

Extracting Weekday

It creates a new column called day_of_week in the DataFrame df, which extracts the day names, for example, Monday or Tuesday from the time column. This would help in analyzing the data for different days of the week.

df['day_of_week'] = df['time'].dt.day_name()

Finding Maximum IOT Sensor Readings by Day of the Week

This code groups by day_of_week and gets the maximum of the IOT_Sensor_Reading for each day. It helps to identify what is the highest reading on each distinct day of the week regarding the specific sensor.

max_IOT_Sensor_Reading = df.groupby('day_of_week')['IOT_Sensor_Reading'].max()
max_IOT_Sensor_Reading

Finding Minimum IOT Sensor Readings by Day of the Week

This code groups by day_of_week and gets the maximum of the IOT_Sensor_Reading for each day. It helps to identify what is the lowest reading on each distinct day of the week regarding the specific sensor

min_IOT_Sensor_Reading = df.groupby('day_of_week')['IOT_Sensor_Reading'].min()
min_IOT_Sensor_Reading

Finding Average IOT Sensor Readings by Day of the Week

This code groups by day_of_week and gets the average of the IOT_Sensor_Reading for each day.

avg_IOT_Sensor_Reading = df.groupby('day_of_week')['IOT_Sensor_Reading'].mean()
avg_IOT_Sensor_Reading

Understanding the Shape of the DataFrame

This code gives the shape of the DataFrame df, showing the number of rows and columns in it. It helps to understand the size and structure of the dataset.

df.shape

Setting the time as an index

This code sets the time column of the DataFrame df as the index and is helpful for time series analysis and method efficiency, allowing more convenient slicing based on time.

#df.sort_values('time', inplace=True)
df.set_index('time', inplace=True)

Plotting IoT Sensor Reading

This will plot the IOT_Sensor_Reading column from the DataFrame df as a time series. This plot is 20 inches wide by 5 inches high and has an appropriate title for visual clarity.

df.IOT_Sensor_Reading.plot(figsize=(20,5), title="IOT Sensor_Reading")
plt.show()

STEP 5:

Resampling Data to Hourly Frequency

This code resamples the DataFrame df to a frequency of one hour ('H'). It adjusts the data to have consistent hourly intervals, filling in missing values if necessary.

df.asfreq('H')

This code assigns the resampled DataFrame (df) to a frequency of one hour ('H'). It adjusts the data to have consistent hourly intervals, filling in missing values if necessary.

df = df.asfreq('H')

This code gives the shape of the DataFrame df, showing the number of rows and columns in it. It helps to understand the size and structure of the dataset.

df.shape

Handling Missing Values

Forward filling is done to IOT_Sensor_Reading, and Error_Present and Sensor_2 has used backward filling to provide other strategies for handling missing values in the DataFrame df. The Sensor_Value will have its missing values replaced with the mean of the available observation. These methods ensure completeness in data for processing and analysis.

df.IOT_Sensor_Reading = df.IOT_Sensor_Reading.fillna(method="ffill")
df.Error_Present = df.Error_Present.fillna(method="bfill")
df.Sensor_2 = df.Sensor_2.fillna(method="bfill")
df.Sensor_Value = df.Sensor_Value.fillna(value=df.Sensor_Value.mean())

Decomposing Components of Time Series

This code decomposes IOT_Sensor_Reading time series into trend, seasonal, and irregular components via an additive model with period 365. Decomposition will help identify the underlying patterns in the data, such as trends, and seasonality.

# Decompose the time series into trend, seasonal, cyclical, and irregular components
decomp = seasonal_decompose(df['IOT_Sensor_Reading'], model='additive', period=365)
# Extract the components
trend = decomp.trend
seasonal = decomp.seasonal
irregular = decomp.resid

Time Series Decomposition and Plotting

The code decomposes the IOT_Sensor_Reading's time series into its trend, seasonal, and residual components, utilizing an additive model. It then passes on to plot these components but in better styling showing original data, trend, seasonality, and residuals in a 4-panel figure for clearer understanding.

# Decompose the time series
decomposition = seasonal_decompose(df['IOT_Sensor_Reading'], model='additive')
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
# Plot the components with improved styling
plt.figure(figsize=(12, 10))
plt.suptitle("Time Series Decomposition of IoT Sensor Reading", fontsize=16, fontweight='bold')
plt.subplot(411)
plt.plot(df['IOT_Sensor_Reading'], label='Original', color='blue')
plt.title('Original Time Series')
plt.legend(loc='upper left')
plt.subplot(412)
plt.plot(trend, label='Trend', color='orange')
plt.title('Trend Component')
plt.legend(loc='upper left')
plt.subplot(413)
plt.plot(seasonal, label='Seasonality', color='green')
plt.title('Seasonal Component')
plt.legend(loc='upper left')
plt.subplot(414)
plt.plot(residual, label='Residuals', color='red')
plt.title('Residual Component')
plt.legend(loc='upper left')
plt.tight_layout(rect=[0, 0.03, 1, 0.95])  # Adjust layout to fit the main title
plt.show()

Irregular Plot of IOT Sensor Readings

This code generates a graph of the IOT_Sensor_Reading from the DataFrame df over time. This is created with red markers and lines, customized labels, title, and grid to make it more easily readable. The x-axis labels are then rotated for easy understanding.

# prompt: irregular plot
# Assuming df is your DataFrame with 'time' as index and 'IOT_Sensor_Reading' as the column you want to plot.
plt.figure(figsize=(20, 6))  # Adjust figure size as needed
# Create the irregular plot
plt.plot(df.index, df['IOT_Sensor_Reading'], marker='o', line, color='red')
# Customize the plot (optional)
plt.xlabel("Time")
plt.ylabel("IOT Sensor Reading")
plt.title("Irregular Plot of IOT Sensor Reading")
plt.grid(True)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()
plt.show()

STEP 6:

Selecting Specific Column

This code selects only the IOT_Sensor_Reading column from the DataFrame df, simplifying the dataset by keeping just the relevant feature.

df = df[['IOT_Sensor_Reading']]
df

The ADF Test

This code carries out the Augmented Dickey-Fuller (ADF) On Time Series IOT_Sensor_Reading for stationarity. The output includes ADF statistics, p-value, and critical values that would help in determining if the given time series is stationary or not.

# Perform the ADF test on the original series and display results
result = adfuller(df['IOT_Sensor_Reading'])
adf_statistic, p_value, _, _, critical_values, _ = result

Data Visualization and Analysis for Original Time Series

This program contains an illustration of IOT_Sensor_Reading data with an improved aesthetic. The output then is a plot showing lines in blue with titles, axis labels, and gridlines to improve readability and understanding of time-series data.

# Plot the original time series with enhanced visualization
plt.figure(figsize=(20, 4))
plt.plot(df['IOT_Sensor_Reading'], color='blue')
plt.title('Original Time Series - IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Sensor Reading')
plt.grid(True)
plt.show()

Presenting the results of ADF tests

This code displays the results of the Augmented Dickey-Fuller (ADF) test, including the value of the ADF statistic, the p-value, and critical values. These results help assess whether the time series is stationary, with lower p-values indicating stronger evidence against the null hypothesis of non-stationarity.

# Display ADF test results for the original series
print('ADF Test on Original Series')
print(f'ADF Statistic: {adf_statistic:.4f}')
print(f'p-value: {p_value:.4f}')
print('Critical Values:')
for key, value in critical_values.items():
print(f'{key}: {value:.4f}')

Differencing the Timeseries and Plotting

This code differences the IOT_Sensor_Reading time series to remove non-stationarity and then drops the NA values. Finally, it visualizes the differenced series in green, helping to show a stationary trend by subtracting the previous value from each data point.

# Difference the time series to remove non-stationarity and drop NA values
diff_data = df.diff().dropna()
# Plot the differenced time series with enhanced visualization
plt.figure(figsize=(20, 4))
plt.plot(diff_data, color='green')
plt.title('Differenced Time Series - IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Differenced Sensor Reading')
plt.grid(True)
plt.show()
# Perform the ADF test on the differenced series and display results
result_diff = adfuller(diff_data['IOT_Sensor_Reading'])
adf_statistic_diff, p_value_diff, _, _, critical_values_diff, _ = result_diff
print('\nADF Test on Differenced Series')
print(f'ADF Statistic: {adf_statistic_diff:.4f}')
print(f'p-value: {p_value_diff:.4f}')
print('Critical Values:')
for key, value in critical_values_diff.items():
print(f'{key}: {value:.4f}')

Executing KPSS Test on Original Series

The Kwiatkowski-Phillips-Schmidt-Shin test has been conducted in this code for the original IOT_Sensor_Reading series to check the stationarity of the series. It returns the KPSS statistic, p-value as well as critical values that are useful in the determinations of whether the series is trend-stationary.

# Perform KPSS test on the original series
kpss_result = kpss(df['IOT_Sensor_Reading'], nlags="auto")
kpss_statistic, kpss_p_value, _, kpss_critical_values = kpss_result

Plotting the Original Time Series

This code visualizes the IOT_Sensor_Reading data superenhanced. It now creates a plot with a red line, a title, an axis label, and grid lines for better clarity to visually analyze the time series during the original state.

# Plot the original time series with enhanced visualization
plt.figure(figsize=(20, 4))
plt.plot(df['IOT_Sensor_Reading'], color='red')
plt.title('Original Time Series - IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Sensor Reading')
plt.grid(True)
plt.show()

Demonstrating Results of the KPSS Test

This code will print the KPSS test results of the original time series, including KPSS statistics, p-value, and critical values. A very low p-value would indicate evidence against the null hypothesis that the time series is stationary, hence, aiding in determining the stationarity of the original series.

# Display KPSS test results for the original series
print('KPSS Test on Original Series')
print(f'KPSS Statistic: {kpss_statistic:.4f}')
print(f'p-value: {kpss_p_value:.4f}')
print('Critical Values:')
for key, value in kpss_critical_values.items():
print(f'{key}: {value:.4f}')

Differencing the Timeseries and Plotting

This code differences the IOT_Sensor_Reading time series to remove non-stationarity and then drops the NA values. Finally, it visualizes the differenced series in pink, helping to show a stationary trend by subtracting the previous value from each data point.

# Difference the time series to remove non-stationarity and drop NA values
diff_data = df.diff().dropna()
# Plot the differenced time series with enhanced visualization
plt.figure(figsize=(20, 4))
plt.plot(diff_data, color='#FF77B7')
plt.title('Differenced Time Series - IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Differenced Sensor Reading')
plt.grid(True)
plt.show()

Kpss Test on Differenced Series

This code runs the KPSS test on the differenced series from IOT_Sensor_Reading to test whether differencing has achieved stationarity in the series. The output from the script includes the KPSS statistic, the p-value, and the critical values; these results can be used as evidence to confirm that the differenced series is now stationary.

# Perform KPSS test on the differenced series
kpss_result_diff = kpss(diff_data['IOT_Sensor_Reading'], nlags="auto")
kpss_statistic_diff, kpss_p_value_diff, _, kpss_critical_values_diff = kpss_result_diff
# Display KPSS test results for the differenced series
print('\nKPSS Test on Differenced Series')
print(f'KPSS Statistic: {kpss_statistic_diff:.4f}')
print(f'p-value: {kpss_p_value_diff:.4f}')
print('Critical Values:')
for key, value in kpss_critical_values_diff.items():
print(f'{key}: {value:.4f}')

Plotting the Autocorrelation Function (ACF)

This piece of code marks the autocorrelation function for the first 30 lag points of the recorded series IOT_Sensor_Reading with a 95% confidence level (alpha \= 0.05). The plot is presented in dodgerblue color for better visibility in exploring the correlation of the time series with its past values.

# Plot the ACF with distinct color adjustments for better visualization
plt.figure(figsize=(12, 5))
plot_acf(df['IOT_Sensor_Reading'], lags=30, zero=False, alpha=0.05, color='dodgerblue')
plt.xlabel('Lag', fontsize=12)
plt.ylabel('Autocorrelation', fontsize=12)
plt.title('Autocorrelation Function (ACF) for IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.grid(True)
plt.show()

Partial Autocorrelation Function (PACF) plotting

This code creates a plot of the Partial Autocorrelation Function (PACF) using the IOT_Sensor_Reading time series up to the first 30 lags with inputs defined by a confidence level equivalent to 95% (alpha=0.05). The plot should use tomato color for better visualization while revealing the partial correlation between the series and its previous values after some intermediate lags.

# Plot the PACF with distinct color adjustments for better visualization
plt.figure(figsize=(12, 5))
plot_pacf(df['IOT_Sensor_Reading'], lags=30, zero=False, alpha=0.05, color='tomato')
plt.xlabel('Lag', fontsize=12)
plt.ylabel('Partial Autocorrelation', fontsize=12)
plt.title('Partial Autocorrelation Function (PACF) for IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.grid(True)
plt.show()

Displaying the DataFrame

This code displays the DataFrame df, showing all the data currently stored in it.

df

Calculating the Autocovariance Matrix

It uses the autocovariance function numpy.cov to calculate the autocovariance matrix by Dataframes df. The matrix essentially holds the covariances between all pairs of the variables, finding out how the numerous features in the data move for time.

# Calculate the autocovariance matrix
autocov_matrix = np.cov(df, rowvar=False)
print("Autocovariance Matrix:")
print(autocov_matrix)

Generating and Plotting White Noise

This Python code creates white noise from the mean and standard deviation of the IOT_Sensor_Reading series and adds this to df as a new column, white_noise. Plotting then shows that white noise using green color and enhancement shows the random waving characteristics over time.

# Regenerate white noise and add it to the DataFrame correctly
white_noise = np.random.normal(loc=df['IOT_Sensor_Reading'].mean(), scale=df['IOT_Sensor_Reading'].std(), size=len(df))
df['white_noise'] = white_noise
# Plot the white noise with enhanced visualization
plt.figure(figsize=(20, 6))
plt.plot(df['white_noise'], color='green', linewidth=1.2)
plt.title('White Noise - IoT Sensor Reading', fontsize=14, fontweight='bold')
plt.xlabel('Time', fontsize=12)
plt.ylabel('Amplitude', fontsize=12)
plt.grid(True, line, alpha=0.7)
plt.show()

Comparing White Noise and IoT Sensor Readings

Both the IOT_Sensor_Reading series and the generated white noise have been plotted on the same graph against time for comparison. White noise is shown in green while sensor readings are shown in red, with a legend and grid to facilitate the visual comparison of their patterns and fluctuations for time.

# Plotting white noise and IoT sensor readings together for comparison
plt.figure(figsize=(20, 6))
df['white_noise'].plot(label="White Noise", color='green', linewidth=1.2)
df['IOT_Sensor_Reading'].plot(label="IoT Sensor Reading", color='red', linewidth=1.2)
plt.title("White Noise vs IoT Sensor Reading Series", fontsize=12, fontweight='bold')
plt.xlabel("Time", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.legend()
plt.grid(True, line, alpha=0.6)
plt.show()

Plot the Autocorrelation of IoT Sensor Readings

This is the code that uses a customized red color in the plot of IOT_Sensor_Reading autocorrelation for better visualization. The plot helps to judge the relationship between the series and its past values with a grid and clearer labeling for easier analysis.

# Plot the autocorrelation with customized color for better visualization
plt.figure(figsize=(20, 4))
autocorrelation_plot(df['IOT_Sensor_Reading'], color='red')
plt.title("Autocorrelation Plot of IoT Sensor Reading", fontsize=16, fontweight='bold')
plt.xlabel("Lag", fontsize=12)
plt.ylabel("Autocorrelation", fontsize=12)
plt.grid(True, line, alpha=0.6)
plt.show()

Autocorrelation Plot for White Noise

The generated white noise time series is autocorrelated, depicted in purple. This would vividly imply that this noise is uncorrelated at the various lags. This display would also include grid lines with suitable labels on axes for sharp readability and comprehension.

# Plot the autocorrelation for the White Noise series
plt.figure(figsize=(20, 4))
autocorrelation_plot(df['white_noise'], color='purple')
plt.title("Autocorrelation Plot of White Noise", fontsize=12, fontweight='bold')
plt.xlabel("Lag", fontsize=12)
plt.ylabel("Autocorrelation", fontsize=12)
plt.grid(True, line, alpha=0.6)
plt.show()

Deleting the White Noise Column

This code deletes the white_noise column from the DataFrame df, removing it from the dataset after it has been used or analyzed.

del df['white_noise']

Creating a Random Walk

This code generates a random walk beginning from 99. For each one of the 1900 iterations, the previous value gets either -1 or 1 added (chosen at random). The resultant sequence simulates a random process with the value fluctuating above and below with each step.

walk = [99]
for i in range(1900):
\# Create random noise
noise = \-1 if np.random.random() \< 0.5 else 1
walk.append(walk\[\-1\] \+ noise)

STEP 7:

Visualizing Random Walk Simulations

Here, a random walk is reflected through its time-variant plot in blue. The plot is then completed with x- and y-axis labels, a title, and grid lines to make the random process easier to visualize.

plt.figure(figsize=(20, 4))
plt.plot(walk, color='blue')  # Added color 'blue'
plt.xlabel("Time")
plt.ylabel("Value")
plt.title("Random Walk Simulation")
plt.grid(True)
plt.show()

Autocorrelation Plot of Random Walk

This code plots an autocorrelation of the sequence at random walks with blue colors for clear reference on how the random walk values correlate with each other through the lags of their past values.

# Plot the autocorrelation for the Random Walk series
plt.figure(figsize=(20, 4))
autocorrelation_plot(walk, color='blue')
plt.title("Autocorrelation Plot of Random Walk", fontsize=12, fontweight='bold')
plt.xlabel("Lag", fontsize=12)
plt.ylabel("Autocorrelation", fontsize=12)
plt.grid(True, line, alpha=0.6)
plt.show()

Plotting IoT Sensor Readings Using a 10-Point Rolling Average

This program will compute the 10-point rolling average for the IOT_Sensor_Reading series and append it as a new rolling_av column to the original data and subsequently plot together the original data in green and the rolling average in red for easy interpretation of the trend and smoothing. The plot will have the title, axis labeling, and a legend to enhance clarity.

# Calculate the 10-point rolling average and store it in a new column 'rolling_av'
df['rolling_av'] = df['IOT_Sensor_Reading'].rolling(window=10).mean()
# Now you can plot the data:
plt.figure(figsize=(20, 6))
df['IOT_Sensor_Reading'].plot(label="Original Data", color='green', linewidth=2)
df['rolling_av'].plot(label="10-Point Rolling Average", color='red', linewidth=2)
plt.title("IoT Sensor Reading with 10-Point Rolling Average", fontsize=12, fontweight='bold')
plt.xlabel("Time", fontsize=12)
plt.ylabel("Reading", fontsize=12)
plt.grid(True, line, alpha=0.7)
plt.legend()
plt.show()

STEP 8:

Building and Fitting the MA Model

This is the code that a Moving Average (MA) model has created using the ARIMA framework with parameters (p=0, d=0, q=1). Fits the model for the data from IOT_Sensor_Reading. It then calculates the fitted values and calculates the Root Mean Square Error (RMSE) between the actual and fitted values before finally printing the model summary that shows some key points of the model fit.

# Create MA model
order = (0, 0, 1)  # (p, d, q) order of the model
model_1 = ARIMA(df['IOT_Sensor_Reading'], order=order)
# Fit the model
model_1_fit = model_1.fit()
# Get the fitted values
fitted_values = model_1_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'], fitted_values))
# Print model summary
print(model_1_fit.summary())
print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the MA model (in red). The resulting plot serves as a visual justification of how well the model fits that data, as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data', color='green')
plt.plot(fitted_values, label='Fitted Values', color='red')
plt.title('Original Data vs. Fitted Values (MA Model)')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

Building and Fitting the MA 2 Model

This is the code that a Moving Average (MA) model has created using the ARIMA framework with parameters (p=0, d=0, q=2). Fits the model for the data from IOT_Sensor_Reading. It then calculates the fitted values and calculates the Root Mean Square Error (RMSE) between the actual and fitted values before finally printing the model summary that shows some key points of the model fit.

# Create MA model
order = (0, 0, 2)  # (p, d, q) order of the model
model_2 = ARIMA(df['IOT_Sensor_Reading'], order=order)
# Fit the model
model_2_fit = model_2.fit()
# Get the fitted values
fitted_values = model_2_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'], fitted_values))
# Print model summary
print(model_2_fit.summary())

Displaying RMSE of the Model

This code prints the Root Mean Squared Error (RMSE) of the ARIMA model, which measures the difference between the actual IOT_Sensor_Reading values and the fitted values

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the MA 2 model (in pink). The resulting plot serves as a visual justification of how well the model fits that data as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Get the fitted values from the model
fitted_values = model_2_fit.fittedvalues
# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data', color='blue')
plt.plot(fitted_values, label='Fitted Values', color='#FF77B7')
plt.title('Original Data vs. Fitted Values')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

Building and Fitting the MA (5) Model

This is the code that a Moving Average (MA) model has created using the ARIMA framework with parameters (p=0, d=0, q=5). Fits the model for the data from IOT_Sensor_Reading. It then calculates the fitted values and calculates the Root Mean Square Error (RMSE) between the actual and fitted values before finally printing the model summary that shows some key points of the model fit.

# Create MA model
order = (0, 0, 5)  # (p, d, q) order of the model
model_3 = ARIMA(df['IOT_Sensor_Reading'], order=order)
# Fit the model
model_3_fit = model_3.fit()
# Get the fitted values
fitted_values = model_3_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'], fitted_values))
# Print model summary
print(model_3_fit.summary())

Displaying the MA(5) Model RMSE

This program will print the Root Mean Squared Error (RMSE) concerning the MA(5) model about the actual IOT_Sensor_Reading compared to the fitted values. The lower the RMSE for the actual values, the better the overall fit of the model.

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the MA 5 model. The resulting plot serves as a visual justification of how well the model fits that data, as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data', color='#4CC9FE')
plt.plot(fitted_values, label='Fitted Values', color='red')
plt.title('Original Data vs. Fitted Values (MA Model)')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

Building and Fitting the AR(1) Model

The following code creates an Autoregressive model with an order of 1 using the AutoReg class. It then fits using the IOT_Sensor_Reading data, calculates the fitted values, and finally calculates the RMSE between actual and fitted values (lagging the last value). The summary outputs the details of the fit by the model.

# Create AR model
order = 1  # Order of the AR model
model_4 = AutoReg(df['IOT_Sensor_Reading'], lags=order)
# Fit the model
model_4_fit = model_4.fit()
# Get the fitted values
fitted_values = model_4_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'][:-1], fitted_values))
# Print model summary
print(model_4_fit.summary())

Diasplaying the AR(1) Model RMSE

This program will print the Root Mean Squared Error (RMSE) of the AR(1) model about the actual IOT_Sensor_Reading compared to the fitted values. The lower the RMSE for the actual values, the better the overall fit of the model.

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the AR(1) model. The resulting plot serves as a visual justification of how well the model fits that data as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data', color='green')
plt.plot(fitted_values, label='Fitted Values', color='#FF8000')
plt.legend()
plt.title('First-Order AR Model')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Building and Fitting the AR(2) Model

The following code creates an Autoregressive model with an order of 2 using the AutoReg class. It then fits using the IOT_Sensor_Reading data, calculates the fitted values, and finally calculates the RMSE between actual and fitted values (lagging the last value). The summary outputs the details of the fit by the model.

# Specify the order of the AR model
order = 2  # Order of the AR model
# Create AR model
model_5 = AutoReg(df['IOT_Sensor_Reading'], lags=[1, 2])
# Fit the model
model_5_fit = model_5.fit()
# Get the fitted values
fitted_values = model_5_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'][:-2], fitted_values))
# Print model summary
print(model_5_fit.summary())

Displaying the AR(2) Model RMSE

This program will print the Root Mean Squared Error (RMSE) of the AR(2) model about the actual IOT_Sensor_Reading compared to the fitted values. The lower the RMSE for the actual values, the better the overall fit of the model.

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the AR(2) model. The resulting plot serves as a visual justification of how well the model fits that data as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data', color='green')
plt.plot(fitted_values, label='Fitted Values', color='red')
plt.legend()
plt.title(f'AR({order}) Model')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Building and Fitting the AR(3) Model

The following code creates an Autoregressive model with an order of 3 using the AutoReg class. It then fits using the IOT_Sensor_Reading data, calculates the fitted values, and finally calculates the RMSE between actual and fitted values (lagging the last value). The summary outputs the details of the fit by the model.

# Specify the order of the AR model
order = 3  # Order of the AR model
# Create AR model
model_6 = AutoReg(df['IOT_Sensor_Reading'], lags=[1, 2, 3])
# Fit the model
model_6_fit = model_6.fit()
# Get the fitted values
fitted_values = model_6_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'][:-3], fitted_values))
# Print model summary
print(model_6_fit.summary())

Displaying the AR(3) Model RMSE

This program will print the Root Mean Squared Error (RMSE) of the AR(3) model about the actual IOT_Sensor_Reading compared to the fitted values. The lower the RMSE for the actual values, the better the overall fit of the model.

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the AR(3) model. The resulting plot serves as a visual justification of how well the model fits that data, as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data')
plt.plot(fitted_values, label='Fitted Values')
plt.legend()
plt.title(f'AR({order}) Model')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Building and Fitting the AR(4) Model

The following code creates an Autoregressive model with an order of 4 using the AutoReg class. It then fits using the IOT_Sensor_Reading data, calculates the fitted values, and finally calculates the RMSE between actual and fitted values (lagging the last value). The summary outputs the details of the fit by the model.

# Specify the order of the AR model
order = 4  # Order of the AR model
# Create AR model
model_7 = AutoReg(df['IOT_Sensor_Reading'], lags=[1, 2, 3,4])
# Fit the model
model_7_fit = model_7.fit()
# Get the fitted values
fitted_values = model_7_fit.fittedvalues
rmse = np.sqrt(mean_squared_error(df['IOT_Sensor_Reading'][:-4], fitted_values))
# Print model summary
print(model_7_fit.summary())

Displaying the AR(1) Model RMSE

This program will print the Root Mean Squared Error (RMSE) of the AR(4) model about the actual IOT_Sensor_Reading compared to the fitted values. The lower the RMSE for the actual values, the better the overall fit of the model.

print(f"RMSE: {rmse}")

Plotting Original Data Compared to Fitted Values

This code depicts both original IOT_Sensor_Reading values and fitted values from the AR(1) model. The resulting plot serves as a visual justification of how well the model fits that data, as one can see original data against its predicted values over time. This is supplemented with appropriate titles, labels of the axes, and legends.

# Plot the original data and the fitted values
plt.figure(figsize=(20, 6))
plt.plot(df['IOT_Sensor_Reading'], label='Original Data')
plt.plot(fitted_values, label='Fitted Values')
plt.legend()
plt.title(f'AR({order}) Model')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

STEP 9:

RMSE-Based Model Comparison

The code creates a dictionary which stores RMSE values for various models (MA(1), MA(2), MA(3), AR(1), AR(2), AR(3), AR(4)). It then converts that dictionary into a DataFrame and sorts it by ascending RMSE values. The resulting table allows for an easy way to compare how well each model performs; lower RMSE values indicate better model fits.

# Create a dictionary to store the model names and their corresponding RMSE values
model_rmse = {
"MA(1)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\], model\_1\_fit.fittedvalues)),
"MA(2)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\], model\_2\_fit.fittedvalues)),
"MA(3)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\], model\_3\_fit.fittedvalues)),
"AR(1)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\]\[:\-1\], model\_4\_fit.fittedvalues)),
"AR(2)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\]\[:\-2\], model\_5\_fit.fittedvalues)),
"AR(3)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\]\[:\-3\], model\_6\_fit.fittedvalues)),
"AR(4)": np.sqrt(mean\_squared\_error(df\['IOT\_Sensor\_Reading'\]\[:\-4\], model\_7\_fit.fittedvalues)),
}
# Create a DataFrame from the dictionary
model_comparison = pd.DataFrame.from_dict(model_rmse, orient='index', columns=['RMSE'])
# Sort the DataFrame by RMSE in ascending order
model_comparison = model_comparison.sort_values('RMSE')
# Print the model comparison table
print(model_comparison)

Visualization of Model RMSE Comparison

This code draws a bar graph comparison of RMSEs of various models, MA- and AR-based. Each colored bar represents a model. RMSE is printed on the top of each bar for better comprehension and comparison of performance across models. The grid lines are included for the y-axis, while the x-axis labels are centered.

# Enhanced visualization with different colors for each bar in the RMSE comparison plot
plt.figure(figsize=(16, 6))
bar_colors = ['#5DA5DA', '#FAA43A', '#60BD68', '#F17CB0', '#B2912F', '#B276B2', '#DECF3F']
plt.bar(model_comparison.index, model_comparison['RMSE'], color=bar_colors)
plt.xlabel("Model", fontsize=12)
plt.ylabel("Root Mean Squared Error (RMSE)", fontsize=12)
plt.title("Model Comparison based on RMSE", fontsize=12, fontweight='bold')
plt.xticks(ha='center')
plt.grid(axis='y', line, alpha=0.5)
# Adding values on top of the bars for clarity
for idx, value in enumerate(model_comparison['RMSE']):
plt.text(idx, value \+ 0, f'{value:.6f}', ha='center', va='bottom', fontsize=12)
plt.show()

Conclusion

The purpose of this project was to show how time series analysis can be used to predict IoT sensor readings. To maintain temporal dependencies and to discover trends, seasonality, and noise in sensor data, we have implemented various models such as Moving Average (MA) and Autoregressive (AR). We verified that the data was stationary and suitable for forecasting, ensuring that it was sufficient for performing such an analysis, via performance assessment by use of metrics such as Root Mean Square Error (RMSE) and statistical tests like the Augmented Dickey-Fuller (ADF) test. Autocorrelation and rolling average visualizations gave us clearer insights into the behavior of the data. This work goes beyond sensor data from IoT sensors and has applications in predictive maintenance, energy consumption forecasting, and anomaly detection. This project lays the groundwork for more accurate, data-driven predictions and more statistically informed decisions in the real-world environment.

Challenges New Coders Might Face

  • *Challenge***: Handling missing data
    Solution
    :** Forward filling or interpolation methods can take care of smooth transitions in data without loss of trends.

  • Challenge: Stationarity Issues
    Solution: Transformation like differencing or log transform will be used to make stationary time series or use ARIMA model that takes account of it.

  • Challenge: Overfitting Model
    Solution: Regularly checking model residuals by using ACF plot-s in the model should use a well-suited p, d, and q from AIC and BIC criteria to avoid complex models.

  • Challenge: Model Selection
    Solution: Different comparisons of these models should be done based on RMSE, Log Likelihood, and residual analysis to select the best model fitted according to performance metrics.

FAQ

Question 1: What is the time series analysis in terms of the IoT sensor readings?
Answer: Time series analysis involves understanding the data with time points and recognizing patterns, trends, or seasonality while detecting dependencies. In the case of IoT sensor readings, it predicts future values based on past data.

Question 2: How do I manage missing values in time series data?
Answer: Handling missing values from time series data could be done by forward filling, backward filling, or interpolation. These methods will not affect your analysis and predictions.

Question 3: What is the use of the Augmented Dickey-Fuller (ADF) test?
Answer: The ADF test checks the time series for stationarity. It is critical to have this before the execution of ARIMA-style models. If not, then it gets transformed by differencing or some other process to make it fit for modeling.

Question 4: What do you think is the necessity of differencing in time series?
Answer: Differencing is very important as it helps to modify non-stationary data, which forecasting processes most of the time demand be stationary.

Question 5: How do I choose the best time series model?
Answer: To select the most appropriate time series model, one could use comparative evaluation metrics such as Root Mean Squared Error (RMSE), AIC, or BIC to compare different models. Autocorrelation plots and residual analysis also help in identifying the most accurate model.

Question 6: How can I make my time series data stationary?
Answer: By differencing and logarithmic scaling, you can use the different transformations to make time series data into a stationary one. This stationarity is helpful in forecasting accurately with the help of models like ARIMA.

Code Editor