COVID-19: Using AI to Predict Stock Market Movement

4 min readDec 1, 2020

Written July 2020

With the spread of COVID-19, global stock markets have declined significantly. U.S indices including the S&P 500, Dow Jones and NASDAQ have dropped close to 30%, bringing us to values which were previously observed in 2017.

GSPC, DJI, IXIC Index Values (Yahoo Finance, 2020/03/31)

From prior crashes we know that large drops in the stock market results in excellent investment opportunities. But how do we know the right time to take a shot and buy some stocks 🤔?

This past week, I challenged myself to train an AI to predict the S&P 500’s movement based on data from past market crashes. If you are interested in the process/programming behind the AI, I will be breaking it down in the next section. Otherwise, you can skip to the end to see the final result!

Programming the AI (Python)

GitHub: https://github.com/Vedant-Gupta523/corona-ai

The Data Set

The data I used consists of the S&P 500 index value for the following crashes:

The Wall Street Crash (1929)
The 73–74 Market Crash (1973)
Black Monday (1987)
The Dot Com Bubble (2000)
The Financial Crisis (2007)
COVID-19 (2020)

For each of the above, I used values starting from 100 days before the pre-crash peak up until the index regained its initial value (the current value in the case of COVID-19).

Objective: Train a Support Vector Regressor (SVR) to predict the next index value given the 30 prior values.

Pre-processing The Data

I start by importing all of the libraries I will use. I then create a Pandas DataFrame out of my data set.

# Importing libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
import matplotlib.pyplot as plt# Get data from csv file
dataset = pd.read_csv("data.csv")

A good machine learning practice is to scale the data we train on. This is because normalization helps weigh all of our features equally. In some cases, it can also help speed up the calculations our model performs!

With the data from the DataFrame I create two lists, a scaled data set for training our model and an unscaled data set for graphing/visualization. Additionally, I break the scaled data set into our training data (all crashes excluding COVID-19) and our test data (COVID-19).

# Create scaled/unscaled datasets, divide into train and test data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_dataset = []
unscaled_dataset = []
for crash in list(dataset)[1:]:
    data = dataset.filter([crash])
    scaled_dataset.append(scaler.fit_transform((data.values)))
    unscaled_dataset.append(data.values)
for i in range(len(scaled_dataset)):
    scaled_dataset[i] = np.reshape(list(filter(lambda x: x==x, scaled_dataset[i])), (len(list(filter(lambda x: x==x, scaled_dataset[i]))), 1))
    unscaled_dataset[i] = np.reshape(list(filter(lambda x: x==x, unscaled_dataset[i])), (len(list(filter(lambda x: x==x, unscaled_dataset[i]))), 1))
train_data = scaled_dataset[:-1]
test_data = scaled_dataset[-1]
unscaled_test_data = unscaled_dataset[-1]

Next we separate our independent variable (prior 30 index values) from our dependent variable (next day’s index value). This is done for both our training set and our test set.

# Number of prior values (can be changed)
batch_size = 30# Split data
x_train = []
x_test = []
y_train = []
y_unscaled_test = []
y_scaled_test = []for crash in train_data:
    for i in range(batch_size, len(crash)):
        x_train.append(crash[i-batch_size:i, 0])
        y_train.append(crash[i, 0])for i in range(batch_size, len(test_data)):
    x_test.append(test_data[i-batch_size:i, 0])
    y_unscaled_test.append(unscaled_test_data[i, 0])
    y_scaled_test.append(test_data[i, 0])

Finally, we train our SVR on the training data and make predictions 😎.

# Fitting SVR to the dataset
regressor = SVR(kernel = "rbf")
regressor.fit(x_train, y_train)# Making predictions on test data
y_pred = []
for test_case in x_test:
    y_pred.append(regressor.predict([test_case]))
y_pred = scaler.inverse_transform(y_pred)# Making predictions beyond known data
x_future_test = x_test[-1][1:]
x_future_test = [np.append(x_future_test, y_scaled_test[-1])]
future_preds = []
for i in range(future_prediction_size):
    future_preds.append(regressor.predict([x_future_test[i]]))
    x_future_test.append(np.append(x_future_test[i][1:], future_preds[i]))
future_preds = scaler.inverse_transform(future_preds)

Bonus: Graph the results

# Graphing predictions
plt.title("COVID-19 Crash Analysis")
plt.xlabel("Days from Crash")
plt.ylabel("S&P 500")
plt.plot([x for x in range(-99, len(y_pred) - 99)], y_pred, color = "orange")
plt.plot([x for x in range(-99, len(y_pred) - 99)], y_unscaled_test, linewidth=1)
plt.plot([x for x in range(len(y_pred) - 99, len(y_pred) - 99 + future_prediction_size)] , future_preds, color = "red")
plt.show()

The Results

Blue — Actual values, Orange — Predicted values based on prior 30 actual values, Red — Predicted values based on prior 30 predicted values

After running the code, the AI outputs the above graph with a one month prediction. It suggests that the S&P 500 index will rise for the next two weeks and then decline the following two weeks.

Disclaimer: I am not suggesting that these predictions will be accurate or even close. Making predictions with AI has both pros and cons:

Pros

AI is excellent at finding mathematical correlations in previous events and applying them to predict a new situation.

Cons

Stock prices are affected by countless factors which are next to impossible to predict with AI.
The further into the future we try to predict, the more inaccurate our predictions become. This is because predictions that are farther out start to be based on previously predicted values.

Conclusion

Despite staying at home, this past week has been an amazing learning experience. I look forward to taking up more programming projects while I have all this extra time on my hands 🙌.

If you have any questions or you just want to say hello, feel free to email me at vedantgupta523@gmail.com 😃.