Apple Stock price prediction using a stacked LSTM model

Aftab Gazali
5 min readApr 29, 2021


A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks.

Long short-term memory (LSTM) is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single data points, but also entire sequences of data.

please refer to the below links for an in-depth explanation of LSTMs


The dataset used here is the Apple stock price which you can get on yahoo finance here you can change the time period of the dataset according to your purpose and download it. The model is trained on the stock data from 3rd January 2017 to 31st December 2020. Our plan of attack is to take the stock price of 3rddays and try to predict the stock price for every 4rth day.


There are 1006 rows in our dataset, we have divided it into 65% containing training sample and the rest 35% as the testing sample. The plot given below contains Training sample in blue color and testing in green color.

plt.figure(figsize=(20,8))plt.plot(df.Close[:654],color='blue',label='Training data')plt.plot(df.Close[654:1006],color='g',label='Testing data')plt.xlabel('time')plt.ylabel('Stock Price')plt.title('Training and Testing Stock price')plt.legend()

Data Preprocessing

  1. Feature Scaling

Feature Scaling is an important step of data preprocessing, If feature scaling is not performed, a machine learning algorithm would consider larger values to be higher and smaller values to be lower, regardless of the unit of measurement. The scaler used here is the Traditional MinMax Scaler. Minmax scaler Transform features by scaling each feature to a given range[0,1].

from sklearn.preprocessing import MinMaxScalersc=MinMaxScaler(feature_range=(0,1))df=sc.fit_transform(np.array(df).reshape(-1,1))df

2. Splitting the Data

Splitting our dataset into 65% of train data and the remaining 35% of test sample

Since our dataset contains 1006 rows therefore 654 rows for the training sample and the rest 352 rows for the testing sample

##splitting dataset into train and test splittraining_size=int(len(df1)*0.65)test_size=len(df1)-training_sizetrain_data,test_data=df1[0:training_size,:],df1[training_size:len(df1),:1]

3. Preparing the Data for our model

Since we want to predict the stock price at a future time. We’ll predict the stock price at time t+1 relative to the stock price at time t. LSTM Architecture's memory is maintained by setting the time step, basically how many steps in the past we want the LSTM model to use

This time step could be any number say 3..4 or 50 or 60.

## We are going to take previous 3 days stock price and predict ##the price on the 4rth day hence time_step=3def create_dataset(dataset, time_step=3): dataX, dataY = [], []for i in range(len(dataset)-time_step-1): a = dataset[i:(i+time_step), 0]    dataX.append(a) dataY.append(dataset[i + time_step, 0])return np.array(dataX), np.array(dataY)time_step =100X_train, y_train = create_dataset(train_data, time_step)X_test, y_test = create_dataset(test_data, time_step)print(X_train[0])
these are the first 3 days price

In an LSTM network the input for each layer needs to contain the following information:

1. The number of observation

2. The time steps

3. The numerical features

Therefore we need to add a temporal dimension compared to a classical network

##reshape input to be [samples, time steps, features] which is required for LSTMX_train =X_train.reshape(X_train.shape[0],X_train.shape[1] , 1)X_test = X_test.reshape(X_test.shape[0],X_test.shape[1] , 1)

Therefore our Model would contain 650 training samples and the rest 349 testing sample

Building the Model

A Stacked LSTM architecture can be defined as an LSTM model comprised of multiple LSTM layers. An LSTM layer above provides a sequence output rather than a single value output to the LSTM layer below. Specifically, one output per input time step, rather than one output time step for all input time steps.

from keras.models import Sequentialfrom keras.layers import LSTM, Dense

After Hyperparameter tuning we can build our model with 4 LSTM layers each containing the units 50,100,150,200 and a return function of true to pass the updated weight from one neuron to another also our model would contain dropout to reduce the overfitting, we would use adam optimizer and a loss function of mean_squared_error.

## First layermodel=Sequential()model.add(LSTM(units=50,input_shape=(X_train.shape[1],X_train.shape[2]),return_sequences=True))model.add(Dropout(0.2))## Second Layermodel.add(LSTM(units= 100,return_sequences=True))model.add(Dropout(0.4))## Third layermodel.add(LSTM(units= 150,return_sequences=True))model.add(Dropout(0.5))## Fourth layermodel.add(LSTM(units= 200,return_sequences=False))model.add(Dropout(0.6))## Output Layermodel.add(Dense(units=1))model.compile(optimizer='adam',loss='mean_squared_error')model.summary(),y_train,epochs=100,batch_size=64,validation_data=(X_test,y_test),verbose=1)

Model Evaluation

We can evaluate our model accuracy by plotting the predicted price of our model and the actual price

plt.figure(figsize=(20,6))plt.plot(y_test, color='r',label='Real Price')plt.plot(test_predict,color='g',label='Predicted Price')plt.title('Stock Price Prediction')plt.xlabel('Time')plt.ylabel('Stock Price')plt.legend()

The red color plot is the actual price for those 349 rows and the green color plot is the predicted price for those 349 rows. We can see that our predicted price has performed better and has predicted exact values at some places. Our model can also detect sharp increases or decreases in the price correctly. We can further tune our model by changing the hyperparameter but that’s another story for another time.