### 1. Introduction

### 2. Prediction Model

*W*denotes the weight matrix used to update the hidden state, and

_{h}*b*denotes the bias of the hidden state. tanh is a hyperbolic tangent activation function, which limits input values to a range between −1 and 1. RNNs calculate the hidden state of the current time

_{h}*h*by combining the current input

_{t}*x*and the hidden state of the previous time

_{t}*h*

_{t}_{−1}.

*h*is then passed on to the next time step to maintain continuous information. However, the vanishing gradient and exploding gradient problems occur in RNNs, depending on the values of the weights. To solve these problems, Hochreiter and Schmidhuber (1997) introduced a gate mechanism and proposed an LSTM, which learns data with long-term dependencies more effectively than RNNs. However, the LSTM had several limitations, including the absence of the forget gate, complexity, and computational cost. Gers et al. (2000) proposed an improved version of LSTM that has been used to date. Fig. 1(b) shows the structure of an LSTM, which consists of a forget gate, an input gate, an output gate, and an update cell state. This structure is an enhanced version compared with that of RNNs. The forget gate determines whether to preserve or delete the information from the cell state of the previous time, and the output is calculated by combining the current input and the hidden state of the previous time. The input gate plays the role of determining how much to update the cell state at the current time step, and it consists of the output of the input gate and a candidate for the cell state. The cell state adds the current input to the past information to represent the overall state at the current time. The output gate plays the role of determining the hidden state based on the cell state at the current time, and it is composed of two main elements: the output of the output gate and the determination of the hidden state at the current time. Finally, in the determination of the hidden state, the hidden state at the current time is determined by performing an element-wise multiplication of the output of the output gate and the cell state at the current time via tanh. LSTMs have three gates and a cell state. Hence, they have the drawback of having a complex structure, which increases the training time and computational complexity. Furthermore, there is a risk of overfitting if the amount of data is small. To address these issues of LSTMs, Chung et al. (2014) proposed the GRU, which only has a reset gate, an update gate, and a hidden state. Fig. 1(c) shows the structure of a GRU, which consists of two gates and a hidden state. The reset gate plays the role of determining whether the information from the previous hidden state will be reflected in the calculation of the current hidden state, whereas the update gate determines which information from the previous hidden state will be retained and which information from the previous hidden state will be replaced with the new hidden state. The candidate hidden state is determined by combining the current input with the previous hidden state adjusted by the reset gate. Finally, the GRU determines the final hidden state by performing an element-wise multiplication of the update gate result from the previous hidden state and the update gate result from the candidate hidden state and adding them.

_{t}*n*denotes the number of input units that are fed into the model, and

_{input}*n*denotes the number of hidden state units for the model. The number of learning parameters for all the models is calculated by combining

_{hidden}*n*and

_{input}*n*, and this number varies according to the structure of the model. Comparing Eq. (2) with Eq. (3), it is observed that a GRU, a simplified structure of an LSTM, has 25% fewer learning parameters than the LSTM and is computationally more efficient.

_{hidden}