### 1. Introduction

### 2. Collection of Metocean Data and Statistical Analysis

### 3. Machine Learning (ML) Methodology

*N*,

*Ŷ*, and

_{i}*Y*represent the total number of data points in the dataset, predicted value, and measured value, respectively.

_{i}### 3.1 Input Layer Selection

*p*and

_{i}*q*represent individual values of the metocean data for calculating the correlation coefficient, while

_{i}*p̄*and

*q̄*are the average values of the selected metocean data. Here,

*n*denotes the number of metocean data.

*Hs*), a predictor variable, was most correlated with the maximum wave height (

*Hmax*), with a correlation coefficient of 0.97. The

*Hmax*was followed by the significant wave period (

*Ts*) and the maximum wave period (

*Tmax*) in the correlation coefficient order. However, the wave data with the same characteristics as the significant wave height were not used as the input data in this study. Instead, in addition to the wave data, the remaining environmental variables were adopted to devise an ML model for predicting the significant wave height. Excluding the wave data, the order of the absolute value of the correlation coefficient, from the highest to lowest, is wind speed, wind direction, current direction, and water temperature. Based on this result, the input variable conditions were divided into three categories in Table 2. In addition, to solve the discontinuity problem with the direction (0°–360°) for the current and wind directions, the method of expressing the direction was changed from the polar coordinate system to the Cartesian coordinate system (x, y), using Eq. (6). Afterward, the current and wind directions were adopted as the input variables (Table 2). The input variables were standardized using the feature scaling method (Eq. (7)). The gradient descent method was optimally applied by making the features of the distribution between the input variables the same.

*θ*denotes the angle, while

*X*in Eq. (7) represents the input variable. In addition,

*μ*and

*σ*represent the mean and standard deviation, respectively. The input variables were categorized into 3, 5, and 10, and the category with three input variables comprised the wind speed and wind direction. This is because wind speed and wind direction are the most important factors for predicting waves in the FNN (Mohjoobi et al., 2008). Therefore, it is determined that estimating the wave height with only wind data is a substantially rational approach. The categories with 5 and 10 input variables were classified to identify the effect of the correlation coefficient between the output and input variables.

### 3.2 Hidden Layer Selection

### 4. Results of Significant Wave Height Predictions Using the ML Model

*r*> 0.8) with the significant wave height. In the FNN (W48), the MAE tends to increase as the number of input variables gradually increases. However, the input variables (5 and 10) yield the same MAE in the case of the LSTM (W48).

### 5. Conclusion

*r*< 0.1) were adopted, the MAE exhibited a tendency to increase. In the comparison of the FNN (W1) and the FNN (W48), which are the same FNN models, the FNN (W48) exhibited a smaller MAE for the test set. In the comparison between the FNN (W48) and LSTM (W48), using two models with the same window size, the LSTM (W48) exhibited a slightly smaller mean absolute error for the test set. However, when the MAE was compared based on the SS, the FNN (W48) with input variable 3 demonstrated better results between the SS3 and SS7 grades, except for the SS2 grade. In addition, the FNN (W48) was twice as fast as the LSTM (W48) in terms of computation time. Therefore, by comprehensively considering factors such as the accuracy of significant wave height predictions and computation speed, the FNN (W48) was evaluated to be the suitable ML model for predicting significant wave heights in the Korea Strait. When predicting significant wave heights, selecting input variables using correlation coefficients can produce outstanding results in machine learning. In addition, it is determined that optimal prediction models can be created using only wind data (e.g., wind speed and wind direction). However, the prediction accuracy was slightly lower in high wave areas with a significant wave height of 4 m or higher. It is inferred that the high wave prediction exhibits lower performance because the amount of high-wave data owing to typhoons is insufficient. To address this problem, it is necessary to expand the high-wave data when typhoons occur or develop an ML model that efficiently utilizes limited high-wave data. Finally, The ML model that predicts significant wave heights using only wind data can be utilized at the practical work where is adjusting the engine’s power considering the added resistance of the ship, owing to the waves according to the SS, during sea trials. In the future, we plan to continue our research and enhance the accuracy of the model for predicting significant wave heights in the Korea Strait by adopting the data obtained from other oceanographic buoys near the Korea Strait oceanographic buoy, or the hindcast data, as input variables.