The observation points in the East Sea include four locations besides Gyeongpodae Beach, and three locations excluding Ulleungdo are beaches. In the South Sea, there are five locations besides Haeundae Beach. Moreover, in both the Jeju Coastal Sea and the West Sea, observation points are distributed across three locations. Data acquisition intervals of the data varied across different observation points (
Table 1). Using a water depth of 30 m as a reference, observation locations with a larger water depth were classified as offshore sea, whereas observation locations with a smaller water depth were classified as nearshore sea (
Froehlich et al., 2017). In the case of beaches, data were measured at intervals of 5 min for all the beaches except for Haeundae Beach, where data were measured at intervals of 1 min. At observation points near islands, data were measured at intervals of 10 min. In the eastern part of the South Sea and the southern sea of Jeju, which are offshore seas, data were measured at intervals of 30 min. The data consist of the measurements of the current speed (CS), current direction (CD), water temperature (WT), salinity, significant wave height (SWH), significant wave period (SWP), maximum wave height (MWH), maximum wave period (MWP), wave direction (WD), wind speed (WS), wind direction (WD), air temperature (AT), and air pressure (AP), taken from 2012 to 2021. To train and evaluate the model used to predict significant wave heights and significant wave periods, the holdout cross-validation was used to divide the dataset into the training, validation, and test sets. The training set consisted of the data collected in the East Sea, the South Sea, and the Jeju Coastal Sea, for a total period of eight years from 2012 to 2019. However, in the case of the West Sea, data were not provided between 2012 and 2014. Hence, the data consisted of the measurements taken from 2015 onward. To evaluate the performance of the trained model and determine the optimized hyperparameters, the validation set consisted of the data collected in 2020. The test set consisted of the data collected in 2021 and was used to perform the final evaluation of the model. To ensure temporal continuity and maximize the utilization of the data, the model was trained by sliding one day at a time for various window sizes. The wind direction and current direction, which indicate the direction in the data, were in polar coordinates. To solve the discontinuity between 0° and 360°, the polar coordinates were converted into Cartesian coordinates (x, y) and used as input variables (
Park et al., 2021). The wave direction had a high proportion of missing values in all sea areas, with 37% in the East Sea, 23% in Jeju, 60% in the South Sea, and 88% in the West Sea. Hence, it was excluded from the input variables. Furthermore, the salinity data were not available for the years corresponding to the validation and test sets; hence, salinity was also excluded from the input variables.
Fig. 3 shows the analysis of the input data from Saengildo in the South Sea for 2021. In the case of water temperature and air temperature, values below −50 °C and above 50 °C were deemed unrealistic. Hence, these values were removed to handle the outliers. In addition, data from the time period in which outliers (“0,” “NaN,” “-,” “99.99”) occurred owing to problems with the observation equipment were removed. The water temperature values ranged between 7 °C and 30 °C, and the air temperature values ranged between −10 °C and 30 °C (
Figs. 3 (a) and (b)).
Fig. 3(c) shows a rose diagram of the wind direction and wind speed, whereas
Fig. 3(d) shows a rose diagram of the current direction and current speed. True north (0°) is the reference point in the rose diagrams, and the rose diagram is divided into east (90°), south (180°), and west (270°) in the clockwise direction.
Fig. 3(e) shows the distribution of air pressures. There are no outliers, and the air pressure values are distributed between 1000 hPa and 1035 hPa. At Jungmun Beach, the current direction is distributed in the west and northeast directions, and the current speed is evenly distributed at over 0.7 cm/s. On the other hand, the wind direction is in the northwest and east directions, and the wind speed is uniformly distributed at 11 m/s or lower.
Table 2 shows the distribution of significant wave heights with intervals of 1 m and significant wave periods with intervals of 3 s. In the case of significant wave heights, significant wave heights within 1 m account for the majority, with a proportion of 90.55%, whereas waves over 3 m account for a very small proportion at 0.5%.