1. Introduction
The rising intensity and frequency of typhoons caused by climate change, such as global warming, have led to a rise in the frequency and scale of coastal disasters on the Korean Peninsula. These disasters, which affect major national infrastructure and coastal areas of large cities, are primarily caused by storm surges, inundation, coastal erosion, and swells (
KIOST, 2022). Long-term observations and analysis of waves, which are the primary external force behind coastal disasters, are essential for researching the prediction of these disasters and mitigating their resulting damage (
Shim and Min, 2007). Therefore, the long-term weather and ocean observation data from the Ieodo Ocean Research Station can be used for typhoon prediction. The station’s location in the open sea, where it remains unaffected by the land and lies on the paths of most typhoons passing through the Korean Peninsula, makes its data particularly valuable (
Mun et al., 2007). Therefore, the wave observation data from the Ieodo Ocean Research Station are highly significant for calculating the design conditions of coastal and offshore structures and for predicting wave behavior during extreme weather conditions (
Lee et al., 2007). Established in 2003, the Ieodo Ocean Research Station (IORS) is Korea’s first ocean research facility. The station is located 149 km southwest of Jeju Marado Island, the southernmost point of Korea, and is dedicated to observing oceanic, weather, and environmental factors (
Shim and Chun, 2004).
In Korea, wave data are currently gathered through direct observations using buoys or pressure-type wave gauges, as well as through remote observations using radar (
Jeong et al., 2018). Direct observation involves low installation costs and provides high accuracy, but it has the risk of damage or loss during severe weather conditions, such as typhoons, and poses maintenance challenges, making it unsuitable for installation in open-sea areas. In contrast, remote observations allow for easy maintenance and stable long-term monitoring even in severe weather conditions, but they provide less accurate results than direct observations. Accordingly, the IORS, a jacket-type structure installed in the open sea 149 km from land, observes waves using the Wave and Current Radar (SM-050, hereafter “MWR”) from MIROS, considering the observation conditions rather than direct observation methods.
The MWR, as previously noted, necessitates minimal maintenance and enables wave observation in severe weather, but also presents a limitation where it may yield exaggerated data in low-wave conditions because of the reduced wind due to its utilization of the backscattering microwaves during wave observation (
Min et al., 2018). Studies have been conducted on the MWR at the Socheongcho Ocean Research Station in 2018 and at Dokdo Island in 2020 to overcome such drawbacks (
Min et al., 2018;
Jun et al., 2020).
Min et al. (2018) reanalyzed the wave data observed at the Socheongcho Ocean Research Station for approximately 2.5 months from January 2015 using MWR System Software (SW-002) v4.00. On the other hand, the Socheongcho Ocean Research Station’s location, located further from typhoon trajectories, lacks the observation of high waves with a significant height of 3 m or more. Consequently, studies on the reliability of wave observations during extreme weather events could not be conducted.
Jun et al. (2020) reassessed wave data spanning two years from October 2017, using an advanced version filter, SW-002 v4.10, on the raw observations from the MWR at Dokdo Island. The reliability of the data was enhanced by integrating the spike test algorithm (
OOI, 2012) from the Ocean Observatories Initiative (OOI) and the H-Ts quality control method, a novel approach for wave data quality control. After applying three quality control methods, the wave observation data from the MWR exhibit a certain level of reliability for significant wave heights. Nevertheless, there were still errors, particularly those occurring for high wave events exceeding 3 m.
Highly reliable wave observation data from the MWR at the IORS, which holds significant value for typhoon research, were regenerated by collecting the raw observation data of the MWR from the station spanning May 19, 2013, to September 8, 2021. In addition, the reliability of the raw MWR data was analyzed by comparing it with data from the Jeju Southern Ocean buoy, operated by the Korea Hydrographic and Oceanographic Agency and located near the IORS. The errors remaining in the raw data were addressed by reanalyzing the wave data, primarily using the MWR System Software (SW-002) filter, which has been used in previous studies. As reported elsewhere, applying the SW-002 filter alone, commonly used for processing MWR wave observation data, does not ensure reliability for high wave events and significant wave periods. Therefore, this study proposes using an artificial neural network (ANN) technique, which is particularly effective for identifying and reproducing the nonlinear and potential correlations between the input and output values of MWR wave observation data applied with the filter. The correlation between 40 parameters of MWR and wind speed data of the IORS anemometer for buoy data was analyzed to select the proper input values for effectively training ANNs. Stratified sampling was then applied to the segmented training and test data to enhance the applicability and generalization performance of ANNs. The optimization process for hyperparameters, including the learning rate, batch size, and ANN architecture, along with the analysis of the results, were explained.
2. Wave Radar Observation and Verification
2.1 Wave Radar Observation
The MWR at the IORS is installed on the southeast side of the roof deck, positioned approximately 34.8 m above the sea level, and has been conducting remote wave observations since 2003 (
Fig. 1). The MWR is remote observation equipment (
MIROS, 2011) for observing the gravity surface wave spectrum;
Table 1 lists its specifications. The MWR consists of six antennas that sequentially take observations every 30°, covering a half-circle with a range of 210 m from the antenna. On the other hand, the observation is omnidirectional (360°) because it captures the incoming and outgoing waves. Each antenna sequentially irradiates microwaves corresponding to the C-Band (5.8 GHz) toward the sea surface at a 10° angle from the horizontal. The irradiated microwaves are then backscattered from the sea surface. The radar echo travels at the speed of water particles and is modulated in amplitude and phase according to the speed of the ocean current (Doppler effect). A strong radar echo is formed when the radar wavelength becomes twice the period of the reflected signal; since the radar wavelength is 5.17 cm (5.8 GHz), the actual scattered signals of the radar are from capillary waves caused by wind, having a wavelength close to 2.6 cm. Therefore, observations should be taken in an environment where the wind speed is at least 3 m/s because capillary waves are not generated without wind and are difficult to observe. The backscattered radar echoes were then collected for 128 seconds (2 Hz, total of 256 samples) and saved as a Doppler time series. The process takes between 12 and 15 minutes to complete one full observation of 180°. Subsequently, once the observation was complete, one wave spectrum was generated using the pulse Doppler method based on the data collected from the six antennas. After generating one wave spectrum, the observation data from the five previous antennas is combined once the observation from each antenna is completed, generating a wave spectrum every 2.5 minutes. The generated spectrum was saved as raw data in a DF025 file and sent to SW-002, the processing software of MIROS, to be calculated as a 2D spectrum (DF038) and wave parameter (DF037) in real time. During the calculation process, the quality of the observation data can be enhanced using the filter embedded in the software. The highly dependable wave data from past observations were reconstructed by manually conducting parameterization and applying filters using the SW-002 software to the raw MWR data.
2.2 Wave Radar Accuracy Verification
This study verified the accuracy of the MWR wave observation data collected at the IORS over nine years (2013–2021) by comparison with the data from the Jeju Southern Ocean Buoy provided by the Korea Hydrographic and Oceanographic Agency (KOOFS, 2023). The Jeju Southern Ocean Buoy is a large-scale marine observation buoy mooring approximately 168 km from the IORS (
Fig. 1), and its specifications are provided in
Table 1. The wave data collected by a wave sensor (MOSE-G100) installed on the buoy were processed every 30 minutes by a program, saved in a data logger, and transmitted to a management system in real time (
Datawell, 2009). For a more accurate comparison, data from a buoy located closer to the MWR would be ideal. On the other hand, a comparative analysis was conducted using these MWR data because the Jeju Southern Ocean Buoy 1 data are the only long-term buoy observation data available for comparison with the IORS and are the nearest.
The ERA5 reanalysis data provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) were used to verify the suitability of using the Jeju Southern Ocean Buoy data for comparison. These data were used to derive the comparison results for significant wave heights and statistical measures between the Jeju Southern Ocean Buoy 1 data and the IORS data, as shown in
Fig. 2. The statistical measures included the Pearson correlation coefficient (R), root mean squared error (RMSE), and bias. Each statistical measure was derived using the following equations:
where
x is the actual value (buoy),
y is the comparison value (MWR),
x̄,
ȳ refer to the mean of each value, and
n refers to the total amount of data.
The comparison showed that the RMSE and bias for the significant wave height of the ERA5 reanalysis data for the two stations were 0.56 m and 0.26 m, respectively, indicating the presence of some errors. These errors can be attributed to regional differences, such as the 168 km distance between the two stations and the 90 m difference in water depth (127 m for the Jeju Southern Ocean Buoy and 40 m for the IORS), highlighting the limitations of this study. The correlation coefficient of the significant wave height was 0.87, indicating a strong correlation between the two stations. The wave characteristics between the two stations are not considerably different, as shown in
Fig. 2. Considering that the Jeju Southern Ocean Buoy is the only long-term observation station available for comparing the nine years of MWR raw data collected near the IORS, this study used the Jeju Southern Ocean Buoy data to enhance the reliability of the MWR regeneration data from the IORS.
The Jeju Southern Ocean Buoy wave data have been available since 2014, but the MWR raw data cannot be obtained from January to March 2014. Therefore, the comparison was conducted using wave data from March 13, 2014, to September 8, 2021. Both sets required identical time intervals to compare the observation data with the buoy data. The MWR wave data, initially recorded at two to three-minute intervals, had to be converted to match the 30-minute intervals of the buoy data. On the other hand, converting nine years’ worth of data during preprocessing is extremely time-consuming. Therefore, the comparison was conducted using one-hour intervals for both data sets. Because the MWR data are not precisely divided into one-hour intervals, they were averaged into one-hour intervals to compare the significant wave height and period with the Jeju Southern Ocean Buoy data, which are recorded at one-hour intervals (
Fig. 3).
As previously mentioned, the accuracy of MWR observation data diminishes in low-wave environments with little wind because of backscattering microwaves. The accuracy of MWR wave data was assessed using wind speed data from an IORS anemometer provided by the Korea Institute of Ocean Science and Technology ocean research station (
Fig. 3(c)). The comparison showed that observation data when the wind speed was 3 m/s or below, tended to be overestimated compared to the buoy data, a trend also noted in previous studies. In addition, more spikes were evident in the significant wave period data than in the significant wave height data. On the other hand, the number of spikes is relatively smaller than the MWR data from the Socheongcho or Dokdo Ocean Research Stations in previous studies. This suggests that the data quality is better, likely due to the more suitable height at which the wave and current radar are installed above the sea level. MIROS, the manufacturer of the MWR, recommends installing the wave and current radar 25 to 80 m above mean sea level, at a 10° angle from the sea surface, with an observation radius of 170 to 450 m from the radar (
MIROS, 2011). The suggested installation ranges are based on the radar scanning range of the sea surface, with a maximum footprint length of 75 m. Therefore, if the MWR is installed at 80 m or higher and the observations are made at the recommended 10° angle, the distance to the footprint becomes excessively long. This results in a lower reception rate of radar signals reflected off the sea, particularly when the wind speed is 3 m/s or below. The correlation between installation height and observation accuracy was examined by comparing the significant wave height and installation height of the MWRs analyzed in previous studies with those of MWRs installed at the IORS (
Table 2). As the installation height increased, the accuracy of the significant wave height decreased due to the elongation of the observation range (distance to the footprint). Therefore, The MWR at the IORS was installed at a relatively lower height compared to the Socheongcho or Dokdo, allowing observations to be conducted within an appropriate range, resulting in fewer spikes in the data because of a higher reception rate of reflected radar signals.
2.3 Application of Optimal Filter
The raw data obtained from the MWR (DF025) exhibit numerous spikes and errors because of the scarcity of backscattered microwaves from radar signals in low-wind environments. In a previous study (
Jun et al., 2020), the reliability of the generated wave data was enhanced using a filter from MIROS software, SW-002 (v4.10), to eliminate spikes. This study implemented the identical SW-002 filter used in a previous study on MWR observation data, the procedure of which is outlined as follows.
2.3.1 Energy level check
The spikes observed in the wave height time series in low-wind conditions were eliminated by comparing the wave spectrum with the power of a moving averaged wave spectrum. The most notable achievement was observed in enhancing the accuracy of significant wave height and the quality of wave period.
2.3.2 Long period noise removal
The spikes of low frequency that emerge in the wave spectrum during low wind and wave conditions were identified and removed before the wave spectrum calculation. The improvement in the significant wave height and period was minimal compared to that achieved by other filters.
2.3.3 Reduce noise frequency
Empirical algorithms were used to remove the low-frequency wave energy that leads to overestimating the wave height from the wave spectrum calculated under low wave energy conditions. Although the overestimated wave heights and periods are improved, there is an overall tendency to underestimate the observed wave heights.
2.3.4 Phillips check
The observed wave spectrum was compared with a theoretical wave spectrum, and if the measured spectrum density surpasses the threshold, it is flagged, and the results are recorded as status. Applying the filter had a limited impact on the wave data generated from the IORS MWR.
Individually, the aforementioned filters have a restricted impact on enhancing quality. In SW-002, the users can apply each filter multiple filters in combination, considering the specific characteristics of the respective sea (
Jun et al., 2020). Therefore, this study aimed to enhance the quality of the MWR observation data by identifying an optimal filter for the Ieodo sea area. This was achieved by combining different filters applied to the same period of MWR raw data used in the comparative analysis against the buoy data, as shown in
Table 3. Case 1 assessed the quality improvement effect by applying all four filters, while Case 2 evaluated the effect of the remaining filters, excluding the “long-period noise removal” filter, which has minimal impact on controlling the period.
Case 3 examined the effect of excluding the “reduce noise frequency” filter, which tends to underestimate the wave heights.
Table 4 presents the analysis of statistical measures (correlation coefficient, RMSE, and bias) after applying relevant filters in each case. In case 1, where all four filters were applied, both significant wave height and significant wave periods improved. On the other hand, the improvement in significant wave height was minimal compared to the significant wave period, and the decreasing bias indicated a tendency to underestimate the wave height. Comparing case 1, where all filters were applied, with case 2, which excluded the “long-period noise removal” filter, showed that using multiple filters together still allows each filter to exert its effect.
In addition, the “long-period noise removal” filter effectively removes low-frequency spikes in low-wave environments. Compared to case 1, case 3 had slightly lower quality in the wave period. On the other hand, excluding the “reduce noise frequency” filter, which tends to underestimate the wave height, improves the wave height quality.
Hence, this study chose case 3 as the optimal filter because it exhibited the lowest RMSE among the filter combinations, with an RMSE for a significant wave height of 0.48. This selection aimed to rectify the errors in observation data occurring in 3 m or higher wave environments. The significant wave height and period were compared before and after implementing the optimal filter on the buoy data, and the outcomes were depicted in both a time series plot and a scatter plot (
Fig. 4). Overall, the wave height and period showed improvement after applying the optimal filter. On the other hand, the enhanced wave height was marginal, while the wave period still exhibited numerous spikes. As described in section 2.2, the impact of applying the optimal filter is not conspicuous for the wave height, mainly because the installation height of the MWR is appropriate compared to other data, and observations for the displacement of sea level have been adequately addressed, resulting in a less pronounced improvement effect. Nevertheless, the accuracy declined considerably as the waves were underestimated in five-meter or higher wave conditions, typically during extreme weather events such as typhoons. It has been reported that only applying filters is limited in improving the quality of MWR wave observation data.
3. Quality Enhancement of Wave Radar Data with Artificial Neural Network
3.1 Artificial Neural Network
Different combinations of filters from the MWR software were tested to identify the optimal filter for the Ieodo sea area and applied to the MWR. On the other hand, despite these efforts, numerous errors persist with the wave period, and the reliability of observation data sharply diminishes during 3 m or higher wave conditions. An artificial neural network (ANN) is commonly used across diverse fields when a clear correlation between two physical quantities is not evident because of its ability to discern and replicate nonlinear and potential correlations between two values (
Kim et al., 2021;
Park et al., 2021;
Wei, 2021;
Yun et al., 2022). Thus, this study used the filtered MWR observation data as the input value and the Jeju Southern Ocean Buoy 1 data as the actual value. An ANN model was then developed and trained to improve the data by learning the nonlinear relationship between these two data sets, enhancing the overall quality of the IORS observation data. The ANNs are designed to mimic human brain activities and are implemented as a network of nodes replicating the electrical signal transmission process of neurons or brain nerve cells. The architecture of the ANN primarily comprises input, hidden, and output layers, each containing numerous nodes. The input and output layers are single layers in which the number of nodes corresponds to the number of inputs and outputs accordingly. A hidden layer is situated between the input and output layers and is considered a hyperparameter that must be determined through trial and error by the user, as the optimal architecture varies depending on the characteristics of the input and output data. In each node, a weighted sum was calculated by incorporating a bias, which serves as a threshold for deciding whether to output a signal, and a weight, indicating the importance of the signal from the previous node. This value is then applied to an activation function, and the resulting output is passed to the next node as an input. This computation process occurs sequentially across all connected nodes as the input passes through each layer of the ANN. The final output, derived from a series of computations from the input layer to the output layer, is used to calculate loss from the target value via a loss function. This loss is then used to update the weights and biases of each node through backpropagation. Neural network training involves optimizing values by allowing the network to learn the relationship autonomously between the input and target values while fine-tuning parameters, such as the weights and biases, to achieve highly accurate results. Training is repeated either for a user-specified number of iterations or until the model achieves the accuracy level set by the user.
3.2 ANN Data
This study designed an ANN model to learn potential nonlinear correlations between data and generate highly dependable wave data based on significant wave height and period information from MWR observation data. Initially, the node count of the output layer was configured as two to enable the simultaneous output of the significant wave height and significant wave period, enhancing the practicality of the wave data regeneration model. Subsequently, the correlation between the target values (significant wave height and significant wave period acquired from Jeju Southern Ocean Buoy 1) and approximately 40 observation parameters gathered by the MWR were examined to identify a significant input variable. The analysis revealed strong correlations in significant wave height and significant wave period of the MWR, as well as in wave skewness and wave steepness. This is because the dependence of the MWR on wind-induced sea level fluctuations for continuous observation is a significant indicator of the wave height, while the wave skewness and steepness serve as significant indicators of wave period. Considering that the reliability of MWR observation data decreases during periods of low wind speed, the correlation was analyzed with the buoy observation data using the wind direction and wind speed data of the IORS. Both were utilized as input variables when developing the ANN model owing to the strong correlation between wind speed and wave height. Furthermore, the input parameters with different scales used in this study were normalized to enhance reliability and ensure they were equally weighted in the ANN model. In this study, the input data for the ANN model consisted of significant wave height and period from the MWR, as well as wave steepness, wave skewness, and wind speed from the IORS. In contrast, the output data were set to show the significant wave height and period from the Jeju Southern Ocean Buoy. The range for each data type matched that of the data used in the comparative analysis with the buoy data. The data were separated into training and testing sets at an 8:2 ratio, comprising 40,011 data points for training the ANN model and 10,002 data points for testing it (
Table 5). Stratified sampling was used to enhance the generalization performance of the model, recognizing that the performance of observation equipment is greatly affected by the energy environment and that the distribution rate of high wave events is extremely low, considering the actual nature of observed waves. Stratified sampling involves data-splitting to maintain the user-specified ratio (strata) when the data composition is unbalanced. Accordingly, training and test datasets maintain the original data’s characteristics, promoting model stability and applicability by effectively learning from a diverse range of data. The input values were divided into training and test datasets using stratified sampling based on the significant wave height and period of the buoy, representing actual values, and wind speed data, which significantly impacts MWR observations.
Fig. 5 shows the composition of each dataset strata to determine the optimal reference value, ensuring that the data distribution of statistical measures (RMSE, correlation coefficient) is suitable for the buoy data.
3.3 ANN Optimization
PyTorch, an open-source Python machine learning library, was used to develop a deep neural network to enhance the quality of MWR wave data. The optimization and performance assessment of the ANN model during training was conducted in a workstation environment featuring an RTX 2080Ti GPU and an Intel Xeon 4210 CPU. Optimization involves finding the best combination of hyperparameters to enable a model to reach its target accuracy through iterative learning. These hyperparameters, such as hidden layer architecture, activation function, batch size, and learning rate, were determined empirically by a programmer.
A hidden layer is where various computations involving weighted sums and activation functions are performed sequentially upon receiving a signal from nodes in the input layer. Typically, as the number of layers deepens and the nodes increase, the performance of the model improves, enabling more accurate identification of data features.
On the other hand, the model complexity escalates with the depth of hidden layers and the number of nodes, potentially leading to overfitting, where the model parameters become excessively tailored to the training data, resulting in diminished performance on the test data. In addition, the heightened computational workload prolongs the time required for a neural network to converge. Therefore, this study prefers a neural network demonstrating a convergence speed that maintains accuracy and practicality. To determine the appropriate architecture, 300 different neural network models with varying complexities were designed and evaluated. These models sequentially incorporated two to seven hidden layers and varied the number of nodes in each layer from eight to 128. Therefore, a model with five hidden layers, consisting of 128, 128, 64, 64, and eight nodes, was the most suitable for representing the dataset used in this study (
Fig. 6).
Subsequently, the optimal combination of hyperparameters for the best learning outcomes was determined for six parameters: activation function, learning rate, batch size, loss function, epoch, and optimizer function, as shown in
Table 6. The optimization process of hyperparameters is as follows. First, nonlinearity is introduced into the neural network through activation functions, a critical factor in determining the network performance. Furthermore, a learning rate was chosen to adjust the extent of modification for each weight associated with the activation functions. In this study, Learning rates of 0.001 and 0.0005 were applied to the sigmoid function and hyperbolic tangent function, proposed in the early stages of neural network research, the ReLU (Rectified Linear Unit), which is widely used in many studies by effectively solving the gradient vanishing problem frequently encountered with the sigmoid function and hyperbolic tangent function, and the Leaky ReLU function, which resolves the issue of dying neurons in ReLU, respectively, to assess the performance of each function (
Table 6). The same activation function was used for the nodes in all hidden layers. On the other hand, a linear function was applied as the activation function of the final hidden layer to ensure the output layer produces an arbitrary value because the neural network in this study was designed for regression analysis to estimate the correlations between variables. As a result, R_2 was identified as the optimal hyperparameter combination in terms of accuracy and practicality, based on the number of epochs required for convergence and the loss observed in the test data (
Fig. 7). Furthermore, a mean squared error function was used as the loss function to measure the discrepancies between the output and target values of the model. The Adam optimizer was chosen to minimize the loss through gradient descent, where it seeks the minimum value using the gradient (a derivative of the loss function). Adam is widely acknowledged in various studies for its adaptive learning rates, which adjust according to the fluctuations in curvature to find the minimum value while also providing momentum to the learning speed. After numerous trials and errors, the batch size, representing a data unit used for a neural network to update weights once during learning, was determined to be 64. The number of epochs representing a full cycle through the training dataset was unrestricted. On the other hand, the risk of overfitting was mitigated using an early stopping mechanism, which automatically terminates learning if improved performance is not achieved within 20 epochs.
3.4 Result of ANN application
The neural network was optimized by optimal hyperparameters for the provided data. the optimal hyperparameters for the provided data. The performance of the optimized neural network on the MWR observation data was assessed by comparing it before and after applying the neural network. The results are depicted in a scatter plot and statistical measures in
Fig. 8 and
Table 7, respectively. When the results after applying the ANN were compared with the data before applying the ANN (case 3), there was a 0.02 increase in the correlation coefficient, a 0.06 decrease in RMSE, and a 0.1 decrease in bias for the significant wave height. Similarly, for the significant wave period, there was a 0.19 increase in the correlation coefficient, a 0.61 decrease in RMS, and a 0.387 decrease in bias. As a result, while there is little discernible enhancement in significant wave height after applying the ANN, the issue of overestimation in significant wave period has been notably addressed through the ANN. Such performance improvement is also demonstrated in the scatter plot in
Fig. 8. The significant wave height remains largely unchanged, as a strong correlation was already apparent before implementing the ANN. On the other hand, the distribution of significant wave period data has become increasingly concentrated along the diagonal line. Overall, for high wave conditions with a 3 m or higher wave height, there was a 0.07 increase in the correlation coefficient, a 0.07 decrease in RMSE, and a 0.04 decrease in bias for the significant wave height, while there was a 0.09 increase in the correlation coefficient, a 0.19 decrease in RMSE, and a 0.14 increase in bias for the significant wave period (
Table 8). The reliability of the MWR observation data was improved using the ANN even under high wave conditions.
On the other hand, the regenerated wave data still contain some errors when a wave, 3 m or higher, occurs. This is because the dataset used for ANN training contained only 7% of data with wave heights above 3 meters, leading the model to perform better in low wave environments compared to high wave conditions. Therefore, to enhance the model performance under high wave conditions by addressing the imbalance in data distribution, it is essential to use various data augmentation methods, such as the synthetic minority over-sampling technique with Gaussian noise (SMOGN) capable of amplifying data in minority sections, such as high waves, and reducing the normal data in majority sections, all while preserving the characteristics of the existing data. Nevertheless, there are few cases of applying these data augmentation techniques to marine data, so further research is needed to optimize them for this specific data type.
4. Conclusions
This study collected raw MWR data (DF025) and conducted a comparative analysis against the observation data from the Jeju Southern Ocean Buoy, which is currently operated by the Korea Hydrographic and Oceanographic Agency, to enhance the quality of wave observation data from the IORS. Four types of filters included in the MIROS software (SW-002) were applied to eliminate spikes caused by backscattering microwaves under wind speeds of 3 m/s or lower. The optimal filter was to use three filters in combination, excluding the “reduce noise frequency” filter, which tends to underestimate wave height, because applying filters individually has limited effectiveness in improving quality. After applying the most optimal filter combination, most spikes were eliminated, but the improvement in wave height was minimal, and numerous errors in wave height persisted. In addition, errors continued to occur frequently under 5 m or more wave conditions, particularly during extreme weather conditions. This study applied an ANN designed to identify and reproduce the nonlinear and potential correlations between two data sets to address this limitation. The data were separated into training and test sets using stratified sampling to ensure the generalization of neural networks. The hyperparameters that exhibited a strong correlation with the buoy data in the MWR observation data were chosen as input variables to facilitate efficient learning. A neural network architecture that achieves the best learning outcomes in accuracy and practicality was designed through iterative learning by adjusting the hyperparameters until the optimal learning outcomes were achieved. Ultimately, the optimal learning outcomes were achieved with a neural network featuring five hidden layers, with node counts of 64, 64, 32, 16, and eight, respectively, a learning rate of 0.0005, and a batch size of 64. Comparing the learning outcomes with the initial data, there was a notable enhancement in the reliability of wave period (correlation coefficient increased by 0.19 and RMSE decreased by 0.61), as well as for waves 3 m or higher (correlation: 0.07 increase for wave height and 0.09 increase for wave period; RMSE: 0.07 decrease for wave height, and 0.19 for wave period). These findings suggest that the proposed ANN model significantly enhanced the reliability of MWR wave data observed at the IORS. Given the regional characteristics of the IORS, located along typhoon pathways, these results are anticipated to offer valuable insights into designing waves for safeguarding coastal and offshore structures from high waves during extreme weather conditions. In addition, they are expected to serve as foundational data for improving wave prediction models. Moreover, implementing the proposed method with other MWR wave observation data beyond the IORS will help enhance the reliability of remote wave observation data. On the other hand, errors persist due to overfitting during ANN training, attributed to inadequate data distribution for waves of 3 m or higher. Therefore, various data augmentation techniques, such as SMOGN, are recommended to address the data imbalance issue. Further research is needed to identify suitable data augmentation techniques for wave data, such as that from the MWR because no established precedent has demonstrated its efficacy with marine data. In addition, applying ANNs to wave direction data is worthwhile, considering their proven effectiveness in enhancing the significant wave height and period of remote wave observation data.
Conflict of Interest
Kideok Do serves as a member of the journal publication committee of the Journal of Ocean Engineering and Technology, but he had no role in the decision to publish this article. No potential conflict of interest relevant to this article was reported.
Funding
This study was partly supported by the National Research Foundation of Korea grant funded by the Korean government (NRF-2022R1I1A306559912) and Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries (20210607, Establishment of the Ocean Research Station in the Jurisdiction Zone and Convergence Research).
Fig. 1.
Fig. 2.
Comparison of the ERA5 significant wave height data at Ieodo and Jeju Southern Ocean Buoy location ERA5 significant wave height data: (a) Time series of significant wave height from Ieodo (red) and Buoy (blue) location; (b) Scatter plots of the significant wave height from Ieodo and Buoy location with statistics.
Fig. 3.
Comparison of the MWR and Jeju Southern Ocean Buoy data: (a) Time series of significant wave height from MWR (red) buoy (blue); (b) Time series of significant wave period from MWR (red) buoy (blue); (c) Time series of wind speed from IORS anemometer.
Fig. 4.
Comparison of optimal filtered MWR data (case3, red) and Buoy data (blue) with no filtered MWR data (white): (a) Time series of significant wave height; (b) Time series of significant wave period; (c) Scatter plots of significant wave height; (d) Scatter plots of significant wave period.
Fig. 5.
Composition of the training data (outer circle) and test data (inner circle) by stratified sampling: (a) Composition sampled by stratified the buoy significant wave height; (b) Composition sampled by stratified buoy significant wave period; (c) Composition sampled by stratified IORS wind speed.
Fig. 6.
Optimal ANN model structure (five inputs, five hidden layers, and two outputs)
Fig. 7.
Comparison of Epoch (blue) and Loss (red) to select the optimal hyperparameter set
Fig. 8.
Comparison of the ANN result (red) and Optimal Filter data (blue): (a) scatter plots of significant wave height; (b) scatter plots of significant wave period.
Table 1.
Specifications of MWR and Jeju Southern Ocean Buoy (
KHOA, 2012)
Specifications |
MWR |
Jeju Southern Ocean buoy |
Size |
Height 0.86 m × Width 0.9 m × Depth 0.7 m |
Diameter 4.3 m × Height 3.2 m |
Observation equipment |
Six Radar antenna (5.8 GHz) |
MOSE-G1000 Wave sensor (2 Hz) |
Observation data |
Wave height |
Wave period |
Wave direction |
Wave height |
Wave period |
Wave direction |
Range |
0 30 m |
3 30 s |
1° 360° |
0 9999 cm |
1–100 s |
1° 360° |
Accuracy |
0.1 m |
0.1 s |
1° |
1 cm |
0.1 s |
1° |
Update interval |
15 min (2.5min per Antenna) |
30 min |
Table 2.
Comparison of the MWR installation location, observation range, and significant wave height correlation by observation station (
KORS, 2022a;
KORS, 2022b).
Observation station |
Installation location |
Observation range |
Correlation (no filter) |
Ieodo |
Roof deck DL(+) 33.6 + 1.2 m |
210 m (135 + 75 m) |
0.81 |
Socheongcho |
Roof deck DL(+) 37+ 0.8 m |
235 m (160 + 75 m) |
0.62 |
Dokdo |
Dongdo (East island) DL(+) 89 m |
570 m (495 + 75 m) |
0.34 |
Table 3.
SW-002 Filter Combinations to Improve Wave Data Quality
Combinations of filters |
Reduce noise frequency |
Phillips check |
Long period noise removal |
Energy level check |
No filter |
Off |
Off |
Off |
Off |
Case 1 |
On |
On |
On |
On |
Case 2 |
On |
On |
Off |
On |
Case 3 |
Off |
On |
On |
On |
Table 4.
Statistic values of significant wave height and significant wave period with SW-002 Filter Combination.
|
Wave height |
Wave period |
|
Correlation |
RMSE |
Bias |
Correlation |
RMSE |
Bias |
No filter |
0.81 |
0.58 |
−0.08 |
0.38 |
2.45 |
0.78 |
Case 1 |
0.88 |
0.50 |
−0.21 |
0.69 |
1.23 |
0.31 |
Case 2 |
0.88 |
0.52 |
−0.26 |
0.68 |
1.31 |
0.33 |
Case 3 |
0.88 |
0.48 |
−0.12 |
0.62 |
1.42 |
0.38 |
Table 5.
Data used in the ANN model
|
Data period |
Input data |
Output data |
Label data |
Training and test data |
Value |
2014.03.14 – 2021.09.08 |
Case 3 Hs, Ts, Sk, St IORS ws |
Hs, Ts
|
Jeju Southern Ocean buoy Hs, Ts
|
40011 (8): 10002 (2) |
Table 6.
Check the ANN model performance by changing hyperparameters.
Hidden layer structure |
Loss function |
Optimizer |
Batch size |
Activation function |
Learning rate |
Abbreviation |
Loss |
Epoch |
|
MSE |
Adam |
64 |
Tanh |
0.001 |
T-1 |
0.2474 |
55 |
|
Tanh |
0.0005 |
T-2 |
0.2465 |
50 |
128 |
Sigmoid |
0.001 |
S-1 |
0.2507 |
117 |
128 |
Sigmoid |
0.0005 |
S-2 |
0.2486 |
100 |
64 |
64 |
ReLU |
0.001 |
R-1 |
0.2457 |
43 |
8 |
ReLU |
0.0005 |
R-2 |
0.2435 |
50 |
|
Leaky ReLU |
0.001 |
L-1 |
0.2462 |
40 |
|
Leaky ReLU |
0.0005 |
L-2 |
0.2452 |
33 |
Table 7.
Statistic values of significant wave height and wave period with ANN and Optimal Filter (Case 3)
|
Wave height |
Wave period |
|
Correlation |
RMSE |
Bias |
Correlation |
RMSE |
Bias |
Case 3 |
0.88 |
0.47 |
−0.11 |
0.65 |
1.39 |
0.39 |
ANN |
0.90 |
0.41 |
−0.01 |
0.84 |
0.78 |
−0.003 |
Table 8.
Statistical values of significant wave height and wave period under high wave conditions with ANN and Optimal Filter (Case 3)
|
Wave height |
Wave period |
|
Correlation |
RMSE |
Bias |
Correlation |
RMSE |
Bias |
Case 3 (Hs ≥ 3 m) |
0.66 |
0.94 |
−0.49 |
0.66 |
1.00 |
0.19 |
ANN (Hs ≥ 3 m) |
0.73 |
0.87 |
−0.45 |
0.75 |
0.81 |
−0.33 |
References
Datawell. (2009). Wave unit reference manual, Datawell BV Oceanographic Instruments, Netherlands, Datawell.
Jeong, W., Oh, S., Ryu, K., Back, J., & Choi, I. (2018). Establish of wave information network of Korea (WINK).
Journal of Korean Society of Coastal and Ocean Engineers,
30(6), 326-336.
https://doi.org/10.9765/KSCOE.2018.30.6.326
Jun, H., Min, Y., Jeong, J. Y., & Do, K. (2020). Measurement and quality control of MIROS Wave Radar data at Dokdo.
Journal of Korean Society of Coastal and Ocean Engineers,
32(2), 135-145.
https://doi.org/10.9765/kscoe.2020.32.2.135
Kim, H., Ahn, K., & Oh, C. (2021). Estimation of significant wave heights from X-band radar based on ANN using CNN rainfall classifier.
Journal of Korean Society of Coastal and Ocean Engineers,
33(3), 101-109.
http://doi.org/10.9765/KSCOE.2021.33.3.101
Min, Y., Jeong, J., Min, I., Kim, Y., Shim, J., & Do, K. (2018). Enhancement of wave radar observation data quality at the socheongcho ocean research station.
Journal of Coastal Research,
85, 571-575.
https://doi.org/10.2112/SI85-115.1
MIROS. (2011). Wave and current radar system handbook, Norway. MIROS.
Park, S., Shin, S., Jung, K., & Lee, B. (2021). Prediction of significant wave height in Korea strait using machine learning.
Journal of Ocean Engineering and Technology,
35(5), 336-346.
https://doi.org/10.26748/KSOE.2021.021
Shim, J., & Chun, I. (2004). Construction and operation of Ieodo Ocean Research Stations. The Magazine of the Korean society of Civil Engineers, 52(4), 28-36.
Shim, J., & Min, I. (2007). Construction of IEODO Ocean Research Station and its data analysis. The 1st Proceedings of Ieodo Research, 56-65.
Yun, M., Kim, J., & Do, K. (2022). Estimation of Wave-Breaking Index by Learning Nonlinear Relation Using Multilayer Neural Network.
Journal of Marine Science and Engineering,
10(1), 50.
https://doi.org/10.3390/jmse10010050