The developed algorithm was determined to operate in real time, as it could classify task situations at a rate of 10 Hz when 100 ms of acoustic data were input. Audio data, not used during the training process, were recorded using a Sennheiser
TM Profile microphone. The results of classifying situations by both a human operator and the trained model were then compared. In this case, the microphone was configured to collect data in the PCM format at a sampling rate of 44,100 Hz, consistent with the settings used for collecting training data. These results are illustrated in the graph in
Fig. 9. For the manual classification of operational situations (
Fig. 9(a)), the results predicted by the trained RNN model (
Fig. 9(b)) achieved an accuracy of 84%, while the results from the LSTM model (
Fig. 9(c)) achieved an accuracy of 88%. The prediction process for both models required a total operation time of 5 ms, which included 4 ms for extracting acoustic features and 1 ms for predicting the class. This confirmed that predictions could be performed at a maximum frequency of 200 Hz. The learning process for both the RNN and LSTM models was represented by the average loss (
Figs. 10(a) and 10(c)) and the average accuracy for the test set (
Figs. 10(b) and 10(d)) plotted over each epoch, showing the performance after one out of ten epochs. The performance metrics for the models’ training are presented in
Tables 1 and
2 for the RNN and LSTM models, respectively. In this study, data imbalance and the influence of the base class were prominently observed.First, the models exhibited the lowest performance for the hard cutting class, as it had the smallest amount of corresponding data. Thus, the models struggled to learn this class effectively. In contrast, for the cutting class, the precision, recall, and F1 score (the harmonic mean of precision and recall) all showed high values. This was due to the abundance of training data, with 4,333 frames available for learning. Second, for the base class, the precision and recall values were relatively lower, with scores of 78 and 66, respectively, for the LSTM model. This is likely because the base class did not belong to any of the three primary classes, leading to misclassifications where other classes were incorrectly predicted. The confusion matrix derived from the test set (
Fig. 11) confirmed these predictions for the base class. As previously described in section 3.2, the LSTM model is known to demonstrate superior performance compared to the conventional RNN. To verify whether this holds true for the actual learning results, we compared the performance of both models using the same hyperparameters, model structure, and data as outlined in section 4.1. The comparison was conducted through the confusion matrix (
Fig. 11), and the precision, recall, and F1 score classification performance evaluation tables were analyzed for both the RNN (
Fig. 11(a),
Table 1) and LSTM (
Fig. 11(b),
Table 2). The RNN displayed distinct characteristics when compared to the LSTM model, and similar performance was observed for the idling and cutting classes, which had the largest amount of training data. However, for the hard cutting class, which is similar to the cutting class, the RNN demonstrated lower performance in both precision and recall than the LSTM, thereby confirming that the RNN is not suitable for the classification task in this study.