自監督學習在台灣手語辨識上的應用研究 Sign language pretrained transformer
Previous research of Taiwanese Sign Language (TSL) recognition used supervised
learning as their model training method, which required a large number of labeled data, limiting the recognizable vocabulary size. To fix this issue, we took inspiration from the masking concept in a language representation model called BERT. Our idea is to randomly mask certain number of frames in unlabeled TSL videos, allowing the model to learn the features of TSL by predicting the masked frames. Transfer learning is then applied to train the TSL recognition model. The results showed that the TSL recognition model had achieved a recognizable vocabulary size of 242 words with an accuracy of 94.8%.
Moreover, there had been no research about TSL sentence translation. To address that, we
designed a TSL translation system based on the TSL recognition model. The system achieved an 88% accuracy in translation for 100 sentences, with a BLEU-4 score of 20.98. This research proved that the self-supervised learning approach is effective in both TSL recognition and translation. With this method, the model requires fewer samples to train, also makes the recognizable vocabulary easier to expand.