Taiwan
Nowadays, when computers are quite popular, many people in Taiwan choose the "Zhuyin input method" as the input method when typing on computers. However, when using the Zhuyin input method to input Chinese, if you do not switch the input method, you may mistakenly use the English input method, and then type out a confusing garbled text. If you want to input the five words "今天天氣好", using the English input method will output "rup wu0 wu0 fu4cl3". This style of writing is almost difficult to understand and can easily cause confusions to read or to communicate. The purpose of this research is to study the method of translating mixed characters of English and numerical symbols typed without switching to the "Zhuyin input method" into Chinese characters. This research uses Chinese sentences in the "PTT Chinese corpus" and “Wikipedia Chinese database” to train GRU, BiGRU, LSTM and Transformer, calculate the Viterbi algorithm, and compare it with existing translation methods by using Google input tools. In terms of score performance, the best machine learning models are then the LSTM model and the Transformer model. Overall, the results trained with the PTT Chinese language corpus are better than the Wikipedia Chinese database. The Viterbi algorithm calculated with the PTT Chinese language corpus has the highest BLEU-4 score, and its accuracy and BLEU4 score are both higher than the Google input tool, with scores of 0.94 and 88.3 points, respectively. It can illustrate that the Viterbi algorithm used in this research is the best solution. The results of this work have a wide range of applications and can used to online translation or real-time translation of chat software. If you frequently switch between English and "Zhuyin input" methods, you may then need to use this kind of translator. The program code used here is an open source on the GitHub website. In addition to allowing users to download and use it, users can also train their own models or adjust the program to get better acquirement by the user's feedback.