TISF 2024

回首頁
Home
每日快報
TISF Daily News
各科作品
Project Exhibition Hall
最新消息
News
大會手冊下載
Manual download
臺灣國際科學展覽會簡介
About TISF
國立臺灣科學教育館簡介
About NTSEC
開幕典禮觀看
Opening Ceremony
頒獎典禮觀看
Awards Ceremony
2024年得獎作品
2024 Award Winning Projects

190032 惡意程式無所遁形—以自然語言處理模型實現惡意程式之識別
Taiwan

The objective of this study is to apply natural language processing techniques to create a model for identifying malicious software. Two datasets, PE (Portable Executable) and ELF (Executable and Linkable Format), were used, each comprising both benign and malicious files, with a diverse range of malicious program families collected during the data collection process. The datasets were disassembled and preprocessed. The study used assembly language files as text data to train a model to distinguish between benign and malicious programs. The research compared the performance of various models, including bag-of-words, sequence models, BERT, and different n-gram models.
The research findings indicate that the bag-of-words model performs best when using multi-hot encoding, achieving an F1-score of 96.87% on the PE dataset. In the case of sequence models, the transformer encoder with positional encoding yields the optimal results. When comparing different n-grams, the multi-hot bag-of-words model and the TF-IDF bag-of-words model present the highest F1-scores in 2-gram and 5-gram scenarios, respectively.