摘要: |
对上海中小学教材德目教育文本分类进行研究,提出了基于转换器的双向编码表征(BERT)预训练模型、双向长短期记忆(BiLSTM)网络和注意力机制的模型IoMET_BBA. 通过合成少数类过采样技术(SMOTE)与探索性数据分析(EDA)技术进行数据增强,使用BERT模型生成富含语境信息的语义向量,通过BiLSTM提取特征,并结合注意力机制来获得词语权重信息,通过全连接层进行分类. 对比实验的结果表明,IoMET_BBA的F1度量值达到了86.14%,优于其他模型,可以精确地评估教材德目教育文本. |
关键词: 德目指标 中文文本分类 基于转换器的双向编码表征(BERT)模型 双向长短期记忆(BiLSTM)网络 注意力机制 |
DOI:10.3969/J.ISSN.1000-5137.2024.02.005 |
分类号:TP391.1 |
基金项目:国家社会科学基金(13JZD046) |
|
Text classification method for textbook moral education based on deep learning |
CHEN Haomiao, CHEN Junhua
|
College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
|
Abstract: |
The classification of moral education texts in Shanghai primary and secondary school textbooks was studied and an IoMET_BBA(Indicators of moral education target based on BERT, BiLSTM and attention) model was proposed based on bidirectional encoder representations from transformer(BERT) pre-training model, bidirectional long short-term memory (BiLSTM) network, and attention mechanism. Firstly, data augmentation was performed using synthetic minority oversampling technique(SMOTE)and exploratory data analysis (EDA). Secondly, BERT was used to generate semantic vectors with rich contextual information. Thirdly, BiLSTM was adopted to extract features, and attention mechanism was combined to obtain word weight information. Finally, classification was performed through a fully connected layer. The comparative experimental results indicated that F1 measurement value of IoMET_BBA reached 86.14%, which was higher than other models and could accurately evaluate the moral education texts of textbooks. |
Key words: moral education index chinese text classification bidirectional encoder representations from transformer(BERT) model bidirectional long short-term memory (BiLSTM) network attention mechanism |