快速检索:      
引用本文:
【打印本页】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1260次   下载 1065 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于二阶隐马尔可夫模型的新闻分类算法
孙璇, 李鲁群, 江龙泉
上海师范大学 信息与机电工程学院, 上海 200234
摘要:
提出一种基于二阶隐马尔可夫模型(HMM)的新闻分类算法,旨在提取新闻内容中的类别字,构成特征词集合.以该特征词集合作为不同二阶HMM分类器的观察序列,二阶HMM的隐藏状态反映了文档中词语之间的相关性差异,每个状态表示出现在语料库中的词语的相关性水平.实验结果表明,相比k近邻(kNN)、朴素贝叶斯(Naive Bayes)以及支持向量机(SVM)算法,二阶HMM算法的分类表现更显优势.
关键词:  新闻分类  二阶隐马尔可夫模型(HMM)  词频率-逆向文件频率  χ2检验  特征词
DOI:10.3969/J.ISSN.1000-5137.2018.04.016
分类号:TP391
基金项目:
News classification algorithm based on second order Hidden Markov Model
Sun Xuan, Li Luqun, Jiang Longquan
The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Abstract:
A novel algorithm based on second order Hidden Markov Model (HMM) was proposed to classify the documents of news,aiming to extract categorical feature words from news contents as a feature set.The feature set was considered as the observation sequence of different second order HMM classifiers,and the hidden state of which reflected the differences between the words in the relevant documents,and each state of which represented correlation of words occurring in the corpus.The experiment showed that the proposed classification algorithm based second order HMM had prominent advantage over k-Nearest Neighbor (kNN),Naive Bayes and Support Vector Machine (SVM) algorithms.
Key words:  news classification  second order Hidden Markov Model (HMM)  term frequency-inverse document frequency  χ2 test  feature word