Rapid Retrieval:      
引用本文:
【打印本页】   【下载PDF全文】   View/Add Comment  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1322次   下载 3131 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于信息增益的中文网页SVM分类研究
潘正才, 陈海光
上海师范大学
摘要:
针对中文网页文本分类中特征降维方法和传统信息增益方法的缺陷和不足做出优化改进,旨在有效提高文本分类效率和精度.首先,采取词性过滤和同义词归并处理对特征项进行初次特征降维,然后提出改进的信息增益方法对特征项进行特征加权运算,最后采用支持向量机(SVM)分类算法对中文网页进行文本分类.理论分析和实验结果都表明本方法比传统方法具有更好的性能和分类效果.
关键词:  信息增益方法  词性过滤  同义词归并  特征加权  支持向量机
DOI:
分类号:
基金项目:上海市教育委员会科研创新项目(09YZ154)
Research on Chinese web page SVM classifer based on information gain
PAN Zhengcai, CHEN Haiguang
College of Information,Mechanical and Electrical Engineering,Shanghai Normal University
Abstract:
In order to improve the efficiency and accuracy of text classification,optimization and improvement are made for defects and deficiencies of the feature dimensionality reduction method and traditional information gain method in text classification of Chinese web pages.At first,part-of-speech filtering and synonyms merging processes are taken for the first feature dimension reduction of feature items.Then,an improved information gain method is proposed for feature weighting computation of feature items.Finally,the classification algorithm of Support Vector Machine (SVM) is used for text classification of Chinese web pages.Both theoretical analysis and experimental results show that this method has better performance and classification results than traditional method.
Key words:  information gain method  part-of-speech filtering  synonyms merging  feature weighting  Support Vector Machine