2025,4,9 Wednesday

Publisher

Home

Editorial Board

中文版

引用本文:

【打印本页】【下载PDF全文】【View/Add Comment】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 1355次下载 3156次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于信息增益的中文网页SVM分类研究
潘正才, 陈海光
上海师范大学

摘要:

针对中文网页文本分类中特征降维方法和传统信息增益方法的缺陷和不足做出优化改进,旨在有效提高文本分类效率和精度.首先,采取词性过滤和同义词归并处理对特征项进行初次特征降维,然后提出改进的信息增益方法对特征项进行特征加权运算,最后采用支持向量机(SVM)分类算法对中文网页进行文本分类.理论分析和实验结果都表明本方法比传统方法具有更好的性能和分类效果.

关键词: 信息增益方法词性过滤同义词归并特征加权支持向量机

DOI：

分类号:

基金项目:上海市教育委员会科研创新项目(09YZ154)

Research on Chinese web page SVM classifer based on information gain

PAN Zhengcai, CHEN Haiguang

College of Information,Mechanical and Electrical Engineering,Shanghai Normal University

Abstract:

In order to improve the efficiency and accuracy of text classification,optimization and improvement are made for defects and deficiencies of the feature dimensionality reduction method and traditional information gain method in text classification of Chinese web pages.At first,part-of-speech filtering and synonyms merging processes are taken for the first feature dimension reduction of feature items.Then,an improved information gain method is proposed for feature weighting computation of feature items.Finally,the classification algorithm of Support Vector Machine (SVM) is used for text classification of Chinese web pages.Both theoretical analysis and experimental results show that this method has better performance and classification results than traditional method.

Key words: information gain method part-of-speech filtering synonyms merging feature weighting Support Vector Machine