Rapid Retrieval:      
引用本文:
【打印本页】   【下载PDF全文】   View/Add Comment  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1118次   下载 1241 本文二维码信息
码上扫一扫!
分享到: 微信 更多
web日志挖掘中会话识别方法
袁艺, 陈海光
上海师范大学
摘要:
通过对传统web会话识别方法分析和比较,改进了目前最常用的基于时间阈值会话识别方法,提出了一种基于动态阈值会话识别方法,该算法采用动态计算会话中请求记录间的平均时间间隔和动态计算会话中页面的平均大小相结合的方法,根据用户和网页的特点动态调整阈值,相对于传统单一的先验阈值,该方法可以根据不同的用户访问不同的页面生成动态的阈值,充分运用用户和网页信息.经过实验验证,该方法可以识别出更多的用户会话,且识别会话的准确率和查全率也比传统算法更高.
关键词:  web挖掘  会话识别  时间阈值  数据预处理
DOI:
分类号:
基金项目:
Method of session identification in web log mining
YUAN Yi, CHEN Haiguang
College of Information,Mechanical and Electrical Engineering,Shanghai Normal University
Abstract:
In this paper,by analyzing and comparing of the traditional web method of session identification,with the improvement on the most commonly used method of session identification based on time threshold,this paper proposed a session identification method based on dynamic threshold,in which the algorithm uses the average time interval between request records in conversation of dynamic calculation,and in combination of the average size of dynamic calculation session pages dynamically adjusts the threshold according to the characteristics of users and Webpage.Compared with the traditional single a priori threshold,this method can generate dynamic threshold according to different user access to different pages and make full use of user information and Webpage.After experimental verification,this method can identify more user sessions.The accuracy of session identification and the recall rate are higher than the traditional algorithm.
Key words:  web mining  session identification  threshold  data preprocessing