摘要: |
通过对传统web会话识别方法分析和比较,改进了目前最常用的基于时间阈值会话识别方法,提出了一种基于动态阈值会话识别方法,该算法采用动态计算会话中请求记录间的平均时间间隔和动态计算会话中页面的平均大小相结合的方法,根据用户和网页的特点动态调整阈值,相对于传统单一的先验阈值,该方法可以根据不同的用户访问不同的页面生成动态的阈值,充分运用用户和网页信息.经过实验验证,该方法可以识别出更多的用户会话,且识别会话的准确率和查全率也比传统算法更高. |
关键词: web挖掘 会话识别 时间阈值 数据预处理 |
DOI: |
分类号: |
基金项目: |
|
Method of session identification in web log mining |
YUAN Yi, CHEN Haiguang
|
College of Information,Mechanical and Electrical Engineering,Shanghai Normal University
|
Abstract: |
In this paper,by analyzing and comparing of the traditional web method of session identification,with the improvement on the most commonly used method of session identification based on time threshold,this paper proposed a session identification method based on dynamic threshold,in which the algorithm uses the average time interval between request records in conversation of dynamic calculation,and in combination of the average size of dynamic calculation session pages dynamically adjusts the threshold according to the characteristics of users and Webpage.Compared with the traditional single a priori threshold,this method can generate dynamic threshold according to different user access to different pages and make full use of user information and Webpage.After experimental verification,this method can identify more user sessions.The accuracy of session identification and the recall rate are higher than the traditional algorithm. |
Key words: web mining session identification threshold data preprocessing |