Publisher

Editorial Board

Overview

Submission Guidance

Policies

Recommended Articles

中文版

引用本文:

【打印本页】【下载PDF全文】【View/Add Comment】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 1418次下载 1070次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于Hive的性能优化研究
王康¹, 陈海光¹, 李东静²
1.上海师范大学信息与机电工程学院, 上海 200234;2.南京航空航天大学计算机科学与技术学院, 南京 211106

摘要:

主要从MapReduce作业调度和Hive性能调优两个方面对Hive的性能优化进行研究.对于MapReduce主要从编程模型切入，分析其执行过程，并从map端、reduce端进行参数调优.接着从Hive框架角度入手，分别从分区表和外部表以及常用数据文件的压缩、行式存储与列式存储等方面进行深入研究.实验结果表明，snappy压缩、orcfile/parquet存储格式对于列式查询，提高查询效率，对于大数据分析平台有较好的兼容性.

关键词: 数据仓库作业调优性能优化压缩存储格式

DOI：10.3969/J.ISSN.1000-5137.2017.04.011

分类号:TP301

基金项目:

Performance optimization research based on Hive

Wang Kang¹, Chen Haiguang¹, Li Dongjing²

1.The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China;2.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Abstract:

This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce's programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive's framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.

Key words: data warehouse job optimization performance optimization compression storage format