> 文章列表 > Impala VS Spark

Impala VS Spark

Impala VS Spark

Impala vs spark性能测试

impala版本chd2.3;4个server节点

spark版本1.6.2;5个 Executors,每个10cores,20G memory

1.8亿的表

sql sparkSql(取三次平均) impala(取三次平均)
select * from dadddcb8179f4345a53819a7f75abdfb where fk87561dc6 > 131097 limit 1000 1s 0.9s
select fkeefe9a40, fkeb26f85b, count(1) from dadddcb8179f4345a53819a7f75abdfb group by fkeefe9a40, fkeb26f85b limit 1000 3.5s 5.1s
select fkeefe9a40, fkeb26f85b, count(1) from dadddcb8179f4345a53819a7f75abdfb where fkeefe9a40 > ‘2015-03-06’ and fkeefe9a40 < ‘2015-12-06’ group by fkeefe9a40, fkeb26f85b limit 1000 3.5s 5.2s
select fkeefe9a40, fkca691145, sum(cast(fk87561dc6 as int)) as total from dadddcb8179f4345a53819a7f75abdfb where fkeb26f85b in (‘4490’,‘3588’) group by fkeefe9a40, fkca691145 order by total desc limit 1000 3.9s 3.2s
select fkeefe9a40, fk1412e410, max(fk87561dc6), min(fk87561dc6) from dadddcb8179f4345a53819a7f75abdfb where fkeefe9a40 > ‘2015-03-06’ group by fkeefe9a40, fk1412e410 limit 1000 9.5s 13.5s

1千万的表

sql sparkSql(取三次平均) impala(取三次平均)
select * from 6e0643254e9848f5b71c676649e336cb where fk153ddc37 > 1000 limit 1000 0.58s 0.6s
select fkfe37137d,fk49466f92,count(1) from 6e0643254e9848f5b71c676649e336cb group by fkfe37137d,fk49466f92 limit 1000 3.2s 3.4s
select fkfe37137d,fk49466f92,count(1) from 6e0643254e9848f5b71c676649e336cb where fkfe37137d > ‘2017-01-10’ and fkfe37137d < ‘2017-03-20’ group by fkfe37137d,fk49466f92 limit 1000 1.7s 1.6s
select fkfe37137d, fk49466f92, sum(cast(fk153ddc37 as int)) as total from 6e0643254e9848f5b71c676649e336cb where fkd2f91355 in(‘e339058d8ddb7bd89a3e939e54372f38’, ‘eab700943c25bd7f522cb9c7a187e344’) group by fkfe37137d, fk49466f92 order by total desc limit 1000 2.9s 1.8s
select fkfe37137d, fk1dba2b17, max(fk153ddc37), min(fk153ddc37) from 6e0643254e9848f5b71c676649e336cb where fkfe37137d > ‘2017-01-10’ group by fkfe37137d, fk1dba2b17 limit 1000 2.3s 1.7s

1百万的表

sql sparkSql(取三次平均) impala(取三次平均)
select * from o49ed7e906424d90a0ea2f11bea5e1db where fk0770b65e > 1000 limit 1000 0.35s 0.14
select fk28a55b60,fk418dc234,count(1) from o49ed7e906424d90a0ea2f11bea5e1db group by fk28a55b60,fk418dc234 limit 1000 1.4s 1.6s
select fk28a55b60,fk418dc234,count(1) from o49ed7e906424d90a0ea2f11bea5e1db where fk28a55b60 > ‘2017-02-10’ and fk28a55b60 < ‘2017-03-20’ group by fk28a55b60,fk418dc234 limit 1000 0.6s 1.1s
select fk28a55b60, fk8b32095b, sum(cast(fk0770b65e as int)) as total from o49ed7e906424d90a0ea2f11bea5e1db where fk815f8898 in (‘0’,‘3’) group by fk28a55b60, fk8b32095b order by total desc limit 1000 0.9s 1.2s
select fk28a55b60, fk418dc234, max(fk0770b65e), min(fk0770b65e) from o49ed7e906424d90a0ea2f11bea5e1db where fk28a55b60 > ‘2017-02-10’ group by fk28a55b60, fk418dc234 limit 1000 1.7s 1.6s

结论:

impala2.3和spark1.6查询性能并没有相差太大,impala2.6和spark2.0的话按照官方的宣传的话最后也差不多。

Spark 2.0 improved its large query performance by an average of 2.4X over Spark 1.6 (so upgrade!). Small query performance was already good and remained roughly the same.

Impala 2.6 is 2.8X as fast for large queries as version 2.3. Small query performance was already good and remained roughly the same.

与 Spark 1.6 相比,Spark 2.0 将其大型查询性能平均提高了 2.4 倍(所以升级!)。 小型查询性能已经很好并且大致保持不变。
Impala 2.6 的大型查询速度是 2.3 版的 2.8 倍。 小型查询性能已经很好并且大致保持不变。