Overview
Optimized Analytics Package (OAP) is an open source project to optimize Apache Spark on SQL engine, MLlib and so on, driven by Intel and the community.
Why use OAP?
Apache Spark is powerful and well optimized on many aspects, but it still faces some challenges to achieve a higher-level performance.
-
The JVM and row-based computing engine prevents Spark to be fully optimized for Intel hardware, for example AVX/AVX512, GPU.
-
The current implementation of key aspects, such as memory management & shuffle, doesn't consider the latest technology advancements, like PMEM.
-
The batch processing engine cannot satisfy the need of queries with high performance requirement.
OAP Project aims to optimize Spark on these aspects above. It had 6 components, including Gazelle Plugin, OAP MLlib, SQL DS Cache, PMem Spill, PMem Common, and PMem Shuffle in previous releases.
Since 1.4.0, OAP consists of 3 components: Gazelle Plugin, OAP MLlib and CloudTik.
How to use OAP?
Guide
Please refer to OAP project installation and developer guide below for instructions.
Components
You can get more detailed information from each component web page of OAP Project below.