Overview
OAP 1.2.1 has been updated to remove Log4j version 2.13.3 and may not include the latest functional and security updates. OAP 1.3 is targeted to be released in January 2022 and will include additional functional and/or security updates. Customers should update to the latest version as it becomes available.
Optimized Analytics Package (OAP) is an open source project to optimize Apache Spark on cache, shuffle, SQL Engine, MLlib and so on, driven by Intel and the community.
Why use OAP?
Apache Spark is powerful and well optimized on many aspects, but still faces some challenges to achieve the higher-level performance.
-
The JVM and row-based computing engine prevents Spark to be fully optimized for Intel hardware, for example AVX/AVX512, GPU
-
The current implementation of key aspects, such as memory management & shuffle, doesn't consider the latest technology advancements, like PMEM
-
The batch processing engine cannot satisfy the need of queries with high performance requirement.
OAP Project is targeted to optimize Spark on these aspects above, now it has 7 components, including Gazelle Plugin, SQL DS Cache, OAP MLlib, PMem Spill, PMem Common, PMem Shuffle and Remote Shuffle.
How to use OAP?
Guide
Please refer to the total OAP project installation and developer guide below.
Components
You can get more detailed information from each module web page of OAP Project below.