Developer Guide
This document is a supplement to the whole OAP Developer Guide for SQL Index and Data Source Cache. After following that document, you can continue more details for SQL Index and Data Source Cache.
Building
Building SQL DS Cache
Building with Apache Maven*.
Before building, install PMem-Common locally:
git clone -b <tag-version> https://github.com/oap-project/pmem-common.git
cd pmem-common
mvn clean install -DskipTests
Build the SQL DS Cache package:
git clone -b <tag-version> https://github.com/oap-project/sql-ds-cache.git
cd sql-ds-cache
mvn clean -DskipTests package
Running Tests
Run all the tests:
mvn clean test
Run a specific test suite, for example OapDDLSuite
:
mvn -DwildcardSuites=org.apache.spark.sql.execution.datasources.oap.OapDDLSuite test
NOTE: Log level of unit tests currently default to ERROR, please override oap-cache/oap/src/test/resources/log4j.properties if needed.
Building with Intel® Optane™ DC Persistent Memory Module
Prerequisites for building with PMem support
Install the required packages on the build system:
memkind installation
The memkind library depends on libnuma
at the runtime, so it must already exist in the worker node system. Build the latest memkind lib from source:
git clone -b v1.10.1 https://github.com/memkind/memkind
cd memkind
./autogen.sh
./configure
make
make install
vmemcache installation
To build vmemcache library from source, you can (for RPM-based linux as example):
git clone https://github.com/pmem/vmemcache
cd vmemcache
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCPACK_GENERATOR=rpm
make package
sudo rpm -i libvmemcache*.rpm
Plasma installation
To use optimized Plasma cache with OAP, you need following components:
(1) libarrow.so
, libplasma.so
, libplasma_java.so
: dynamic libraries, will be used in Plasma client.
(2) plasma-store-server
: executable file, Plasma cache service.
(3) arrow-plasma-3.0.0.jar
: will be used when compile oap and spark runtime also need it.
.so
file and binary file
Clone code from Arrow repo and run following commands, this will installlibplasma.so
,libarrow.so
,libplasma_java.so
andplasma-store-server
to your system path(/usr/lib64
by default). And if you are using Spark in a cluster environment, you can copy these files to all nodes in your cluster if the OS or distribution are same, otherwise, you need compile it on each node.
cd /tmp
git clone https://github.com/oap-project/arrow.git
cd arrow && git checkout arrow-3.0.0-oap
cd cpp
mkdir release
cd release
#build libarrow, libplasma, libplasma_java
cmake -DCMAKE_INSTALL_PREFIX=/usr/ -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_TESTS=on -DARROW_PLASMA_JAVA_CLIENT=on -DARROW_PLASMA=on -DARROW_DEPENDENCY_SOURCE=BUNDLED ..
make -j$(nproc)
sudo make install -j$(nproc)
- arrow-plasma-3.0.0.jar
Run following command, this will install arrow jars to your local maven repo. Besides, you need copy arrow-plasma-3.0.0.jar to$SPARK_HOME/jars/
dir, cause this jar is needed when using external cache.
cd /tmp/arrow/java
mvn clean -q -pl plasma -am -DskipTests install
Building the package
You need to add -Ppersistent-memory
to build with PMem support. For noevict
cache strategy, you also need to build with -Ppersistent-memory
parameter.
cd <path>/pmem-common
mvn clean install -Ppersistent-memory -DskipTests
cd <path>/sql-ds-cache
mvn clean -DskipTests package
For vmemcache cache strategy, please build with command:
cd <path>/pmem-common
mvn clean install -Pvmemcache -DskipTests
cd <path>/sql-ds-cache
mvn clean -DskipTests package
Build with this command to use all of them:
cd <path>/pmem-common
mvn clean install -Ppersistent-memory -Pvmemcache -DskipTests
cd <path>/sql-ds-cache
mvn clean -DskipTests package
Enabling NUMA binding for PMem in Spark
Rebuilding Spark packages with NUMA binding patch
When using PMem as a cache medium apply the NUMA binding patch numa-binding-spark-3.0.0.patch to Spark source code for best performance.
-
Download src for Spark-3.0.0 and clone the src from github.
-
Apply this patch and rebuild the Spark package.
git apply numa-binding-spark-3.0.0.patch
- Add these configuration items to the Spark configuration file $SPARK_HOME/conf/spark-defaults.conf to enable NUMA binding.
spark.yarn.numa.enabled true
NOTE: If you are using a customized Spark, you will need to manually resolve the conflicts.