Objective:
This guide provides a systematic approach to configuring Apache Spark environments to facilitate metadata collection by leveraging Spline.
Prerequisites:
- Determine the version of Apache Spark currently in use.
- Identify the Scala version associated with your Spark environment.
- Acquire the external IP address provided by Octopai for metadata dispatch.
Configuration Steps:
- Locate the execution environment for your Spark applications, where you run JAR or pySpark commands.
- Append the following parameters to your `spark-shell` command:
spark-shell \
--packages za.co.absa.spline.agent.spark:spark-<x.x>-spline-agent-bundle_<y.yy>:2.0.0 \
--conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" \
--conf "spark.spline.lineageDispatcher.http.producer.url=http://<external-IP>:9090/producer"
- Replace `x.x` with your specific Spark version.
- Substitute `y.yy` with the corresponding Scala version in use.
- Replace `<external-IP>` with the external IP address provided by Octopai.
Following these instructions will enable your Spark environment to collect metadata using Spline effectively.
Comments
0 comments
Please sign in to leave a comment.