Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.64 KB

spark_session.md

File metadata and controls

36 lines (25 loc) · 1.64 KB

Where's The Spark Session?

"The entry point into all functionality in Spark is the SparkSession class."
Spark's Official Getting Started

Most Geni functions for dataset creation (including reading data from different sources) use a Spark session in the background. For instance, it is optional to pass a Spark session to the function g/read-csv! as the first argument. When a Spark session is not present, Geni uses the default Spark session that can be found here. The default is designed to optimise for the out-of-the-box experience.

Note that the default Spark session is a delayed object that never gets instantiated unless invoked by these dataset-creation functions.

Creating A Spark Session

The following Scala Spark code:

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .master("local")
  .appName("Basic Spark App")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

translates to:

(require '[zero-one.geni.core :as g])

(g/create-spark-session
  {:master   "local"
   :app-name "Basic Spark App"
   :configs  {:spark.some.config.option "some-value"}})

It is also possible to specify :log-level and :checkpoint-dir, which are set at the SparkContext level. By default, Spark sets the log-level to INFO. In contrast, Geni sets it to WARN for a less verbose default REPL experience.