SparkSession, SnappySession and SnappyStreamingContext¶
Create a SparkSession¶
Spark Context is the main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster and can be used to create RDDs, accumulators and broadcast variables on that cluster.
Spark Session is the entry point to programming Spark with the Dataset and DataFrame API. SparkSession object can be created by using SparkSession.Builder used as below.
SparkSession.builder()
.master("local")
.appName("Word Count")
.config("spark.some.config.option", "some-value")
.getOrCreate()
In environments where SparkSession has been created up front (e.g. REPL, notebooks), use the builder to get an existing session:
Create a SnappySession¶
SnappySession is the main entry point for SnappyData extensions to Spark. A SnappySession extends Spark's SparkSession to work with Row and Column tables. Any DataFrame can be managed as a SnappyData table and any table can be accessed as a DataFrame.
To create a SnappySession:
Note
Dataframe created from Snappy session and Spark session cannot be used across interchangeably.
Create a SnappyStreamingContext¶
SnappyStreamingContext is an entry point for SnappyData extensions to Spark Streaming and it extends Spark's Streaming Context.
To create a SnappyStreamingContext:
SparkSession spark = SparkSession
.builder()
.appName("SparkApp")
.master("master_url")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
Duration batchDuration = Milliseconds.apply(500);
JavaSnappyStreamingContext jsnsc = new JavaSnappyStreamingContext(jsc, batchDuration);
Also, SnappyData can be run in three different modes, Local Mode, Embedded Mode and SnappyData Connector mode. Before proceeding, it is important that you understand these modes. For more information, see Affinity modes.
If you are using SnappyData in LocalMode or Connector mode, it is the responsibility of the user to create a SnappySession. If you are in the Embedded Mode, applications typically submit jobs to SnappyData and do not explicitly create a SnappySession or SnappyStreamingContext. Jobs are the primary mechanism to interact with SnappyData using the Spark API in embedded mode. A job implements either SnappySQLJob or SnappyStreamingJob (for streaming applications) trait.