Using Spark Scala APIs¶

Create a SnappySession SnappySession extends the SparkSession so you can mutate data, get much higher performance, etc.

scala> val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
// Import Snappy extensions
scala> import snappy.implicits._

Create a dataset using the Spark APIs

scala> val ds = Seq((1,"a"), (2, "b"), (3, "c")).toDS()

Define a schema for the table

scala>  import org.apache.spark.sql.types._
scala>  val tableSchema = StructType(Array(StructField("CustKey", IntegerType, false),
          StructField("CustName", StringType, false)))

Create a "column" table with a simple schema [String, Int] and default options For detailed option refer to the Row and Column Tables section.

// Column tables manage data is columnar form and offer superior performance for analytic class queries.
scala>  snappy.createTable(tableName = "colTable",
          provider = "column", // Create a SnappyData Column table
          schema = tableSchema,
          options = Map.empty[String, String], // Map for options
          allowExisting = false)

SnappyData (SnappySession) extends SparkSession, so you can simply use all the Spark's APIs.

Insert the created DataSet to the column table "colTable"

scala>  ds.write.insertInto("colTable")
// Check the total row count.
scala>  snappy.table("colTable").count

Create a row object using Spark's API and insert the row into the table Unlike Spark DataFrames SnappyData column tables are mutable. You can insert new rows to a column table.

// Insert a new record
scala>  import org.apache.spark.sql.Row
scala>  snappy.insert("colTable", Row(10, "f"))
// Check the total row count after inserting the row
scala>  snappy.table("colTable").count

Create a "row" table with a simple schema [String, Int] and default options For detailed option refer to the Row and Column Tables section.

// Row formatted tables are better when datasets constantly change or access is selective (like based on a key)
scala>  snappy.createTable(tableName = "rowTable",
provider = "row",
schema = tableSchema,
options = Map.empty[String, String],
allowExisting = false)

Insert the created DataSet to the row table "rowTable"

scala>  ds.write.insertInto("rowTable")
// Check the row count
scala>  snappy.table("rowTable").count

Insert a new record

scala>  snappy.insert("rowTable", Row(4, "d"))
//Check the row count now
scala>  snappy.table("rowTable").count

Change some data in the row table

// Updating a row for customer with custKey = 1
scala>  snappy.update(tableName = "rowTable", filterExpr = "CUSTKEY=1",
                newColumnValues = Row("d"), updateColumns = "CUSTNAME")

scala>  snappy.table("rowTable").orderBy("CUSTKEY").show

// Delete the row for customer with custKey = 1
scala>  snappy.delete(tableName = "rowTable", filterExpr = "CUSTKEY=1")

// Drop the existing tables
scala>  snappy.dropTable("rowTable", ifExists = true)
scala>  snappy.dropTable("colTable", ifExists = true)