SnappyData Spark Extension API Reference Guide¶
This guide gives details of Spark extension APIs that are provided by SnappyData. The following APIs are included:
SnappySession APIs | DataFrameWriter APIs | SnappySessionCatalog APIs |
---|---|---|
sql Query Using Cached Plan | putIntoPut Dataframe Content into Table | getKeyColumns Get Key Columns of SnappyData table |
sqlUncachedQuery Using Fresh Plan | deleteFromDelete DataFrame Content from Table | getKeyColumnsAndPositions Gets primary key or key columns with their position in the table. |
createTableCreate SnappyData Managed Table | ||
createTableCreate SnappyData Managed JDBC Table | ||
truncateTable Empty Contents of Table | ||
dropTable Drop SnappyData Table | ||
createSampleTableCreate Stratified Sample Table | ||
createApproxTSTopKCreate Structure to Query Top-K | ||
setCurrentSchemaSet Current Database/schema | ||
getCurrentSchemaGet Current Schema of Session | ||
insertInsert Row into an Existing Table | ||
putUpsert Row into an Existing Table | ||
updateUpdate all Rows in Table | ||
deleteDelete all Rows in Table | ||
queryApproxTSTopKFetch the TopK Entries |
SnappySession APIs¶
The following APIs are available for SnappySession.
- sql
- sqlUncached
- createTable
- truncateTable
- dropTable
- createSampleTable
- createApproxTSTopK
- setSchema
- getCurrentSchema
- insert
- put
- delete
- queryApproxTSTopK
sql¶
You can use this API to run a query with a cached plan for a given SQL.
Syntax
Parameters
Parameter | Description |
---|---|
sqlText | The SQL string required to execute. |
Returns | Dataframe |
Example
sqlUncached¶
You can use this API to run a query using a fresh plan for a given SQL String.
Syntax
Parameters
Parameter | Description |
---|---|
sqlText | The SQL string required to execute. |
Returns | Dataframe |
Example
createTable¶
Creates a SnappyData managed table. Any relation providers, that is the row, column etc., which are supported by SnappyData can be created here.
Syntax
createTable(
tableName: String,
provider: String,
schema: StructType,
options: Map[String, String],
allowExisting: Boolean)
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
provider | Provider name such as ‘ROW’, ‘COLUMN’' etc. |
schema | The table schema. |
options | Properties for table creation. For example, partition_by, buckets etc. |
allowExisting | When set to true, tables with the same name are ignored, else an AnalysisException is thrown stating that the table already exists. |
Returns | Dataframe |
Example
case class Data(col1: Int, col2: Int, col3: Int)
val props = Map.empty[String, String]
val data = Seq(Seq(1, 2, 3), Seq(7, 8, 9), Seq(9, 2, 3), Seq(4, 2, 3), Seq(5, 6, 7))
val rdd = sc.parallelize(data, data.length).map(s => new Data(s(0), s(1), s(2)))
val dataDF = snappySession.createDataFrame(rdd)
snappySession.createTable(tableName, "column", dataDF.schema, props)
createTable¶
Creates a SnappyData managed JDBC table which takes a free format DDL string. The DDL string should adhere to the syntax of the underlying JDBC store. SnappyData ships with an inbuilt JDBC store, which can be accessed by the data store of Row format. The options parameter can take connection details.
Syntax
Syntax:
createTable(
tableName: String,
provider: String,
schemaDDL: String,
options: Map[String, String],
allowExisting: Boolean)
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
provider | Provider name such as ‘ROW’, ‘COLUMN’' etc. |
schemaDDL | The table schema as a string interpreted by the provider. |
options | Properties for table creation. For example, partition_by, buckets etc. |
allowExisting |
Example
val props = Map(
"url" -> s"jdbc:derby:$path",
"driver" -> "org.apache.derby.jdbc.EmbeddedDriver",
"poolImpl" -> "tomcat",
"user" -> "app",
"password" -> "app"
)
val schemaDDL = "(OrderId INT NOT NULL PRIMARY KEY,ItemId INT, ITEMREF INT)"
snappySession.createTable("jdbcTable", "jdbc", schemaDDL, props)
truncateTable¶
Empties the contents of the table without deleting the catalog entry.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
ifExists | Attempt truncate only if the table exists. |
Returns | Dataframe |
Example
dropTable¶
Drop a SnappyData table created by a call to SnappySession.createTable, Catalog.createExternalTable or Dataset.createOrReplaceTempView.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
ifExists | Attempts drop only if the table exists. |
Returns | Unit |
Example
createSampleTable¶
Creates a stratified sample table.
Note
This API is not supported in the Smart Connector mode.
Syntax
createSampleTable(tableName: String,
baseTable: Option[String],
samplingOptions: Map[String, String],
allowExisting: Boolean)
Parameters
Parameter | Description |
---|---|
tableName | The qualified name of the table. |
baseTable | The base table of the sample table, if any. |
samplingOptions | sampling options such as QCS, reservoir size etc. |
allowExisting | When set to true, tables with the same name are ignored, else a table exist exception is shown. |
Returns | Dataframe |
Example
snappySession.createSampleTable("airline_sample", Some("airline"), Map("qcs" -> "UniqueCarrier ,Year_ ,Month_", "fraction" -> "0.05", "strataReservoirSize" -> "25", "buckets" -> "57"),
allowExisting = false)
createApproxTSTopK¶
Creates an approximate structure to query top-K with time series support.
Note
This API is not supported in the Smart Connector mode.
Syntax
createApproxTSTopK(topKName: String, baseTable: Option[String], keyColumnName: String, inputDataSchema: StructType, topkOptions: Map[String, String], allowExisting: Boolean = false)
Parameters
Parameter | Description |
---|---|
topKName | The qualified name of the top-K structure. |
baseTable | The base table of the top-K structure, if any. |
keyColumnName | Top-k key column for aggregation |
inputDataSchema | Schema of input data |
topkOptions | Extra options including the following:frequencyCol: column to use for top-k frequency count.epoch, timeInterval: start and interval for collecting samples.timeSeriesColumn: A column that accurately records timestamps for better handling of time range queries |
allowExisting | When set to true, tables with the same name are ignored, else a table exist exception is shown. |
Returns | Dataframe |
Example
setCurrentSchema¶
Sets the current database/schema.
Syntax
Parameters
Parameter | Description |
---|---|
schema | schema name which goes into the catalog. |
Returns | Unit |
Example
getCurrentSchema¶
Gets the current schema of the session.
Syntax
Example
Returns
String
insert¶
Inserts one or more row into an existing table.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Table name for the insert operation. |
Rows | List of rows to be inserted into the table. |
Returns | Int |
Example
put¶
Upserts one or more row into an existing table. Only works for row tables.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Table name for the put operation |
rows | List of rows to be put into the table. |
Returns | Int |
Example
update¶
Updates all the rows in the table that match passed filter expression. This works only for row tables.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Th table name which needs to be updated. |
filterExpr | SQL WHERE criteria to select rows that will be updated. |
newColumnValues | A single row containing all the updated column values. They MUST match the updateColumn: list passed. |
updateColumns | List of all column names that are updated. |
Returns | Int |
Example
delete¶
Deletes all the rows in the table that match passed filter expression. This works only for row tables.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
filterExpr | SQL WHERE criteria to select rows that will be updated. |
Returns | Int |
Example
queryApproxTSTopK¶
Fetches the topK entries in the Approx TopK synopsis for the specified time interval. The time interval specified here should not be less than the minimum time interval used when creating the TopK synopsis.
Note
This API is not supported in the Smart Connector mode.
Syntax
Parameters
Parameter | Description |
---|---|
topKName | The topK structure that is to be queried. |
startTime | Start time as string in the format yyyy-mm-dd hh:mm:ss. If passed as null, the oldest interval is considered as the start interval. |
endTime | End time as string in the format yyyy-mm-dd hh:mm:ss. If passed as null, the newest interval is considered as the last interval. |
k | Optional. The number of elements to be queried. This is to be passed only for stream summary |
Returns | Dataframe |
Example
DataFrameWriter APIs¶
The following APIs are available for DataFrameWriter:
putInto¶
Puts the content of the DataFrame into the specified table. It requires that the schema of the DataFrame is the same as the schema of the table. Column names are ignored while matching the schemas and put into operation is performed using position based resolution. If some rows are already present in the table, then they are updated. Also, the table on which putInto is implemented should have defined key columns, if its a column table. If it is a row table, then it should have defined a primary key.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
Returns | Unit |
Example
deleteFrom¶
The deleteFrom
API deletes all those records from given snappy table which exists in the input Dataframe. Existence of the record is checked by comparing the key columns (or the primary keys) values.
To use this API, key columns(for column table) or primary keys(for row tables) must be defined in the SnappyData table.
Also, the source DataFrame must contain all the key columns or primary keys (depending upon the type of snappy table). The column existence is checked using a case-insensitive match of column names. If the source DataFrame contains columns other than the key columns, it will be ignored by the deleteFrom
API.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
Returns | Unit |
Example
SnappySessionCatalog APIs¶
The following APIs are available for SnappySessionCatalog:
Note
These are developer APIs and are subject to change in the future.
getKeyColumns¶
Gets primary key or key columns of a SnappyData table.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
Returns | Sequence of key columns (for column tables) or sequence of primary keys (for row tables). |
Example |
getKeyColumnsAndPositions¶
Gets primary key or key columns of a SnappyData table along with their position in the table.
Syntax
Parameters
Parameter | Description |
---|---|
tableName | Name of the table. |
Returns | Sequence of scala.Tuple2 containing column and column's position in the table for each key columns (for column tables) or sequence of primary keys (for row tables). |
Example