How to Create Column Tables and Run Queries¶
Column tables organize and manage data in a columnar form such that modern day CPUs can traverse and run computations like a sum or an average fast (as the values are available in contiguous memory).
Refer to the Row and column tables documentation for the complete list of attributes for column tables.
Full source code, for example, to create and perform operations on column table can be found in CreateColumnTable.scala
Create a Column Table using DataFrame API¶
The code snippet below shows how to create a column table using DataFrame API.
Get a SnappySession:
val spark: SparkSession = SparkSession
.builder
.appName("CreateColumnTable")
.master("local[*]")
.getOrCreate
val snSession = new SnappySession(spark.sparkContext)
Define the table schema
val tableSchema = StructType(Array(StructField("C_CUSTKEY", IntegerType, false),
StructField("C_NAME", StringType, false),
StructField("C_ADDRESS", StringType, false),
StructField("C_NATIONKEY", IntegerType, false),
StructField("C_PHONE", StringType, false),
StructField("C_ACCTBAL", DecimalType(15, 2), false),
StructField("C_MKTSEGMENT", StringType, false),
StructField("C_COMMENT", StringType, false)
))
Create the table and load data from CSV
// props1 map specifies the properties for the table to be created
// "PARTITION_BY" attribute specifies partitioning key for CUSTOMER table(C_CUSTKEY)
val props1 = Map("PARTITION_BY" -> "C_CUSTKEY")
snSession.createTable("CUSTOMER", "column", tableSchema, props1)
val tableSchema = snSession.table("CUSTOMER").schema
// insert some data in it
// loading data in CUSTOMER table from a text file with delimited columns
val customerDF = snSession.read.schema(schema = tableSchema).csv("quickstart/src/main/resources/customer.csv")
customerDF.write.insertInto("CUSTOMER")
Create a Column Table using SQL¶
The same table can be created using SQL as shown below:
snSession.sql("CREATE TABLE CUSTOMER ( " +
"C_CUSTKEY INTEGER NOT NULL," +
"C_NAME VARCHAR(25) NOT NULL," +
"C_ADDRESS VARCHAR(40) NOT NULL," +
"C_NATIONKEY INTEGER NOT NULL," +
"C_PHONE VARCHAR(15) NOT NULL," +
"C_ACCTBAL DECIMAL(15,2) NOT NULL," +
"C_MKTSEGMENT VARCHAR(10) NOT NULL," +
"C_COMMENT VARCHAR(117) NOT NULL)" +
"USING COLUMN OPTIONS (PARTITION_BY 'C_CUSTKEY')")
You can execute selected queries on a column table, join the column table with other tables, and append data to it.