How to use Python to Create Tables and Run Queries¶

Developers can write programs in Python to use SnappyData features.

First create a SnappySession:

 from pyspark.sql.snappy import SnappySession
 from pyspark import SparkContext, SparkConf

 conf = SparkConf().setAppName(appName).setMaster(master)
 sc = SparkContext(conf=conf)
 snappy = SnappySession(sc)

Create table using SnappySession:

For complete list of table attributes refer the Programming Guide

# Creating partitioned table PARTSUPP using SQL
snappy.sql("DROP TABLE IF EXISTS PARTSUPP")
# "PARTITION_BY" attribute specifies partitioning key for PARTSUPP table(PS_PARTKEY),
snappy.sql("CREATE TABLE PARTSUPP ( " +
      "PS_PARTKEY     INTEGER NOT NULL PRIMARY KEY," +
      "PS_SUPPKEY     INTEGER NOT NULL," +
      "PS_AVAILQTY    INTEGER NOT NULL," +
      "PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL)" +
      "USING ROW OPTIONS (PARTITION_BY 'PS_PARTKEY' )")

Inserting data in table using INSERT query:

snappy.sql("INSERT INTO PARTSUPP VALUES(100, 1, 5000, 100)")
snappy.sql("INSERT INTO PARTSUPP VALUES(200, 2, 50, 10)")
snappy.sql("INSERT INTO PARTSUPP VALUES(300, 3, 1000, 20)")
snappy.sql("INSERT INTO PARTSUPP VALUES(400, 4, 200, 30)")
# Printing the contents of the PARTSUPP table
snappy.sql("SELECT * FROM PARTSUPP").show()

Update the data using SQL:

# Update the available quantity for PARTKEY 100
snappy.sql("UPDATE PARTSUPP SET PS_AVAILQTY = 50000 WHERE PS_PARTKEY = 100")
# Printing the contents of the PARTSUPP table after update
snappy.sql("SELECT * FROM PARTSUPP").show()

Delete records from the table:

# Delete the records for PARTKEY 400
snappy.sql("DELETE FROM PARTSUPP WHERE PS_PARTKEY = 400")
# Printing the contents of the PARTSUPP table after delete
snappy.sql("SELECT * FROM PARTSUPP").show()

Create table using API: This same table can be created by using createTable API. First create a schema and then create the table, and then mutate the table data using API:

# drop the table if it exists
snappy.dropTable('PARTSUPP', True)

schema = StructType([StructField('PS_PARTKEY', IntegerType(), False),
      StructField('PS_SUPPKEY', IntegerType(), False),
      StructField('PS_AVAILQTY', IntegerType(),False),
      StructField('PS_SUPPLYCOST', DecimalType(15, 2), False)
      ])

 # "PARTITION_BY" attribute specifies partitioning key for PARTSUPP table(PS_PARTKEY)
 snappy.createTable('PARTSUPP', 'row', schema, False, PARTITION_BY = 'PS_PARTKEY')

 # Inserting data in PARTSUPP table using DataFrame
tuples = [(100, 1, 5000, Decimal(100)), (200, 2, 50, Decimal(10)),
         (300, 3, 1000, Decimal(20)), (400, 4, 200, Decimal(30))]
rdd = sc.parallelize(tuples)
tuplesDF = snappy.createDataFrame(rdd, schema)
tuplesDF.write.insertInto("PARTSUPP")
#Printing the contents of the PARTSUPP table
snappy.sql("SELECT * FROM PARTSUPP").show()

# Update the available quantity for PARTKEY 100
snappy.update("PARTSUPP", "PS_PARTKEY =100", [50000], ["PS_AVAILQTY"])
# Printing the contents of the PARTSUPP table after update
snappy.sql("SELECT * FROM PARTSUPP").show()

# Delete the records for PARTKEY 400
snappy.delete("PARTSUPP", "PS_PARTKEY =400")
# Printing the contents of the PARTSUPP table after delete
snappy.sql("SELECT * FROM PARTSUPP").show()

The complete source code for the above example is in CreateTable.py

Related Topics:

Running Python Applications