Working with Data Sources¶

SnappyData relies on the Spark SQL Data Sources API to parallelly load data from a wide variety of sources. Any data source or database that supports Spark to load or save state can be accessed from within SnappyData.

There is built-in support for many data sources as well as data formats. Built-in data sources include - Amazon S3, GCS (Google Cloud storage), Azure Blob store, file systems, HDFS, Hive metastore, RDB access using JDBC, TIBCO Data Virtualization and Pivotal GemFire.

SnappyData supports the following data formats: CSV, Parquet, ORC, Avro, JSON, XML and Text.

You can also deploy other third party connectors using the SQL Deploy command. Refer Deployment of Third Party Connectors. You will likely find a Spark connector for your data source via the Spark packages portal or doing a web search.

Connecting to TIBCO Data Virtualization (TDV)