Working with Data Sources¶
SnappyData relies on the Spark SQL Data Sources API to parallelly load data from a wide variety of sources. Any data source or database that supports Spark to load or save state can be accessed from within SnappyData.
There is built-in support for many data sources as well as data formats. Built-in data sources include - Amazon S3, GCS (Google Cloud storage), Azure Blob store, file systems, HDFS, Hive metastore, RDB access using JDBC, TIBCO Data Virtualization and Pivotal GemFire.
SnappyData supports the following data formats: CSV, Parquet, ORC, Avro, JSON, XML and Text.
You can also deploy other third party connectors using the SQL Deploy
command. Refer Deployment of Third Party Connectors. You will likely find a Spark connector for your data source via the Spark packages portal or doing a web search.
- START HERE - for a quick overview of the concepts and some examples for loading data
- Data Loading examples using Spark SQL/Data Sources API
- Supported Data Formats
- Accessing Cloud Stores
- Connecting to External Hive Metastores
- Using the SnappyData Change Data Capture (CDC) Connector
- Using the SnappyData JDBC Streaming Connector