Create SnappyData Cluster¶

In this section, you will get a quick tour to start a SnappyData cluster and try out the basic features and functionalities. The following items are covered in this guide:

Start SnappyData Cluster
Check SnappyData Cluster Status
Connect/Disconnect SnappyData Shell
Create Tables
Create Tables and Import Data Using quickstart Scripts
Create a Column Table using an External Table
Run Queries
Submit Snappy Jobs
Add Servers to Cluster
Rebalance Data on Servers
Stop Cluster

Start SnappyData Cluster¶

Navigate to the SnappyData product root directory to start the cluster. Run the ./sbin/snappy-start-all.sh script to start the SnappyData cluster on your single machine using default settings. This starts a lead node, a locator, and a data server.

$./sbin/snappy-start-all.sh

Logs generated in /home/xyz/<snappydata_install_dir>/work/localhost-locator-1/snappylocator.log
SnappyData Locator pid: 9086 status: running
  Distributed system now has 1 members.
  Started Thrift locator (Compact Protocol) on: localhost/127.0.0.1[1527]
Logs generated in /home/xyz/<snappydata_install_dir>/work/localhost-server-1/snappyserver.log
SnappyData Server pid: 9220 status: running
  Distributed system now has 2 members.
  Started Thrift server (Compact Protocol) on: localhost/127.0.0.1[1528]
Logs generated in /home/xyz/<snappydata_install_dir>/snappy/work/localhost-lead-1/snappyleader.log
SnappyData Leader pid: 9370 status: running
  Distributed system now has 3 members.
  Starting job server on: 0.0.0.0[8090]

You can connect to Snappy SQL shell and run the select id, kind, netservers from sys.members; query to view the cluster members.

    ./bin/snappy
    connect client '127.0.0.1:1527';
    select id, kind, netservers from sys.members;

    ID                                                                                                         |KIND                  |NETSERVERS
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    127.0.0.1(3688)<v1>:32604                                                                                  |datastore(normal)     |localhost/127.0.0.1[1528]
    127.0.0.1(3842)<v2>:25366                                                                                  |accessor(normal)      |                         
    127.0.0.1(3431:locator)<ec><v0>:41406                                                                      |locator(normal)       |localhost/127.0.0.1[1527]

In the output, the details of the cluster members are displayed. Here in the KIND column, the member corresponding to accessor is the lead node. In a cluster, you can connect as a client to any member by specifying localhost with the unique port number of the member (the one specified in the NETSERVERS column corresponding to each member). However, connecting to the locator provides basic load balancing by routing the connection request to an available server member.

Check SnappyData Cluster Status¶

You can check the status of a running cluster using the following command:

$ ./sbin/snappy-status-all.sh
SnappyData Locator pid: 9748 status: running
SnappyData Server pid: 9887 status: running
SnappyData Leader pid: 10468 status: running

Connect to Snappy SQL shell to perform various SQL operations.

Alternatively , you can access the SnappyData Monitoring Console by entering http:// <leadhost>:5050/dashboard/ in the web browser. For example, http://localhost:5050/dashboard/. <leadhost> is the hostname or IP of the lead node in your cluster which is provided in the conf/leads file. On the SnappyData Monitoring Console dashboards, after starting a cluster, you can check the status of each of the cluster member.

Connect/Disconnect to SnappyData Shell¶

After starting the SnappyData cluster, run these commands together to start the SnappyData shell:

./bin/snappy
connect client '127.0.0.1:1527';

Type exit; or press CTRL + C to disconnect the SnappyData Shell.

Create Tables¶

Create a simple table and insert a few rows. By default, if no options are provided, row replicated table is formed. However, you can create tables using the row or column option.

# Create a table named quicktable. A replicated row table is formed without/empty options by default.
CREATE TABLE quicktable (id int generated always as identity, item char(25));

# Create column table. A column table is created which is partitioned by default.
CREATE TABLE  quicktable_col (id int, item varchar(25)) using column options();

# Create partitioned Row table using the partition by options. A partitioned row table is created with partitioning scheme on the 'id' column. 
CREATE TABLE quicktable_row (id int generated always as identity, item char(25)) using row options(partition_by 'id');

# Insert one row into the table.
INSERT into quicktable values (default, 'widget');

# Insert one more row into the table.
INSERT into quicktable values (default, 'gadget');

# View the contents of the table.
select * from quicktable;
ID         |ITEM                     
-------------------------------------
2          |gadget                
1          |widget

2 rows selected

# SnappyData replicates the row tables that are created by default onto the data store members. 
# You can validate this using the following query:

select tablename, datapolicy from sys.systables where tablename='QUICKTABLE';

#The following output is displayed:

TABLENAME                                                                                                                       |DATAPOLICY              
---------------------------------------------------------------------------------------------------------------------------------------------------------
QUICKTABLE                                                                                                                      |PERSISTENT_REPLICATE

Create Tables and Import Data Using quickstart Scripts¶

SnappyData contains various quickstart scripts that can be used to run some basic functionalities. For example, you can run the create_and_load_column_table.sql script. This script first drops the table if it exists and then creates an external table named STAGING AIRLINE to load the formatted data from a parquet file. Then a column table is created with only the specified columns from this external table. Connect to SnappyData shell before running these scripts. These script files must be run from within the quickstart directory if you are providing the relative path as shown:

        ./bin/snappy
        connect client '127.0.0.1:1527';
        RUN 'quickstart/scripts/create_and_load_column_table.sql';

        # Use the following command to view the details of the external table. 
        describe staging_airline;

        # The following output is displayed:
        COLUMN_NAME         |TYPE_NAME|DECIMAL_DIGITS|NUM_PREC_RADIX|COLUMN_SIZE|COLUMN_DEF|CHAR_OCTET_LENGTH|IS_NULLABLE
        -----------------------------------------------------------------------------------------------------------------
        Year                |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        Month               |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        DayOfMonth          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        DayOfWeek           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        DepTime             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        CRSDepTime          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        ArrTime             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        CRSArrTime          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        UniqueCarrier       |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
        FlightNum           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        TailNum             |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
        ActualElapsedTime   |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        CRSElapsedTime      |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        AirTime             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        ArrDelay            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        DepDelay            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        Origin              |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
        Dest                |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
        Distance            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        TaxiIn              |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        TaxiOut             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        Cancelled           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        CancellationCode    |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
        Diverted            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        CarrierDelay        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        WeatherDelay        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        NASDelay            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        SecurityDelay       |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        LateAircraftDelay   |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
        ArrDelaySlot        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES

        # Use the following command to check the number of records in the staging_airline external table:

        select count(*) from staging_airline;
        count(1)            
        --------------------
        1000000

        1 row selected

You can also try the following:

Create and load a row table:

RUN './quickstart/scripts/create_and_load_row_table.sql';

View the status of the system:

RUN './quickstart/scripts/status_queries.sql';

Create a Column Table Using an External Table¶

Similarly as the quickstart scripts, you can try to create an external table named staging_airline to load the formatted data from a airlineParquetData file with inferSchema option as true. Later, you can create a column table named airline and pull data from the external table into this table. After pulling in the data, you can check the number of records in the table.

    CREATE EXTERNAL TABLE STAGING_AIRLINE USING parquet OPTIONS(path '../../quickstart/data/airlineParquetData', inferSchema 'true');

        CREATE TABLE AIRLINE2 USING column AS (SELECT * FROM STAGING_AIRLINE);

        describe airline2;
    COLUMN_NAME         |TYPE_NAME|DECIMAL_DIGITS|NUM_PREC_RADIX|COLUMN_SIZE|COLUMN_DEF|CHAR_OCTET_LENGTH|IS_NULLABLE
    -----------------------------------------------------------------------------------------------------------------
    YEAR                |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    MONTH               |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    DAYOFMONTH          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    DAYOFWEEK           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    DEPTIME             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CRSDEPTIME          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    ARRTIME             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CRSARRTIME          |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    UNIQUECARRIER       |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
    FLIGHTNUM           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    TAILNUM             |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
    ACTUALELAPSEDTIME   |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CRSELAPSEDTIME      |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    AIRTIME             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    ARRDELAY            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    DEPDELAY            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    ORIGIN              |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
    DEST                |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
    DISTANCE            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    TAXIIN              |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    TAXIOUT             |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CANCELLED           |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CANCELLATIONCODE    |VARCHAR  |NULL          |NULL          |32672      |NULL      |65344            |YES        
    DIVERTED            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    CARRIERDELAY        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    WEATHERDELAY        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    NASDELAY            |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    SECURITYDELAY       |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    LATEAIRCRAFTDELAY   |INTEGER  |0             |10            |10         |NULL      |NULL             |YES        
    ARRDELAYSLOT        |INTEGER  |0             |10            |10         |NULL      |NULL             |YES

    30 rows selected

    select count(*)
    > ;
    count(1)            
    --------------------
    1

    1 row selected

After running these queries, you can check the table details on the SnappyData Monitoring Console Dashboards. The details of the newly created tables are displayed in the Tables section.

External Table & Tables on Dashboard

Run Queries¶

You can try a couple of analytical queries as shown:

Query to find the average arrival delay.

select avg(arrdelay) from airline;

avg(ARRDELAY)         
----------------------
6.735443              

1 row selected

* Query for avg arrival delay of a specific airline.

select max(arrdelay) from airline where DEST = '';

max(ARRDELAY)
-------------
NULL         

1 row selected

Submit Snappy Jobs¶

Instructions to submit Snappy jobs are available here. There are sample jobs available for reference. You can import the examples into a separate independent gradle project and submit the jobs to the cluster or run within an IDE. Refer to the instructions here.

Add Servers into Cluster¶

You can add more than one server to a cluster. To add a new server, do the following:

Go to SnappyData home directory.cd <snappydata_install_dir>
Create a configuration file named servers in the conf folder in the the SnappyData home directory. To do so, you can copy the existing template files servers.template and rename it to servers as shown:cp -f conf/servers.template conf/servers
Open this file using a vi editor and add a hostname entry of the additional server, after the entry of the primary server, and save the file. For example, suppose there is an entry localhost in this file for the primary server. You can add an entry localhost below this entry for the additional server. The servers file should contain the hostnames of the nodes (one per line) where you intend to start the member.
From the SnappyData home directory, start the cluster again using the ./sbin/snappy-start-all.sh command. The new server gets started. Ignore the error messages of the other nodes that are already running. You can check details of the newly added member from the SnappyData Monitoring Console.

Rebalancing Data on Servers¶

Further, you can distribute the data among the servers in the cluster. This ensures that each server carries almost equal data. To balance the data equally on the servers, do the following:

Go to SnappyData home directory.cd <snappydata_install_dir>
Connect to snappy shell and obtain the jdbc client connection.
Run the rebalance command. call sys.rebalance_all_buckets();
On SnappyData Monitoring Console, check the Heap Memory Used/Total column for the servers. You will notice that before rebalancing the data, there was an unequal distribution of the memory usage and after running the rebalance command, the data is distributed equally among both the servers.

Before Rebalance

After Rebalance

Stop the Cluster¶

You can stop the cluster using the ./sbin/snappy-stop-all.sh command:

./sbin/snappy-stop-all.sh
The SnappyData Leader has stopped.
The SnappyData Server has stopped.
The SnappyData Locator has stopped.

For more details, refer to Stopping the Cluster