User Defined Functions (UDF) and User Defined Aggregate Functions (UDAF)¶
Users can define a function and completely customize how SnappyData evaluates data and manipulates queries using UDF and UDAF functions across sessions. The definition of the functions is stored in a persistent catalog, which enables it to be used after node restart as well.
Note
Support for UDF is available from SnappyData version release 0.8 onwards.
Create User Defined Function¶
You can simply extend any one of the interfaces in the package org.apache.spark.sql.api.java. These interfaces can be included in your client application by adding snappy-spark-sql_2.11-2.0.3-2.jar to your classpath.
Define a User Defined Function class¶
The number of the interfaces (UDF1 to UDF22) signifies the number of parameters a UDF can take.
Note
Currently, any UDF which can take more than 22 parameters is not supported.
package some.package
import org.apache.spark.sql.api.java.UDF1
class StringLengthUDF extends UDF1[String, Int] {
override def call(t1: String): Int = t1.length
}
Create a User Defined Function¶
Note
Place the jars used for creating persistent UDFs in a shared location (NFS, HDFS etc.) if you are configuring multiple leads for high availability. The same jar is used for DDL replay while the standby lead becomes the active lead.
After defining a UDF you can bundle the UDF class in a JAR file and create the function by using ./bin/snappy-sql
of SnappyData. This creates a persistent entry in the catalog after which, you use the UDF.
CREATE FUNCTION udf_name AS qualified_class_name RETURNS data_type USING JAR '/path/to/file/udf.jar'
For example:
CREATE FUNCTION APP.strnglen AS some.package.StringLengthUDF RETURNS Integer USING JAR '/path/to/file/udf.jar'
You can write a JAVA or SCALA class to write a UDF implementation.
Note
For input/output types: The framework always returns the Java types to the UDFs. So, if you are writing scala.math.BigDecimal
as an input type or output type, an exception is reported. You can use java.math.BigDecimal
in the SCALA code.
Return Types to UDF program type mapping
SnappyData Type | UDF Type |
---|---|
STRING | java.lang.String |
INTEGER | java.lang.Integer |
LONG | java.lang.Long |
DOUBLE | java.lang.Double |
DECIMAL | java.math.BigDecimal |
DATE | java.sql.Date |
TIMESTAMP | java.sql.Timestamp |
FLOAT | java.lang.Float |
BOOLEAN | java.lang.Boolean |
SHORT | java.lang.Short |
BYTE | java.lang.Byte |
Use a User Defined Function¶
If you try to use a UDF on a different type of column, for example, an Int column an exception is reported.
Drop the Function¶
For example:
Modify an Existing User Defined Function¶
1) Drop the existing UDF
2) Modify the UDF code and create a new UDF. You can create the UDF with the same name as that of the dropped UDF.
Create User Defined Aggregate Functions¶
SnappyData uses the same interface as that of Spark to define a User Defined Aggregate Function org.apache.spark.sql.expressions.UserDefinedAggregateFunction
. For more information refer to this document.
Known Limitations¶
-
In the current version of the product, setting schema over a JDBC connection (using the
set schema
command) or SnappySession (usingSnappySession.setSchema
API) does not work in all scenarios. Even if the schema is set, the operations are occasionally performed in the defaultAPP
schema. As a workaround, you can qualify the schemaname with tablename. For example, to select all rows from table 't1' in schema 'schema1', use queryselect * from schema1.t1
-
In the current version of the product, user defined functions are not displayed when you run the SHOW FUNCTIONS command in SnappyData shell. This will be available in the future releases.