Write a Scala UDF in Snowflake with BlackDiamond Studio
by Luis Alonso Montero, on Sep 6, 2022 9:04:12 AM
Snowflake provides a set of built-in functions which can be used to manipulate your data. (In Snowflake's documentation, you can find a big list that describes all of those functions.) There will be situations in which you will want to have User-Defined Functions (also known as UDFs). In Snowflake, you can define those custom functions using any of the following programming languages:
In this tutorial, you will learn how to create a User Defined Function using Scala. If you're not familiar with Scala, it is a functional, object-oriented, statically typed programming language that leverages the power of the Java Virtual Machine. Scala also has seamless interoperability with Java, which allows us to call code from Java libraries in our Scala code. This opens up a lot of possibilities, since Java has a rich ecosystem with lots of useful and mature libraries.
So let's get started writing our own UDF in Scala!
What will our UDF do?
Let's say (just for didactic purposes) that we want a function that will tell us if a string is a palindrome or not. If you are not familiar with the concept of palindromes, a palindrome is a string that reads the same backwards as it reads normally. For instance, the word "radar" is the same even if you reverse the string, which makes it a palindrome.
If you take a look at that same list of built-in functions that Snowflake provides and start searching for such a function, you will not find it (at least, I did not find any function with "PALINDROME" in its name). However, writing an "IS_PALINDROME" function in Scala will not be very complicated. In the following code snippet, we define two functions "reverse" and "is_palindrome". We use "reverse" in "is_palindrome". We should test if this works.
Creating a new BlackDiamond Studio project
We should run this code to test that it works the way we want. We will need to compile our Scala code, run it, and be able to interact with our Snowflake account to create the UDF in that account. BlackDiamond Studio can help us by providing an environment where we can do all that without installing anything. Let's go to the main BlackDiamond Studio page (bds.mobilize.net) and log into our account, or you can create a new account if you don't already have one.
After logging into our account, we will be able to create a new project. We just need to click on the "Add" button to create a new project.
You will be prompted to give your project a name and then continue.
After that, you must choose between using an already existing repo (in your GitHub account, if you have one) or a new repo that will be hosted by BlackDiamond Studio Git provider (powered by GitLab). I would advise to use BlackDiamond Studio git provider for this scenario, so that you do not need to create a new repo on GitHub.
Now, you will be prompted to connect this project to your Snowflake account. You can connect later, but for the purpose of this tutorial, it will be easier to just connect your account now, so click on “Yes, I want to connect to my Snowflake account” and fill the form with the corresponding data. If you are not sure on how to fill one of the boxes, just click on the box and the website will give you some information about the value that you should enter there.
When you have finished filling out the form, you can click on “Connect Database”. If everything went well, the website would tell you so and the "Continue" button will be enabled. Click on “Continue”. The final step to create your project is choosing if you want to use a template, upload some code, start from scratch or extract code from your Snowflake account. Let's click on “Start with a template” and choose the “Basic Scala template”, which will help us get started. Now, click on the “Finish Setup” button and your project will be ready.
You will be redirected to your project's page, and now you can click on the “Open IDE” button to open the online IDE.
Creating the Scala UDF
Now that we have our IDE open, we can add a new file StringUtils.scala to our existing Scala project. Let's create that new file in the folder src/main/scala with this code:
We should test that this code works. Let's modify the Main.scala file that is next to the StringUtils.scala. Replace the existing code with this content:
We should compile this and check that it works. We can go to the “Metals” extension tab by clicking on the "M" on the activity bar on the left side.
Let's click on “Import build” and the compilation of our project will begin.
When the compilation is finished, in the Main.scala file, you will see two buttons that read “run” and “debug”, respectively.
You can click on "run". When you run the code, you should get an output just like this:
This confirms that the code is working correctly, since the palindrome "racecar" was identified as a palindrome and "racebike" was not.
Next, we should create the UDF in our Snowflake account. You can right-click on the build.sbt file and click on “BlackDiamond: Export Scala UDF”.
This will create a template file: target/UDF-Export-Snowflake-Scala.sql.
Now you can replace the placeholders with the actual values that you want to use.
You do not need to replace all the placeholders, there are some placeholders that should just be filled out when creating a stored procedure: the ones in section 3.1 of the template. You can delete that entire section since it is not needed.
I will describe what each placeholder should be replaced with:
- <database-name>: The name of the database in which you will create the UDF. This database should already exist on your account.
- <schema-name>: The name of the schema in which you will create the UDF. This schema should already exist on the previously specified database.
- <stage-name>: The name of the stage to which your packaged Scala code will be uploaded.
- <package>: The name of the package in which the function is declared. In this case, it would be “MyUtils”.
- <class-name>: The name of the class in which the function is declared. In this case, it would be “StringUtils”.
- <function-name>: The name of the function you wrote. In this case, it would be “is_palindrome”.
- <function-parameters>: The names and types of the parameters for the function. In this case, this value should be “text STRING”, since there is only one parameter of type “String” and its name is "text".
- <function-return-type>: The return type for the function. In this case it would be “BOOLEAN”, the equivalent Snowflake data type for Scala's “Boolean” type.
Here is a screenshot of the last part of the template, with the placeholders replaced by the actual values.
When you are finished replacing the placeholders with the actual values, you can right-click on the template file and choose “BlackDiamond: Deploy”, which will cause the SQL script to be run with the credentials you provided while setting up the environment.
When this script finishes running, the UDF will be created in your Snowflake account, in the database and schema you specified. You can test it by running the following commands in the Snowflake console:
You can also run these same commands from BlackDiamond Studio, by writing that code in an SQL file, selecting each statement separately, right clicking and choosing “Run Selected Query”.
You have successfully written your own UDF using the Scala programming language. For more information about this topic, read Snowflake's official documentation about the creation of UDFs.