We Are GAP Mobilize
Free Assessment Tool

SnowConvert for Spark
Qualification Tool Download

Inventory and analyze your Spark workload

The SnowConvert for Spark Qualification Tool is designed to help you understand how close your Spark application is for migration to Snowpark. You will get a file inventory, keyword counts, and spark reference inventory from any codebase you scan with Spark Scala or PySpark code. Use this information to get started on a move from Spark to Snowpark

SnowConvert is now a part of Snowflake. To access SnowConvert for PySpark or SnowConvert for Spark Scala, you will need to follow the information available on the documentation page here: https://docs.mobilize.net/snowconvert/general/getting-started/download-and-access.

You can still fill out the form on this page to learn more about Growth Acceleration Partners' Snowflake consulting services. 

Watch this video to learn how to use the SnowConvert for Spark Qualification Tool:


Below you will see a summary of the output reports generated by the tool. If you'd like to learn more about your results or how much of your workload can be automatically converted from the Spark API to the Snowpark API, reach out to your Snowflake account representative or contact us at info@mobilize.net.

Output Reports

The following reports are going to be output by the assessment tool:

  • File Inventory - This will generate an inventory for all of the files present in the input directory of the tool. This could be any file, not just the ones listed above. You will get a breakdown by filetype that includes the source technology, code lines, comment lines, and size of the source files.
  • Keyword Counts - A count of all keywords that is present broken out by technology. For example, if you have a CREATE statement in a SQL file, this file will keep track of all of them. You will get a count of how many of each keyword you have by filetype.
  • Spark Reference Inventory - Finally, you will get an inventory of every reference to the Spark API present in Scala or Python code. These references will form the basis for assessing the level of conversion that can be applied to a given codebase.

Input Filetypes

The following filetypes and code languages can be input into the assessment tool:

  • Scala (*.scala)
  • Python (*.py and *.ipynb)

Currently, the SnowConvert for Spark Qualification Tool supports Scala Version 2.12 and Python Version 3.8. Other versions can be scanned but may lead to misidentification of keywords and Spark references.

Resources

To learn more about the SnowConvert for Spark Qualification Tool, visit our documentation page on SnowConvert for Spark. To learn more about the strategy for migrating from Spark to Snowpark, review the migration guide put together by Snowflake and Mobilize.Net.

For more information about SnowConvert, you can visit our main page on SnowConvertFor more information about how to use SnowConvert, watch our co-webinar with Snowflake.

Minimum system requirements:

Windows 10 MacOS
.NET Framework 4.6.2 (runtime) Catalina 10.15.6
Java JDK 8 or higher Java JDK 8 or higher
4 GB of RAM 4 GB of RAM

Privacy and Security

Use of the SnowConvert for Spark Qualification Tool is governed by the End User License Agreement (EULA), Terms of Use, and Privacy Policy provided by Mobilize.Net. Each of these can be found in the following locations:

If you have any questions or concerns regarding any of the documents above, please reach out to info@mobilize.net.

Telemetry and Data Collection

The SnowConvert for Spark application requires the user to be able to run the local installer, either in a container or on a local machine, and requires a connection to the internet. This internet connection uses two APIs to perform the following tasks:

  • Validate the license – This only happens once when the program is first run and validates that the input license is valid.
  • Send telemetry data – This sends a few pieces of information back to Mobilize.Net. Everything that is sent is available to review in the output directory generated by the tool. This includes:
    • Execution Summary: This information describes the use of the tool. Things like the license string, an execution identification string, and some basic information about the user. You can view this information in the tool_execution.pam file and the Controller-Log-[timestamp].log file in the output directory.
    • Inventories: There are two that are sent: keyword inventory and the Spark reference inventory. These are both .csv files that you can view in the output (KeywordCounts.csv and SparkReferenceInventory.csv). The keyword counts file can be used to build a “light” assessment of the codebase, and the Spark refence inventory is used to determine the readiness score for a particular workload.

This application scans a user’s code, but it does not send or transmit any of that code back to Mobilize.Net. Only the summary information and the reports listed above are sent back.

External Connections

SnowConvert does not connect to a source or target database. It is run locally and only transmits the information shown above through the telemetry API.

   

Learn more about Snowflake with GAP