ETL (Extract, Transform and Load) tools and software are powerful information management products designed to move and convert data into target repositories for analysis and usage. TechRepublic’s ETL Top Products page covers the concept in depth including business-use cases and product examples.
There is a diverse array of ETL tools on the market to cover a broad set of data manipulation needs. Two prominent ETA software products which can be considered among the best ETL tools are Firebolt and Snowflake.
What is Snowflake?
Snowflake is a cloud data platform available in AWS, Azure and Google cloud providers. It works with data integration tools such as Informatica, Talend, Fivetran, Matillion and others. In fact, it can be integrated with over 140 data sources, data analysis and business intelligence platforms such as Alooma, Sisense, Datom and DBschema.
Snowflake relies on the concept of warehouses, which are clusters of compute resources involving node types that feature memory, storage and CPU usage (note you cannot actually tune node types, only overall warehouses. It also operates based on data lakes, which is an unprocessed amount of raw data in its native formats.
According to the developers, if using Snowflake as a data lake and data warehouse there is no need for the ETL process “as no pre-transformations or pre-schemas are needed.”
Scalability is a strong factor with Snowflake and it offers auto-scaling options to adjust cluster operations as resource processing requires. It can also auto-suspend idle clusters to optimize performance and cost, seeing as how resource utilization is a driving factor with data management.
Job processing, where data is actually manipulated and worked with, is a standard unit of measurement defined in hours per day.
SEE: Feature comparison: Time tracking software and systems (TechRepublic Premium)
Main features of Snowflake
Snowflake documentation lists the top key features as follows:
- Security, governance and data protection: This function controls access to data through strong authentication mechanisms, TLS security, granular access policies, isolation of data and disaster recovery.
- Standard and extended SQL support: SQL support in Snowflake is robust, including compatibility with SQL standards dating back to 1999.
- Tools and interfaces: Snowflake offers a web-based GUI and a SnowSQL Python-based CLI. Warehouses can be managed from either of the above interfaces.
- Connectivity: Connectivity being an integral component of data access and management, Snowflake facilitates the use of connectors, which work with Python, Spark, Node.js, Go Snowflake, .NET, JDBC, ODBC and PHP PDO.
- Data import and export: A broad array of data import/export mechanisms are made possible by Snowflake; data using supported character encoding, compressed files, local or cloud storage files, and CSV, TSV, JSON, Avro, ORC, Parquet and XML formats.
- Data sharing: This function entails sharing information with other Snowflake accounts or vice versa.
- Database replication and failover: Snowflake protects databases with this option to replicate and sync databases among Snowflake accounts in the same or different regions.
SEE: Snowflake data warehouse platform: A cheat sheet (TechRepublic)
Snowflake dashboards let you monitor user activity per warehouse, per database and over time:
Starter templates allow you to easily monitor elements such as performance out of the box as well.
Price: Snowflake offers a free trial with $400 worth of usage. Their pricing model is complex due to the numerous functions involved with storing and processing data across different organizational sizes and requirements. This is standard for ETL software and ETL tools for big data. However, to summarize, cost is based upon the storage and resources used.
The pricing guide covers these details and includes two helpful pricing examples whereby a small company with eight users with similar needs storing 5TB of compressed data working over a 10 hour daily time slot would cost $22,878 per year. A larger company with 17 users with differing needs storing 65TB of compressed data working over an 11 hour daily time slot would cost $118,807.20 per year.
Snowflake also provides a handy Computer Cost interface to help you assess cost by warehouse and over time.
What is Firebolt?
Firebolt is an AWS cloud data warehouse product competitor to Snowflake that integrates with Looker, Tableau and the Sisense business intelligence platforms. Its developers promote the speed and performance of the product to achieve sub-second analytics processing. According to database architect Robert Meyer, “Firebolt has been up to 182x faster than any alternatives. One customer achieved 3x faster performance and 10x lower cost, or a 30x price-performance advantage compared to their Snowflake deployment.”
Meyer also aggressively touted Firebolt’s advantages in the realm of superior data manipulation such as “ad hoc analytics, large complex queries against massive data sets, semi-structured data queries and streaming analytics or continuous ingestion.”
Firebolt allows you to tune individual node types to specify desired storage and resource limits.
SEE: Windows 11: Tips on installation, security and more (free PDF) (TechRepublic)
Main features of Firebolt
Firebolt documentation lists these product features which are generally self-explanatory and similar in form and function to Snowflake:
- Access controls/permissions
- Ad hoc Query
- Data capture and transfer
- Data dictionary management
- Data extraction
- Data integration
- Data migration
- Data replication
- Data storage management
- Data transformation
- Database conversion
- Database support
- ETL – extract/transfer/load
- Mobile access
- Multiple programming languages supported
- Performance analysis
- Third party integrations
- Workflow management
Firebolt dashboards let you monitor the vitals such as connectors, services, computing, external data and storage.
Price: Firebolt has a free trial available and its paid pricing model is similar to Snowflake in that it is complex and based on resource consumption. Firebolt’s pricing page also includes subjective examples, referencing one customer paying $3.616 per hour for 2.36 use of data across 64 virtual CPUs and 256GB of RAM. Compute costs start from less than $1/hr and the base storage cost is $3/hr for “as much data as you need” with an average data consumption being ~23TB.
How to decide between the two data warehousing platforms
Snowflake appears more of a basic product for a more common array of needs where performance is less critical than achieving data results. This seems a good fit for smaller shops with a standard array of requirements. To reiterate, it has a broader base of availability in terms of running on AWS, Azure and Google Cloud. It can also integrate with many more data sources and BI platforms than Firebolt.
Firebolt’s strengths lie in performance and flexibility. Independent reviews such as Hevodata.com have confirmed that Firebolt’s speed is superior to other providers including Snowflake. Snowflake does not use indexing, whereas Firebolt does so, along with a higher octane blend of query performance. Firebolt lets you tune individual node types, whereas Snowflake limits you to tuning warehouses only.
Firebolt was developed for AWS, so it’s important to contrast this with Snowflake which is available for AWS, Azure and Google Cloud.
Larger shops or businesses with a more diverse set of needs which depend on rigorous, detailed data massaging and rapid analytical results would likely fare better with Firebolt, which also seems to have a more tolerant pricing structure.