Snowflake on a Shoestring Budget

The proverbial wagon is packed and your organization has decided to move on from the current, on-premise data warehouse. Perhaps the DBMS software is too old or expensive. Perhaps there are constant complaints about the performance.  Perhaps the data in the current warehouse is corrupt and inconsistent and too few business users trust it.  Unfortunately, the reality is that the hardware, licensing cost and employee training to improve the on-premise warehouse presents an insurmountable cost to your organization.

However, there is hope. You’ve discovered, researched and and gained approval to implement a cloud warehousing service from Snowflake. If you could simply wave a magic wand and implement your warehouse into Snowflake instantaneously, many of the current issues plaguing your organization because of your on-premise warehouse would be solved. While you have budget approved for Snowflake itself, there isn’t much room for the necessary surrounding infrastructure.  Is there a way to implement and run Snowflake in a cost effective manner to push past this final barrier for entry to a stable and robust data-driven future?

Yes there is! Here are various tips, tricks and ideas that will help to minimize cost and springboard your warehouse improvement process forward.

If you’ve already explored Snowflake, this tip won’t be surprising, but it is worth repeating – Snowflake is extremely scalable. This means that you only pay for the horsepower necessary and experience no charge during periods of inactivity.   Therefore, it is fairly simple to find the right balance of power and cost, as a warehouse level change is in effect almost instantaneously.  Also, the unlimited number of SQL processing artifacts, named virtual warehouses in Snowflake, allow clear distinction between consumer groups of your warehouse, allowing for query costs to be born by those who actually incurred them.  This allows those who want faster responses or more frequent loads to bear financial responsibility for those needs.  Make sure users who want more from the Snowflake warehouse understand their options and the costs associated.

A good portion of your Snowflake expenditures occur when the Virtual warehouses are active. A decision to load data frequently can quickly consume credits and cost more than expected. A thoughtful load approach, including careful evaluation around what data has to be loaded near real-time can prevent billing surprises and keep cost to a minimum.

If you are concerned about the cost of an ETL tool to load the warehouse, there are plenty of open source tools available in relatively stable states. My ETL/Scheduler preference for Snowflake is Apache Airflow, listed at Open Source ETL Tools Comparison. It is relatively easy to install and customize, assuming a user with an average level of Python capabilities. It can run in the cloud or an on-premise server and can easily connect to Snowflake via Snowflake’s python connector. You can limit the amount of processing power and memory required by the Airflow server by using it as a Scheduler and avoid processes or logic that require the data to pass through the server. Examples would be Snowflake’s COPY INTO functionality or activating an FTP process that between a source and AWS s3.  Additionally, Astronomer has a hosted cloud solution for Airflow that is relatively inexpensive, scalable and provides the Airflow management for you.

Since Snowflake was created to look and feel similar to traditional data warehousing and the on-premise DBMS systems of the past, IT staff versed in those systems can easily transition to build and support Snowflake. If your staff is unable to learn Snowflake without external paid training, you may want to consider if you have the right folks on this project.

Rather than trying to figure out how to convert all your data, pick a high-impact domain within your current warehouse as a springboard project for your organization’s Snowflake implementation. Look for a domain with as few data sources as possible, but with data consumers who would feel the most impact from a reliable, performing warehouse. This prototype will reveal Snowflake’s value without too much cost, and will generate positive buzz within your organization that can accelerate adoption and additional financial commitment.

No need to purchase SQL developer or DBMS management software to support the Snowflake instance. Snowflake provides a web interface to your databases and artifacts, with all the admin functionality presented in a graphical way. The Snowflake UI also provides a query execution window, but mainly for simple access. Open source DBeaver is a robust database tool that supports all the connecting and queries necessary to develop and load Snowflake, via a JDBC connection.

David Schoel

Principal Architect