The landscape of data management is constantly evolving, with organizations facing the challenge of integrating, processing, and analyzing increasingly diverse and extensive datasets. Snowflake’s AI Data Cloud platform offers a modern and flexible approach to address modern data management challenges, providing a unified platform for various data workloads. Based on a recent technical overview, this blog post explores key features of Snowflake, covering everything from getting data into the platform to leveraging advanced AI capabilities and sharing data securely. We will delve into how Snowflake empowers users to streamline data operations and extract valuable insights.

Loading your Data into Snowflake

Snowflake provides flexible and efficient methods for data loading, accommodating various data sources and formats. A common approach involves first placing data into a stage. A stage acts as a storage location for files from which data can be loaded into Snowflake tables. Stages can be internal to Snowflake or external, e.g. your existing cloud storage like AWS S3, Azure Blob Storage, or Google Cloud Storage. This external staging capability is useful for working with data in cloud data lakes.

Snowflake supports loading data from numerous file formats into stages, including structured, semi-structured, and even unstructured types. Supported formats include CSV, JSON, ORC, Avro, Parquet, and XML. The COPY INTO command is used to move data from a stage into a Snowflake table. This SQL command also allows for specifying details about the file format to ensure correct ingestion. For instance, you can define the field delimiter, handle header rows, and specify date formats.

Snowflake also offers options for managing files after loading them. The COPY INTO command has a PURGE option, which automatically removes the source files from the stage after successful ingestion, helping to manage storage space. If you prefer a visual interface, Snowflake’s Snowsight UI includes a data load wizard. This wizard guides users through the process of uploading files to a stage and loading them into a table, providing a user-friendly experience for ingesting data without writing code.

Managing your Data Effectively

Snowflake offers built-in features like Time Travel and Zero Copy Cloning that provide robust capabilities for data recovery, historical analysis, and streamlining development and testing workflows. These features are core to Snowflake’s design philosophy of providing simplicity for the user while managing complexity in the backend. They significantly improve data resilience, enable agile development workflows, and reduce infrastructure costs associated with data duplication and backups.

Time Travel is a feature that allows you to access historical versions of your data. You can query data as it existed at a specific timestamp or even before a particular SQL query was executed, identified by its query ID. This is incredibly useful for recovering data that was accidentally modified or dropped. Snowflake retains historical data for a configurable period, up to 90 days by default, providing a safety net against unintentional data loss or malicious activity. You can easily query data from a specific point in the past using simple syntax, making it simple to compare data across different time periods.

Zero Copy Cloning is another powerful feature for data management. It allows you to create instant, writable copies of databases, schemas, or tables without physically duplicating the data. This capability is revolutionary for creating development, testing, or UAT environments quickly and cost-effectively. Clones are independent of the original object; modifications in the clone do not affect the source, and vice versa. Only when data is modified in either the original or the clone does new storage get consumed by the changes. Zero Copy Cloning can also be combined with Time Travel, allowing you to create a clone of a database or table as it existed at a specific point in the past. This provides immense flexibility for recreating historical states for analysis or development cases.

Handling Diverse Data Types

Snowflake excels at handling diverse data types alongside traditional structured data, allowing organizations to derive value from all their data assets. Snowflake provides native support for working with both semi-structured and unstructured data. This is crucial as a large percentage of data generated by organizations falls into these categories, which are not fully utilized.

For semi-structured data formats like JSON, Avro, Parquet, ORC, and XML, Snowflake utilizes a specialized VARIANT data type. Unlike systems that require distinct data types for each format, Snowflake’s VARIANT type intelligently recognizes, compresses, and optimizes the storage of these data types upon ingestion. This simplifies the process of loading and querying semi-structured data, removing the complexity of managing multiple data type definitions – better than its competitors.

For unstructured data that include files like PDFs, images (including DICOM medical images), videos, audio files (like call center recordings in WAV, MP4, or MP3 formats), and more, Snowflake leverages the concept of stages. As mentioned earlier, stages can point to internal Snowflake storage or external cloud storage locations where your unstructured data resides. This allows you to work with unstructured data without necessarily having to load it into a structured table within Snowflake.

Moreover, Snowflake offers directory tables to interact with files stored in stages. A directory table provides a structured view of the files in a stage, exposing metadata such as file names, sizes, and URLs. This enables you to query information about your unstructured files using SQL. Directory tables are a building block for developing applications and solutions that interact directly with files in storage. Snowflake also provides pre-signed URLs, which are temporary, secure links that allow direct access to files in a stage, useful for building public applications or enabling downloads.

Snowflake also allows you to process and analyze unstructured data by integrating with external services through external functions. This enables scenarios like performing sentiment analysis on audio recordings or facial recognition on images by calling third-party services within Snowflake.

Integrating AI and ML using Cortex AI

Snowflake is heavily investing in bringing Artificial Intelligence (AI) and Machine Learning (ML) capabilities directly into its AI Data Cloud under the brand name “Cortex AI”. This integration allows users to perform advanced analytical tasks on their data without needing to export it to separate AI/ML platforms, simplifying data and AI/ML workflows and enhancing data governance.

Cortex AI provides a range of functionalities catering to different levels of expertise. For users looking to quickly leverage AI without deep technical knowledge, Snowflake offers numerous built-in functions. These functions can perform common tasks such as translating text, summarizing documents, completing text sequences, and generating embeddings. These functions can be easily invoked within Snowflake using SQL or other supported languages like Python – even in notebooks.

In addition, Cortex AI supports the use of various foundational models directly within the platform. This includes hosting models like Llama 4, DeepSeek, and models from Anthropic. Users have the flexibility to utilize these pre-integrated models or bring their own custom models into Snowflake and potentially fine-tune them on their specific datasets. This capability allows organizations to tailor AI models to their unique business needs while keeping the data within the secure Snowflake environment.

Cortex AI also includes specialized tools like Cortex Search for performing searches within your data stored in Snowflake. Another tool called Cortex Analyst assists users in creating LLM-powered applications based on structured data inside Snowflake, e.g., a chatbot for support teams.

A competitor to Amazon Textract, Cortex AI’s solution for working with unstructured data is Document AI. Document AI enables the extraction of structured information from unstructured documents such as PDFs, TIFFs, and other image-based files. This is particularly valuable for organizations with large archives of documents, like geological surveys or invoices, allowing them to automate the process of extracting key data points and populating tables within Snowflake. Document AI provides a way to apply AI directly to your document corpus, transforming unstructured information into actionable data.

Snowflake’s AI/ML Studio is the graphical interface providing a dedicated space for working with these capabilities, including creating custom LLMs, using Cortex analyst and search, and leveraging features like forecasting, anomaly detection, and classification within a single, integrated platform.

Gaining Insights through Observability

Snowflake provides robust observability features that offer deep insights into usage and query execution. At the core of Snowflake’s observability is the Information Schema. This schema, based on the ANSI SQL standard, provides metadata about the objects within each database, such as tables, columns, and row counts. Snowflake enhances the standard Information Schema with additional columns, views, and functions to provide more detailed telemetry and usage information for observability.

Secondly, Snowflake maintains a comprehensive query history, tracking all queries executed against the platform for up to 365 days. This history includes vital details for each query, such as its unique identifier, start and end times, duration, and execution status. This is invaluable for troubleshooting issues, analyzing performance trends, and meeting security and auditing requirements.

Snowflake also offers the query profile for a deeper understanding of how specific queries are executed. The query profile provides a visual representation of the query execution plan, illustrating steps like table scans, joins, and the order of operations. This detailed view helps SQL developers and performance optimizers identify bottlenecks and opportunities to tune their queries for better performance.

Flexi Compute with Virtual Warehouses

Snowflake’s architecture separates compute resources from storage, offering significant flexibility and scalability. The compute layer is managed through virtual warehouses. A virtual warehouse is essentially a cluster of compute resources dedicated to executing queries and other processing tasks within Snowflake. Users have full control over the size of their virtual warehouses, allowing them to allocate the appropriate amount of compute power for different types of data workloads.

A key benefit of virtual warehouses is their ability to automatically scale. Snowflake can automatically add or remove compute resources (scale out or scale down) within a multi-cluster environment based on the workload demand. This elastic scaling ensures that your queries have sufficient compute power during peak periods while also optimizing costs by reducing resources when demand is low.

This separation of compute and storage, combined with the elasticity of virtual warehouses, provides a cost-effective and highly performant foundation for running diverse data workloads on Snowflake.

Building Applications with Streamlit

Snowflake has integrated Streamlit, an open-source Python library for building interactive web applications, directly into its platform. This integration empowers data professionals to move beyond traditional reports and dashboards and build data-driven applications directly within Snowflake.

The Streamlit integration allows developers to create user interfaces and applications using Python, a language widely familiar to data professionals. This enables the creation of custom dashboards, data exploration tools, and other interactive applications directly on the data residing in Snowflake.

Also, Snowflake hosts Streamlit applications within its platform, providing a streamlined deployment and hosting experience. The ability to build applications directly where the data resides simplifies development workflows and reduces the need to move data to external web hosting platforms.

Marketplace and Secure Data Sharing

Snowflake Marketplace serves as a hub where organizations can publish, discover, and access a wide variety of data products including datasets and connectors. This allows data providers to monetize their datasets by making them available to other users – similar to AWS Marketplace. On top of curated datasets, its Marketplace also features data-driven applications and compute platforms.

A core component enabling its Marketplace is Snowflake’s secure, zero-copy data sharing capability. This feature allows organizations to securely share live data with other Snowflake accounts without copying or moving data. This is a highly efficient and cost-effective way to exchange data with partners, customers, or even different departments within the same organization.

Snowflake supports both public sharing through the Marketplace and private sharing, enabling organizations to share data directly with specific accounts. This is particularly useful for implementing data mesh architectures within an organization, allowing different business units to share data securely and efficiently. Its graphical interface provides clear visibility into the data being shared with your account and data being shared by your account, simplifying data management too.

In summary, Snowflake’s AI Data Cloud provides a powerful and flexible platform that addresses the modern challenges of data management and analytics. Its comprehensive features, from efficient data loading and robust data protection to advanced AI capabilities and secure data sharing, empower organizations to unlock the full potential of their data and drive innovation.

References

  1. Snowflake [ Snowflake Platform Training (original) (archived) ]

  2. Snowflake [ Snowflake Platform Training - Course Datasheet (original) (archived) ]

Let’s discuss.

Get in touch to discuss an idea or project. We can work together to make it live! Or enquire about writing guest posts or speaking in meetups or workshops.