Using Virtual Tables in Foundry

Why Virtual Tables matter

Virtual Tables are a new addition to Foundry that let you access data in source systems without needing to transfer that data to Foundry.

This is a big deal for a couple reasons:

First, Virtual Tables make integrating Foundry into existing IT architectures much easier because you can leverage existing data warehouse investments without having to replicate either data or logic in Foundry.
Second, Virtual Tables offers a shortcut for getting started with Foundry’s Ontology and AIP capabilities faster than you otherwise could. It does this by letting you use external systems as the backing data for the Ontology and AIP functions instead of Foundry datasets.
Additionally, Virtual Tables make certain data engineering or analytic tasks more performant and potentially cheaper.

In this tutorial we cover how you use Virtual Tables, how they’re different from Datasets, and when you should consider using them.

And, as a reminder, if you want to learn more about Foundry or AIP than what we cover here and in our other public tutorials, you can schedule a live in-person or virtual training session at ontologize.com.

Prefer to watch? Check out the video version of this tutorial on YouTube.

How to use Virtual Tables

First, let’s cover how to get set up with Virtual Tables and use them in Foundry applications.

To start, you need a Source in Data Connection. If you’re not familiar with Data Connection, I’ve made a 10min intro tutorial that covers all the essentials. Here, I’ve got a Snowflake database that contains a variety of tables.

Exploring the tables in a Snowflake database from Data Connection

Normally, to get one of these Snowflake tables into Foundry we would create a Sync, which would produce a Dataset that contains a copy of the data in that Snowflake table. However, what we’ll do instead is create a Virtual Table, which produces a live view of the Snowflake table without transferring any data. I can do that either from the Overview tab or from the Virtual Tables tab.

It prompts me to pick the table and if I want to save it in a different folder than the default for this source, I can. I click Create, and then I can start using this data in Foundry.

A Virtual Table looks a lot like a Dataset, except with a pared back set of options — there’s no version history because it’s a live view of the data; there are no branches because the source systems don’t support git-based version control semantics; and so on. But apart from the simpler interface, Virtual Tables look and feel a lot like Datasets.

For instance, one of the things I can do is analyze this table in Contour. If you’re not familiar with Contour, it’s Foundry’s large scale point-and-click analytics and dashboarding application, and I’ve got a comprehensive beginner tutorial on that too (written | video).

Now, even though this data doesn’t reside in Foundry, all of Contour’s boards still work. For example, I can use an Expression board to cast the Date column from a string to a date. And I can then plot how many people entered the Ontologize Theme Park each day over time.

A Contour analysis using a Virtual Table instead of a dataset

To Contour users, this should feel just like working on a regular dataset and, depending on your data scale and how you’ve got your source system configured, it may even be more performant.

What’s happening here is that instead of running this logic in Foundry, the computation is being pushed down to the source system. In this case, that means using the designated Warehouse in Snowflake to run the query and return the results. Now, in the case of source systems that lack a compute layer, such as Azure Blob Storage, Foundry will handle the compute for those Virtual Tables.

In addition to Contour, Virtual Tables are also compatible with Foundry’s point-and-click data engineering tool, Pipeline Builder. And if you’re not familiar with Pipeline Builder I also have a free tutorial on that too.

Here, we have what I suspect will be the most common way of using Pipeline Builder with Virtual Tables. It’s so simple we can hardly call it a pipeline — all that’s happening is that I’m using a Virtual Table as input and I’m indexing that data into the Ontology in order to create a new Object Type. This how you would use Virtual Tables to quickly start using your existing data warehouse with Foundry’s Ontology. And, since AIP works with object types, links, and actions in the ontology, a “pipeline” like this is all it takes to get started building with AIP.

A simple "pipeline" that outputs a new object type

Even though I expect most people will use Virtual Tables to back object types and links in the ontology, you can also use them in more complicated pipelines that process the data, join it to other virtual tables or datasets, and do whatever data engineering tasks you normally would perform on a dataset. Again, take a look at my tutorial in Pipeline Builder for more details about what that would look like.

How Virtual Tables Differ From Datasets

It’s pretty clear that Virtual Tables offer many of the same capabilities as datasets for the purposes of data engineering, ontology design, and analytics. But how do they differ?

To reiterate, the main difference is that Datasets store their data in Foundry and Virtual Tables are live views into the data stored in external source systems. As such, Datasets offer additional capabilities not available in those source systems, such as supporting the same git-based version control semantics that govern code and logic throughout Foundry.

The other big difference has to do with which Foundry applications support Datasets and Virtual Tables. Today, Virtual Tables are not yet supported by all applications. For example, although you can use Virtual Tables in Contour, you can’t save the analysis as a Dataset like you normally can. Additionally, at the time of recording this tutorial, using Virtual Tables in Python Transforms is still a work in progress. For a full, up-to-date application compatibility matrix, check out the Foundry documentation.

Finally, one difference that will be a bit harder to plan for is how the allocation of costs will change if you use Virtual Tables in lieu of Datasets. When I use Contour to analyze a Virtual Table stored in Snowflake, the compute costs will be incurred by the relevant Snowflake Warehouse. Understanding how this may change how much you’re charged for will depend on your specific data, how you’ve configured the compute engines of the external systems, and other factors such as how much usage different tables receive.

One usage pattern that should be pretty easy to reason about is if your source system imposes egress fees. In that case you’ll want to consider whether regular sync’d datasets would be a more cost-effective method of giving Foundry users access to that data.

When to use Virtual Tables

Today, Virtual Tables are available for a limited number of source systems — Google BigQuery, Snowflake, AWS S3, and Azure Blog Storage. I’m sure more will come in the future.

The question is, if using Virtual Tables is an option for you, when should you use them in lieu of Datasets?

The quick answer is that, at least for Ontologize, we're going to use Virtual Tables by default whenever possible.

One reason is that I can auto-register Virtual Tables in Foundry. Instead of registering individual tables from a data warehouse, I can specify that I want Foundry to periodically check for new tables and to automatically create corresponding Virtual Tables in Foundry. This means that my whole external data warehouse is discoverable by default within Foundry.

Another reason is that I get many of the same capabilities that raw datasets give me. I can still create data pipelines, I can still analyze the data, and I can even use them to back Object Types in the Ontology.

Finally, if I need a capability that only Datasets offer, I can either create a simple transform downstream of the Virtual Table, or I could replace the Virtual Table with a Data Connection Sync that updates on a regular basis.

There are other reasons to use Virtual Tables too.

If your data scale is so large that storage costs are a concern, Virtual Tables will help to reduce costs because they don’t transfer data to Foundry
If your IT architecture plan is strict about where data transformation logic resides or where compute is run — and that’s not in Foundry — then Virtual Tables let you deploy Foundry in a way that sticks to that architecture plan

Conclusion

All-in-all, Virtual Tables are a welcome addition to Foundry’s capabilities and will make the platform a lot more appealing to organizations that already enjoy well-organized data warehouses.

If you have any additional questions about Virtual Tables or want to learn more about our live trainings for Foundry & AIP, feel free to reach out through our contact form at ontologize.com.