Skip to main content

SQL SELECT

My goodness, would there even be modern data teams without SQL SELECT statements? Probably not.

Luckily, we live in a world of tabular data, cloud data warehouses, and SQL prowess. Analysts and analytics engineers are writing queries, creating data models, and leveraging SQL to power their data transformations and analysis. But what makes these queries possible? SELECT statements.

The SQL SELECT statement is the fundamental building block of any query: it allows you to select specific columns (data) from a database schema object (table/view). In a dbt project, a SQL dbt model is technically a singular SELECT statement (often built leveraging CTEs or subqueries).

How to use SELECT

Any query begins with a simple SELECT statement:

select
order_id, --your first column you want selected
customer_id, --your second column you want selected
order_date --your last column you want selected (and so on)
from {{ ref('orders') }} --the table/view/model you want to select from
limit 3

This basic query is selecting three columns from the jaffle shop’s orders table and returning three rows. If you execute this query in your data warehouse, it will return a result looking like this:

order_idcustomer_idorder_date
112018-01-01
232018-01-02
3952018-01-04

You may also commonly see queries that select * from table_name. The asterisk or star is telling the compiler to select all columns from a specified table or view.

Goodbye carpal tunnel

Leverage dbt utils’ star macro to be able to both easily select many and specifically exclude certain columns.

In a dbt project, analytics engineers will typically write models that contain multiple CTEs that build to one greater query. For folks that are newer to analytics engineering or dbt, we recommend they check out the “How we structure our dbt projects” guide to better understand why analytics folks like modular data modeling and CTEs.

SELECT statement syntax in Snowflake, Databricks, BigQuery, and Redshift

While we know the data warehouse players like to have their own slightly different flavors and syntax for SQL, they have conferred together that the SELECT statement is sacred and unchangeable. As a result, writing the actual select…from statement across Snowflake, Databricks, Google BigQuery, and Amazon Redshift would look the same. However, the actual SQL manipulation of data within the SELECT statement (ex. adding dates, casting columns) might look slightly different between each data warehouse.

0