📄️ CTE in SQL
A Common Table Expression (CTE) is a temporary result set that can be used in a SQL query. You can use CTEs to break up complex queries into simpler blocks of code that can connect and build on each other.
A DAG is a Directed Acyclic Graph, a type of graph whose nodes are directionally related to each other and don’t form a directional closed loop.
📄️ Data catalog
A data catalog is an inventory of data assets from different parts of the data stack within an organization. This catalog can display metadata, lineage, and business definitions from your different data sources.
📄️ Data extraction
Data extraction is the process by which data is retrieved from multiple sources, often varying in volume and structure.
📄️ Data lake
A data lake is a data management system used for storing large amounts of data in in its raw, native form as files. Data lakes can store any type of data—structured, semi-structured, unstructured—in one centralized place.
📄️ Data lineage
Data lineage provides a holistic view of how data moves through an organization, where it’s transformed and consumed.
📄️ Data warehouse
A data warehouse is a data management system used for data storage and computing that allows for analytics activities such as transforming and sharing data.
📄️ Data wrangling
Data wrangling describes the different processes used to transform raw data into a consistent and easily usable format. The ultimate goal of data wrangling is to work in a way that allows you to dive right into analysis on a dataset or build upon that data.
A DataFrame is a two-dimensional data structure (rows and columns). It's the most common way of representing and interacting with large datasets in Python.
Data Definition Language (DDL) is a group of SQL statements that you can execute to manage database objects, including tables, views, and more.
📄️ Dimensional modeling
Dimensional modeling is a data modeling technique where you break data up into “facts” and “dimensions” to organize and describe entities within your data warehouse.
Data Manipulation Language (DML) is a class of SQL statements that are used to query, edit, add and delete row-level data from database tables or views. The main DML statements are SELECT, INSERT, DELETE, and UPDATE.
DRY is a software development principle that stands for “Don’t Repeat Yourself.” Living by this principle means that your aim is to reduce repetitive patterns and duplicate code and logic in favor of modular and referenceable code.
An Enterprise Data Warehouse (EDW), like any other data warehouse, is a collection of databases that centralize a business's information from multiple sources and applications.
Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, loading it into a target data warehouse, and finally transforming it.
Extract, Transform, Load (ETL)is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse.
📄️ Data grain
Your data's grain is the combination of columns at which records in a table are unique. Ideally, this is captured in a single column and a unique primary key.
Idempotent describes a process that gives you the same result no matter how many times you run it.
The exact Data Definition Language (DDL) that dbt will use when creating the model’s equivalent in a data warehouse.
📄️ Primary key
A primary key is a non-null column in a database object that uniquely identifies each row.
📄️ Relational database
A relational database provides a structured way to store data into tables consisting of rows and columns. Different tables in a relational database can be joined together using common columns from each table, forming relationships.
📄️ Reverse ETL
Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms.
📄️ Subquery in SQL
A subquery is a query within another query. Subqueries are often used when you need to process data in multiple steps.
📄️ Surrogate key
A surrogate key is a unique identifier derived from the data itself. It often takes the form of a hashed value of multiple columns that will create a uniqueness constraint for each row.
In simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells.
A view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse).