Skip to main content

Oracle setup

  • Maintained by: Oracle
  • Authors: Oracle
  • GitHub repo: oracle/dbt-oracle
  • PyPI package: dbt-oracle
  • Slack channel: #db-oracle
  • Supported dbt Core version: v1.2.1 and newer
  • dbt Cloud support: Not Supported
  • Minimum data platform version: Oracle 12c and higher

Installing dbt-oracle

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

Configuring dbt-oracle

For Oracle-specific configuration, please refer to Oracle configs.

Configure the Python driver mode

python-oracledb makes it optional to install the Oracle Client libraries. This driver supports 2 modes

  1. Thin mode (preferred): Python process directly connects to the Oracle database. This mode does not need the Oracle Client libraries
  2. Thick mode: Python process links with the Oracle Client libraries. Some advanced Oracle database functionalities (for example: Advanced Queuing, LDAP connections, Scrollable cursors) are currently available via Oracle Client libraries

You can configure the driver mode using the environment variable ORA_PYTHON_DRIVER_TYPE. Use the thin mode as it vastly simplifies installation.

Driver ModeOracle Client libraries required?Configuration
ThinNoORA_PYTHON_DRIVER_TYPE=thin
ThickYesORA_PYTHON_DRIVER_TYPE=thick

The default value of ORA_PYTHON_DRIVER_TYPE is thin

export ORA_PYTHON_DRIVER_TYPE=thin # default

Configure wallet for Oracle Autonomous Database (ADB-S) in Cloud

dbt can connect to Oracle Autonomous Database (ADB-S) in Oracle Cloud using either TLS (Transport Layer Security) or mutual TLS (mTLS). TLS and mTLS provide enhanced security for authentication and encryption. A database username and password is still required for dbt connections which can be configured as explained in the next section Connecting to Oracle Database.

With TLS, dbt can connect to Oracle ADB without using a wallet. Both Thin and Thick modes of the python-oracledb driver support TLS.

info

In Thick mode, dbt can connect through TLS only when using Oracle Client library versions 19.14 (or later) or 21.5 (or later).

Refer to Oracle documentation to connect to an ADB instance using TLS authentication and the blog post Easy wallet-less connections to Oracle Autonomous Databases in Python to enable TLS for your Oracle ADB instance.

Connecting to Oracle Database

Define the following mandatory parameters as environment variables and refer them in the connection profile using env_var jinja function. Optionally, you can also define these directly in the profiles.yml file, but this is not recommended

export DBT_ORACLE_USER=<username>
export DBT_ORACLE_PASSWORD=***
export DBT_ORACLE_SCHEMA=<username>
export DBT_ORACLE_DATABASE=example_db2022adb

Use the following query to retrieve the database name:

SELECT SYS_CONTEXT('userenv', 'DB_NAME') FROM DUAL

An Oracle connection profile for dbt can be set using any one of the following methods

The tnsnames.ora file is a configuration file that contains network service names mapped to connect descriptors. The directory location of tnsnames.ora file can be specified using TNS_ADMIN environment variable

tnsnames.ora
db2022adb_high = (description =
(retry_count=20)(retry_delay=3)
(address=(protocol=tcps)
(port=1522)
(host=adb.example.oraclecloud.com))
(connect_data=(service_name=example_high.adb.oraclecloud.com))
(security=(ssl_server_cert_dn="CN=adb.example.oraclecloud.com,
OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

The TNS alias db2022adb_high can be defined as an environment variable and referred to in profiles.yml

export DBT_ORACLE_TNS_NAME=db2022adb_high
~/.dbt/profiles.yml
dbt_test:
target: dev
outputs:
dev:
type: oracle
user: "{{ env_var('DBT_ORACLE_USER') }}"
pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}"
database: "{{ env_var('DBT_ORACLE_DATABASE') }}"
tns_name: "{{ env_var('DBT_ORACLE_TNS_NAME') }}"
schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}"
threads: 4
Note

Starting with dbt-oracle==1.0.2, it is optional to set the database name in profile.yml

Starting with dbt-oracle==1.8.0 database key in profile.yml is still optional for all but one of the dbt-oracle workflows. if database is missing in profile.yml the generated catalog used for project documentation will be empty.

From dbt-oracle==1.8, we detect that database key is missing from profile.yml and issue a warning to add it for catalog generation. The warning message also shows the database name that dbt-oracle expects. That way users don't have to worry about "what" the database name is and "how" to get it.

Quoting configuration

The default quoting configuration used by dbt-oracle is shown below:

dbt_project.yaml
quoting:
database: false
identifier: false
schema: false

This is recommended and works for most cases.

Approximate relation match error

Often users have complained about an approximate relation match as shown below:

Compilation Error in model <model>
19:09:40 When searching for a relation, dbt found an approximate match. Instead of guessing
19:09:40 which relation to use, dbt will move on. Please delete <model>, or rename it to be less ambiguous.
Searched for: <model>

This is reported in multiple channels:

In all cases, the solution was to enable quoting only for the database.

To solve this issue of approximate match use the following quoting configuration

dbt_project.yaml
quoting:
database: true

Python models using Oracle Autonomous Database (ADB-S)

Oracle's Autonomous Database Serverless (ADB-S) users can run dbt-py models using Oracle Machine Learning (OML4PY) which is available without any extra setup required.

Features

  • User Defined Python function is run in an ADB-S spawned Python 3.12.1 runtime
  • Access to external Python packages available in the Python runtime. For e.g. numpy, pandas, scikit_learn etc
  • Integration with Conda 24.x to create environments with custom Python packages
  • Access to Database session in the Python function
  • DataFrame read API to read TABLES, VIEWS, and ad-hoc SELECT queries as DataFrames
  • DataFrame write API to write DataFrames as TABLES
  • Supports both table and incremental materialization

Setup

Required roles

  • User must be non-ADMIN to execute the Python function
  • User must be granted the OML_DEVELOPER role

OML Cloud Service URL

OML Cloud Service URL is of the following format:

https://tenant1-dbt.adb.us-sanjose-1.oraclecloudapps.com

In this example:

  • tenant1 is the tenancy ID
  • dbt is the database name
  • us-sanjose-1 is the datacenter region
  • oraclecloudapps.com is the root domain

Add oml_cloud_service_url to your existing ~/.dbt/profiles.yml

~/.dbt/profiles.yml
dbt_test:
target: dev
outputs:
dev:
type: oracle
user: "{{ env_var('DBT_ORACLE_USER') }}"
pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}"
database: "{{ env_var('DBT_ORACLE_DATABASE') }}"
tns_name: "{{ env_var('DBT_ORACLE_TNS_NAME') }}"
schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}"
oml_cloud_service_url: "https://tenant1-dbt.adb.us-sanjose-1.oraclecloudapps.com"

Python model configurations

ConfigurationDatatypeExamples
MaterializationStringdbt.config(materialized="incremental") or dbt.config(materialized="table")
ServiceStringdbt.config(service="HIGH") or dbt.config(service="MEDIUM") or dbt.config(service="LOW")
Async ModeBooleandbt.config(async_flag=True)
Timeout in seconds only to be used with async mode (min: 1800 and max: 43200)Integerdbt.config(timeout=1800)
Conda environmentStringdbt.config(conda_env_name="dbt_py_env")

In async mode, dbt-oracle will schedule a Python job, poll the job's status, and wait for it to complete. Without async mode, dbt-oracle will immediately invoke the Python job in a blocking manner.

Note

Use dbt.config(async_flag=True) for long-running Python jobs.

Python model examples

Refer other model

Use dbt.ref(model_name) to refer to either SQL or Python model

def model(dbt, session):
# Must be either table or incremental (view is not currently supported)
dbt.config(materialized="table")
# returns oml.core.DataFrame referring a dbt model
s_df = dbt.ref("sales_cost")
return s_df

Refer a source

Use dbt.source(source_schema, table_name)

def model(dbt, session):
# Must be either table or incremental (view is not currently supported)
dbt.config(materialized="table")
# oml.core.DataFrame representing a datasource
s_df = dbt.source("sh_database", "channels")
return s_df

Incremental materialization

def model(dbt, session):
# Must be either table or incremental
dbt.config(materialized="incremental")
# oml.DataFrame representing a datasource
sales_cost_df = dbt.ref("sales_cost")

if dbt.is_incremental:
cr = session.cursor()
result = cr.execute(f"select max(cost_timestamp) from {dbt.this.identifier}")
max_timestamp = result.fetchone()[0]
# filter new rows
sales_cost_df = sales_cost_df[sales_cost_df["COST_TIMESTAMP"] > max_timestamp]

return sales_cost_df

Concatenate a new column in Dataframe


def model(dbt, session):
dbt.config(materialized="table")
dbt.config(async_flag=True)
dbt.config(timeout=1800)

sql = f"""SELECT customer.cust_first_name,
customer.cust_last_name,
customer.cust_gender,
customer.cust_marital_status,
customer.cust_street_address,
customer.cust_email,
customer.cust_credit_limit,
customer.cust_income_level
FROM sh.customers customer, sh.countries country
WHERE country.country_iso_code = ''US''
AND customer.country_id = country.country_id"""

# session.sync(query) will run the sql query and returns a oml.core.DataFrame
us_potential_customers = session.sync(query=sql)

# Compute an ad-hoc anomaly score on the credit limit
median_credit_limit = us_potential_customers["CUST_CREDIT_LIMIT"].median()
mean_credit_limit = us_potential_customers["CUST_CREDIT_LIMIT"].mean()
anomaly_score = (us_potential_customers["CUST_CREDIT_LIMIT"] - median_credit_limit)/(median_credit_limit - mean_credit_limit)

# Add a new column "CUST_CREDIT_ANOMALY_SCORE"
us_potential_customers = us_potential_customers.concat({"CUST_CREDIT_ANOMALY_SCORE": anomaly_score.round(3)})

# Return potential customers dataset as a oml.core.DataFrame
return us_potential_customers

Use Custom Conda environment

  1. As ADMIN user, create a conda environment using OML4PY Conda Notebook:
conda create -n dbt_py_env -c conda-forge --override-channels --strict-channel-priority python=3.12.1 nltk gensim
  1. Save this environment using the following command from the OML4PY Conda Notebook:
conda upload --overwrite dbt_py_env -t application OML4PY
  1. Use the environment in dbt Python models:
# Import custom packages from Conda environments
import nltk
import gensim

def model(dbt, session):
dbt.config(materialized="table")
dbt.config(conda_env_name="dbt_py_env") # Refer the conda environment
dbt.config(async_flag=True) # Use async mode for long running Python jobs
dbt.config(timeout=900)
# oml.core.DataFrame referencing a dbt-sql model
promotion_cost = dbt.ref("direct_sales_channel_promo_cost")
return promotion_cost

Supported features

  • Table materialization
  • View materialization
  • Materialized View
  • Incremental materialization
  • Seeds
  • Data sources
  • Singular tests
  • Generic tests; Not null, Unique, Accepted values and Relationships
  • Operations
  • Analyses
  • Exposures
  • Document generation
  • Serve project documentation as a website
  • Python Models (from dbt-oracle version 1.5.1)
  • Integration with Conda to use any Python packages from Anaconda's repository
  • All dbt commands are supported

Not supported features

  • Ephemeral materialization
0