Skip to main content

Query the Discovery API

The Discovery API supports ad-hoc queries and integrations.. If you are new to the API, read the Discovery API overview for an introduction.

Use the Discovery API to evaluate data pipeline health and project state across runs or at a moment in time. dbt Labs provide a GraphQL explorer for this API, enabling you to run queries and browse the schema.

Since GraphQL provides a description of the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query.

Prerequisites

Authorization

Currently, authorization of requests takes place using a service token. dbt Cloud admin users can generate a Metadata Only service token that is authorized to execute a specific query against the Discovery API.

Once you've created a token, you can use it in the Authorization header of requests to the dbt Cloud Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a 401 Unauthorized error. Note that Bearer can be used in place of Token in the Authorization header. Both syntaxes are equivalent.

Access the Discovery API

  1. Create a service account token to authorize requests. dbt Cloud Admin users can generate a Metadata Only service token, which can be used to execute a specific query against the Discovery API for authorization of requests.

  2. Find your API URL using the endpoint https://metadata.{YOUR_ACCESS_URL}/graphql.

    • Replace {YOUR_ACCESS_URL} with the appropriate Access URL for your region and plan. For example, if your multi-tenant region is North America, your endpoint is https://metadata.cloud.getdbt.com/graphql. If your multi-tenant region is EMEA, your endpoint is https://metadata.emea.dbt.com/graphql.
  3. For specific query points, refer to the schema documentation.

Run queries using HTTP requests

You can run queries by sending a POST request to the https://metadata.YOUR_ACCESS_URL/graphql endpoint, making sure to replace:

  • YOUR_ACCESS_URL with the appropriate Access URL for your region and plan.

  • YOUR_TOKEN in the Authorization header with your actual API token. Be sure to include the Token prefix.

  • QUERY_BODY with a GraphQL query, for example { "query": "<query text>" }

  • VARIABLES with a dictionary of your GraphQL query variables, such as a job ID or a filter.

  • ENDPOINT with the endpoint you're querying, such as environment.

    curl 'https://metadata.YOUR_ACCESS_URL/graphql' \
    -H 'authorization: Bearer YOUR_TOKEN' \
    -H 'content-type: application/json'
    -X POST
    --data QUERY_BODY

Python example:

response = requests.post('YOUR_ACCESS_URL',
headers={"authorization": "Bearer "+YOUR_TOKEN, "content-type": "application/json"},
json={"query": QUERY_BODY, "variables": VARIABLES})
metadata = response.json()['data'][ENDPOINT]

Every query will require an environment ID or job ID. You can get the ID from a dbt Cloud URL or using the Admin API.

There are several illustrative example queries in this documentation. You can see an examples in the use case guide.

Reasonable use

To maintain performance and stability, and prevent abuse, Discovery (GraphQL) API usage is subject to request rate and response size limits.

  • The current request rate limit is 200 requests within a minute for a given IP address. If a user exceeds this limit, they will receive an HTTP 429 response status.
  • Environment-level endpoints will be subject to response size limits in the future. The depth of the graph should not exceed three levels. A user can paginate up to 500 items per query.

Retention limits

You can use the Discovery API to query data from the previous three months. For example, if today was April 1st, you could query data back to January 1st.

Run queries with the GraphQL explorer

You can run ad-hoc queries directly in the GraphQL API explorer and use the document explorer on the left-hand side, where you can see all possible nodes and fields.

Refer to the Apollo explorer documentation for setup and authorization info.

  1. Access the GraphQL API explorer and select fields you'd like query.

  2. Go to Variables at the bottom of the explorer and replace any null fields with your unique values.

  3. Authenticate via Bearer auth with YOUR_TOKEN. Go to Headers at the bottom of the explorer and select +New header.

  4. Select Authorization in the header key drop-down list and enter your Bearer auth token in the value field. Remember to include the Token prefix. Your header key should look like this {"Authorization": "Bearer <YOUR_TOKEN>}.


Enter the header key and Bearer auth token valuesEnter the header key and Bearer auth token values
  1. Run your query by pressing the blue query button in the top-right of the Operation editor (to the right of the query). You should see a successful query response on the right side of the explorer.
Run queries using the Apollo Server GraphQL explorerRun queries using the Apollo Server GraphQL explorer

Fragments

Use the ..on notation to query across lineage and retrieve results from specific node types.


environment(id: $environmentId) {
applied {
models(first: $first,filter:{uniqueIds:"MODEL.PROJECT.MODEL_NAME"}) {
edges {
node {
name
ancestors(types:[Model, Source, Seed, Snapshot]) {
... on ModelAppliedStateNode {
name
resourceType
materializedType
executionInfo {
executeCompletedAt
}
}
... on SourceAppliedStateNode {
sourceName
name
resourceType
freshness {
maxLoadedAt
}
}
... on SnapshotAppliedStateNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
... on SeedAppliedStateNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
}
}
}
}
}
}

Pagination

Querying large datasets can impact performance on multiple functions in the API pipeline. Pagination eases the burden by returning smaller data sets one page at a time. This is useful for returning a particular portion of the dataset or the entire dataset piece-by-piece to enhance performance. dbt Cloud utilizes cursor-based pagination, which makes it easy to return pages of constantly changing data.

Use the PageInfo object to return information about the page. The following fields are available:

  • startCursor string type - corresponds to the first node in the edge.
  • endCursor string type - corresponds to the last node in the edge.
  • hasNextPage boolean type - whether there are more nodes after the returned results.
  • hasPreviousPage boolean type - whether nodes exist before the returned results.

There are connection variables available when making the query:

  • first integer type - will return the first 'n' nodes for each page, up to 500.
  • after string type sets the cursor to retrieve nodes after. It's best practice to set the after variable with the object ID defined in the endcursor of the previous page.

The following example shows that we're returning the first 500 models after the specified Object ID in the variables. The PageInfo object will return where the object ID where the cursor starts, where it ends, and whether there is a next page.

Example of paginationExample of pagination

Here is a code example of the PageInfo object:

pageInfo {
startCursor
endCursor
hasNextPage
}
totalCount # Total number of pages

Filters

Filtering helps to narrow down the results of an API query. Want to query and return only models and tests that are failing? Or find models that are taking too long to run? You can fetch execution details such as executionTime, runElapsedTime, or status. This helps data teams monitor the performance of their models, identify bottlenecks, and optimize the overall data pipeline.

In the following example, we can see that we're filtering results to models that have succeeded on their lastRunStatus:

Example of filteringExample of filtering

Here is a code example that filters for models that have an error on their last run and tests that have failed:


environment(id: $environmentId) {
applied {
models(first: $first, filter: {lastRunStatus:error}) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
tests(first: $first, filter: {status:"fail"}) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
}

0