SQL "Inception": Recursive CTEs, BigQuery + Looker
Have you ever needed to query a data structure where you didn't know how deep the relationships went? Think of an organizational chart, a complex product assembly (Bill of Materials), or a nested comment thread like Hacker News.
Standard SQL joins fail here because they require you to know the number of levels upfront. Enter the Recursive Common Table Expression (CTE)—a powerful tool that allows a query to reference itself, iterating through levels until a termination condition is met.
In this guide, we’ll explore why recursive CTEs are a game-changer, how to implement them in BigQuery, and the specific architectural patterns required to make them work seamlessly in Looker.
Want to skip ahead and see the final LookML? Download the full source code here
Why Recursive CTEs?
Recursive CTEs are designed for hierarchical or graph-based data. Without them, you are often forced to write inefficient loops in a scripting language or perform multiple, expensive joins.
Top Use Cases:
- Organizational Hierarchies: Finding every employee under a specific manager, regardless of level.
- Network Analysis: Traversing a graph to find paths between nodes (e.g., "Degrees of Separation").
- Threaded Conversations: Analyzing comment chains, replies, and sub-threads (our Hacker News example).
- BOM (Bill of Materials): Exploding a product into all its constituent parts and sub-assemblies.
Recursive CTEs in BigQuery
BigQuery uses a standard syntax for recursion, consisting of three parts: the Anchor Member, the Recursive Member, and the Termination Condition.
The Syntax Pattern:
WITH RECURSIVE hierarchy AS (
-- 1. Anchor Member: Where do we start?
SELECT id, parent, 1 as level
FROM my_table
WHERE parent IS NULL
UNION ALL
-- 2. Recursive Member: How do we get to the next level?
SELECT child.id, child.parent, parent.level + 1
FROM my_table child
JOIN hierarchy parent ON child.parent = parent.id
-- 3. Termination: BigQuery has a hard limit of 500 iterations.
-- Ensure your join eventually returns no rows to stop.
)
SELECT * FROM hierarchy
WITH RECURSIVE MUST be at the very top level of your SQL statement. It cannot be inside a subquery. This leads to a small challenge in Looker we will overcome later.
The "Looker Trap" and the PDT Solution
When you define a SQL Derived Table in Looker, the generated SQL typically looks like this:
WITH my_derived_table (
SELECT ... -- Your Derived Table SQL here
)
SELECT ... FROM my_derived_table GROUP BY ... ORDER BY ...
If your Derived Table SQL starts with WITH RECURSIVE, Looker wraps it in a subquery, pushing the WITH clause down. BigQuery will reject this query.
The Solution: Persistent Derived Tables (PDTs)
To use recursive CTEs in BigQuery with Looker, you must use a Persistent Derived Table. When Looker builds a PDT, it uses a CREATE TABLE AS SELECT (CTAS) statement. Because the PDT build process executes the SQL as a standalone statement, the WITH RECURSIVE stays at the top level, and the query succeeds.
view: hn_hierarchy {
derived_table: {
datagroup_trigger: my_datagroup
# persist_for: "1 hour" # we recommend using a datagroup_trigger, but using persist_for works too
# Using a trigger or persist_for is mandatory to enable PDT behavior
sql:
WITH RECURSIVE hierarchy AS (
...
)
SELECT * FROM hierarchy ;;
}
}
Real-World Implementation: Hacker News
We analyzed the Hacker News dataset (47 million rows) to find the deepest comment threads. Using a recursive CTE joined to the main table, we uncovered fascinating insights:
- Record Depth: The deepest thread reached 65 levels deep!
- Pre-Aggregated Stats: By using a window function in our recursive view, we calculated total story comments as a dimension:
COUNT(1) OVER(PARTITION BY root_id) as story_total_commentsThis allows users to sort by a story's total popularity while still viewing individual comment text in the same row.
Taking it Further: Drills and Links
A premium Looker implementation doesn't just show data; it provides a path to action.
Enable Detail Drills
Aggregated counts should always allow users to see the underlying rows. We enabled this by adding drill_fields to our recursive measures:
measure: count {
type: count
drill_fields: [id, parent, root_id, hacker_news.text]
}
Connect to Reality
Using Looker’s link parameter, we added external links to the original Hacker News articles. With the Google favicon lookup, the UI feels integrated and professional:
dimension: id {
link: {
label: "View on Hacker News"
url: "https://news.ycombinator.com/item?id={{ value }}"
icon_url: "https://www.google.com/s2/favicons?domain=news.ycombinator.com"
}
}

Summary: Best Practices for Looker Developers
- Materialize: Always persist recursive CTEs in BigQuery.
- Window Functions: Use them within your CTE to pre-calculate hierarchy-wide stats for better Explore performance.
- Liquid Guards: If using Query-Time Derived Tables (on other dialects), use
{% condition %}to filter the anchor member early.
Recursive CTEs turn impossible data challenges into elegant LookML solutions. By understanding the warehouse-specific constraints and Looker's persistence model, you can unlock hierarchical insights that were previously out of reach.
