DuckDB Reference

This document provides references to DuckDB documentation and key SQL syntax used in the loop-duckdb project.

Official Documentation

DuckDB Homepage: https://duckdb.org
Documentation: https://duckdb.org/docs/
SQL Reference: https://duckdb.org/docs/sql/introduction

Core SQL Syntax

Data Definition Language (DDL)

CREATE TABLE - Create tables
CREATE VIEW - Create views
CREATE OR REPLACE - Idempotent table/view creation

Data Query Language (DQL)

SELECT - Query data
FROM - Specify data sources
WHERE - Filter rows
GROUP BY - Aggregate data
ORDER BY - Sort results
LIMIT - Limit result rows

Common Table Expressions (CTEs)

WITH Clause - Define temporary named result sets

JSON Functions

DuckDB provides comprehensive JSON support:

JSON Operators

JSON Operators - -> and ->>
- -> returns JSON
- ->> returns text

JSON Functions

json_transform() - Convert JSON to structured types
json_extract_string() - Extract string values
json_keys() - Get object keys

Documentation: https://duckdb.org/docs/extensions/json

Array Functions

unnest() - Expand arrays into rows
Array Functions Overview

Window Functions

Window functions perform calculations across rows related to the current row:

Window Functions Overview
ROW_NUMBER() - Assign sequential numbers
PARTITION BY - Divide result set
ORDER BY (in windows) - Order within partitions

Aggregate Functions

COUNT() - Count rows
SUM() - Sum values
AVG() - Calculate average
MIN()/MAX() - Find min/max values
ROUND() - Round numeric values
bool_or() - Boolean OR aggregation
Aggregate Functions Overview

Conditional Expressions

CASE - Conditional logic
COALESCE() - Return first non-null value

Type Casting

CAST() - Convert between types
Type Casting with :: - Postgres-style casting
Data Types - Available data types

File Formats

Parquet

DuckDB has native Parquet support:

Parquet Files - Read/write Parquet
Querying Parquet - Direct queries on files

CSV

CSV Files - Read/write CSV

Date and Time

Date/Time Functions - Date/time operations
INTERVAL - Time intervals
current_timestamp - Current timestamp

Example:

WHERE timestamp >= current_timestamp - INTERVAL 7 DAY

String Functions

String Functions - String operations
LIKE - Pattern matching
String Operators - Concatenation, etc.

JOINs

JOIN Types - INNER, LEFT, RIGHT, FULL
JOIN Syntax

Performance and Optimization

Indexes

Indexes - Create and use indexes

Query Profiling

EXPLAIN - Query execution plans
EXPLAIN ANALYZE - Actual execution statistics

Best Practices

Performance Guide - Optimization tips
Import Data - Efficient data loading

Extensions

DuckDB supports various extensions:

JSON Extension - JSON processing
Parquet Extension - Parquet file support (built-in)
All Extensions - Complete list

MCP Server Integration

For using DuckDB with Model Context Protocol:

mcp-server-duckdb: Python-based MCP server for DuckDB
Installation: uvx mcp-server-duckdb (automatically installed when using the MCP server)
Documentation: See MCP server documentation for API details

Prerequisites: The database must exist before the MCP server can query it. Generate the database by running:
pnpm generate-report

MCP Server Usage

The MCP server provides a query tool that executes SQL queries:

{
  "tool": "query",
  "parameters": {
    "query": "SELECT * FROM loop_items LIMIT 10"
  }
}

Configuration in .mcp.json:

{
  "mcpServers": {
    "duckdb": {
      "command": "uvx",
      "args": [
        "mcp-server-duckdb",
        "--db-path",
        "generate-report/generate-report.db",
        "--readonly"
      ]
    }
  }
}

Command-Line Interface

Project Commands

Generate the report and database:

pnpm generate-report

Start the interactive DuckDB UI:

pnpm start-report-ui

Download Loop data from S3:

pnpm download-loop-data-s3

Generate specific report parts:

# Run specific parts with wildcards
pnpm exec ./scripts/generate-report.sh '050*' '060*'

# Preview what would run
pnpm exec ./scripts/generate-report.sh --dry-run --verbose

DuckDB CLI

Command Line Client - Interactive SQL shell
CLI Parameters - Startup options

Generic DuckDB Commands

# Start DuckDB with a database file
duckdb generate-report/generate-report.db

# Execute query from command line
duckdb database.db "SELECT * FROM table LIMIT 10"

# Read from stdin
echo "SELECT 42" | duckdb

# Output as JSON
duckdb database.db -json "SELECT * FROM table"

# Output as Markdown
duckdb database.db -markdown "SELECT * FROM table"

Dot Commands

Inside the DuckDB CLI:

.help - Show help
.tables - List tables
.schema [TABLE] - Show schema
.mode - Set output mode
.read [FILE] - Execute SQL from file
.exit or .quit - Exit DuckDB

Additional Resources

Learning Resources

Friendly SQL - DuckDB's SQL enhancements
Samples - Practical guides and examples
FAQ - Frequently asked questions

Community

GitHub Repository - Source code
Discord - Community chat
Blog - News and updates

Python Integration

While this project uses the MCP server, DuckDB also has excellent Python support:

Python API - Python client
Pandas Integration - Query Pandas DataFrames

Quick Reference Card

Most Common Operations

Operation	Syntax
Create table	`CREATE TABLE name AS SELECT ...`
Query JSON	`json_field->'key'` or `json_field->>'key'`
Transform JSON	`json_transform(json_col, 'schema')`
Unnest array	`unnest(array_col)`
Window function	`ROW_NUMBER() OVER (PARTITION BY col ORDER BY col2)`
CTE	`WITH cte_name AS (SELECT ...) SELECT ...`
Cast type	`CAST(col AS type)` or `col::type`
Time filter	`WHERE timestamp >= current_timestamp - INTERVAL 7 DAY`
Percentage	`ROUND(count * 100.0 / total, 2)`
Conditional count	`COUNT(CASE WHEN condition THEN 1 END)`

Project-Specific Notes

Database Schema

For the current database schema and table structure, refer to:

020-create-duckdb-database.sh - Schema creation

Query Examples

For practical examples used in this project, see:

examples.md - Real queries from the project
scripts/generate-report-parts/ - Source scripts

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

DuckDB Reference

Official Documentation

Core SQL Syntax

Data Definition Language (DDL)

Data Query Language (DQL)

Common Table Expressions (CTEs)

JSON Functions

JSON Operators

JSON Functions

Array Functions

Window Functions

Aggregate Functions

Conditional Expressions

Type Casting

File Formats

Parquet

CSV

Date and Time

String Functions

JOINs

Performance and Optimization

Indexes

Query Profiling

Best Practices

Extensions

MCP Server Integration

MCP Server Usage

Command-Line Interface

Project Commands

DuckDB CLI

Generic DuckDB Commands

Dot Commands

Additional Resources

Learning Resources

Community

Python Integration

Quick Reference Card

Most Common Operations

Project-Specific Notes

Database Schema

Query Examples