This document provides references to DuckDB documentation and key SQL syntax used in the loop-duckdb project.
- DuckDB Homepage: https://duckdb.org
- Documentation: https://duckdb.org/docs/
- SQL Reference: https://duckdb.org/docs/sql/introduction
- CREATE TABLE - Create tables
- CREATE VIEW - Create views
- CREATE OR REPLACE - Idempotent table/view creation
- SELECT - Query data
- FROM - Specify data sources
- WHERE - Filter rows
- GROUP BY - Aggregate data
- ORDER BY - Sort results
- LIMIT - Limit result rows
- WITH Clause - Define temporary named result sets
DuckDB provides comprehensive JSON support:
- JSON Operators -
->and->>->returns JSON->>returns text
- json_transform() - Convert JSON to structured types
- json_extract_string() - Extract string values
- json_keys() - Get object keys
Documentation: https://duckdb.org/docs/extensions/json
- unnest() - Expand arrays into rows
- Array Functions Overview
Window functions perform calculations across rows related to the current row:
- Window Functions Overview
- ROW_NUMBER() - Assign sequential numbers
- PARTITION BY - Divide result set
- ORDER BY (in windows) - Order within partitions
- COUNT() - Count rows
- SUM() - Sum values
- AVG() - Calculate average
- MIN()/MAX() - Find min/max values
- ROUND() - Round numeric values
- bool_or() - Boolean OR aggregation
- Aggregate Functions Overview
- CASE - Conditional logic
- COALESCE() - Return first non-null value
- CAST() - Convert between types
- Type Casting with
::- Postgres-style casting - Data Types - Available data types
DuckDB has native Parquet support:
- Parquet Files - Read/write Parquet
- Querying Parquet - Direct queries on files
- CSV Files - Read/write CSV
- Date/Time Functions - Date/time operations
- INTERVAL - Time intervals
- current_timestamp - Current timestamp
Example:
WHERE timestamp >= current_timestamp - INTERVAL 7 DAY- String Functions - String operations
- LIKE - Pattern matching
- String Operators - Concatenation, etc.
- JOIN Types - INNER, LEFT, RIGHT, FULL
- JOIN Syntax
- Indexes - Create and use indexes
- EXPLAIN - Query execution plans
- EXPLAIN ANALYZE - Actual execution statistics
- Performance Guide - Optimization tips
- Import Data - Efficient data loading
DuckDB supports various extensions:
- JSON Extension - JSON processing
- Parquet Extension - Parquet file support (built-in)
- All Extensions - Complete list
For using DuckDB with Model Context Protocol:
- mcp-server-duckdb: Python-based MCP server for DuckDB
- Installation:
uvx mcp-server-duckdb(automatically installed when using the MCP server) - Documentation: See MCP server documentation for API details
Prerequisites: The database must exist before the MCP server can query it. Generate the database by running:
pnpm generate-report
The MCP server provides a query tool that executes SQL queries:
{
"tool": "query",
"parameters": {
"query": "SELECT * FROM loop_items LIMIT 10"
}
}Configuration in .mcp.json:
{
"mcpServers": {
"duckdb": {
"command": "uvx",
"args": [
"mcp-server-duckdb",
"--db-path",
"generate-report/generate-report.db",
"--readonly"
]
}
}
}Generate the report and database:
pnpm generate-reportStart the interactive DuckDB UI:
pnpm start-report-uiDownload Loop data from S3:
pnpm download-loop-data-s3Generate specific report parts:
# Run specific parts with wildcards
pnpm exec ./scripts/generate-report.sh '050*' '060*'
# Preview what would run
pnpm exec ./scripts/generate-report.sh --dry-run --verbose- Command Line Client - Interactive SQL shell
- CLI Parameters - Startup options
# Start DuckDB with a database file
duckdb generate-report/generate-report.db
# Execute query from command line
duckdb database.db "SELECT * FROM table LIMIT 10"
# Read from stdin
echo "SELECT 42" | duckdb
# Output as JSON
duckdb database.db -json "SELECT * FROM table"
# Output as Markdown
duckdb database.db -markdown "SELECT * FROM table"Inside the DuckDB CLI:
.help- Show help.tables- List tables.schema [TABLE]- Show schema.mode- Set output mode.read [FILE]- Execute SQL from file.exitor.quit- Exit DuckDB
- Friendly SQL - DuckDB's SQL enhancements
- Samples - Practical guides and examples
- FAQ - Frequently asked questions
- GitHub Repository - Source code
- Discord - Community chat
- Blog - News and updates
While this project uses the MCP server, DuckDB also has excellent Python support:
- Python API - Python client
- Pandas Integration - Query Pandas DataFrames
| Operation | Syntax |
|---|---|
| Create table | CREATE TABLE name AS SELECT ... |
| Query JSON | json_field->'key' or json_field->>'key' |
| Transform JSON | json_transform(json_col, 'schema') |
| Unnest array | unnest(array_col) |
| Window function | ROW_NUMBER() OVER (PARTITION BY col ORDER BY col2) |
| CTE | WITH cte_name AS (SELECT ...) SELECT ... |
| Cast type | CAST(col AS type) or col::type |
| Time filter | WHERE timestamp >= current_timestamp - INTERVAL 7 DAY |
| Percentage | ROUND(count * 100.0 / total, 2) |
| Conditional count | COUNT(CASE WHEN condition THEN 1 END) |
For the current database schema and table structure, refer to:
- 020-create-duckdb-database.sh - Schema creation
For practical examples used in this project, see:
- examples.md - Real queries from the project
- scripts/generate-report-parts/ - Source scripts