Skip to content

Latest commit

 

History

History
301 lines (207 loc) · 9.69 KB

File metadata and controls

301 lines (207 loc) · 9.69 KB

DuckDB Reference

This document provides references to DuckDB documentation and key SQL syntax used in the loop-duckdb project.

Official Documentation

Core SQL Syntax

Data Definition Language (DDL)

Data Query Language (DQL)

Common Table Expressions (CTEs)

JSON Functions

DuckDB provides comprehensive JSON support:

JSON Operators

JSON Functions

Documentation: https://duckdb.org/docs/extensions/json

Array Functions

Window Functions

Window functions perform calculations across rows related to the current row:

Aggregate Functions

Conditional Expressions

Type Casting

File Formats

Parquet

DuckDB has native Parquet support:

CSV

Date and Time

Example:

WHERE timestamp >= current_timestamp - INTERVAL 7 DAY

String Functions

JOINs

Performance and Optimization

Indexes

Query Profiling

Best Practices

Extensions

DuckDB supports various extensions:

MCP Server Integration

For using DuckDB with Model Context Protocol:

  • mcp-server-duckdb: Python-based MCP server for DuckDB
  • Installation: uvx mcp-server-duckdb (automatically installed when using the MCP server)
  • Documentation: See MCP server documentation for API details

Prerequisites: The database must exist before the MCP server can query it. Generate the database by running:

pnpm generate-report

MCP Server Usage

The MCP server provides a query tool that executes SQL queries:

{
  "tool": "query",
  "parameters": {
    "query": "SELECT * FROM loop_items LIMIT 10"
  }
}

Configuration in .mcp.json:

{
  "mcpServers": {
    "duckdb": {
      "command": "uvx",
      "args": [
        "mcp-server-duckdb",
        "--db-path",
        "generate-report/generate-report.db",
        "--readonly"
      ]
    }
  }
}

Command-Line Interface

Project Commands

Generate the report and database:

pnpm generate-report

Start the interactive DuckDB UI:

pnpm start-report-ui

Download Loop data from S3:

pnpm download-loop-data-s3

Generate specific report parts:

# Run specific parts with wildcards
pnpm exec ./scripts/generate-report.sh '050*' '060*'

# Preview what would run
pnpm exec ./scripts/generate-report.sh --dry-run --verbose

DuckDB CLI

Generic DuckDB Commands

# Start DuckDB with a database file
duckdb generate-report/generate-report.db

# Execute query from command line
duckdb database.db "SELECT * FROM table LIMIT 10"

# Read from stdin
echo "SELECT 42" | duckdb

# Output as JSON
duckdb database.db -json "SELECT * FROM table"

# Output as Markdown
duckdb database.db -markdown "SELECT * FROM table"

Dot Commands

Inside the DuckDB CLI:

  • .help - Show help
  • .tables - List tables
  • .schema [TABLE] - Show schema
  • .mode - Set output mode
  • .read [FILE] - Execute SQL from file
  • .exit or .quit - Exit DuckDB

Additional Resources

Learning Resources

  • Friendly SQL - DuckDB's SQL enhancements
  • Samples - Practical guides and examples
  • FAQ - Frequently asked questions

Community

Python Integration

While this project uses the MCP server, DuckDB also has excellent Python support:

Quick Reference Card

Most Common Operations

Operation Syntax
Create table CREATE TABLE name AS SELECT ...
Query JSON json_field->'key' or json_field->>'key'
Transform JSON json_transform(json_col, 'schema')
Unnest array unnest(array_col)
Window function ROW_NUMBER() OVER (PARTITION BY col ORDER BY col2)
CTE WITH cte_name AS (SELECT ...) SELECT ...
Cast type CAST(col AS type) or col::type
Time filter WHERE timestamp >= current_timestamp - INTERVAL 7 DAY
Percentage ROUND(count * 100.0 / total, 2)
Conditional count COUNT(CASE WHEN condition THEN 1 END)

Project-Specific Notes

Database Schema

For the current database schema and table structure, refer to:

Query Examples

For practical examples used in this project, see: