Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,15 @@ knowledge graphs seamlessly within Memgraph.
- **:bulb: Demo: [Graph-Aware Agents with LangGraph and Memgraph AI Toolkit](./integrations/langgraph/memgraph-toolkit-chatbot)**
- This demo showcases a simple agent built using the LangGraph framework and the [Memgraph AI Toolkit](https://github.com/memgraph/ai-toolkit) to demonstrate how to integrate graph-based tooling into your LLM stack.

**MCP**
- **:bulb: Demo: [SIC classification agent](./integrations/mcp/sic-agent)**
- This demo showcases a FastMCP agent that classifies a free-form business description into the OSHA SIC taxonomy stored in Memgraph.
- **:mag_right: Key Features:**
- Vector search over `IndustryGroup` nodes with Memgraph's vector index
- Context expansion through neighboring `Industry` and `MajorGroup` nodes
- Clarifying follow-up questions when the first match is ambiguous
- Included scraper and embedding scripts for building the SIC graph data

**LlamaIndex**
- **:bulb: Demo: [KG creation and retrieval](./integrations/llamaindex/property-graph-index)**
- This demo demonstrates the use of LlamaIndex with Memgraph to
Expand Down
56 changes: 56 additions & 0 deletions integrations/mcp/sic-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# SIC Classification MCP Agent

This example ports the SIC classification agent from the Memgraph AI Toolkit
into this repository as a standalone FastMCP server.

The agent:
- embeds a free-form business description,
- performs vector search over `IndustryGroup` nodes in Memgraph,
- inspects nearby `Industry` and `MajorGroup` context,
- uses MCP sampling to select the best SIC code,
- asks a clarifying follow-up when the initial match is ambiguous.

## Files

- `sic_classification.py` runs the FastMCP server.
- `sic-scrapper/main.py` scrapes the OSHA SIC manual and generates import
Cypher.
- `sic-scrapper/embeddings.py` generates embeddings for `IndustryGroup` nodes.
- `sic-scrapper/output/sic_vector_index.cypherl` creates the vector index used
by the server.

## Quick Start

1. Generate SIC data and embedding updates:

```bash
cd sic-scrapper
uv sync
uv run main.py
uv run embeddings.py --from-json output/sic_data.json
```

2. Load the generated data into Memgraph:

```bash
mgconsole < output/sic_import.cypherl
mgconsole < output/sic_embeddings.cypherl
mgconsole < output/sic_vector_index.cypherl
```

3. Run the MCP server:

```bash
cd ..
uv sync
uv run python sic_classification.py
```

## Environment Variables

- `MEMGRAPH_URL` defaults to `bolt://localhost:7687`
- `MEMGRAPH_USER` defaults to an empty string
- `MEMGRAPH_PASSWORD` defaults to an empty string
- `MEMGRAPH_DATABASE` defaults to `memgraph`
- `SIC_VECTOR_INDEX` defaults to `sic_industry_group_embedding`
- `SIC_EMBEDDING_MODEL` defaults to `all-MiniLM-L6-v2`
7 changes: 7 additions & 0 deletions integrations/mcp/sic-agent/init.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

set -euo pipefail

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
uv run python sic_classification.py
11 changes: 11 additions & 0 deletions integrations/mcp/sic-agent/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[project]
name = "sic-classification"
version = "0.1.0"
description = "SIC classification MCP server using Memgraph vector search"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"fastmcp>=0.1.0",
"neo4j>=5.28.1",
"sentence-transformers>=2.2.0",
]
47 changes: 47 additions & 0 deletions integrations/mcp/sic-agent/sic-scrapper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# SIC Scraper

Scrapes the SIC (Standard Industrial Classification) hierarchy from OSHA and
generates Cypher queries for Memgraph import.

## SIC Hierarchy

```text
Division (A-J) -> Major Group (2-digit) -> Industry Group (3-digit) -> Industry (4-digit)
```

## Usage

```bash
uv sync

# Full scrape
uv run main.py

# Quick scrape (skip industry details)
uv run main.py --no-industry-details
```

## Output

- `output/sic_data.json` - Complete hierarchy
- `output/sic_import.cypherl` - Cypher import queries
- `output/sic_embeddings.json` - Generated IndustryGroup embeddings
- `output/sic_embeddings.cypherl` - Embedding update queries

## Import to Memgraph

```bash
mgconsole < output/sic_import.cypherl
mgconsole < output/sic_embeddings.cypherl
mgconsole < output/sic_vector_index.cypherl
```

## Generate Embeddings

```bash
uv run embeddings.py --from-json output/sic_data.json
```

## Data Source

https://www.osha.gov/data/sic-manual
Loading