Skip to content

devaxai/doc-intelligence-platform

Repository files navigation

Document Intelligence Platform

A production-ready document intelligence platform that transforms how organizations process, analyze, and search through their document collections using advanced AI capabilities.

Overview

The Document Intelligence Platform enables organizations to unlock the value hidden in their document repositories through AI-powered analysis, semantic search, and automated content extraction. Built with modern Python architecture and cloud-native design principles.

What This Platform Does

Transform your document workflows with AI-powered intelligence:

  • Smart Document Processing: Upload PDFs and text files with automatic content extraction and analysis
  • Semantic Search: Find documents by meaning, not just keywords, using vector embeddings
  • AI Summarization: Get instant summaries and key insights from lengthy documents
  • RESTful API: Production-ready API with interactive documentation and async processing

Features

  • Document upload and storage (PDF, text files)
  • AI-powered text extraction and analysis
  • Semantic search using vector embeddings
  • RESTful API with FastAPI
  • Background task processing with Celery
  • Containerized deployment with Docker and Kubernetes
  • Modern Python tooling with UV package manager

Technology Stack

Backend

  • Python 3.9+ - Core language
  • FastAPI - Web framework with async support
  • UV - Package and environment management
  • SQLAlchemy - Database ORM with async support
  • Celery - Background task processing

AI/ML

  • OpenAI API - Text embeddings and summarization
  • Pinecone - Vector database for semantic search
  • PyPDF2/pdfplumber - PDF text extraction

Infrastructure

  • Docker - Containerization
  • Kubernetes - Container orchestration
  • AWS S3 - Object storage
  • PostgreSQL - Relational database
  • Redis - Caching and task queue

Quick Start

Prerequisites

  • Python 3.9+
  • UV package manager
  • Docker and Docker Compose
  • AWS account (for S3 storage)
  • OpenAI API key
  • Pinecone account

Installation

  1. Clone the repository:
git clone <repository-url>
cd ai-document-platform
  1. Install dependencies:
uv sync --dev
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
  1. Start the development environment:
docker-compose up -d
  1. Run database migrations:
uv run alembic upgrade head
  1. Start the development server:
uv run uvicorn ai_document_platform.api.main:app --reload

Development

Code Quality

This project uses modern Python tooling for code quality:

  • Black - Code formatting
  • Ruff - Fast Python linter
  • MyPy - Static type checking
  • Pre-commit - Git hooks for code quality

Run code quality checks:

uv run black src/ tests/
uv run ruff check src/ tests/
uv run mypy src/

Testing

Run tests with pytest:

uv run pytest

Run tests with coverage:

uv run pytest --cov=src --cov-report=html

Docker Development

Build and run with Docker Compose:

docker-compose up --build

API Documentation

The API is built with comprehensive type hints and automatic validation for better reliability and developer experience.

Once the server is running, visit:

Project Structure

├── src/ai_document_platform/    # Main application code
│   ├── api/                     # FastAPI routes and endpoints
│   ├── services/                # Business logic layer
│   ├── models/                  # Data models and schemas
│   ├── repositories/            # Data access layer
│   ├── clients/                 # External service clients
│   └── utils/                   # Utility functions
├── tests/                       # Test suite
│   ├── unit/                    # Unit tests
│   ├── integration/             # Integration tests
│   └── e2e/                     # End-to-end tests
├── docker/                      # Docker configuration
├── scripts/                     # Utility scripts
└── docs/                        # Documentation

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and code quality checks
  5. Submit a pull request

About

A cloud-native microservices-based application.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published