Metadata-Version: 2.4
Name: fastpluggy-docmanager-ai
Version: 0.1.3
Summary: AI document extraction & classification plugin for FastPluggy — Qwen3-VL multimodal, OCR, NSFW detection
Author: FastPluggy Team
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: FastPluggy>=0.4.33
Requires-Dist: loguru
Requires-Dist: fastpluggy-docmanager>=0.1.0
Requires-Dist: python-dateutil
Provides-Extra: ai
Requires-Dist: pdf2image; extra == "ai"
Requires-Dist: PyMuPDF; extra == "ai"
Requires-Dist: PyPDF2; extra == "ai"
Requires-Dist: Pillow; extra == "ai"
Requires-Dist: pytesseract; extra == "ai"
Provides-Extra: nsfw
Requires-Dist: opennsfw2; extra == "nsfw"
Requires-Dist: ifnude; extra == "nsfw"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Provides-Extra: e2e
Requires-Dist: fastpluggy-cli; extra == "e2e"

# fastpluggy-docmanager-ai

![Doc Manager AI](https://img.shields.io/badge/FastPluggy-Doc%20Manager%20AI-blue)
[![Release](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/-/badges/release.svg)](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/-/releases)
[![Pipeline Status](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/badges/main/pipeline.svg?key_text=CI)](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/-/pipelines?ignore_skipped=true)
[![Coverage](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/badges/main/coverage.svg)](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager-ai/-/pipelines)

AI extraction layer for the [docmanager](https://gitlab.ggcorp.fr/open/fastpluggy/plugins/doc-manager) plugin. Classifies documents into 16 categories and extracts structured fields via multimodal LLM (Qwen3-VL via Ollama).

## Features

- 16-category AI classification (bank statements, invoices, contracts, etc.)
- Per-category structured field extraction (amounts, dates, names)
- Audience/NSFW content detection
- OCR text extraction (pytesseract)
- Versioned extraction results with confidence scores
- Event-driven — subscribes to docmanager events
- Async processing via tasks_worker

## Install

```bash
pip install fastpluggy-docmanager-ai
# With AI dependencies:
pip install fastpluggy-docmanager-ai[ai]
# With NSFW detection:
pip install fastpluggy-docmanager-ai[nsfw]
```

## Configuration

| Setting | Default | Description |
|---------|---------|-------------|
| `model_path` | `qwen3-vl:4b` | Ollama vision model |
| `categorization_temperature` | `0.1` | Classification temperature |
| `extraction_temperature` | `0.0` | Extraction temperature |

## 18 Builtin Prompts

| Key | Category | Modality |
|-----|----------|----------|
| `doc.classify` | all | multimodal |
| `doc.audience_classifier` | all | image |
| `doc.extract.invoice` | invoice | multimodal |
| `doc.extract.receipt` | receipt | multimodal |
| `doc.extract.bill` | bill | multimodal |
| `doc.extract.payslip` | payslip | multimodal |
| `doc.extract.bank_statement` | bank_statement | multimodal |
| `doc.extract.contract` | contract | multimodal |
| `doc.extract.correspondence` | correspondence | multimodal |
| `doc.extract.id_document` | id_document | multimodal |
| `doc.extract.insurance_document` | insurance_document | multimodal |
| `doc.extract.medical_document` | medical_document | multimodal |
| `doc.extract.tax_document` | tax_document | multimodal |
| `doc.extract.travel_document` | travel_document | multimodal |
| `doc.extract.warranty_document` | warranty_document | multimodal |
| `doc.extract.photo` | photo | image |
| `doc.extract.screenshot` | screenshot | image |
| `doc.extract.generic` | other | multimodal |

## Dependencies

- `FastPluggy>=0.4.0`
- `fastpluggy-docmanager>=0.1.0`

Optional: `tasks_worker` (async), `ollama_connector` (LLM)

## License

Private — FastPluggy project
