Spaces:

CodeKnightDebjit
/

data_cleaning_env

Sleeping

App Files Files Community

data_cleaning_env / README.md

CodeKnightDebjit

Upload folder using huggingface_hub

3ce214c verified 14 days ago

preview code

raw

history blame contribute delete

28 kB

	---
	title: Data Cleaning Environment
	emoji: 🧹
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	base_path: /web
	---
	<div align="center">

	# 🧹 Data Cleaning Environment

	### A Reinforcement Learning Benchmark for Autonomous Data Cleaning Agents

	[![Python](https://img.shields.io/badge/Python-3.12+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
	[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-FF6B35?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)
	[![Pydantic](https://img.shields.io/badge/Pydantic-v2-E92063?style=for-the-badge&logo=pydantic&logoColor=white)](https://docs.pydantic.dev/)
	[![FastAPI](https://img.shields.io/badge/FastAPI-WebSocket-009688?style=for-the-badge&logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com/)
	[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace-Deployable-FFD21E?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/)
	[![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)](LICENSE)

	<br/>

	> An OpenEnv-compatible reinforcement learning environment where an LLM agent receives a dirty CSV dataset and must autonomously fix type errors, outliers, missing values, and schema inconsistencies to match a hidden ground truth — step by step.

	<br/>

	```
	┌──────────────────────────────────────────────────────────────────┐
	│ Dirty CSV → Agent Observes → Issues CleanAction → Reward │
	│ │
	│ "N/A" → FILL_MISSING(median) → Score ↑ → +0.12 reward │
	│ "2099" → SET_VALUE(row=3,"2024-01-15") → Score ↑ → +0.08 │
	│ " bob" → STANDARDIZE_COL("name") → Score ↑ → +0.05 │
	└──────────────────────────────────────────────────────────────────┘
	```

	</div>

	---

	## 📑 Table of Contents

	- [Overview](#-overview)
	- [Architecture](#-architecture)
	- [Project Structure](#-project-structure)
	- [Tasks](#-tasks)
	- [Action Space](#-action-space)
	- [Observation Space](#-observation-space)
	- [Reward Function](#-reward-function)
	- [Quick Start](#-quick-start)
	- [Running Inference](#-running-inference)
	- [Environment API](#-environment-api)
	- [Configuration](#-configuration)
	- [Deployment](#-deployment)
	- [Development & Testing](#-development--testing)
	- [Troubleshooting](#-troubleshooting)

	---

	## 🌟 Overview

	The Data Cleaning Environment is a structured RL benchmark where an LLM-powered agent must clean tabular datasets. The environment wraps a FastAPI WebSocket server following the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) protocol, making it compatible with any OpenEnv-based training or evaluation framework.

	### Why This Matters

	Real-world data pipelines spend 60–80% of their time on data cleaning. This environment trains agents to:

	- Detect type errors, outliers, missing values, and schema inconsistencies
	- Reason about which fix is most impactful at each step
	- Self-correct from informative error feedback
	- Terminate efficiently without over-cleaning

	### Key Properties

	\| Property \| Value \|
	\|---\|---\|
	\| Protocol \| OpenEnv (WebSocket + HTTP) \|
	\| Action Space \| Discrete (5 command types) \|
	\| Observation \| Full CSV state + grader feedback \|
	\| Episode Structure \| Reset → N × Step → Done \|
	\| Concurrency \| ✅ Multiple simultaneous sessions \|
	\| State Management \| Server-side, fully isolated per session \|

	---

	## 🏗️ Architecture

	```
	┌─────────────────────────────────────────────────────────────────────┐
	│ Agent (LLM / RL Policy) │
	│ Qwen2.5-72B / Mistral / Custom Model │
	└────────────────────────┬───────────────────────────────┬────────────┘
	│ CleanAction (JSON) │ CleanObservation
	▼ │
	┌────────────────────────────────────────────────────────┴────────────┐
	│ DataCleaningEnv (client.py) │
	│ OpenEnv EnvClient[CleanAction, CleanObservation, dict] │
	│ WebSocket persistent connection │
	└────────────────────────┬────────────────────────────────────────────┘
	│ WebSocket /ws
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ FastAPI Server (server/app.py) │
	│ HTTP + WebSocket endpoints, sessions │
	└────────────────────────┬────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ DataCleaningEnvironment (server/data_cleaning_env.py) │
	│ │
	│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ ┌────────────┐ │
	│ │ dataset_ │ │ Action │ │ Grader │ │ Reward │ │
	│ │ factory.py │ │ Dispatcher │ │ Engine │ │ Computer │ │
	│ │ │ │ SET_VALUE │ │ grade() │ │ │ │
	│ │ easy/medium │ │ DROP_ROW │ │ score │ │ progress │ │
	│ │ /hard CSVs │ │ STANDARD. │ │ delta │ │ efficiency│ │
	│ │ │ │ FILL_MISS. │ │ │ │ penalties │ │
	│ └─────────────┘ └──────────────┘ └───────────┘ └────────────┘ │
	└─────────────────────────────────────────────────────────────────────┘
	```

	---

	## 📁 Project Structure

	```
	data_cleaning_env/
	│
	├── 📄 client.py # DataCleaningEnv — OpenEnv client
	├── 📄 models.py # CleanAction, CleanObservation, CleanState (Pydantic)
	├── 📄 inference.py # Official evaluation entry point
	├── 📄 dataset_factory.py # Generates easy/medium/hard dirty↔clean CSV pairs
	├── 📄 graders.py # Scoring engine — grade(agent_df vs clean_df)
	├── 📄 openenv.yaml # OpenEnv manifest (HuggingFace Spaces config)
	├── 📄 pyproject.toml # Project metadata and dependencies
	│
	└── server/
	├── 📄 app.py # FastAPI application (HTTP + WebSocket)
	├── 📄 data_cleaning_env.py # Core environment logic (reset/step/state)
	├── 📄 __init__.py
	└── 📄 Dockerfile # Container image definition
	```

	---

	## 🎯 Tasks

	The environment ships three progressively harder tasks, each with fixed-seed deterministic datasets:

	### 🟢 Easy — Sales Orders

	\| Property \| Value \|
	\|---\|---\|
	\| Dataset \| ~100-row sales orders CSV \|
	\| Dirty Issues \| Cell-level type errors, a few missing values \|
	\| Step Budget \| 40 steps \|
	\| Success Threshold \| Score ≥ 0.95 \|
	\| Primary Skills \| `SET_VALUE`, `FILL_MISSING` \|

	What the agent needs to fix: Individual cells with wrong types (e.g., `"N/A"` in a price column, `"abc"` in a numeric field). Straightforward injected errors with clear ground truth.

	---

	### 🟡 Medium — Financial Transactions

	\| Property \| Value \|
	\|---\|---\|
	\| Dataset \| ~200-row transaction log \|
	\| Dirty Issues \| Outlier rows, mixed date formats, missing amounts \|
	\| Step Budget \| 80 steps \|
	\| Success Threshold \| Score ≥ 0.85 \|
	\| Primary Skills \| `DROP_ROW`, `STANDARDIZE_COL`, `FILL_MISSING` \|

	What the agent needs to fix: Statistical outliers disguised as data, inconsistent date formats, missing numeric values. Crucially, some extreme values are valid — dropping them costs a false-positive penalty.

	---

	### 🔴 Hard — Multi-Schema Dataset

	\| Property \| Value \|
	\|---\|---\|
	\| Dataset \| ~400-row multi-domain CSV \|
	\| Dirty Issues \| Cross-column inconsistencies, future-year dates, bulk missing data \|
	\| Step Budget \| 150 steps \|
	\| Success Threshold \| Score ≥ 0.80 \|
	\| Primary Skills \| All commands \|

	What the agent needs to fix: Everything from easy + medium, plus cascading schema issues across columns. Requires strategic planning about fix order.

	---

	## 🕹️ Action Space

	Every step the agent sends exactly one `CleanAction`:

	```python
	from models import CleanAction

	# Fix a specific cell
	CleanAction(command="SET_VALUE", row_index=3, column="price", value="29.99")

	# Remove an entire row (use carefully — false positives are penalised)
	CleanAction(command="DROP_ROW", row_index=17)

	# Normalise a column's format (dates → YYYY-MM-DD, numbers → float, strings → stripped)
	CleanAction(command="STANDARDIZE_COL", column="order_date")

	# Fill all NaN values in a column using a strategy
	CleanAction(command="FILL_MISSING", column="quantity", fill_strategy="median")

	# Signal episode completion (only accepted when score ≥ task threshold)
	CleanAction(command="DONE")
	```

	### Command Reference

	\| Command \| `row_index` \| `column` \| `value` \| `fill_strategy` \|
	\|---\|---\|---\|---\|---\|
	\| `SET_VALUE` \| ✅ required \| ✅ required \| ✅ required \| — \|
	\| `DROP_ROW` \| ✅ required \| — \| — \| — \|
	\| `STANDARDIZE_COL` \| — \| ✅ required \| — \| — \|
	\| `FILL_MISSING` \| — \| ✅ required \| — \| ✅ required \|
	\| `DONE` \| — \| — \| — \| — \|

	### `FILL_MISSING` Strategies

	\| Strategy \| Behaviour \|
	\|---\|---\|
	\| `"mean"` \| Replace NaN with column mean (numeric columns only) \|
	\| `"median"` \| Replace NaN with column median (numeric columns only) \|
	\| `"mode"` \| Replace NaN with most frequent value (any column) \|
	\| `"drop"` \| Remove rows where this column is NaN \|

	> ⚠️ Important: `DROP_ROW` removes by positional row index (the `row_index` column in the CSV), not by a row ID field. Row indices shift after each drop.

	---

	## 👁️ Observation Space

	After every `reset()` and `step()`, the agent receives a `CleanObservation`:

	```python
	@dataclass
	class CleanObservation:
	# ── Task context (constant per episode) ──────────────────────
	task_id: str # "easy" \| "medium" \| "hard"
	schema_hint: str # Plain-English description of clean schema
	initial_dirty_cells: int # Total dirty cells at episode start

	# ── Per-step state ───────────────────────────────────────────
	dirty_csv: str # Full current CSV as string (all edits applied)
	current_score: float # 0.0 → 1.0 (grader score vs ground truth)
	issues_remaining: int # Approximate dirty cells still to fix
	step_number: int # Steps taken so far
	max_steps: int # Budget for this task

	# ── Last-action feedback ─────────────────────────────────────
	last_action_success: bool # Whether previous action applied cleanly
	last_action_error: str # Error message if success=False (else None)

	# ── Inherited ────────────────────────────────────────────────
	done: bool # True = episode ended
	reward: float \| None # Per-step reward (None after reset)
	```

	### Score Computation

	The grader compares the agent's working DataFrame to the hidden ground-truth DataFrame:

	```
	score = (initial_dirty_cells - remaining_dirty_cells) / initial_dirty_cells
	```

	A score of `1.0` means perfect agreement with ground truth.

	---

	## 💰 Reward Function

	The reward is dense and shaped to guide efficient, precise cleaning:

	```
	reward = progress_term
	+ efficiency_bonus
	+ false_positive_penalty
	+ early_done_penalty
	+ step_cost
	```

	\| Component \| Value \| When \|
	\|---\|---\|---\|
	\| Progress \| `current_score − previous_score` \| Every step \|
	\| Efficiency bonus \| `+0.10 × (1 − steps_used/max_steps)` \| Only when task is solved this step \|
	\| False-positive penalty \| `−0.15` \| `DROP_ROW` removes a valid-extreme row (medium task) \|
	\| Early DONE penalty \| `−0.20` \| `DONE` called with score < 0.60 \|
	\| Step cost \| `−0.005` \| Every step (discourages padding) \|
	\| Premature DONE block \| `−1.00` \| `DONE` below task threshold — episode continues \|

	Reward range: `[−0.5, +1.0]` (clipped)

	### Termination Logic

	The episode terminates when any of these is true:

	1. ✅ `current_score >= task_threshold` (auto-terminated, efficiency bonus awarded)
	2. ✅ Agent sends `DONE` and `current_score >= task_threshold` (accepted)
	3. ⏱️ `step_count >= max_steps` (budget exhausted)

	`DONE` is refused if the score is below threshold — the episode continues with a `−1.0` reward signal.

	---

	## 🚀 Quick Start

	### Prerequisites

	- Python 3.12+
	- Docker Desktop (for containerised server)
	- A free [HuggingFace token](https://huggingface.co/settings/tokens) (for the inference LLM)

	### 1. Clone & Install

	```bash
	git clone https://github.com/Code-Knight-Debjit/Data-Cleaning-Environment.git
	cd Data-Cleaning-Environment

	# Create virtual environment
	python -m venv .venv

	# Activate (Windows PowerShell)
	.venv\Scripts\Activate.ps1

	# Activate (macOS/Linux)
	source .venv/bin/activate

	# Install dependencies
	pip install -e .
	```

	### 2. Build the Docker Image

	```bash
	docker build -t openenv-data_cleaning:latest -f server/Dockerfile .
	```

	### 3. Set Your HuggingFace Token

	```powershell
	# Windows PowerShell
	$env:HF_TOKEN = "hf_your_token_here"

	# macOS / Linux
	export HF_TOKEN="hf_your_token_here"
	```

	### 4. Run Inference

	```bash
	python inference.py
	```

	That's it! The script auto-starts the Docker container, runs the LLM agent through all three tasks (easy → medium → hard), and prints structured evaluation logs.

	---

	## 🤖 Running Inference

	### Environment Variables

	\| Variable \| Default \| Description \|
	\|---\|---\|---\|
	\| `HF_TOKEN` \| (required) \| Your HuggingFace token for LLM API access \|
	\| `API_BASE_URL` \| `https://router.huggingface.co/v1` \| LLM API endpoint \|
	\| `MODEL_NAME` \| `Qwen/Qwen2.5-72B-Instruct` \| Model to use for inference \|
	\| `LOCAL_IMAGE_NAME` \| `openenv-data_cleaning:latest` \| Docker image to launch \|
	\| `ENV_BASE_URL` \| `http://localhost:8000` \| Direct server URL (if not using Docker) \|

	### Switching Models

	```powershell
	# Use Mistral (smaller, faster)
	$env:MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

	# Use Llama
	$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
	```

	### Connecting to a Running Server (skip Docker)

	```powershell
	$env:LOCAL_IMAGE_NAME = "" # must be empty string
	$env:ENV_BASE_URL = "http://localhost:8000"
	python inference.py
	```

	### Expected Output

	```
	API_BASE_URL : https://router.huggingface.co/v1
	MODEL_NAME : Qwen/Qwen2.5-72B-Instruct
	LOCAL_IMAGE_NAME : openenv-data_cleaning:latest
	ENV_BASE_URL : http://localhost:8000

	[START] task=easy env=data_cleaning_env model=Qwen/Qwen2.5-72B-Instruct
	[STEP] step=1 action=FILL_MISSING reward=0.12 done=false error=null
	[STEP] step=2 action=SET_VALUE reward=0.08 done=false error=null
	[STEP] step=3 action=STANDARDIZE_COL reward=0.05 done=false error=null
	...
	[END] success=true steps=18 score=0.97 rewards=0.12,0.08,...

	[START] task=medium env=data_cleaning_env ...
	...

	════════════════════════════════════════════════════════
	Task Score Reward Steps Pass
	────────────────────────────────────────────────────────
	easy 0.9712 1.3400 18 YES
	medium 0.8823 2.1100 47 YES
	hard 0.7640 1.8500 98 NO
	════════════════════════════════════════════════════════
	```

	---

	## 🔌 Environment API

	### Using the Python Client Directly

	```python
	import asyncio
	from client import DataCleaningEnv
	from models import CleanAction

	async def run():
	# Option A: Auto-start Docker container
	env = await DataCleaningEnv.from_docker_image("openenv-data_cleaning:latest")

	# Option B: Connect to an already-running server
	# env = DataCleaningEnv(base_url="http://localhost:8000")
	# await env.connect()

	try:
	# Reset for a specific task
	result = await env.reset(task_id="easy")
	obs = result.observation

	print(f"Score: {obs.current_score:.4f}")
	print(f"Issues: {obs.issues_remaining}")
	print(f"Schema: {obs.schema_hint}")

	# Take a step
	action = CleanAction(
	command="FILL_MISSING",
	column="price",
	fill_strategy="median"
	)
	result = await env.step(action)
	obs = result.observation

	print(f"Reward: {result.reward:.4f}")
	print(f"New score: {obs.current_score:.4f}")
	print(f"Action OK: {obs.last_action_success}")

	# Signal completion
	result = await env.step(CleanAction(command="DONE"))

	finally:
	await env.close()

	asyncio.run(run())
	```

	### Using the Sync Wrapper

	```python
	from client import DataCleaningEnv
	from models import CleanAction

	env = DataCleaningEnv(base_url="http://localhost:8000").sync()

	with env:
	result = env.reset(task_id="easy")
	result = env.step(CleanAction(command="STANDARDIZE_COL", column="order_date"))
	print(f"Score: {result.observation.current_score:.4f}")
	```

	### HTTP Endpoints

	When the server is running, the following HTTP endpoints are available:

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `/health` \| GET \| Server health check \|
	\| `/docs` \| GET \| Swagger / OpenAPI documentation \|
	\| `/web` \| GET \| Interactive web UI \|
	\| `/ws` \| WebSocket \| Persistent session endpoint \|

	---

	## ⚙️ Configuration

	### Step Budgets

	```python
	MAX_STEPS = {
	"easy": 40,
	"medium": 80,
	"hard": 150,
	}
	```

	### Success Thresholds

	```python
	DONE_THRESHOLD = {
	"easy": 0.95,
	"medium": 0.85,
	"hard": 0.80,
	}
	```

	### Reward Constants

	\| Constant \| Value \| Purpose \|
	\|---\|---\|---\|
	\| `STEP_COST` \| `-0.005` \| Per-step penalty to discourage padding \|
	\| `EARLY_DONE_PENALTY` \| `-0.20` \| Penalty for `DONE` below score 0.60 \|
	\| `EARLY_DONE_THRESHOLD` \| `0.60` \| Score floor for DONE without penalty \|
	\| `FALSE_POSITIVE_PENALTY` \| `-0.15` \| Penalty for wrongly dropping a valid row \|
	\| `EFFICIENCY_BONUS_WEIGHT` \| `0.10` \| Multiplier for early-completion bonus \|

	---

	## ☁️ Deployment

	### Deploy to HuggingFace Spaces

	```bash
	# Install the OpenEnv CLI
	pip install openenv

	# Authenticate with HuggingFace
	huggingface-cli login

	# Deploy (from the repo root where openenv.yaml lives)
	openenv push

	# Or deploy privately to a specific repo
	openenv push --repo-id your-username/data-cleaning-env --private
	```

	After deployment, your environment will be live at:
	```
	https://huggingface.co/spaces/your-username/data-cleaning-env
	```

	With endpoints:
	- Web UI: `/web`
	- API Docs: `/docs`
	- Health: `/health`
	- WebSocket: `/ws`

	### Connect to a HuggingFace Space

	```python
	env = await DataCleaningEnv.from_env("your-username/data-cleaning-env")
	# or run locally with UV (no Docker needed)
	env = await DataCleaningEnv.from_env("your-username/data-cleaning-env", use_docker=False)
	```

	### Run the Server Locally (Without Docker)

	```bash
	uvicorn server.app:app --reload --port 8000
	```

	---

	## 🧪 Development & Testing

	### Test the Environment Logic (No Server Needed)

	```bash
	# Runs a smoke test across all three tasks
	python server/data_cleaning_env.py
	```

	Expected output:
	```
	────────────────────────────────────────────────────────────────
	TASK: EASY
	────────────────────────────────────────────────────────────────
	reset() → score=0.0000 issues=29 done=False
	CSV: 101 rows, 5 cols
	Hint: Sales orders dataset. price must be float...
	step (bad col) → success=False error='Column 'DOES_NOT_EXIST' not found...'
	step (fix row=3 col='price') → success=True score=0.0345 reward=0.0295
	step (DONE, blocked) → done=False reward=-1.0 score=0.0345
	...
	All smoke tests passed.
	```

	### Test Pydantic Models

	```bash
	python models.py
	```

	### Test the Client Parser

	```bash
	python test_parse.py
	```

	### Run the Full Server Locally

	```bash
	uvicorn server.app:app --reload
	# Open http://localhost:8000/docs for interactive API explorer
	```

	---

	## 🔧 Troubleshooting

	### `TypeError: Too few arguments for EnvClient`

	Cause: Your `client.py` subclasses `EnvClient` with only 2 type parameters, but OpenEnv requires 3 (`ActT`, `ObsT`, `StateT`).

	Fix:
	```python
	# ❌ Wrong
	class DataCleaningEnv(EnvClient[CleanAction, CleanObservation]):

	# ✅ Correct
	class DataCleaningEnv(EnvClient[CleanAction, CleanObservation, dict]):
	```

	Also ensure `_parse_state` is implemented:
	```python
	def _parse_state(self, payload: dict) -> dict:
	return payload
	```

	---

	### `ValidationError: Input should be 'SET_VALUE', 'DROP_ROW', ...`

	Cause: Passing an invalid command string to `CleanAction`.

	Fix: Only these 5 commands are valid:
	```python
	"SET_VALUE" \| "DROP_ROW" \| "STANDARDIZE_COL" \| "FILL_MISSING" \| "DONE"
	```
	There is no `"drop_column"` — columns cannot be dropped, only rows.

	---

	### `UnboundLocalError: cannot access local variable 'env'`

	Cause 1: Docker image doesn't exist yet.
	```bash
	docker build -t openenv-data_cleaning:latest -f server/Dockerfile .
	```

	Cause 2: Stray test lines in `inference.py` referencing `env` before it's assigned.

	Fix: Remove any manually added lines like `action = CleanAction(...)` or `result = await env.step(action)` from inside `main()`. The `main()` function should only call `run_episode()` — all action logic belongs inside that function.

	---

	### `DONE rejected: score X < required Y`

	This is expected behaviour, not a bug. The environment refuses premature termination. The agent should continue cleaning until the score meets the task threshold.

	---

	### HuggingFace Router returns 401

	Ensure your token is set:
	```powershell
	$env:HF_TOKEN = "hf_your_token_here"
	```
	Get a free token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).

	---

	## 📐 Data Flow Diagram

	```
	┌──────────────────────────────────┐
	│ inference.py / custom agent │
	│ │
	│ 1. await env.reset(task_id=…) │
	│ 2. obs = result.observation │
	│ 3. build_prompt(obs) → LLM │
	│ 4. parse_action(llm_output) │
	│ 5. await env.step(action) │
	│ 6. GOTO 2 until done │
	└──────────────┬───────────────────┘
	│
	CleanAction (JSON over WebSocket)
	│
	▼
	┌──────────────────────────────────┐
	│ DataCleaningEnvironment │
	│ │
	│ _apply_action() │
	│ → mutates _dirty_df in-place │
	│ │
	│ grade(agent_df vs clean_df) │
	│ → score ∈ [0.0, 1.0] │
	│ │
	│ _compute_reward() │
	│ → progress + bonuses │
	│ │
	│ _build_observation() │
	│ → CleanObservation │
	└──────────────────────────────────┘
	```

	---

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch: `git checkout -b feature/my-improvement`
	3. Run the smoke tests: `python server/data_cleaning_env.py`
	4. Commit your changes: `git commit -m "feat: add my improvement"`
	5. Push and open a Pull Request

	---

	## 📄 License

	This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.

	---

	<div align="center">

	Built with ❤️ using [OpenEnv](https://github.com/meta-pytorch/OpenEnv) · [FastAPI](https://fastapi.tiangolo.com/) · [Pydantic](https://docs.pydantic.dev/) · [HuggingFace](https://huggingface.co/)

	</div>