data_cleaning_env / README.md
CodeKnightDebjit's picture
Upload folder using huggingface_hub
3ce214c verified
metadata
title: Data Cleaning Environment
emoji: 🧹
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
base_path: /web

🧹 Data Cleaning Environment

A Reinforcement Learning Benchmark for Autonomous Data Cleaning Agents

Python OpenEnv Pydantic FastAPI Docker HuggingFace License


An OpenEnv-compatible reinforcement learning environment where an LLM agent receives a dirty CSV dataset and must autonomously fix type errors, outliers, missing values, and schema inconsistencies to match a hidden ground truth β€” step by step.


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Dirty CSV  β†’  Agent Observes  β†’  Issues CleanAction  β†’  Reward β”‚
β”‚                                                                  β”‚
β”‚   "N/A"  β†’  FILL_MISSING(median)  β†’  Score ↑  β†’  +0.12 reward  β”‚
β”‚   "2099" β†’  SET_VALUE(row=3,"2024-01-15")  β†’  Score ↑  β†’  +0.08 β”‚
β”‚   "  bob" β†’ STANDARDIZE_COL("name")  β†’  Score ↑  β†’  +0.05       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‘ Table of Contents


🌟 Overview

The Data Cleaning Environment is a structured RL benchmark where an LLM-powered agent must clean tabular datasets. The environment wraps a FastAPI WebSocket server following the OpenEnv protocol, making it compatible with any OpenEnv-based training or evaluation framework.

Why This Matters

Real-world data pipelines spend 60–80% of their time on data cleaning. This environment trains agents to:

  • Detect type errors, outliers, missing values, and schema inconsistencies
  • Reason about which fix is most impactful at each step
  • Self-correct from informative error feedback
  • Terminate efficiently without over-cleaning

Key Properties

Property Value
Protocol OpenEnv (WebSocket + HTTP)
Action Space Discrete (5 command types)
Observation Full CSV state + grader feedback
Episode Structure Reset β†’ N Γ— Step β†’ Done
Concurrency βœ… Multiple simultaneous sessions
State Management Server-side, fully isolated per session

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Agent (LLM / RL Policy)                     β”‚
β”‚                  Qwen2.5-72B / Mistral / Custom Model               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ CleanAction (JSON)             β”‚ CleanObservation
                         β–Ό                               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      DataCleaningEnv (client.py)                     β”‚
β”‚               OpenEnv EnvClient[CleanAction, CleanObservation, dict] β”‚
β”‚                   WebSocket persistent connection                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚  WebSocket /ws
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      FastAPI Server (server/app.py)                  β”‚
β”‚                  HTTP + WebSocket endpoints, sessions                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               DataCleaningEnvironment (server/data_cleaning_env.py)  β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ dataset_    β”‚  β”‚  Action      β”‚  β”‚  Grader   β”‚  β”‚  Reward    β”‚ β”‚
β”‚  β”‚ factory.py  β”‚  β”‚  Dispatcher  β”‚  β”‚  Engine   β”‚  β”‚  Computer  β”‚ β”‚
β”‚  β”‚             β”‚  β”‚  SET_VALUE   β”‚  β”‚  grade()  β”‚  β”‚            β”‚ β”‚
β”‚  β”‚ easy/medium β”‚  β”‚  DROP_ROW    β”‚  β”‚  score    β”‚  β”‚  progress  β”‚ β”‚
β”‚  β”‚ /hard CSVs  β”‚  β”‚  STANDARD.   β”‚  β”‚  delta    β”‚  β”‚  efficiencyβ”‚ β”‚
β”‚  β”‚             β”‚  β”‚  FILL_MISS.  β”‚  β”‚           β”‚  β”‚  penalties β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

data_cleaning_env/
β”‚
β”œβ”€β”€ πŸ“„ client.py                  # DataCleaningEnv β€” OpenEnv client
β”œβ”€β”€ πŸ“„ models.py                  # CleanAction, CleanObservation, CleanState (Pydantic)
β”œβ”€β”€ πŸ“„ inference.py               # Official evaluation entry point
β”œβ”€β”€ πŸ“„ dataset_factory.py         # Generates easy/medium/hard dirty↔clean CSV pairs
β”œβ”€β”€ πŸ“„ graders.py                 # Scoring engine β€” grade(agent_df vs clean_df)
β”œβ”€β”€ πŸ“„ openenv.yaml               # OpenEnv manifest (HuggingFace Spaces config)
β”œβ”€β”€ πŸ“„ pyproject.toml             # Project metadata and dependencies
β”‚
└── server/
    β”œβ”€β”€ πŸ“„ app.py                 # FastAPI application (HTTP + WebSocket)
    β”œβ”€β”€ πŸ“„ data_cleaning_env.py   # Core environment logic (reset/step/state)
    β”œβ”€β”€ πŸ“„ __init__.py
    └── πŸ“„ Dockerfile             # Container image definition

🎯 Tasks

The environment ships three progressively harder tasks, each with fixed-seed deterministic datasets:

🟒 Easy β€” Sales Orders

Property Value
Dataset ~100-row sales orders CSV
Dirty Issues Cell-level type errors, a few missing values
Step Budget 40 steps
Success Threshold Score β‰₯ 0.95
Primary Skills SET_VALUE, FILL_MISSING

What the agent needs to fix: Individual cells with wrong types (e.g., "N/A" in a price column, "abc" in a numeric field). Straightforward injected errors with clear ground truth.


🟑 Medium β€” Financial Transactions

Property Value
Dataset ~200-row transaction log
Dirty Issues Outlier rows, mixed date formats, missing amounts
Step Budget 80 steps
Success Threshold Score β‰₯ 0.85
Primary Skills DROP_ROW, STANDARDIZE_COL, FILL_MISSING

What the agent needs to fix: Statistical outliers disguised as data, inconsistent date formats, missing numeric values. Crucially, some extreme values are valid β€” dropping them costs a false-positive penalty.


πŸ”΄ Hard β€” Multi-Schema Dataset

Property Value
Dataset ~400-row multi-domain CSV
Dirty Issues Cross-column inconsistencies, future-year dates, bulk missing data
Step Budget 150 steps
Success Threshold Score β‰₯ 0.80
Primary Skills All commands

What the agent needs to fix: Everything from easy + medium, plus cascading schema issues across columns. Requires strategic planning about fix order.


πŸ•ΉοΈ Action Space

Every step the agent sends exactly one CleanAction:

from models import CleanAction

# Fix a specific cell
CleanAction(command="SET_VALUE", row_index=3, column="price", value="29.99")

# Remove an entire row (use carefully β€” false positives are penalised)
CleanAction(command="DROP_ROW", row_index=17)

# Normalise a column's format (dates β†’ YYYY-MM-DD, numbers β†’ float, strings β†’ stripped)
CleanAction(command="STANDARDIZE_COL", column="order_date")

# Fill all NaN values in a column using a strategy
CleanAction(command="FILL_MISSING", column="quantity", fill_strategy="median")

# Signal episode completion (only accepted when score β‰₯ task threshold)
CleanAction(command="DONE")

Command Reference

Command row_index column value fill_strategy
SET_VALUE βœ… required βœ… required βœ… required β€”
DROP_ROW βœ… required β€” β€” β€”
STANDARDIZE_COL β€” βœ… required β€” β€”
FILL_MISSING β€” βœ… required β€” βœ… required
DONE β€” β€” β€” β€”

FILL_MISSING Strategies

Strategy Behaviour
"mean" Replace NaN with column mean (numeric columns only)
"median" Replace NaN with column median (numeric columns only)
"mode" Replace NaN with most frequent value (any column)
"drop" Remove rows where this column is NaN

⚠️ Important: DROP_ROW removes by positional row index (the row_index column in the CSV), not by a row ID field. Row indices shift after each drop.


πŸ‘οΈ Observation Space

After every reset() and step(), the agent receives a CleanObservation:

@dataclass
class CleanObservation:
    # ── Task context (constant per episode) ──────────────────────
    task_id: str               # "easy" | "medium" | "hard"
    schema_hint: str           # Plain-English description of clean schema
    initial_dirty_cells: int   # Total dirty cells at episode start

    # ── Per-step state ───────────────────────────────────────────
    dirty_csv: str             # Full current CSV as string (all edits applied)
    current_score: float       # 0.0 β†’ 1.0  (grader score vs ground truth)
    issues_remaining: int      # Approximate dirty cells still to fix
    step_number: int           # Steps taken so far
    max_steps: int             # Budget for this task

    # ── Last-action feedback ─────────────────────────────────────
    last_action_success: bool  # Whether previous action applied cleanly
    last_action_error: str     # Error message if success=False (else None)

    # ── Inherited ────────────────────────────────────────────────
    done: bool                 # True = episode ended
    reward: float | None       # Per-step reward (None after reset)

Score Computation

The grader compares the agent's working DataFrame to the hidden ground-truth DataFrame:

score = (initial_dirty_cells - remaining_dirty_cells) / initial_dirty_cells

A score of 1.0 means perfect agreement with ground truth.


πŸ’° Reward Function

The reward is dense and shaped to guide efficient, precise cleaning:

reward = progress_term
       + efficiency_bonus
       + false_positive_penalty
       + early_done_penalty
       + step_cost
Component Value When
Progress current_score βˆ’ previous_score Every step
Efficiency bonus +0.10 Γ— (1 βˆ’ steps_used/max_steps) Only when task is solved this step
False-positive penalty βˆ’0.15 DROP_ROW removes a valid-extreme row (medium task)
Early DONE penalty βˆ’0.20 DONE called with score < 0.60
Step cost βˆ’0.005 Every step (discourages padding)
Premature DONE block βˆ’1.00 DONE below task threshold β€” episode continues

Reward range: [βˆ’0.5, +1.0] (clipped)

Termination Logic

The episode terminates when any of these is true:

  1. βœ… current_score >= task_threshold (auto-terminated, efficiency bonus awarded)
  2. βœ… Agent sends DONE and current_score >= task_threshold (accepted)
  3. ⏱️ step_count >= max_steps (budget exhausted)

DONE is refused if the score is below threshold β€” the episode continues with a βˆ’1.0 reward signal.


πŸš€ Quick Start

Prerequisites

  • Python 3.12+
  • Docker Desktop (for containerised server)
  • A free HuggingFace token (for the inference LLM)

1. Clone & Install

git clone https://github.com/Code-Knight-Debjit/Data-Cleaning-Environment.git
cd Data-Cleaning-Environment

# Create virtual environment
python -m venv .venv

# Activate (Windows PowerShell)
.venv\Scripts\Activate.ps1

# Activate (macOS/Linux)
source .venv/bin/activate

# Install dependencies
pip install -e .

2. Build the Docker Image

docker build -t openenv-data_cleaning:latest -f server/Dockerfile .

3. Set Your HuggingFace Token

# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"

# macOS / Linux
export HF_TOKEN="hf_your_token_here"

4. Run Inference

python inference.py

That's it! The script auto-starts the Docker container, runs the LLM agent through all three tasks (easy β†’ medium β†’ hard), and prints structured evaluation logs.


πŸ€– Running Inference

Environment Variables

Variable Default Description
HF_TOKEN (required) Your HuggingFace token for LLM API access
API_BASE_URL https://router.huggingface.co/v1 LLM API endpoint
MODEL_NAME Qwen/Qwen2.5-72B-Instruct Model to use for inference
LOCAL_IMAGE_NAME openenv-data_cleaning:latest Docker image to launch
ENV_BASE_URL http://localhost:8000 Direct server URL (if not using Docker)

Switching Models

# Use Mistral (smaller, faster)
$env:MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

# Use Llama
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"

Connecting to a Running Server (skip Docker)

$env:LOCAL_IMAGE_NAME = ""   # must be empty string
$env:ENV_BASE_URL = "http://localhost:8000"
python inference.py

Expected Output

API_BASE_URL     : https://router.huggingface.co/v1
MODEL_NAME       : Qwen/Qwen2.5-72B-Instruct
LOCAL_IMAGE_NAME : openenv-data_cleaning:latest
ENV_BASE_URL     : http://localhost:8000

[START] task=easy env=data_cleaning_env model=Qwen/Qwen2.5-72B-Instruct
[STEP]  step=1  action=FILL_MISSING  reward=0.12 done=false  error=null
[STEP]  step=2  action=SET_VALUE     reward=0.08 done=false  error=null
[STEP]  step=3  action=STANDARDIZE_COL reward=0.05 done=false error=null
...
[END]   success=true steps=18 score=0.97 rewards=0.12,0.08,...

[START] task=medium env=data_cleaning_env ...
...

════════════════════════════════════════════════════════
Task        Score    Reward  Steps  Pass
────────────────────────────────────────────────────────
easy       0.9712    1.3400     18   YES
medium     0.8823    2.1100     47   YES
hard       0.7640    1.8500     98    NO
════════════════════════════════════════════════════════

πŸ”Œ Environment API

Using the Python Client Directly

import asyncio
from client import DataCleaningEnv
from models import CleanAction

async def run():
    # Option A: Auto-start Docker container
    env = await DataCleaningEnv.from_docker_image("openenv-data_cleaning:latest")

    # Option B: Connect to an already-running server
    # env = DataCleaningEnv(base_url="http://localhost:8000")
    # await env.connect()

    try:
        # Reset for a specific task
        result = await env.reset(task_id="easy")
        obs = result.observation

        print(f"Score: {obs.current_score:.4f}")
        print(f"Issues: {obs.issues_remaining}")
        print(f"Schema: {obs.schema_hint}")

        # Take a step
        action = CleanAction(
            command="FILL_MISSING",
            column="price",
            fill_strategy="median"
        )
        result = await env.step(action)
        obs = result.observation

        print(f"Reward: {result.reward:.4f}")
        print(f"New score: {obs.current_score:.4f}")
        print(f"Action OK: {obs.last_action_success}")

        # Signal completion
        result = await env.step(CleanAction(command="DONE"))

    finally:
        await env.close()

asyncio.run(run())

Using the Sync Wrapper

from client import DataCleaningEnv
from models import CleanAction

env = DataCleaningEnv(base_url="http://localhost:8000").sync()

with env:
    result = env.reset(task_id="easy")
    result = env.step(CleanAction(command="STANDARDIZE_COL", column="order_date"))
    print(f"Score: {result.observation.current_score:.4f}")

HTTP Endpoints

When the server is running, the following HTTP endpoints are available:

Endpoint Method Description
/health GET Server health check
/docs GET Swagger / OpenAPI documentation
/web GET Interactive web UI
/ws WebSocket Persistent session endpoint

βš™οΈ Configuration

Step Budgets

MAX_STEPS = {
    "easy":   40,
    "medium": 80,
    "hard":   150,
}

Success Thresholds

DONE_THRESHOLD = {
    "easy":   0.95,
    "medium": 0.85,
    "hard":   0.80,
}

Reward Constants

Constant Value Purpose
STEP_COST -0.005 Per-step penalty to discourage padding
EARLY_DONE_PENALTY -0.20 Penalty for DONE below score 0.60
EARLY_DONE_THRESHOLD 0.60 Score floor for DONE without penalty
FALSE_POSITIVE_PENALTY -0.15 Penalty for wrongly dropping a valid row
EFFICIENCY_BONUS_WEIGHT 0.10 Multiplier for early-completion bonus

☁️ Deployment

Deploy to HuggingFace Spaces

# Install the OpenEnv CLI
pip install openenv

# Authenticate with HuggingFace
huggingface-cli login

# Deploy (from the repo root where openenv.yaml lives)
openenv push

# Or deploy privately to a specific repo
openenv push --repo-id your-username/data-cleaning-env --private

After deployment, your environment will be live at:

https://huggingface.co/spaces/your-username/data-cleaning-env

With endpoints:

  • Web UI: /web
  • API Docs: /docs
  • Health: /health
  • WebSocket: /ws

Connect to a HuggingFace Space

env = await DataCleaningEnv.from_env("your-username/data-cleaning-env")
# or run locally with UV (no Docker needed)
env = await DataCleaningEnv.from_env("your-username/data-cleaning-env", use_docker=False)

Run the Server Locally (Without Docker)

uvicorn server.app:app --reload --port 8000

πŸ§ͺ Development & Testing

Test the Environment Logic (No Server Needed)

# Runs a smoke test across all three tasks
python server/data_cleaning_env.py

Expected output:

────────────────────────────────────────────────────────────────
TASK: EASY
────────────────────────────────────────────────────────────────
reset()  β†’ score=0.0000  issues=29  done=False
  CSV:  101 rows, 5 cols
  Hint: Sales orders dataset. price must be float...
step (bad col) β†’ success=False  error='Column 'DOES_NOT_EXIST' not found...'
step (fix row=3 col='price') β†’ success=True  score=0.0345  reward=0.0295
step (DONE, blocked)  β†’ done=False  reward=-1.0  score=0.0345
...
All smoke tests passed.

Test Pydantic Models

python models.py

Test the Client Parser

python test_parse.py

Run the Full Server Locally

uvicorn server.app:app --reload
# Open http://localhost:8000/docs for interactive API explorer

πŸ”§ Troubleshooting

TypeError: Too few arguments for EnvClient

Cause: Your client.py subclasses EnvClient with only 2 type parameters, but OpenEnv requires 3 (ActT, ObsT, StateT).

Fix:

# ❌ Wrong
class DataCleaningEnv(EnvClient[CleanAction, CleanObservation]):

# βœ… Correct
class DataCleaningEnv(EnvClient[CleanAction, CleanObservation, dict]):

Also ensure _parse_state is implemented:

def _parse_state(self, payload: dict) -> dict:
    return payload

ValidationError: Input should be 'SET_VALUE', 'DROP_ROW', ...

Cause: Passing an invalid command string to CleanAction.

Fix: Only these 5 commands are valid:

"SET_VALUE" | "DROP_ROW" | "STANDARDIZE_COL" | "FILL_MISSING" | "DONE"

There is no "drop_column" β€” columns cannot be dropped, only rows.


UnboundLocalError: cannot access local variable 'env'

Cause 1: Docker image doesn't exist yet.

docker build -t openenv-data_cleaning:latest -f server/Dockerfile .

Cause 2: Stray test lines in inference.py referencing env before it's assigned.

Fix: Remove any manually added lines like action = CleanAction(...) or result = await env.step(action) from inside main(). The main() function should only call run_episode() β€” all action logic belongs inside that function.


DONE rejected: score X < required Y

This is expected behaviour, not a bug. The environment refuses premature termination. The agent should continue cleaning until the score meets the task threshold.


HuggingFace Router returns 401

Ensure your token is set:

$env:HF_TOKEN = "hf_your_token_here"

Get a free token at huggingface.co/settings/tokens.


πŸ“ Data Flow Diagram

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   inference.py / custom agent    β”‚
                    β”‚                                  β”‚
                    β”‚  1. await env.reset(task_id=…)   β”‚
                    β”‚  2. obs = result.observation      β”‚
                    β”‚  3. build_prompt(obs) β†’ LLM       β”‚
                    β”‚  4. parse_action(llm_output)      β”‚
                    β”‚  5. await env.step(action)        β”‚
                    β”‚  6. GOTO 2 until done             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                    CleanAction (JSON over WebSocket)
                                   β”‚
                                   β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚        DataCleaningEnvironment    β”‚
                    β”‚                                  β”‚
                    β”‚  _apply_action()                 β”‚
                    β”‚    β†’ mutates _dirty_df in-place  β”‚
                    β”‚                                  β”‚
                    β”‚  grade(agent_df vs clean_df)     β”‚
                    β”‚    β†’ score ∈ [0.0, 1.0]          β”‚
                    β”‚                                  β”‚
                    β”‚  _compute_reward()               β”‚
                    β”‚    β†’ progress + bonuses          β”‚
                    β”‚                                  β”‚
                    β”‚  _build_observation()            β”‚
                    β”‚    β†’ CleanObservation            β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-improvement
  3. Run the smoke tests: python server/data_cleaning_env.py
  4. Commit your changes: git commit -m "feat: add my improvement"
  5. Push and open a Pull Request

πŸ“„ License

This project is licensed under the MIT License. See LICENSE for details.


Built with ❀️ using OpenEnv · FastAPI · Pydantic · HuggingFace