Spaces:
Sleeping
Sleeping
| # openenv.yaml | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # Manifest for the Data Cleaning Pipeline OpenEnv environment. | |
| # | |
| # Field reference | |
| # βββββββββββββββ | |
| # Required by the CLI (serve / build / push / validate): | |
| # spec_version β always 1 for this generation of the spec | |
| # name β environment identifier used by the CLI and auto-discovery | |
| # type β "space" means it can be deployed as a Hugging Face Space | |
| # runtime β "fastapi" tells the server how to boot | |
| # app β Python import path to the FastAPI app object | |
| # port β port the server listens on inside the container | |
| # | |
| # Read by AutoEnv auto-discovery (openenv.auto._discovery): | |
| # name β maps to env_key after stripping the "_env" suffix | |
| # description β human-readable label shown in env listings | |
| # spec_version β stored in EnvironmentInfo for introspection | |
| # action β EXPLICIT override of the auto-inferred class name | |
| # observation β EXPLICIT override of the auto-inferred class name | |
| # | |
| # NOTE on action / observation overrides: | |
| # Auto-discovery infers class names from the env name using PascalCase: | |
| # "data_cleaning_env" β base "data_cleaning" β "CleanAction" | |
| # Our actual class is named "CleanAction" (not "CleanAction"), | |
| # so these fields MUST be set to avoid ImportError on AutoEnv.from_env(). | |
| # | |
| # All other fields (tasks, reward, tags) are informational. They are not | |
| # parsed by the current OpenEnv tooling but are preserved in | |
| # EnvironmentInfo.manifest and available to the web UI and external tools. | |
| # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| # ββ Core deployment fields ββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| spec_version: 1 | |
| name: data_cleaning_env | |
| type: space | |
| runtime: fastapi | |
| app: server.app:app | |
| port: 8000 | |
| # ββ Package metadata ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| version: "1.0.0" | |
| description: >- | |
| Data cleaning pipeline: the agent receives a dirty CSV and must detect | |
| and fix type errors, missing values, outliers, and schema inconsistencies | |
| to match a hidden ground-truth dataset. Three tasks (easy β medium β hard) | |
| with a deterministic grader that returns a continuous score in [0.0, 1.0]. | |
| # ββ Auto-discovery class overrides βββββββββββββββββββββββββββββββββββββββββββ | |
| # These override auto-inferred names (which would be CleanAction / | |
| # CleanAction) to match the actual class names defined in models.py. | |
| action: CleanAction | |
| observation: CleanObservation | |
| # The client class is correctly inferred as DataCleaningEnv (data_cleaning β | |
| # DataCleaning + Env), which matches client.py, so no override is needed. | |
| # ββ Tags (informational) ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| tags: | |
| - data-cleaning | |
| - tabular | |
| - real-world | |
| - hackathon | |
| # ββ Task manifest (informational) βββββββββββββββββββββββββββββββββββββββββββββ | |
| # One entry per task. These values mirror the constants in models.py | |
| # (MAX_STEPS, DONE_THRESHOLD) and the descriptions in dataset_factory.py. | |
| tasks: | |
| - id: easy | |
| name: Fix obvious errors | |
| description: >- | |
| 50-row sales CSV with 29 injected dirty cells: 10 type mismatches | |
| (text in numeric columns), 8 missing values, 5 far-future dates | |
| (year 2099), and 6 cells with leading/trailing whitespace. | |
| Graded by exact cell-level match against the ground truth (0.0β1.0). | |
| dataset_rows: 50 | |
| dirty_cells: 29 | |
| max_steps: 40 | |
| done_threshold: 0.95 | |
| - id: medium | |
| name: Outlier detection without false positives | |
| description: >- | |
| 200-row customer transaction CSV with 15 true statistical outliers | |
| (negative or > $2000 amounts) that must be fixed or removed, 5 valid | |
| large transactions ($900β$2000) that must NOT be removed, and 12 | |
| category spelling typos. Graded by F1 score on outlier detection | |
| (0.5 weight) and typo correction rate (0.5 weight). | |
| dataset_rows: 200 | |
| dirty_cells: 27 | |
| max_steps: 80 | |
| done_threshold: 0.85 | |
| - id: hard | |
| name: Multi-source schema normalisation and deduplication | |
| description: >- | |
| 430-row CSV (400 clean + 30 duplicates) merged from 3 fictional data | |
| sources with inconsistent column naming (e.g. cust_id / customer_id / | |
| CustomerID), mixed date formats (ISO, US, EU), and ~30 duplicate rows | |
| (exact and near-duplicate). Agent must infer the canonical 9-column | |
| schema without explicit documentation. Graded by schema match (40%), | |
| deduplication F1 (35%), and date format compliance (25%). | |
| dataset_rows: 430 | |
| canonical_rows: 400 | |
| canonical_columns: 9 | |
| duplicate_rows: 30 | |
| max_steps: 150 | |
| done_threshold: 0.80 | |
| # ββ Reward function summary (informational) βββββββββββββββββββββββββββββββββββ | |
| reward: | |
| type: dense | |
| range: [-0.5, 1.0] | |
| step_cost: -0.005 | |
| components: | |
| - name: progress | |
| weight: primary | |
| description: >- | |
| Grader score delta each step (curr_score β prev_score). | |
| The main learning signal β any cell fixed produces a non-zero reward. | |
| - name: efficiency_bonus | |
| weight: "+0.10 Γ (1 β step_fraction)" | |
| description: >- | |
| Small bonus awarded the step the episode is solved (score crosses | |
| done_threshold). Rewards finishing early relative to the step budget. | |
| - name: false_positive_penalty | |
| weight: -0.15 | |
| description: >- | |
| Applied when DROP_ROW removes a valid-extreme row in the medium task. | |
| Penalises aggressive deletion without checking schema_hint. | |
| - name: early_done_penalty | |
| weight: -0.20 | |
| description: >- | |
| Applied when the agent sends DONE with current_score < 0.60. | |
| Discourages giving up prematurely. | |
| - name: step_cost | |
| weight: -0.005 | |
| description: >- | |
| Fixed cost every step regardless of outcome. | |
| Prevents infinite loops and padding. |