Spaces:

agentDebugger
/

AgentDebugger-training-v3

Running

App Files Files Community

AgentDebugger-training-v3

Commit History

Add --hub-resume and --step-offset to allow resuming directly from HF Hub PEFT checkpoints

8bd8552

Running

shank commited on 8 days ago

Revert speed optimizations to prioritize real-world model quality

cf25957

shank commited on 9 days ago

Optimize training speed for T4 GPUs

2c50d8a

shank commited on 9 days ago

Fix HF_TOKEN scope issue for CheckpointPushCallback

ca28017

shank commited on 9 days ago

Made changes to train_grpo

a07dc7f

shank commited on 9 days ago

Fix links to match correct usernames/orgs

3293f97

shank commited on 9 days ago

Fix checkpoint persistence, add leaderboard and update HF links

e160aa1

shank commited on 9 days ago

fix: use GitHub raw URLs for images so README renders on HF Space

3eb8edc

shank commited on about 1 month ago

Added blog post

c02b65b

shank commited on about 1 month ago

Added blog post

eacdf84

shank commited on about 1 month ago

Added readme again

3165754

shank commited on about 1 month ago

Added readme

f6f33cf

shank commited on about 1 month ago

Update: Added final imporvements for hackathon

713f336

shank commited on about 1 month ago

chore: clean up local dev files and temporary virtual environments

59986c5

shank commited on about 1 month ago

chore: clean up local dev files and temporary virtual environments

374c6cc

shank commited on about 1 month ago

Update: Triggering the full run

75cd77b

shank commited on about 1 month ago

Fix: batch%num_generations math

2b499e7

shank commited on about 1 month ago

Cuda returns false fixed

b8172c5

shank commited on about 1 month ago

COMPUTE_DRIVE fix

77156dd

shank commited on about 1 month ago

Fix: Removed BitsandBytes

bdec91d

shank commited on about 1 month ago

Fix: Fixed dependancy issues

db12eaa

shank commited on about 1 month ago

Revert "Fix: Dockerfile"

dc7eb3f

shank commited on about 1 month ago

Fix: Dockerfile

5dcd156

shank commited on about 1 month ago

Fix: Fixed again again

accb271

shank commited on about 1 month ago

Fix: Fixed again

9864e61

shank commited on about 1 month ago

Fix: Fixing Again

6747185

shank commited on about 1 month ago

Fix: Fixing

18b4e8a

shank commited on about 1 month ago

Fix: Trying to fix dependency issues

024f3c7

shank commited on about 1 month ago

Fix: Fixed file

cb09ef1

shank commited on about 1 month ago

fix: serialize bug_metadata as JSON to fix pyarrow mixed-type error

4668456

shank commited on about 1 month ago

fix: upgrade bitsandbytes>=0.49.0 (triton.ops), switch to Qwen2.5-Coder-3B

a2fa47a

shank commited on Apr 26

fix: torch at build time, remove mergekit (conflicts accelerate/peft/trl)

2bfaf77

shank commited on Apr 26

fix: empty requirements.txt, install training deps at runtime

5d0b2d4

shank commited on Apr 26

fix: remove wandb - click conflict with gradio causes resolution-too-deep

2005cd2

shank commited on Apr 26

fix: resolve pip dependency conflicts for HF Spaces build

d0d5f60

shank commited on Apr 26

Fix: loosen strict dependencies to prevent pip backtracking

2e3be87

shank commited on Apr 25

fix: remove hardcoded torch from requirements for HF space

fe04772

shank commited on Apr 25

chore: normalize dataset inputs and fix mergekit dependency for TRL 0.14.0

e67270e

shank commited on Apr 25

Add HANDOVER.md: full project state, deps, training instructions, known fixes

97aad17

shank commited on Apr 25

Auto-detect GPU: bfloat16+batch2+gen8 on A100, float16+batch1+gen4 on T4 — same script works on both

ea6fe4e

shank commited on Apr 25

Reduce max_completion_length to 160 for T4 speed: target 1000 steps in <8hrs

9487853

shank commited on Apr 25

Fix: bump bitsandbytes to 0.45.3 for CUDA 12.x support on Kaggle T4

6bf2fbb

shank commited on Apr 25

Optimize for Kaggle P100: float16, batch=1, grad_accum=8, num_gen=4, max_completion=256, lora_r=8

73f957d

shank commited on Apr 25

Fix GRPOConfig: rename max_new_tokens to max_completion_length for trl==0.14.0

8b16369

shank commited on Apr 25

Update: Added testing

a5c67b3

shank commited on Apr 25

Align gradio version with Hugging Face Space builder2

633a3b7

shank commited on Apr 25

Add dockerignore to reduce Space build context

c945597

shank commited on Apr 25

Stabilize Space runtime: pin ML deps and disable runtime package drift

663b8db

shank commited on Apr 25

Pin torch to cu121 build + use model.device instead of hardcoded cuda string

8f291e0

shank commited on Apr 25

Replace unsloth with bitsandbytes+peft: fixes CUDA driver incompatibility on HF A100

c325ad7

shank commited on Apr 25

Commit History

Add --hub-resume and --step-offset to allow resuming directly from HF Hub PEFT checkpoints 8bd8552 Running

Revert speed optimizations to prioritize real-world model quality cf25957

Optimize training speed for T4 GPUs 2c50d8a

Fix HF_TOKEN scope issue for CheckpointPushCallback ca28017

Made changes to train_grpo a07dc7f

Fix links to match correct usernames/orgs 3293f97

Fix checkpoint persistence, add leaderboard and update HF links e160aa1

fix: use GitHub raw URLs for images so README renders on HF Space 3eb8edc

Added blog post c02b65b

Added blog post eacdf84

Added readme again 3165754

Added readme f6f33cf

Update: Added final imporvements for hackathon 713f336

chore: clean up local dev files and temporary virtual environments 59986c5

chore: clean up local dev files and temporary virtual environments 374c6cc

Update: Triggering the full run 75cd77b

Fix: batch%num_generations math 2b499e7

Cuda returns false fixed b8172c5

COMPUTE_DRIVE fix 77156dd

Fix: Removed BitsandBytes bdec91d

Fix: Fixed dependancy issues db12eaa

Revert "Fix: Dockerfile" dc7eb3f

Fix: Dockerfile 5dcd156

Fix: Fixed again again accb271

Fix: Fixed again 9864e61

Fix: Fixing Again 6747185

Fix: Fixing 18b4e8a

Fix: Trying to fix dependency issues 024f3c7

Fix: Fixed file cb09ef1

fix: serialize bug_metadata as JSON to fix pyarrow mixed-type error 4668456

fix: upgrade bitsandbytes>=0.49.0 (triton.ops), switch to Qwen2.5-Coder-3B a2fa47a

fix: torch at build time, remove mergekit (conflicts accelerate/peft/trl) 2bfaf77

fix: empty requirements.txt, install training deps at runtime 5d0b2d4

fix: remove wandb - click conflict with gradio causes resolution-too-deep 2005cd2

fix: resolve pip dependency conflicts for HF Spaces build d0d5f60

Fix: loosen strict dependencies to prevent pip backtracking 2e3be87

fix: remove hardcoded torch from requirements for HF space fe04772

chore: normalize dataset inputs and fix mergekit dependency for TRL 0.14.0 e67270e

Add HANDOVER.md: full project state, deps, training instructions, known fixes 97aad17

Auto-detect GPU: bfloat16+batch2+gen8 on A100, float16+batch1+gen4 on T4 — same script works on both ea6fe4e

Reduce max_completion_length to 160 for T4 speed: target 1000 steps in <8hrs 9487853

Fix: bump bitsandbytes to 0.45.3 for CUDA 12.x support on Kaggle T4 6bf2fbb

Optimize for Kaggle P100: float16, batch=1, grad_accum=8, num_gen=4, max_completion=256, lora_r=8 73f957d

Fix GRPOConfig: rename max_new_tokens to max_completion_length for trl==0.14.0 8b16369

Update: Added testing a5c67b3

Align gradio version with Hugging Face Space builder2 633a3b7

Add dockerignore to reduce Space build context c945597

Stabilize Space runtime: pin ML deps and disable runtime package drift 663b8db

Pin torch to cu121 build + use model.device instead of hardcoded cuda string 8f291e0

Replace unsloth with bitsandbytes+peft: fixes CUDA driver incompatibility on HF A100 c325ad7

Add --hub-resume and --step-offset to allow resuming directly from HF Hub PEFT checkpoints

8bd8552

Running

Revert speed optimizations to prioritize real-world model quality

cf25957

Optimize training speed for T4 GPUs

2c50d8a

Fix HF_TOKEN scope issue for CheckpointPushCallback

ca28017

Made changes to train_grpo

a07dc7f

Fix links to match correct usernames/orgs

3293f97

Fix checkpoint persistence, add leaderboard and update HF links

e160aa1

fix: use GitHub raw URLs for images so README renders on HF Space

3eb8edc

Added blog post

c02b65b

Added blog post

eacdf84

Added readme again

3165754

Added readme

f6f33cf

Update: Added final imporvements for hackathon

713f336

chore: clean up local dev files and temporary virtual environments

59986c5

chore: clean up local dev files and temporary virtual environments

374c6cc

Update: Triggering the full run

75cd77b

Fix: batch%num_generations math

2b499e7

Cuda returns false fixed

b8172c5

COMPUTE_DRIVE fix

77156dd

Fix: Removed BitsandBytes

bdec91d

Fix: Fixed dependancy issues

db12eaa

Revert "Fix: Dockerfile"

dc7eb3f

Fix: Dockerfile

5dcd156

Fix: Fixed again again

accb271

Fix: Fixed again

9864e61

Fix: Fixing Again

6747185

Fix: Fixing

18b4e8a

Fix: Trying to fix dependency issues

024f3c7

Fix: Fixed file

cb09ef1

fix: serialize bug_metadata as JSON to fix pyarrow mixed-type error

4668456

fix: upgrade bitsandbytes>=0.49.0 (triton.ops), switch to Qwen2.5-Coder-3B

a2fa47a

fix: torch at build time, remove mergekit (conflicts accelerate/peft/trl)

2bfaf77

fix: empty requirements.txt, install training deps at runtime

5d0b2d4

fix: remove wandb - click conflict with gradio causes resolution-too-deep

2005cd2

fix: resolve pip dependency conflicts for HF Spaces build

d0d5f60

Fix: loosen strict dependencies to prevent pip backtracking

2e3be87

fix: remove hardcoded torch from requirements for HF space

fe04772

chore: normalize dataset inputs and fix mergekit dependency for TRL 0.14.0

e67270e

Add HANDOVER.md: full project state, deps, training instructions, known fixes

97aad17

Auto-detect GPU: bfloat16+batch2+gen8 on A100, float16+batch1+gen4 on T4 — same script works on both

ea6fe4e

Reduce max_completion_length to 160 for T4 speed: target 1000 steps in <8hrs

9487853

Fix: bump bitsandbytes to 0.45.3 for CUDA 12.x support on Kaggle T4

6bf2fbb

Optimize for Kaggle P100: float16, batch=1, grad_accum=8, num_gen=4, max_completion=256, lora_r=8

73f957d

Fix GRPOConfig: rename max_new_tokens to max_completion_length for trl==0.14.0

8b16369

Update: Added testing

a5c67b3

Align gradio version with Hugging Face Space builder2

633a3b7

Add dockerignore to reduce Space build context

c945597

Stabilize Space runtime: pin ML deps and disable runtime package drift

663b8db

Pin torch to cu121 build + use model.device instead of hardcoded cuda string

8f291e0

Replace unsloth with bitsandbytes+peft: fixes CUDA driver incompatibility on HF A100

c325ad7