Parveshiiii commited on
Commit
4a3f223
·
verified ·
1 Parent(s): 82743e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -63
README.md CHANGED
@@ -5,66 +5,4 @@ base_model:
5
  - google/gemma-3-1b-it
6
  pipeline_tag: text-generation
7
  ---
8
-
9
- # Model Card: Parveshiiii/M1-MathX
10
-
11
- ## Model Details
12
- - **Model Name:** Parveshiiii/M1-MathX
13
- - **Base Architecture:** Gemma (1B parameters)
14
- - **Model Type:** Causal Language Model (text-generation)
15
- - **Training Framework:** Hugging Face Transformers
16
- - **Precision:** fp16
17
- - **Attention Mechanism:** Hybrid sliding-window and full attention layers
18
- - **Tokenizer:** Gemma tokenizer (vocab size 262,144)
19
-
20
- ## Usage
21
- ```python
22
- from transformers import pipeline, TextStreamer
23
-
24
- pipe = pipeline("text-generation", model="Parveshiiii/M1-MathX")
25
- messages = [
26
- {"role": "user", "content": "Who are you?"},
27
- ]
28
- streamer = TextStreamer(pipe.tokenizer)
29
- pipe(messages, streamer=streamer, max_new_tokens=10000)
30
- ```
31
- ## Intended Use
32
- - Designed for mathematical reasoning tasks, including problem solving, equation manipulation, and step-by-step derivations.
33
- - Suitable for educational contexts, math tutoring, and research experiments in reasoning alignment.
34
- - Not intended for general-purpose conversation or sensitive domains outside mathematics.
35
-
36
- ## Training Data
37
- - **Dataset:** MathX (curated mathematical reasoning dataset)
38
- - **Samples Used:** ~300
39
- - **Training Steps:** 50
40
- - **Method:** GRPO (Group Relative Policy Optimization) fine-tuning
41
- - **Objective:** Reinforcement-style alignment for improved reasoning clarity and correctness.
42
-
43
- ## Performance
44
- - Demonstrated strong performance on small-scale math problems and symbolic reasoning tasks.
45
- - Early benchmarks suggest improved accuracy compared to the base Gemma 1B model on math-specific datasets.
46
- - Requires formal evaluation on GSM8K, MATH, and other benchmarks for quantitative comparison.
47
-
48
- ## Limitations
49
- - Small dataset and limited training steps mean coverage is narrow.
50
- - May overfit to MathX patterns and fail on broader or more complex problems.
51
- - Not guaranteed to generalize outside mathematical reasoning.
52
- - As a 1B model, capacity is limited compared to larger LLMs.
53
-
54
- ## Ethical Considerations
55
- - Intended for safe educational use.
56
- - Should not be deployed in high-stakes environments without further validation.
57
- - Outputs may contain errors; human oversight is required.
58
-
59
- ## Citation
60
- If you use this model, please cite as:
61
- ```
62
- @misc{Parvesh2025M1MathX,
63
- author = {Parvesh Rawal},
64
- title = {Parveshiiii/M1-MathX: A Gemma-1B model fine-tuned on MathX with GRPO},
65
- year = {2025},
66
- howpublished = {\url{https://huggingface.co/Parveshiiii/M1-MathX}}
67
- }
68
- ```
69
-
70
- ---
 
5
  - google/gemma-3-1b-it
6
  pipeline_tag: text-generation
7
  ---
8
+ This model was fine‑tuned with GRPO for only 50 steps using 4 samples per step. The result is exceptionally high accuracy on JEE‑level mathematics problems, though its broader context handling and instruction‑following abilities were diminished. In essence, it has become a compact powerhouse — a “mini‑tank” built for raw mathematical problem‑solving rather than nuanced reasoning.