| --- |
| language: en |
| license: gpl-3.0 |
| library_name: transformers |
| tags: |
| - vision |
| - image-classification |
| - resnet |
| - pruning |
| - sparse |
| base_model: microsoft/resnet-50 |
| pipeline_tag: image-classification |
| datasets: |
| - ILSVRC/imagenet-1k |
| metrics: |
| - accuracy |
| --- |
| |
| # ModHiFi Pruned ResNet-50 (Small) |
|
|
| ## Model Description |
|
|
| This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. |
| Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~30% of the parameters** while achieving *higher accuracy* than the base model. |
|
|
| Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. |
| This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines. |
|
|
| - **Developed by:** Machine Learning Lab, Indian Institute of Science |
| - **Model type:** Convolutional Neural Network (Pruned ResNet) |
| - **License:** GNU General Public License v3.0 |
| - **Base Model:** Microsoft ResNet-50 |
|
|
| ## Performance & Efficiency |
|
|
| | Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) | |
| | :--- | :---: | :---: | :---: | :---: | :---: | :---: | |
| | **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 | |
| | **ModHiFi-Small** | **~32%** | **76.70%** | **93.32%** | **17.4** | **1.9** | **~66** | |
|
|
| On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **1.69x on CPUs** and **1.70x on GPUs**. |
|
|
| > **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life. |
|
|
| ## ⚠️ Critical Note on Preprocessing & Accuracy |
|
|
| **Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. |
|
|
| Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`. |
|
|
| To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training: |
|
|
| ```python |
| from torchvision import transforms |
| from transformers import pipeline |
| import torch |
| |
| # 1. Define the Exact PyTorch Transform |
| val_transform = transforms.Compose([ |
| transforms.Resize(256), # Resize shortest edge to 256 |
| transforms.CenterCrop(224), # Center crop 224x224 |
| transforms.ToTensor(), # Convert to Tensor (0-1) |
| transforms.Normalize( # ImageNet Normalization |
| mean=[0.485, 0.456, 0.406], |
| std=[0.229, 0.224, 0.225] |
| ), |
| ]) |
| |
| # 2. Define a Wrapper to force Pipeline to use PyTorch |
| class PyTorchProcessor: |
| def __init__(self, transform): |
| self.transform = transform |
| self.image_processor_type = "custom" |
| |
| def __call__(self, images, **kwargs): |
| if not isinstance(images, list): images = [images] |
| # Apply transforms and stack |
| pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images]) |
| return {"pixel_values": pixel_values} |
| |
| # 3. Initialize Pipeline with Custom Processor |
| pipe = pipeline( |
| "image-classification", |
| model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", |
| image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap |
| trust_remote_code=True, |
| device=0 # Use GPU if available |
| ) |
| ``` |
|
|
| ## Quick Start |
|
|
| If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline. |
|
|
| ### Install dependencies |
|
|
| ```bash |
| pip install torch transformers |
| ``` |
|
|
| ## Inference example |
|
|
| ```python |
| import requests |
| from PIL import Image |
| from transformers import pipeline |
| |
| # Load model (ensure trust_remote_code=True for custom architecture) |
| pipe = pipeline( |
| "image-classification", |
| model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", |
| trust_remote_code=True |
| ) |
| |
| # Load an image |
| url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
| image = Image.open(requests.get(url, stream=True).raw) |
| |
| # Run Inference |
| results = pipe(image) |
| print(f"Predicted Class: {results[0]['label']}") |
| print(f"Confidence: {results[0]['score']:.4f}") |
| ``` |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite the following paper: |
|
|
| ``` |
| @inproceedings{kashyap2026modhifi, |
| title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, |
| author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib}, |
| booktitle = {Advances in Neural Information Processing Systems}, |
| year = {2025}, |
| eprint = {2511.19566}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.LG}, |
| url = {https://arxiv.org/abs/2511.19566}, |
| } |
| ``` |
|
|