Add model

33c2790 verified 2 months ago

5.11 kB

	---
	language: en
	license: gpl-3.0
	library_name: transformers
	tags:
	- vision
	- image-classification
	- resnet
	- pruning
	- sparse
	base_model: microsoft/resnet-50
	pipeline_tag: image-classification
	datasets:
	- ILSVRC/imagenet-1k
	metrics:
	- accuracy
	---

	# ModHiFi Pruned ResNet-50 (Small)

	## Model Description

	This model is a structurally pruned version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture.
	Developed by the Machine Learning Lab at the Indian Institute of Science, it has been compressed to remove ~30% of the parameters while achieving higher accuracy than the base model.

	Unlike unstructured pruning (which zeros out weights), structural pruning physically removes entire channels and filters.
	This results in a model that is natively smaller, faster, and reduces FLOPs on standard hardware without needing specialized sparse inference engines.

	- Developed by: Machine Learning Lab, Indian Institute of Science
	- Model type: Convolutional Neural Network (Pruned ResNet)
	- License: GNU General Public License v3.0
	- Base Model: Microsoft ResNet-50

	## Performance & Efficiency

	\| Model Variant \| Sparsity \| Top-1 Acc \| Top-5 Acc \| Params (M) \| FLOPs (G) \| Size (MB) \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Original ResNet-50 \| 0% \| 76.13% \| 92.86% \| 25.56 \| 4.12 \| ~98 \|
	\| ModHiFi-Small \| ~32% \| 76.70% \| 93.32% \| 17.4 \| 1.9 \| ~66 \|

	On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of 1.69x on CPUs and 1.70x on GPUs.

	> Note: "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life.

	## ⚠️ Critical Note on Preprocessing & Accuracy

	Please Read Before Evaluating: This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default.

	Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, you may observe a ~0.5% - 1.0% drop in Top-1 accuracy if you use the default `preprocessor_config.json`.

	To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training:

	```python
	from torchvision import transforms
	from transformers import pipeline
	import torch

	# 1. Define the Exact PyTorch Transform
	val_transform = transforms.Compose([
	transforms.Resize(256), # Resize shortest edge to 256
	transforms.CenterCrop(224), # Center crop 224x224
	transforms.ToTensor(), # Convert to Tensor (0-1)
	transforms.Normalize( # ImageNet Normalization
	mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225]
	),
	])

	# 2. Define a Wrapper to force Pipeline to use PyTorch
	class PyTorchProcessor:
	def __init__(self, transform):
	self.transform = transform
	self.image_processor_type = "custom"

	def __call__(self, images, **kwargs):
	if not isinstance(images, list): images = [images]
	# Apply transforms and stack
	pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images])
	return {"pixel_values": pixel_values}

	# 3. Initialize Pipeline with Custom Processor
	pipe = pipeline(
	"image-classification",
	model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small",
	image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap
	trust_remote_code=True,
	device=0 # Use GPU if available
	)
	```

	## Quick Start

	If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline.

	### Install dependencies

	```bash
	pip install torch transformers
	```

	## Inference example

	```python
	import requests
	from PIL import Image
	from transformers import pipeline

	# Load model (ensure trust_remote_code=True for custom architecture)
	pipe = pipeline(
	"image-classification",
	model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small",
	trust_remote_code=True
	)

	# Load an image
	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)

	# Run Inference
	results = pipe(image)
	print(f"Predicted Class: {results[0]['label']}")
	print(f"Confidence: {results[0]['score']:.4f}")
	```

	## Citation

	If you use this model in your research, please cite the following paper:

	```
	@inproceedings{kashyap2026modhifi,
	title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification},
	author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib},
	booktitle = {Advances in Neural Information Processing Systems},
	year = {2025},
	eprint = {2511.19566},
	archivePrefix = {arXiv},
	primaryClass = {cs.LG},
	url = {https://arxiv.org/abs/2511.19566},
	}
	```