Instructions to use ControlNet/marlin_vit_base_ytf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ControlNet/marlin_vit_base_ytf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="ControlNet/marlin_vit_base_ytf", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ControlNet/marlin_vit_base_ytf", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - video | |
| - feature | |
| - face | |
| license: cc | |
| base_model: | |
| - ControlNet/MARLIN | |
| pipeline_tag: feature-extraction | |
| # MARLIN: Masked Autoencoder for facial video Representation LearnINg | |
| This repo is the official PyTorch implementation for the paper | |
| [MARLIN: Masked Autoencoder for facial video Representation LearnINg](https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper) (CVPR 2023) ([arXiv](https://arxiv.org/abs/2211.06627)). | |
| ## Use `transformers` (HuggingFace) for Feature Extraction | |
| Requirements: | |
| - Python | |
| - PyTorch | |
| - transformers | |
| - einops | |
| Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc). | |
| ```python | |
| import torch | |
| from transformers import AutoModel | |
| model = AutoModel.from_pretrained( | |
| "ControlNet/marlin_vit_base_ytf", # or other variants | |
| trust_remote_code=True | |
| ) | |
| tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W) | |
| output = model(tensor) # torch.Size([1, 1568, 384]) | |
| ``` | |
| ## License | |
| This project is under the CC BY-NC 4.0 license. See [LICENSE](LICENSE) for details. | |
| ## References | |
| If you find this work useful for your research, please consider citing it. | |
| ```bibtex | |
| @inproceedings{cai2022marlin, | |
| title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg}, | |
| author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar}, | |
| booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | |
| year = {2023}, | |
| month = {June}, | |
| pages = {1493-1504}, | |
| doi = {10.1109/CVPR52729.2023.00150}, | |
| publisher = {IEEE}, | |
| } | |
| ``` | |