--- license: apache-2.0 language: - en tags: - diffusion - text-to-image - latent-diffusion - pytorch pipeline_tag: text-to-image --- # SykoDiffusion V1.0 İlk versiyon latent diffusion modelim. CLIP text encoder ve VAE kullanarak metinden görüntü üretir. ## Model Detayları | Özellik | Değer | |---|---| | Parametre | ~100M | | Mimari | Latent Diffusion (U-Net) | | Eğitim Verisi | CC3M (~100k görsel) | | Eğitim Adımı | 20.000 step | | Çözünürlük | 256×256 | | Donanım | 2× NVIDIA T4 | ## Kullanım ```python import torch from diffusers import UNet2DConditionModel, AutoencoderKL, DDIMScheduler from transformers import CLIPTextModel, CLIPTokenizer from PIL import Image import numpy as np device = "cuda" if torch.cuda.is_available() else "cpu" unet = UNet2DConditionModel.from_pretrained("SykoSLM/SykoDiffusion-V1.0").to(device).half() vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(device).half() clip = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device).half() tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False) @torch.no_grad() def generate(prompt, steps=30, cfg=7.5, seed=42): torch.manual_seed(seed) tokens = tokenizer(prompt, padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device) text_emb = clip(**tokens).last_hidden_state neg_tokens = tokenizer("", padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device) neg_emb = clip(**neg_tokens).last_hidden_state emb = torch.cat([neg_emb, text_emb]) latents = torch.randn(1, 4, 32, 32, device=device, dtype=torch.float16) scheduler.set_timesteps(steps) for t in scheduler.timesteps: pred = unet(torch.cat([latents]*2), t, encoder_hidden_states=emb).sample neg_p, text_p = pred.chunk(2) pred = neg_p + cfg * (text_p - neg_p) latents = scheduler.step(pred, t, latents).prev_sample image = vae.decode(latents / vae.config.scaling_factor).sample image = (image.clamp(-1,1)+1)/2 image = (image[0].permute(1,2,0).cpu().float().numpy()*255).astype("uint8") return Image.fromarray(image) img = generate("a cat sitting on a chair") img.save("output.png") ``` ## Notlar - Bu model deneysel bir ilk versiyondur, üretim kalitesi sınırlı olabilir. - En iyi sonuç için `cfg` değerini 5–10 arasında deneyin. - İngilizce prompt önerilir. ## Geliştirici [@SykoSLM](https://huggingface.co/SykoSLM)