{"id":404,"date":"2024-06-19T09:34:34","date_gmt":"2024-06-19T01:34:34","guid":{"rendered":"https:\/\/aitimes.link\/?p=404"},"modified":"2024-06-19T09:34:34","modified_gmt":"2024-06-19T01:34:34","slug":"hugging-face-huan-ying-stable-diffusion-3-jia-ru-diffusers","status":"publish","type":"post","link":"https:\/\/aitimes.link\/index.php\/2024\/06\/19\/hugging-face-huan-ying-stable-diffusion-3-jia-ru-diffusers\/","title":{"rendered":"[hugging Face]\u6b22\u8fce Stable Diffusion 3 \u52a0\u5165 Diffusers"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">SD3 \u65b0\u7279\u6027<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u6a21\u578b<\/h3>\n\n\n\n<p>\u4f5c\u4e3a\u4e00\u4e2a\u9690\u53d8\u91cf\u6269\u6563\u6a21\u578b\uff0cSD3 \u5305\u542b\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6587\u672c\u7f16\u7801\u5668 (<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/openai\/clip-vit-large-patch14\" target=\"_blank\" rel=\"noreferrer noopener\">CLIP L\/14<\/a>\u3001<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/laion\/CLIP-ViT-bigG-14-laion2B-39B-b160k\" target=\"_blank\" rel=\"noreferrer noopener\">OpenCLIP bigG\/14<\/a>&nbsp;\u548c&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/google\/t5-v1_1-xxl\" target=\"_blank\" rel=\"noreferrer noopener\">T5-v1.1-XXL<\/a>) \u3001\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u591a\u6a21\u6001 Diffusion Transformer (MMDiT) \u6a21\u578b\uff0c\u4ee5\u53ca\u4e00\u4e2a 16 \u901a\u9053\u7684 AutoEncoder \u6a21\u578b (\u4e0e&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/arxiv.org\/abs\/2307.01952\" target=\"_blank\" rel=\"noreferrer noopener\">Stable Diffusion XL<\/a>&nbsp;\u4e2d\u7684\u7c7b\u4f3c)\u3002<\/p>\n\n\n\n<p>SD3 \u4ee5\u5e8f\u5217 Embedding \u7684\u5f62\u5f0f\u5904\u7406\u6587\u672c\u8f93\u5165\u548c\u89c6\u89c9\u9690\u7a7a\u95f4\u7279\u5f81\u3002\u4f4d\u7f6e\u7f16\u7801 (Positional Encoding) \u662f\u65bd\u52a0\u5728\u9690\u7a7a\u95f4\u7279\u5f81\u7684 2&#215;2 patch \u4e0a\u7684\uff0c\u968f\u540e\u88ab\u5c55\u5f00\u6210 patch \u7684 Enbedding \u5e8f\u5217\u3002\u8fd9\u4e00\u5e8f\u5217\u548c\u6587\u672c\u7684\u7279\u5f81\u5e8f\u5217\u4e00\u8d77\uff0c\u88ab\u9001\u5165 MMDiT \u7684\u5404\u4e2a\u6a21\u5757\u4e2d\u53bb\u3002\u4e24\u79cd\u7279\u5f81\u5e8f\u5217\u88ab\u8f6c\u5316\u6210\u76f8\u540c\u7279\u5f81\u7ef4\u5ea6\uff0c\u62fc\u63a5\u5728\u4e00\u8d77\uff0c\u7136\u540e\u9001\u5165\u4e00\u7cfb\u5217\u6ce8\u610f\u529b\u673a\u5236\u6a21\u5757\u548c\u591a\u5c42\u611f\u77e5\u673a (MLP) \u91cc\u3002<\/p>\n\n\n\n<p>\u4e3a\u5e94\u5bf9\u4e24\u79cd\u6a21\u6001\u95f4\u7684\u5dee\u5f02\uff0cMMDiT \u6a21\u5757\u4f7f\u7528\u4e24\u7ec4\u4e0d\u540c\u7684\u6743\u91cd\u53bb\u8f6c\u6362\u6587\u672c\u548c\u56fe\u50cf\u5e8f\u5217\u7684\u7279\u5f81\u7ef4\u5ea6\u3002\u4e24\u4e2a\u5e8f\u5217\u4e4b\u540e\u4f1a\u5728\u6ce8\u610f\u529b\u64cd\u4f5c\u4e4b\u524d\u88ab\u5408\u5e76\u5728\u4e00\u8d77\u3002\u8fd9\u79cd\u8bbe\u8ba1\u4f7f\u5f97\u4e24\u79cd\u8868\u5f81\u80fd\u5728\u81ea\u5df1\u7684\u7279\u5f81\u7a7a\u95f4\u91cc\u5de5\u4f5c\uff0c\u540c\u65f6\u4e5f\u4f7f\u5f97\u5b83\u4eec\u4e4b\u95f4\u53ef\u4ee5\u901a\u8fc7\u6ce8\u610f\u529b\u673a\u5236 [1] \u4ece\u5bf9\u65b9\u7684\u7279\u5f81\u4e2d\u63d0\u53d6\u6709\u7528\u7684\u4fe1\u606f\u3002\u8fd9\u79cd\u6587\u672c\u548c\u56fe\u50cf\u95f4\u53cc\u5411\u7684\u4fe1\u606f\u6d41\u52a8\u6709\u522b\u4e8e\u4ee5\u524d\u7684\u6587\u751f\u56fe\u6a21\u578b\uff0c\u540e\u8005\u7684\u6587\u672c\u4fe1\u606f\u662f\u901a\u8fc7 cross-attention \u9001\u5165\u6a21\u578b\u7684\uff0c\u4e14\u4e0d\u540c\u5c42\u8f93\u5165\u7684\u6587\u672c\u7279\u5f81\u5747\u662f\u6587\u672c\u7f16\u7801\u5668\u7684\u8f93\u51fa\uff0c\u4e0d\u968f\u6df1\u5ea6\u7684\u53d8\u5316\u800c\u6539\u53d8\u3002<\/p>\n\n\n\n<p>\u6b64\u5916\uff0cSD3 \u8fd8\u5728\u65f6\u95f4\u6b65 (timestep) \u8fd9\u4e00\u6761\u4ef6\u4fe1\u606f\u4e0a\u52a0\u5165\u4e86\u6c47\u5408\u8fc7\u7684\u6587\u672c\u7279\u5f81\uff0c\u8fd9\u4e9b\u6587\u672c\u7279\u5f81\u6765\u81ea\u4f7f\u7528\u7684\u4e24\u4e2a CLIP \u6a21\u578b\u3002\u8fd9\u4e9b\u6c47\u5408\u8fc7\u7684\u6587\u672c\u7279\u5f81\u88ab\u62fc\u63a5\u5728\u4e00\u8d77\uff0c\u7136\u540e\u52a0\u5230\u65f6\u95f4\u6b65\u7684 Embedding \u4e0a\uff0c\u518d\u9001\u5165\u6bcf\u4e2a MMDiT \u6a21\u5757\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u4f7f\u7528 Rectified Flow Matching \u8bad\u7ec3<\/h3>\n\n\n\n<p>\u9664\u4e86\u7ed3\u6784\u4e0a\u7684\u521b\u65b0\uff0cSD3 \u4e5f\u4f7f\u7528\u4e86&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/arxiv.org\/html\/2403.03206v1%23S2\" target=\"_blank\" rel=\"noreferrer noopener\">conditional flow-matching<\/a>&nbsp;\u4f5c\u4e3a\u8bad\u7ec3\u76ee\u6807\u51fd\u6570\u6765\u8bad\u7ec3\u6a21\u578b\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e2d\uff0c\u524d\u5411\u52a0\u566a\u8fc7\u7a0b\u88ab\u5b9a\u4e49\u4e3a\u4e00\u4e2a&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/arxiv.org\/html\/2403.03206v1%23S3\" target=\"_blank\" rel=\"noreferrer noopener\">rectified flow<\/a>\uff0c\u4ee5\u4e00\u6761\u76f4\u7ebf\u8fde\u63a5\u6570\u636e\u5206\u5e03\u548c\u566a\u58f0\u5206\u5e03\u3002<\/p>\n\n\n\n<p>\u91c7\u6837\u8fc7\u7a0b\u4e5f\u53d8\u5f97\u66f4\u7b80\u5355\u4e86\uff0c\u5f53\u91c7\u6837\u6b65\u6570\u51cf\u5c11\u7684\u65f6\u5019\uff0c\u6a21\u578b\u6027\u80fd\u4e5f\u5f88\u7a33\u5b9a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4e5f\u5f15\u5165\u4e86\u65b0\u7684 scheduler (&nbsp;<code>FlowMatchEulerDiscreteScheduler<\/code>&nbsp;)\uff0c\u96c6\u6210\u4e86 rectified flow-matching \u7684\u8fd0\u7b97\u516c\u5f0f\u4ee5\u53ca\u6b27\u62c9\u65b9\u6cd5 (Euler Method) \u7684\u91c7\u6837\u6b65\u9aa4\u3002\u540c\u65f6\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e0e\u751f\u6210\u5206\u8fa8\u7387\u76f8\u5173\u7684&nbsp;<code>shift<\/code>&nbsp;\u53c2\u6570\u3002\u5bf9\u4e8e\u9ad8\u5206\u8fa8\u7387\uff0c\u589e\u5927&nbsp;<code>shift<\/code>&nbsp;\u7684\u503c\u53ef\u4ee5\u66f4\u597d\u5730\u5904\u7406 noise scaling\u3002\u9488\u5bf9 2B \u6a21\u578b\uff0c\u6211\u4eec\u5efa\u8bae\u8bbe\u7f6e&nbsp;<code>shift=3.0<\/code>&nbsp;\u3002<\/p>\n\n\n\n<p>\u5982\u60f3\u5feb\u901f\u5c1d\u8bd5 SD3\uff0c\u53ef\u4ee5\u4f7f\u7528\u4e0b\u9762\u7684\u4e00\u4e2a\u57fa\u4e8e Gradio \u7684\u5e94\u7528:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/pic4.zhimg.com\/80\/v2-b8b494b6192739d4f449d0175a9169ab_1440w.webp\" alt=\"\"\/><figcaption class=\"wp-element-caption\">stabilityai\/stable-diffusion-3-medium<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">\u5728 Diffusers \u4e2d\u4f7f\u7528 SD3<\/h2>\n\n\n\n<p>\u5982\u60f3\u5728 diffusers \u4e2d\u4f7f\u7528 SD3\uff0c\u9996\u5148\u8bf7\u786e\u4fdd\u5b89\u88c5\u7684 diffusers \u662f\u6700\u65b0\u7248\u672c:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install --upgrade diffusers<\/code><\/pre>\n\n\n\n<p>\u4f7f\u7528\u6a21\u578b\u524d\uff0c\u4f60\u9700\u8981\u5148\u5230&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/stabilityai\/stable-diffusion-3-medium-diffusers\" target=\"_blank\" rel=\"noreferrer noopener\">Stable Diffusion 3 Medium \u5728 Hugging Face \u7684\u9875\u9762<\/a>\uff0c\u586b\u5199\u8868\u683c\u5e76\u540c\u610f\u76f8\u5173\u5185\u5bb9\u3002\u4e00\u5207\u5c31\u7eea\u540e\uff0c\u4f60\u9700\u8981\u767b\u5f55\u4f60\u7684 huggingface \u8d26\u53f7:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>huggingface-cli login<\/code><\/pre>\n\n\n\n<p>\u4e0b\u9762\u7a0b\u5e8f\u5c06\u4f1a\u4e0b\u8f7d SD3 \u7684 2B \u53c2\u6570\u6a21\u578b\uff0c\u5e76\u4f7f\u7528&nbsp;<code>fp16<\/code>&nbsp;\u7cbe\u5ea6\u3002Stability AI \u539f\u672c\u53d1\u5e03\u7684\u6a21\u578b\u7cbe\u5ea6\u5c31\u662f&nbsp;<code>fp16<\/code>&nbsp;\uff0c\u8fd9\u4e5f\u662f\u63a8\u8350\u7684\u6a21\u578b\u63a8\u7406\u7cbe\u5ea6\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u6587\u751f\u56fe<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Pipeline\n\npipe = StableDiffusion3Pipeline.from_pretrained(\"stabilityai\/stable-diffusion-3-medium-diffusers\", torch_dtype=torch.float16)\npipe = pipe.to(\"cuda\")\n\nimage = pipe(\n \"A cat holding a sign that says hello world\",\n negative_prompt=\"\",\n    num_inference_steps=28,\n    guidance_scale=7.0,\n).images&#91;0]\nimage<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/pic1.zhimg.com\/80\/v2-2a833b18db87861369887522f7bf6500_1440w.webp\" alt=\"\"\/><figcaption class=\"wp-element-caption\">hello_world_cat<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">\u56fe\u751f\u56fe<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Img2ImgPipeline\nfrom diffusers.utils import load_image\n\npipe = StableDiffusion3Img2ImgPipeline.from_pretrained(\"stabilityai\/stable-diffusion-3-medium-diffusers\", torch_dtype=torch.float16)\npipe = pipe.to(\"cuda\")\n\ninit_image = load_image(\"https:\/\/huggingface.co\/datasets\/huggingface\/documentation-images\/resolve\/main\/diffusers\/cat.png\")\nprompt = \"cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k\"\nimage = pipe(prompt, image=init_image).images&#91;0]\nimage<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/pic2.zhimg.com\/80\/v2-95836571ceeedad483ab1fd45a8618f5_1440w.webp\" alt=\"\"\/><figcaption class=\"wp-element-caption\">wizard_cat<\/figcaption><\/figure>\n\n\n\n<p>\u76f8\u5173\u7684 SD3 \u6587\u6863\u53ef\u5728&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/docs\/diffusers\/main\/en\/api\/pipelines\/stable_diffusion\/stable_diffusion_3\" target=\"_blank\" rel=\"noreferrer noopener\">\u8fd9\u91cc<\/a>&nbsp;\u67e5\u770b\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u5bf9 SD3 \u8fdb\u884c\u5185\u5b58\u4f18\u5316<\/h2>\n\n\n\n<p>SD3 \u4f7f\u7528\u4e86\u4e09\u4e2a\u6587\u672c\u7f16\u7801\u5668\uff0c\u5176\u4e2d\u4e00\u4e2a\u662f&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/google\/t5-v1_1-xxl\" target=\"_blank\" rel=\"noreferrer noopener\">T5-XXL model<\/a>\uff0c\u662f\u4e00\u4e2a\u5f88\u5927\u7684\u6a21\u578b\u3002\u8fd9\u4f7f\u5f97\u5728\u663e\u5b58\u5c0f\u4e8e 24GB \u7684 GPU \u4e0a\u8dd1\u6a21\u578b\u975e\u5e38\u56f0\u96be\uff0c\u5373\u4f7f\u4f7f\u7528\u7684\u662f&nbsp;<code>fp16<\/code>&nbsp;\u7cbe\u5ea6\u3002<\/p>\n\n\n\n<p>\u5bf9\u6b64\uff0cdiffusers \u96c6\u6210\u4e86\u4e00\u4e9b\u5185\u5b58\u4f18\u5316\u624b\u6bb5\uff0c\u6765\u8ba9 SD3 \u80fd\u5728\u66f4\u591a\u7684 GPU \u4e0a\u8dd1\u8d77\u6765\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u4f7f\u7528 Model Offloading \u63a8\u7406<\/h3>\n\n\n\n<p>Diffusers \u4e0a\u4e00\u4e2a\u6700\u5e38\u7528\u7684\u5185\u5b58\u4f18\u5316\u624b\u6bb5\u5c31\u662f model offloading\u3002\u5b83\u4f7f\u5f97\u4f60\u53ef\u4ee5\u5728\u63a8\u7406\u65f6\uff0c\u628a\u4e00\u4e9b\u5f53\u524d\u4e0d\u9700\u8981\u7684\u6a21\u578b\u7ec4\u4ef6\u5378\u8f7d\u5230 CPU \u4e0a\uff0c\u4ee5\u6b64\u8282\u7701 GPU \u663e\u5b58\u3002\u4f46\u8fd9\u4f1a\u5f15\u5165\u5c11\u91cf\u7684\u63a8\u7406\u65f6\u957f\u589e\u957f\u3002\u5728\u63a8\u7406\u65f6\uff0cmodel offloading \u53ea\u4f1a\u5c06\u6a21\u578b\u5f53\u524d\u9700\u8981\u53c2\u4e0e\u8ba1\u7b97\u7684\u90e8\u5206\u653e\u5728 GPU \u4e0a\uff0c\u800c\u628a\u5269\u4f59\u90e8\u5206\u653e\u5728 CPU \u4e0a\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Pipeline\n\npipe = StableDiffusion3Pipeline.from_pretrained(\"stabilityai\/stable-diffusion-3-medium-diffusers\", torch_dtype=torch.float16)\npipe.enable_model_cpu_offload()\n\nprompt = \"smiling cartoon dog sits at a table, coffee mug on hand, as a room goes up in flames. \u201cThis is fine,\u201d the dog assures himself.\"\nimage = pipe(prompt).images&#91;0]<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">\u4e0d\u4f7f\u7528 T5 \u6a21\u578b\u8fdb\u884c\u63a8\u7406<\/h3>\n\n\n\n<p><a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/arxiv.org\/html\/2403.03206v1%23S5.F9\" target=\"_blank\" rel=\"noreferrer noopener\">\u63a8\u7406\u65f6\u79fb\u9664\u6389 4.7B \u53c2\u6570\u91cf\u7684 T5-XXL \u6587\u672c\u7f16\u7801\u5668<\/a>&nbsp;\u53ef\u4ee5\u5f88\u5927\u7a0b\u5ea6\u5730\u51cf\u5c11\u5185\u5b58\u9700\u6c42\uff0c\u5e26\u6765\u7684\u6027\u80fd\u635f\u5931\u5374\u5f88\u5c0f\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Pipeline\n\npipe = StableDiffusion3Pipeline.from_pretrained(\"stabilityai\/stable-diffusion-3-medium-diffusers\", text_encoder_3=None, tokenizer_3=None, torch_dtype=torch.float16)\npipe = pipe.to(\"cuda\")\n\nprompt = \"smiling cartoon dog sits at a table, coffee mug on hand, as a room goes up in flames. \u201cThis is fine,\u201d the dog assures himself.\"\nimage = pipe(\"\").images&#91;0]<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">\u4f7f\u7528\u91cf\u5316\u7248\u7684 T5-XXL \u6a21\u578b<\/h2>\n\n\n\n<p>\u4f7f\u7528&nbsp;<code>bitsandbytes<\/code>&nbsp;\u8fd9\u4e2a\u5e93\uff0c\u4f60\u4e5f\u53ef\u4ee5\u52a0\u8f7d 8 \u6bd4\u7279\u91cf\u5316\u7248\u7684 T5-XXL \u6a21\u578b\uff0c\u8fdb\u4e00\u6b65\u51cf\u5c11\u663e\u5b58\u9700\u6c42\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Pipeline\nfrom transformers import T5EncoderModel, BitsAndBytesConfig\n\n<em># Make sure you have `bitsandbytes` installed.<\/em>\nquantization_config = BitsAndBytesConfig(load_in_8bit=True)\n\nmodel_id = \"stabilityai\/stable-diffusion-3-medium-diffusers\"\ntext_encoder = T5EncoderModel.from_pretrained(\n    model_id,\n    subfolder=\"text_encoder_3\",\n    quantization_config=quantization_config,\n)\npipe = StableDiffusion3Pipeline.from_pretrained(\n    model_id,\n    text_encoder_3=text_encoder,\n    device_map=\"balanced\",\n    torch_dtype=torch.float16\n)<\/code><\/pre>\n\n\n\n<p><em>\u5b8c\u6574\u4ee3\u7801\u5728&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/gist.github.com\/sayakpaul\/82acb5976509851f2db1a83456e504f1\" target=\"_blank\" rel=\"noreferrer noopener\">\u8fd9\u91cc<\/a>\u3002<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u663e\u5b58\u4f18\u5316\u5c0f\u7ed3<\/h3>\n\n\n\n<p>\u6240\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u90fd\u7528\u4e86 2B \u53c2\u6570\u91cf\u7684 SD3 \u6a21\u578b\uff0c\u6d4b\u8bd5\u5728\u4e00\u4e2a A100-80G \u4e0a\u8fdb\u884c\uff0c\u4f7f\u7528&nbsp;<code>fp16<\/code>&nbsp;\u7cbe\u5ea6\u63a8\u7406\uff0cPyTorch \u7248\u672c\u4e3a 2.3\u3002<\/p>\n\n\n\n<p>\u6211\u4eec\u5bf9\u6bcf\u4e2a\u63a8\u7406\u8c03\u7528\u8dd1\u5341\u6b21\uff0c\u8bb0\u5f55\u5e73\u5747\u5cf0\u503c\u663e\u5b58\u7528\u91cf\u548c 20 \u6b65\u91c7\u6837\u7684\u5e73\u5747\u65f6\u957f\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SD3 \u6027\u80fd\u4f18\u5316<\/h2>\n\n\n\n<p>\u4e3a\u52a0\u901f\u63a8\u7406\uff0c\u6211\u4eec\u53ef\u4ee5\u4f7f\u7528&nbsp;<code>torch.compile()<\/code>&nbsp;\u6765\u83b7\u53d6\u4f18\u5316\u8fc7\u7684&nbsp;<code>vae<\/code>&nbsp;\u548c&nbsp;<code>transformer<\/code>&nbsp;\u90e8\u5206\u7684\u8ba1\u7b97\u56fe\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\nfrom diffusers import StableDiffusion3Pipeline\n\ntorch.set_float32_matmul_precision(\"high\")\n\ntorch._inductor.config.conv_1x1_as_mm = True\ntorch._inductor.config.coordinate_descent_tuning = True\ntorch._inductor.config.epilogue_fusion = False\ntorch._inductor.config.coordinate_descent_check_all_directions = True\n\npipe = StableDiffusion3Pipeline.from_pretrained(\n    \"stabilityai\/stable-diffusion-3-medium-diffusers\",\n    torch_dtype=torch.float16\n).to(\"cuda\")\npipe.set_progress_bar_config(disable=True)\n\npipe.transformer.to(memory_format=torch.channels_last)\npipe.vae.to(memory_format=torch.channels_last)\n\npipe.transformer = torch.compile(pipe.transformer, mode=\"max-autotune\", fullgraph=True)\npipe.vae.decode = torch.compile(pipe.vae.decode, mode=\"max-autotune\", fullgraph=True)\n\n<em># Warm Up<\/em>\nprompt = \"a photo of a cat holding a sign that says hello world\",\nfor _ in range(3):\n _ = pipe(prompt=prompt, generator=torch.manual_seed(1))\n\n<em># Run Inference<\/em>\nimage = pipe(prompt=prompt, generator=torch.manual_seed(1)).images&#91;0]\nimage.save(\"sd3_hello_world.png\")<\/code><\/pre>\n\n\n\n<p><em>\u5b8c\u6574\u4ee3\u7801\u53ef\u53c2\u8003&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/gist.github.com\/sayakpaul\/508d89d7aad4f454900813da5d42ca97\" target=\"_blank\" rel=\"noreferrer noopener\">\u8fd9\u91cc<\/a>\u3002<\/em><\/p>\n\n\n\n<p>\u6211\u4eec\u6d4b\u91cf\u4e86\u4f7f\u7528\u8fc7&nbsp;<code>torch.compile()<\/code>&nbsp;\u7684 SD3 \u7684\u63a8\u7406\u901f\u5ea6 (\u5728 A100-80G \u4e0a\uff0c\u4f7f\u7528&nbsp;<code>fp16<\/code>&nbsp;\u63a8\u7406\uff0cPyTorch \u7248\u672c\u4e3a 2.3)\u3002\u6211\u4eec\u9488\u5bf9\u6bcf\u4e2a\u751f\u6210\u4efb\u52a1\u8dd1 10 \u904d\uff0c\u6bcf\u6b21\u63a8\u7406\u4f7f\u7528 20 \u6b65\u91c7\u6837\u3002\u5e73\u5747\u63a8\u7406\u8017\u65f6\u662f&nbsp;<strong>0.585 \u79d2<\/strong>\uff0c&nbsp;<em>\u8fd9\u6bd4 eager execution \u6a21\u5f0f\u4e0b\u5feb\u4e86\u56db\u500d<\/em>&nbsp;\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u4f7f\u7528 DreamBooth \u548c LoRA \u8fdb\u884c\u5fae\u8c03<\/h2>\n\n\n\n<p>\u6700\u540e\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4f7f\u7528&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/blog\/lora\" target=\"_blank\" rel=\"noreferrer noopener\">LoRA<\/a>&nbsp;\u7684&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/dreambooth.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">DreamBooth<\/a>&nbsp;\u4ee3\u7801\uff0c\u7528\u4e8e\u5fae\u8c03 SD3\u3002\u8fd9\u4e00\u7a0b\u5e8f\u4e0d\u4ec5\u80fd\u5fae\u8c03\u6a21\u578b\uff0c\u8fd8\u80fd\u4f5c\u4e3a\u4e00\u4e2a\u53c2\u8003\uff0c\u5982\u679c\u4f60\u60f3\u4f7f\u7528 rectified flow \u6765\u8bad\u7ec3\u6a21\u578b\u3002\u5f53\u7136\uff0c\u70ed\u95e8\u7684 rectified flow \u5b9e\u73b0\u4ee3\u7801\u8fd8\u6709&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/github.com\/cloneofsimo\/minRF\/\" target=\"_blank\" rel=\"noreferrer noopener\">minRF<\/a>\u3002<\/p>\n\n\n\n<p>\u5982\u679c\u9700\u8981\u4f7f\u7528\u8be5\u7a0b\u5e8f\uff0c\u9996\u5148\u9700\u8981\u786e\u4fdd\u5404\u9879\u8bbe\u7f6e\u90fd\u5df2\u5b8c\u6210\uff0c\u540c\u65f6\u51c6\u5907\u597d\u4e00\u4e2a\u6570\u636e\u96c6 (\u6bd4\u5982&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/datasets\/diffusers\/dog-example\" target=\"_blank\" rel=\"noreferrer noopener\">\u8fd9\u4e2a<\/a>)\u3002\u4f60\u9700\u8981\u5b89\u88c5&nbsp;<code>peft<\/code>&nbsp;\u548c&nbsp;<code>bitsandbytes<\/code>&nbsp;\uff0c\u7136\u540e\u518d\u5f00\u59cb\u8fd0\u884c\u8bad\u7ec3\u7a0b\u5e8f:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>export MODEL_NAME=\"stabilityai\/stable-diffusion-3-medium-diffusers\"\nexport INSTANCE_DIR=\"dog\"\nexport OUTPUT_DIR=\"dreambooth-sd3-lora\"\n\naccelerate launch train_dreambooth_lora_sd3.py \\\n  --pretrained_model_name_or_path=${MODEL_NAME} \\\n  --instance_data_dir=${INSTANCE_DIR} \\\n  --output_dir=\/raid\/.cache\/${OUTPUT_DIR} \\\n  --mixed_precision=\"fp16\" \\\n  --instance_prompt=\"a photo of sks dog\" \\\n  --resolution=1024 \\\n  --train_batch_size=1 \\\n  --gradient_accumulation_steps=4 \\\n  --learning_rate=1e-5 \\\n  --report_to=\"wandb\" \\\n  --lr_scheduler=\"constant\" \\\n  --lr_warmup_steps=0 \\\n  --max_train_steps=500 \\\n  --weighting_scheme=\"logit_normal\" \\\n  --validation_prompt=\"A photo of sks dog in a bucket\" \\\n  --validation_epochs=25 \\\n  --seed=\"0\" \\\n  --push_to_hub<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">\u58f0\u660e<\/h2>\n\n\n\n<p>\u611f\u8c22 Stability AI \u56e2\u961f\u5f00\u53d1\u5e76\u5f00\u6e90\u4e86 Stable Diffusion 3 \u5e76\u8ba9\u6211\u4eec\u63d0\u65e9\u4f53\u9a8c\uff0c\u4e5f\u611f\u8c22&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/huggingface.co\/linoyts\" target=\"_blank\" rel=\"noreferrer noopener\">Linoy<\/a>&nbsp;\u5bf9\u64b0\u5199\u6b64\u6587\u7684\u5e2e\u52a9\u3002<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u539f\u6587\u94fe\u63a5:&nbsp;<a href=\"https:\/\/link.zhihu.com\/?target=https%3A\/\/hf.co\/blog\/sd3\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/hf.co\/blog\/sd3<\/a><br>\u539f\u6587\u4f5c\u8005: Dhruv Nair, YiYi Xu, Sayak Paul, Alvaro Somoza, Kashif Rasul, Apolin\u00e1rio from multimodal AI art<br>\u8bd1\u8005: hugging-hoi2022<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>SD3 \u65b0\u7279\u6027 \u6a21\u578b \u4f5c\u4e3a\u4e00\u4e2a\u9690\u53d8\u91cf\u6269\u6563\u6a21\u578b\uff0cSD3 \u5305\u542b\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6587\u672c\u7f16\u7801\u5668 (CLIP L\/14\u3001Ope [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-404","post","type-post","status-publish","format-standard","hentry","category-ai"],"_links":{"self":[{"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/posts\/404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/comments?post=404"}],"version-history":[{"count":1,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/posts\/404\/revisions"}],"predecessor-version":[{"id":405,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/posts\/404\/revisions\/405"}],"wp:attachment":[{"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/media?parent=404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/categories?post=404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aitimes.link\/index.php\/wp-json\/wp\/v2\/tags?post=404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}