SD3.5为何有点峰味?刚开发布会,隔壁不是明星出G,就是爆大瓜,你们是不是针对我SD?但是开源SD yyds。240亿参数?就算放出模型,不知道我显卡顶不顶得住,更不知道24G顶部顶得住。
昨晚直播的时候,对比了所有模型的生态,还在说Flux的120亿参数是当下参数量最大的文生图模型,结果就在Flux的模型流行基准测试表看到了熟悉的身影,240亿参数当下最接近Midjourney风格的模型V3版本,你知道是什么吗?为什么说它是最接近MJ的模型呢?
总结:超级牛X,240亿参数量,经过GPT-4o超越FLux pro,文本理解能力推理渲染能力NO.1。
V2.5可用,当下最接近MJ风格的模型,3万张来自MJ素材的10个艺术领域
最新版模型加工作流链接:https://pan.quark.cn/s/407d6b252088
直播回放
我们先看样例:
1.Playground v3 是什么?
低调使用 SDXL Playground免费白嫖可直接使用Dall-E 3 和 Google Imagen
春晚龙辰辰被质疑AI * 微软Copilot上手体验 | Playground v2 发布 它在生成效果上比SDXL强2.5倍
更新点:简化的模型架构、噪声调度和新的变分自动编码器 (VAE)。新颖的文本到图像模型结构与 LLM 深度集成,充分利用 LLM 的内部提示理解来实现最先进的提示跟踪性能。包括在训练后阶段使用多级字幕和模型合并。
PGv3在文本合成准确性上得分82%,高于Flux-pro的69%和Ideogram-2的75%,显示出PGv3在遵循提示和文本渲染方面的优越性
Playground v3(PGv3)是由Playground Research推出的最新文本到图像模型,具备以下特点和功能:
技术基础:PGv3基于深度融合的大型语言模型(LLM)技术,拥有240亿参数量,能够精确理解和生成复杂的图像内容。
图形设计能力:在图形设计任务上,PGv3展现出超越人类设计师的能力,特别是在表情包、海报和logo设计等常见设计应用中。
RGB颜色控制:PGv3支持精确的RGB颜色控制,可以生成具有特定颜色要求的图像,适合需要精确颜色匹配的专业设计场景。
多语言支持:PGv3能理解和生成多种语言的文本,满足不同语言用户的需求。
模型架构:PGv3是一个潜扩散模型(LDM),使用变分自编码器(VAE)和经验扩散模型(EDM)进行训练。它采用了与语言模型中对应的Transformer块相同的结构,增强了提示理解和遵循能力。
新基准CapsBench:PGv3引入了新的基准CapsBench,用于评估详细的图像描述性能,推动图像描述评估方法的发展。
性能表现:实验结果显示,PGv3在文本提示遵循、复杂推理和文本渲染准确率方面表现出色。
应用场景:PGv3可应用于图形设计、内容创作、游戏开发、电影和娱乐、广告行业、教育和研究以及艺术创作等多个领域。
2.模型结构
PGv3 采用 DiT 风格模型结构。我们图像模型中的每个 transformer 块都设置为与我们使用的 LLM 中的相应块相同,在本例中为 Llama3-8B,包括匹配参数,例如隐藏维度大小、注意力头数量和注意力头尺寸。我们只训练了图像模型组件,而 LLM 保持不变。请注意,在扩散采样过程中,网络的 LLM 部分只需要运行一次即可生成所需的所有中间隐藏嵌入。
3.基准测试
模型具有更好的文本准确性,OCR-F 评分为 40.35,高于 Flux-pro 的 35.28
A vibrant photograph captures seven knights riding horses through an outdoor medieval reenactment event, positioned in front of a large crowd and tall trees under clear blue skies. The scene is bathed in natural daylight, creating high contrast between the brightly colored costumes and the verdant surroundings. Each knight wears ornate armor and carries distinctive flags with various crests and emblems, including symbols such as lions, dragons, eagles, and heraldic designs. From left to right, the first knight rides a white horse adorned with black and gold armor and a cape featuring circular patterns, holding a black flag with a red lion emblem on a wooden staff. Next is a knight on a gray horse wearing golden armor with intricate designs and a horned silver helmet, carrying a yellow flag with a brown eagle. The third knight, mounted on a brown horse with a white blaze, dons black chainmail armor with gold accents and a white cape with small triangular cut-outs, bearing a white banner displaying a black eagle or dragon silhouette. The fourth knight rides a dark brown steed with a white blaze, clad in elaborate red and gold armor with a flowing crimson cloak, holding a red flag with golden dragon motifs. The fifth knight, atop a light brown horse with a white blaze, sports navy blue armor with red and green trimmings and a forest green cape, bearing a purple flag with a red lion design. The sixth knight, mounted on a black horse, wears metallic silver armor with prominent shoulder plates, a closed visor helmet, and a cape with intricate embroidery, carrying a black flag with a white stag or bull symbol. The seventh knight, on a black horse, completes the lineup with charcoal gray armor embellished with brass details, a unique horn-like helmet, and a cape featuring alternating vertical stripes of royal blue and forest green, topped with a large feather plume in vivid red, yellow, and green hues. Behind them, spectators fill wooden benches and stand areas, attentively watching the procession. Large banners with various designs and text are visible among the crowd, one reading ’Deutscher Ritterturnier’ in white serif font on a blue and white striped background. In the background, tall trees provide a lush backdrop, while large tents with conical roofs display colorful shields with various heraldry symbols, including blue and white triangles over red lines on beige backgrounds. One tent features a shield with red and white stripes. The overall mood is festive, celebratory, and historical, capturing the essence of a lively medieval renaissance festival where the past meets present in a picturesque setting
A breathtaking night landscape photograph captures a lone figure standing atop a jagged rock formation on a dark sandy beach with scattered pebbles, gazing up at the magnificent Milky Way galaxy spanning diagonally across the sky. The foreground features a rough-textured rock with visible striations and crevices, its weathered surface in shades of brown and gray. The midground showcases the person dressed in a checkered shirt and pants, silhouetted against the soft glow of ambient light from an unseen horizon. To the left stands a massive, rugged cliff with uneven surfaces, visible cracks, and patches of green vegetation clinging to its slopes. On the right side of the midground, another distinctive rock structure extends out into the calm sea, partially submerged and silhouetted against the starry backdrop. The background is dominated by the Milky Way’s vibrant display of colors, including deep navy blue, emerald green, lavender purple, burnt orange, and white hues. A subtle aurora-like effect in pale yellow and green adds depth to the scene below the stars. The water appears smooth and glassy due to a long exposure technique, creating a serene atmosphere with gentle ripples reflecting the faint light above. Small rocks are scattered across the surface near the shore. Distant lights hint at nearby human activity along the coastline. The overall mood is awe-inspiring and tranquil, enhanced by the high contrast between the dark elements and the brightly illuminated stars, creating a harmonious blend of natural beauty and cosmic wonder.
总结:
我们推出了 Playground v3 (PGv3),这是我们最新的文本到图像模型,它在多个测试基准中实现了最先进的 (SoTA) 性能,在图形设计能力方面表现出色,并引入了新功能。与依赖于 T5 或 CLIP 文本编码器等预训练语言模型的传统文本到图像生成模型不同,我们的方法将大型语言模型 (LLM) 与一种新颖的结构完全集成,该结构完全利用来自仅解码器 LLM 的文本条件。此外,为了提高图像字幕质量,我们开发了一个内部字幕程序,能够生成具有不同细节级别的字幕,丰富文本结构的多样性。我们还引入了一个新的基准测试 CapsBench 来评估详细的图像字幕性能。实验结果表明,PGv3 在文本提示遵守、复杂推理和准确文本渲染方面表现出色。用户偏好研究表明,我们的模型在常见设计应用程序(如贴纸、海报和徽标设计)中具有超人性的图形设计能力。此外,PGv3 还引入了新功能,包括精确的 RGB 颜色控制和强大的多语言理解。