vibestack

IMAGE GENERATION (Modules 15–19)

⚡ VS field compares to the nearest peer. For a solo builder, image gen earns its keep in two places
VS field compares to the nearest peer. For a solo builder, image gen earns its keep in two places: app/marketing assets (landing pages, OG images, ads) and client deliverables. Prices verified June 2026.

MODULE 15: Midjourney CATEGORY: Image DEPTH: CORE ONE-LINE SUMMARY: The aesthetic leader — produces the most striking, art-directed images with the least prompt effort. Now on a web app (no longer Discord-only). VS NEAREST PEER (DALL-E/Flux): Best-looking output by default and superb style consistency, but historically weaker at legible text in images and at strict instruction-following. DALL-E is easier (in ChatGPT); Flux is more controllable/open. Midjourney wins on pure beauty. BEST FOR:

  • Hero images, brand moodboards, and marketing visuals that need to look premium.
  • Style-consistent sets (use --sref / character refs) for a cohesive brand look.
  • Selling "I'll make your brand visuals" without a designer. WEAKNESS: No free tier; GPU-hour metering means heavy use needs higher plans; text-in-image and precise layout control still lag Ideogram/Flux; subscription-only. COST: ⚠ verify — Basic $10/mo (~3.3 GPU hrs, ~200 imgs); Standard $30/mo (15 hrs, unlimited Relax); Pro $60/mo (Stealth/private); Mega $120/mo. ~20% off annual. No free trial. HANDS-ON TASK: Generate a 3-image brand set (logo-free hero, texture, lifestyle shot) using one --sref style seed so all three feel like one brand. That's a client-ready mini deliverable. GOTCHA: GPU hours, not image count, is the meter. Fast-mode generations on Basic run out quickly; learn Relax mode (Standard+) before you promise a client volume.

MODULE 16: DALL-E 3 CATEGORY: Image DEPTH: REFERENCE ONE-LINE SUMMARY: OpenAI's image model, baked into ChatGPT — generate images conversationally without leaving the chat you already pay for. VS NEAREST PEER (Midjourney): Lower ceiling on aesthetics, but unbeatable convenience and prompt-following inside ChatGPT — you can iterate in natural language ("make the sky moodier") in the same thread. Midjourney looks better; DALL-E is right there and obedient. BEST FOR:

  • Quick one-off images mid-conversation (blog header, rough concept, icon idea).
  • Non-designers who want "describe it and tweak it in plain English."
  • Zero extra subscription if you already have ChatGPT Plus. WEAKNESS: Aesthetic quality below Midjourney/Flux; less style control; output can feel generic. Effectively superseded by newer in-ChatGPT image gen — know it as the convenient baseline. COST: Included with ChatGPT Plus ($20/mo); no separate plan needed. COMMUNICATION SHORTCUT: "DALL-E 3 is the image gen inside ChatGPT — I use it for quick, conversational one-offs, and switch to Midjourney or Flux when the visual actually has to be good." GOTCHA: It's convenient enough that you'll over-use it and ship mediocre visuals. For anything client-facing or brand-facing, step up to Midjourney/Flux/Ideogram.

MODULE 17: Flux CATEGORY: Image DEPTH: SKIM ONE-LINE SUMMARY: Black Forest Labs' high-quality model family with open weights and a clean pay-as-you-go API — the developer's image model. VS NEAREST PEER (Midjourney): Comparable quality with far more control and integration — call it from your own app via API, self-host the open variants, fine-tune. Midjourney is a closed product; Flux is infrastructure you build on. Better text rendering than classic Midjourney. BEST FOR:

  • Generating images programmatically inside your own SaaS (API, per-image pricing).
  • Self-hosted / open-weight pipelines where you need control or commercial clarity.
  • Building an image feature into a product you'll sell. WEAKNESS: Not a polished consumer app — it's an API/model; you build the UX. Quality tuning takes effort vs. Midjourney's "just works" beauty. COST: ⚠ verify — Pay-as-you-go API, $0.01/credit. FLUX.2 megapixel pricing (~$0.03/MP input ref, $0.07 first output MP). Variants: FLUX.2 [max] ~$0.05/img, [turbo] ~$0.0001/img. HANDS-ON TASK: Call the Flux API from a tiny FastAPI endpoint to generate an OG image from a title string. You now have a reusable "dynamic social image" feature for any micro-SaaS. GOTCHA: Per-megapixel pricing means a few large 4K generations cost more than many small ones. Size outputs to actual need (OG images don't need 4K).

MODULE 18: Ideogram CATEGORY: Image DEPTH: SKIM ONE-LINE SUMMARY: The image model that actually renders legible text — posters, logos, ads, and typography-in-image where others produce gibberish letters. VS NEAREST PEER (Midjourney/DALL-E): Its one decisive edge is text inside images — headlines, signage, packaging mockups, logo concepts. Midjourney looks prettier overall but mangles words; Ideogram nails the words. Use it specifically when text matters. BEST FOR:

  • Ad creatives, posters, and social graphics with real headline text baked in.
  • Logo and wordmark concepts; packaging/signage mockups.
  • Marketing assets for your own launches (build-in-public visuals with copy). WEAKNESS: General aesthetic ceiling below Midjourney; narrower use case (its whole pitch is text); free tier is throttled and public. COST: ⚠ verify — Free (10 slow credits/week, public images); Basic ~$7/mo; Plus ~$15/mo (private, ~1,000 prompts); Pro ~$42/mo (~3,000 prompts). API $0.025–$0.10/image. HANDS-ON TASK: Generate an ad creative for one of your products with the actual headline + CTA text rendered in the image. Compare the same prompt in Midjourney to see the text-rendering gap. GOTCHA: Free-tier images are public and the model is text-specialized — for a non-text artistic visual you'll get better results elsewhere. Pick Ideogram because of text, not by default.

MODULE 19: Stable Diffusion CATEGORY: Image DEPTH: REFERENCE ONE-LINE SUMMARY: The open-source image model you run yourself — total control via ComfyUI/Automatic1111, LoRAs, fine-tunes, and no per-image fees. VS NEAREST PEER (Flux/Midjourney): Maximum control and zero marginal cost once you have a GPU; supports custom-trained styles (LoRAs), inpainting, ControlNet, and uncensored use. Trade-off: real setup effort and you supply the compute. Flux is the newer-quality open option; SD is the mature, infinitely-tweakable ecosystem. BEST FOR:

  • Local, free, unlimited generation with full pipeline control (ComfyUI graphs).
  • Custom-trained styles/characters (LoRAs) for a consistent client or product look.
  • Privacy/uncensored or offline workflows where nothing leaves your machine. WEAKNESS: Steep setup; needs a decent GPU; quality requires tuning and good models/LoRAs; you maintain the stack. Not "open and type a prompt." COST: Free/OSS (you pay for GPU/compute). Cloud-hosted SD endpoints are pay-per-image if you don't self-host. COMMUNICATION SHORTCUT: "Stable Diffusion is the open-source, self-hosted image stack — I reach for it when I need custom-trained styles, full ComfyUI pipeline control, or zero per-image cost, and I'm willing to run the GPU." GOTCHA: The power is in the ecosystem (LoRAs, ControlNet, ComfyUI), not the base model. Without learning those, vanilla SD underperforms Midjourney/Flux. Budget learning time before promising client work.

Image tools — at a glance

ToolKiller traitOpen/API?Cost reflexReach for it when
MidjourneyBest aestheticsclosed app$10–120/mopremium brand visuals
DALL-E 3In-ChatGPT conveniencein ChatGPTincl. Plusquick conversational one-offs
FluxQuality + open + APIopen + API$0.01/creditimage gen inside your app
IdeogramLegible text in imageapp + APIfree–$42/moads/posters/logos with text
Stable DiffusionTotal local controlopen/self-hostfree + GPUcustom LoRAs, offline, unlimited

Sources (verified June 2026)

Next: 04-video-voice.md.