Dont even know where to start!

So I just installed ComphyUI and I’ve no idea where to start… co-pilot is hopeless at providing guidance - took 3 hours of my life away before I decided to take over and install it directly from the comphy site!
Anyway, I thought I start with some basic like editing an image and moving on to creating a new image from 2 other images… any pointers welcome

Thanks in advance

1 Like

Among image-generation GUIs, ComfyUI is the one where the setup before you actually start generating images is the most difficult. On the other hand, it offers the widest range of features of any software of this type…

There are a few simpler GUIs out there, though.


Start with built-in Templates, then learn image-to-image, then inpainting, and only after that move to two-image reference workflows. That is the least painful path because it follows how the official docs are organized: first generation, then basic edit workflows, then newer native model workflows. ComfyUI’s own docs also recommend using the built-in Templates browser for supported workflows. (ComfyUI)

What ComfyUI is, in simple terms

ComfyUI is a workflow editor. A workflow is a graph of connected nodes. One node loads a model, another handles prompts, another encodes or decodes images, another does the sampling, and another saves the result. The official docs describe a workflow exactly as a graph of connected nodes, and they explicitly say the built-in Templates are the place to start. (ComfyUI)

That matters because the first mistake most beginners make is treating ComfyUI like a normal “one button, one model” app. It is not. You are learning which workflow solves which problem. That is why the cleanest beginner order is not “best model first.” It is “best workflow family first.” This is an inference from how the official docs split beginner tasks into first generation, image-to-image, inpainting, and then more advanced model-specific workflows. (ComfyUI)

Step by step: where to start

1. Open Templates first

Go to Workflow → Browse Workflow Templates. Templates are ComfyUI’s browser for native model workflows and some example workflows. That is the safest starting point because the templates are part of the supported path, not a random community graph with unknown dependencies. (ComfyUI)

2. Run one official starter workflow once

Use the official Getting Started with AI Image Generation guide and complete one simple run. That guide is specifically about workflow loading, model installation, and first image generation. Do this even if your real goal is editing. You need one known-good baseline before you start changing things. (ComfyUI)

3. Learn image-to-image next

This should be your first real editing workflow. The official image-to-image guide says it is used for style conversion, line-art to realism, restoration, colorization, and other “change this image into a related image” cases. It is also much easier to understand than more advanced edit stacks because it is basically text-to-image plus an input image. (ComfyUI)

Use one source image and make only small changes at first. Do not try to do major composition changes yet. The point of this stage is to learn how the workflow responds when you nudge it. That is the simplest bridge from “I installed ComfyUI” to “I can actually edit something.” This is a recommendation based on the official beginner workflow structure. (ComfyUI)

4. Learn inpainting after that

Once image-to-image makes sense, move to inpainting. The official inpainting guide covers exactly what a beginner needs for local edits: modifying images with a mask, using the mask editor, and the VAE Encoder (for Inpainting) node. This is the right workflow when you want to change only one area of an image instead of reinterpreting the whole thing. (ComfyUI)

5. Then move to a stronger inpaint model

After you understand the basic inpainting workflow, the first model-specific upgrade I would look at is FLUX.1 Fill dev. Its official guide is specifically about inpainting and outpainting, and it is designed for prompt-following edits that stay consistent with the original image. (ComfyUI)

6. Then pick one modern edit model

For general local editing and style-aware editing, FLUX.1 Kontext Dev is one of the cleanest current native options. Its guide says it supports simultaneous text and image input, targeted editing, style reference, character consistency, and interactive speed, and that it runs locally. (ComfyUI)

If your edits involve text inside images, signs, labels, posters, UI mockups, or more semantic changes, Qwen-Image-Edit is a better next step. Its official guide says it supports precise text editing and dual semantic/appearance editing. (ComfyUI)

7. Only then move to “make one image from two images”

There are two main beginner-safe routes here.

If you mean subject from one image plus style from another, use USO. Its official guide says it supports subject-driven, style-driven, and combined subject-plus-style generation. (ComfyUI)

If you mean use multiple reference images and keep them coherent, use FLUX.2 Dev. Its guide says it adds reliable multi-reference consistency, improved editing precision, and better visual understanding. (ComfyUI)

A very simple first-week plan

Day 1

Open Templates, run one starter workflow, and confirm that ComfyUI can load a model and generate one image. (ComfyUI)

Day 2

Do only image-to-image. Use one input image. Make three versions: one mild, one medium, one strong. Do not add any custom nodes. (ComfyUI)

Day 3

Do only inpainting. Change one small object or one small region. Learn the mask editor. (ComfyUI)

Day 4

Try FLUX.1 Fill dev for a cleaner inpaint/outpaint workflow. (ComfyUI)

Day 5

Pick one of these, not all of them:

  • Kontext Dev for general editing and style-aware edits. (ComfyUI)
  • Qwen-Image-Edit for text-heavy or semantic edits. (ComfyUI)
  • USO for subject-plus-style mixing. (ComfyUI)
  • FLUX.2 Dev for multi-reference generation. (ComfyUI)

Good existing beginner guides

The best guides to start with are these:

  • Getting Started with AI Image Generation. This is the official first-run guide. (ComfyUI)
  • Workflow Templates. This is the safest place to find starter workflows. (ComfyUI)
  • Image-to-Image. This is the best first edit tutorial. (ComfyUI)
  • Inpainting. This is the best first local-edit tutorial. (ComfyUI)
  • ComfyUI Examples. The example images contain metadata, so you can drag them into ComfyUI and recover the workflow used to make them. The examples site itself says it is a good place to start if you have no idea how any of this works. (Comfy Anonymous)

For video learning, two community resources keep coming up and are easy to follow:

  • Pixaroma’s “Learn ComfyUI From Scratch” playlist. (YouTube)
  • Scott Detweiler’s ComfyUI playlists. (YouTube)

The biggest beginner mistakes to avoid

Do not start with giant community workflows. Start with Templates and official examples. The official Templates page is built for supported workflows, and the official examples repo is set up so example images can be loaded back into ComfyUI with their workflow metadata. (ComfyUI)

Do not install lots of custom nodes on day one. The official custom-node installation docs exist for a reason, but that path adds more moving parts than you need at the beginning. It is easier to learn the core workflow families first, then add extensions later. This is a recommendation grounded in the official install split between native templates and custom-node installation. (ComfyUI)

Do not assume a missing template means you broke something. Several newer model guides say that if a workflow is missing from Templates, your ComfyUI may simply be outdated, and Desktop/stable releases can lag behind newer workflow docs. (ComfyUI)

Good models for your purpose by VRAM

8GB VRAM

This is the hardest tier. Exact 8GB is tight for modern local editing models.

The most realistic official starting point is FLUX.2 Klein 4B Distilled. ComfyUI’s guide describes FLUX.2 Klein as the fastest model in the FLUX family, built for text-to-image and image editing, with support for style transforms, semantic edits, object replacement/removal, multi-reference composition, and iterative edits. The guide also publishes reference numbers of about 8.4GB VRAM for the distilled 4B model and 9.2GB for the 4B base model on an RTX 5090. That means exact 8GB cards are borderline, but Klein 4B Distilled is still the nearest official fit in the current docs. (ComfyUI)

If you are on exact 8GB and want lighter experiments after that, Ovis-Image and Z-Image-Turbo are worth testing because the docs describe them as efficient models aimed at tighter compute budgets. Ovis-Image is a 7B text-to-image model designed to operate efficiently under stringent computational constraints, and Z-Image-Turbo is a distilled 6B model with sub-second inference and a stated fit within 16GB consumer devices. I would treat both as secondary experiments, not as safer bets than Klein for your exact use case, because the published docs do not give an 8GB editing target for them. (ComfyUI)

12GB VRAM

This is the first tier where things become comfortable rather than merely possible.

My default recommendation here is still FLUX.2 Klein 4B, either base or distilled, because the official docs give concrete VRAM figures and the model already covers both image editing and multi-reference composition. That makes it unusually practical for your two goals. (ComfyUI)

For masked edits, I would add FLUX.1 Fill dev next. It is specifically designed for inpainting and outpainting, and the guide is very direct and beginner-friendly. (ComfyUI)

If you want a stronger local edit model and you are willing to tolerate a heavier workflow, FLUX.1 Kontext Dev is the next thing I would test. Its guide positions it for targeted editing, style reference, character consistency, and local operation, but the doc does not publish a simple VRAM figure like Klein does, so I would treat it as a “try after Klein,” not as the first blind recommendation. (ComfyUI)

Over 40GB VRAM

This is where you can start using the big models the way they are meant to be used.

For pure generation quality and text rendering, Qwen-Image bf16 is the clearest official heavyweight option. The ComfyUI guide lists Qwen-Image_bf16 at 40.9 GB and Qwen-Image_fp8 at 20.4 GB, and describes Qwen-Image as a 20B model with strong multilingual text rendering and precise image editing. (ComfyUI)

For editing, especially text edits and semantic edits, use Qwen-Image-Edit. Its guide says it extends Qwen-Image’s text rendering into editing and supports dual semantic and appearance control. (ComfyUI)

For multi-reference image creation, use FLUX.2 Dev. Its guide says it supports reliable consistency across up to 10 reference images and improved editing precision. (ComfyUI)

Also, if you want the older full FLUX stack for high-quality generation, the official FLUX.1 Text-to-Image guide recommends t5xxl_fp16.safetensors when VRAM is greater than 32GB, which places full-quality FLUX configurations comfortably inside your 40GB+ tier. (ComfyUI)

My plain recommendations by tier

If I had to make this very concrete:

  • 8GB: start with FLUX.2 Klein 4B Distilled. It is the closest official fit, but exact 8GB is still tight. (ComfyUI)
  • 12GB: start with FLUX.2 Klein 4B Base or Distilled, then add FLUX.1 Fill dev for inpainting. (ComfyUI)
  • 40GB+: use Qwen-Image bf16 for heavyweight quality, Qwen-Image-Edit for editing, and FLUX.2 Dev for multi-reference work. (ComfyUI)

If you only do three things tonight

  1. Open Templates and run one official starter workflow. (ComfyUI)
  2. Run the official image-to-image workflow on one image you already have. (ComfyUI)
  3. Run the official inpainting workflow and change one small region only. (ComfyUI)

That is the shortest path from “I installed ComfyUI and the graph scares me” to “I can edit images on purpose.”

Great… thanks… started to make some progress…looks like my PC is going to need an upgrade! only 8GB VRAM… checked pricing of upgraded GPU’s and they are well outside my budget… any further recommendations always welcome… thanks for taking the time to reply

1 Like

What about trying the Google Colab free tier?

1 Like

Yeah. Google Colab free tier is good one. I also use the Tesla T4 16GB on Colab Free on a daily basis for experimental purposes.

Well, as long as you have 8GB of VRAM, there are models that work just fine as long as they aren’t the latest ones, so I think it’s safer to start with a lower-end model to get the hang of things, decide whether you’ll use ComfyUI or not, and then choose your GPU.
An 8GB GPU is plenty for practice. You might also want to try out the HF Spaces demo to see which model actually suits your needs.

The Windows environment is particularly inconvenient when it comes to AI, and there are so many pitfalls to watch out for when choosing a GPU… (I’m a Windows user myself. :cold_face:)


Start with ComfyUI + SD 1.5, not with FLUX or a giant multi-image workflow.

That is the safest fit for your actual setup: 8GB VRAM on Windows plus Colab Free as a backup. ComfyUI’s official beginner docs start with built-in workflow templates, model installation, and a first working run. The official image-to-image tutorial uses SD 1.5 (v1-5-pruned-emaonly-fp16.safetensors), and the official inpainting tutorial uses a dedicated SD 1.5 inpainting checkpoint (512-inpainting-ema.safetensors) and explicitly says it gives more natural inpaint results than a normal SD 1.5 checkpoint. (ComfyUI)

What is realistic for your hardware

Best first local option

SD 1.5 in ComfyUI is the best place to begin. It is directly used in the official beginner image-to-image and inpainting guides, which means you get a supported path and lighter workflows at the same time. (ComfyUI)

Second step

SDXL is plausible on modest NVIDIA hardware, but I would not make it lesson one in ComfyUI on exact 8GB. It is better as a step after you already understand image-to-image and inpainting. This is an inference from your 8GB limit plus the fact that the official beginner docs use SD 1.5 for the basic edit tutorials rather than SDXL. (ComfyUI)

Later experiment

FLUX.2 Klein 4B Distilled is interesting, but not the safest first local target on exact 8GB. ComfyUI’s own guide lists the distilled 4B model at about 8.4GB VRAM and the base 4B model at about 9.2GB VRAM on an RTX 5090. The same guide says Klein supports style transforms, semantic edits, object replacement/removal, multi-reference composition, and iterative edits. That makes it a good later model for your goals, but a tight fit for local 8GB. (ComfyUI)

Colab Free

Use Colab Free as a backup lane, not as your main learning lane. Google’s FAQ says free Colab has dynamic usage limits, no guaranteed or unlimited resources, varying GPU types, idle timeouts, and notebooks that can run for at most 12 hours depending on usage and availability. (Google Research)

My blunt recommendation

For your setup, I would rank the starting paths like this:

  1. ComfyUI + SD 1.5 image-to-image + SD 1.5 inpainting
  2. ComfyUI + SDXL, after the basics feel normal
  3. FLUX.2 Klein 4B Distilled, preferably later or on a favorable Colab session (ComfyUI)

ComfyUI or Forge Classic

For learning, I would still start with ComfyUI.

Why:

  • ComfyUI’s official docs are now structured around Templates, first generation, image-to-image, and inpainting, so there is a clean beginner path. (ComfyUI)
  • ComfyUI is explicitly a graph / nodes / flowchart interface, which is exactly why it scales better once you move from “edit one image” to “build one image from two references.” (GitHub)
  • The official examples repo says all example images contain workflow metadata, so you can drag them into ComfyUI and recover the workflow used to make them. That is extremely beginner-friendly once you know the first few nodes. (GitHub)

Forge Classic is still a valid alternative if you want a more familiar WebUI-style interface. Its README says it is built on top of the original AUTOMATIC1111-style WebUI, focuses on optimization and usability, and currently supports Flux.2-Klein 4B/9B but not FLUX.2 Dev. The same README also notes that Klein there does not support regular img2img and “will always edit.” (GitHub)

So my practical split is:

  • Use ComfyUI if you want the best long-term path for editing + two-image workflows. (GitHub)
  • Use Forge Classic if you mainly want a simpler A1111-style UI and are okay staying in a more traditional WebUI workflow. (GitHub)

Step by step: the safest path

Step 1. Open Templates

In ComfyUI, open Workflow Templates from the sidebar or from Workflow → Browse Workflow Templates. The Templates browser is where ComfyUI puts its natively supported model workflows and example workflows. When you load a template, ComfyUI checks for missing models and prompts you to download them. (ComfyUI)

Step 2. Do one simple first run

Use the official Getting Started with AI Image Generation guide. It is specifically written to cover workflow loading, model installation, and a first working image. It also explains that the default workflow usually loads automatically, and it shows how to load workflows from Templates or from images with workflow metadata. (ComfyUI)

Step 3. Learn image-to-image

This should be your first real task. The official image-to-image guide says it is used for style transfer, line art to realism, restoration, and colorizing old photos. It also says the workflow is very similar to text-to-image, just with an added reference image, which is exactly why it is a good beginner bridge. The key setting is denoise: lower values keep the result closer to the source image, higher values change it more. (ComfyUI)

Step 4. Learn inpainting

After image-to-image, do inpainting. The official inpainting guide is about changing only the masked area, and it covers the Mask Editor and the VAE Encoder (for Inpainting) node. It also explicitly shows that the dedicated inpainting checkpoint gives better transitions than a normal SD 1.5 checkpoint. (ComfyUI)

Step 5. Only then try two-image work

For “make a new image from two images,” the most important shift is conceptual:

  • image A can provide subject/content
  • image B can provide style/look
  • or both images can act as references for a new composition

Klein is relevant here because the official guide says it supports multi-reference composition, but on your hardware I would treat that as a later step, not the first one. (ComfyUI)

The simplest model plan for you

Local Windows 8GB

Use:

  • SD 1.5 for image-to-image
  • SD 1.5 inpainting model for local masked edits

That matches the official beginner tutorials directly. (ComfyUI)

Colab Free

Use Colab only when you want to test something heavier or more modern. Do not build your learning routine around it because the free tier is not predictable. (Google Research)

Klein

Try FLUX.2 Klein 4B Distilled later, especially if you get a decent Colab session or upgrade local VRAM. It is a good model for your eventual goal set, but not the calmest way to start on exact 8GB. (ComfyUI)

Good beginner guides

These are the ones I would actually use:

Official first

  • Getting Started with AI Image Generation. Best first run guide. (ComfyUI)
  • Workflow Templates. Best place to find safe starter workflows. (ComfyUI)
  • Image-to-Image. Best first editing guide. (ComfyUI)
  • Inpainting. Best first local-edit guide. (ComfyUI)
  • ComfyUI Examples. Best place to inspect working workflows because the images contain workflow metadata. (GitHub)

Video guides

  • Pixaroma – Learn ComfyUI From Scratch. It is explicitly presented as a complete beginner course for learning ComfyUI from scratch. (YouTube)
  • Scott Detweiler – ComfyUI playlist. Long-running playlist focused on local Stable Diffusion and ComfyUI workflows. (YouTube)

The biggest beginner mistakes to avoid

Do not start from a random workflow JSON or a giant image with missing custom nodes. Start with Templates and official examples. That is exactly what ComfyUI’s getting-started docs and examples repo are set up for. (ComfyUI)

Do not install a pile of custom nodes before you have one working image-to-image flow and one working inpaint flow. The official beginner path is core workflows first, models next, then more advanced paths. (ComfyUI)

Do not make Klein your first local test on exact 8GB just because it is modern. Its own published VRAM figure is already sitting right at the edge of your card class. (ComfyUI)

The short version

For your exact setup, start like this:

  1. ComfyUI
  2. Official Templates
  3. SD 1.5 image-to-image
  4. SD 1.5 inpainting
  5. Only then test SDXL
  6. Only then test FLUX.2 Klein 4B Distilled, preferably on a good Colab session or stronger VRAM (ComfyUI)

That is the least frustrating path from “I installed this thing” to “I can edit images and understand what the workflow is doing.”

Start with a simple image and tell the AI how you want it changed. If there are problems, be more specific with your prompt. If there are still problems, ask for help and mention: The software name and version (Comphy), the LLM you are using, your prompt, and your question.