U.I. Overview – Txt2img (Text to image)
In this section, we are going to cover the basic features and tools of the txt2img (text to image) tab. This is where you are going to create images using text prompts, both positive and negative. This is where you will be spending most of your time in A1111 if you are going to be generating images.
Let’s begin with the prompt section.
You should see something like this:
Note that I am in the txt2img tab.
This is the positive prompt section.
The positive prompt field indicates what you want in your image by using descriptions, keywords, weights, and loRas. The more specific you can be, the better.
Using mediums such as “portrait”, “concept art”, or “painting” will define the category of the artwork. You can use keywords such as “hyperrealistic” to define the art style. You can even add an artist’s name to the prompt to add their style, such as “Van Gough” or “Claude Monet”. Lighting type, color scheme, and even the name of a studio or anime will also translate to a more specific type of image.
But we’ll get more into that later!
This is the negative prompt section.
The negative prompt field indicates what you DON’T want in your image. Similarly to the positive prompt, you want to add descriptions, keywords, weights, etc. There are a handful of great universal negative prompts that prevent common issues such as too many limbs, disfigurement, ugly proportions, bad lighting, etc. I have included all of mine in the assets folder for you.
This is the Sampling method.
This is where you choose the algorithm for the diffusion/denoising process. There are a ton of sampling methods you can choose from. Don’t feel overwhelmed or put-off at the complexity of their names. In the grand scheme of things, they all do the same thing (solve equations). Some of them do give slightly different results and have slight advantages in terms of speed & processing power, but choosing the “right” one should not be your concern..
This is where you set the Sampling Steps.
Sampling Steps are the amount of steps that the image will be “denoised” during the diffusion process. Typically, the more steps, the better. To save time & VRAM, common practice is to stay within 20 – 40.
Next is the Width & Height section.
Pretty self-explanatory. This is the size of the output image.
V1.5 models (such as the Dreamshaper model we will be using) are trained on images with the size of 512×512.
When generating images with a V1.5 model, it is advised to set at least one size to 512 pixels and to never exceed 768 pixels to produce quality images.
Oftentimes, I get away with breaking this rule and still producing a good image with an aspect ratio of 16:9 or 9:16 by setting the dimensions to 910×512.
For XL models, the native resolution is 1024×1024. It is still advised to set one side to 1024 pixels, but you can be a bit more flexible. But to maintain quality, never exceed 1536 pixels.
Of course, I will cover this more in-depth later on.
This is the Batch Count setting.
The Batch Count is where you will set the number of times you want to run the image generation using the parameters you have set.
This is the Batch Size parameter.
The Batch Size is where you set the number of images you want to generate each time you run the sequence using the parameters you have set.
What is the difference between Batch Size & Batch Count?
An easy way to remember is to think of generating an image like buying a gumball.
Batch Size is how many gumballs you want whenever you buy one.
Batch Count is how many times you want the machine to dispense the gumballs.
*If you have a batch count of 2 and a batch size of 4, you will produce 8 images as the sequence will be run twice, each producing 4 images. Usually you will only change the batch count if you run into memory issues. Maximum efficiency is having a batch size of 4.
This is the Classifier Free Guidance Scale.
This parameter controls how much the checkpoint should adhere to your positive & negative prompt.
A lower value indicates the model ignoring your prompt, whereas a higher number indicates that the model strictly follows your prompt.
Giving the model creative freedom is advised, I personally stick around the 5 – 10 range for most of my work.
Having too high of a CFG value can give your image a “saturated” look, whereas having too low of a value most often produces nightmare fuel.
This is the Seed.
Technically speaking, the seed value is used to generate the initial random tensor in the latent space.
It essentially controls the content that is in the image. Every single image that is generated has its own seed value.
A seed value of “-1” means that Automatic1111 will generate a random image.
This is Re-use Seed button.
Use this button to re-use the seed from the last image/animation you have created.
This is helpful when you want to keep similarities of the image you have just generated.
Or generate an image that you have found online in an instance where the user has provided their workflow (Prompt, Checkpoint, Sampling steps, CFG, & Seed).
This is the Randomize Seed button.
This is the default setting (-1).
Click this button to automatically set the value to -1 & generate random seed values after you have input a previous seed value or re-used a seed value.
Click this “Extra” box to bring up extra seed settings.