Skip to main content
Foundations & Tools

AI Image & Video Glossary 2026: A-Z of Terms

29 min read
AI Image and Video Glossary 2026, A-Z of Terms, hero graphic on a dark blue background with a white and orange title and a grid of four outlined letter tiles A, I, V, S with small icons

Hero image is AI-generated. See our AI-disclosure policy.

TL;DR: This is a plain-English dictionary of the words you meet when using AI image and video tools, covering the whole pipeline in one place: model concepts (diffusion, latent space, autoregressive), image and video controls (seed, CFG scale, img2img, image-to-video, temporal consistency, keyframes), prompting and editing, and the business and legal terms most glossaries skip (commercial license, IP indemnification, C2PA, SynthID, EU AI Act Article 50). Each term gives a one-line definition, a short plain explanation, why it matters, and a real example. The biggest under-explained area is commercial risk: you usually own your AI output, but most consumer plans make you indemnify the vendor, and from 2 August 2026 the EU AI Act requires AI content to be marked machine-readably.

This is a plain-English dictionary of the words you actually meet when you use AI image and video tools. It covers the whole creative pipeline in one place: the model concepts underneath the tools, the controls you adjust when generating images and video, the prompting and editing vocabulary, and the business and legal terms that decide whether your work is safe to sell. Every entry gives you a one-line definition, a short explanation, why it matters, and a real example in a tool you can actually use.

If you want a directory of the image generators themselves rather than the words, see our companion Best AI Image Generators 2026: A-Z Encyclopedia. This page is the words; that page is the tools.

The single most under-explained area in any glossary is commercial risk. Here is the short version up front: you can usually use and even sell what you generate, but on most consumer plans you, not the AI company, carry the legal liability if an output triggers a copyright claim, and purely AI-made images generally cannot be copyrighted at all. From 2 August 2026 the EU AI Act also requires AI-generated content shown to EU audiences to be marked in a machine-readable way. Those terms are defined in plain language below.

A quick note on accuracy: where a definition carries a date, price, or version, it was verified against an official source as of June 2026. Examples use tools we teach and recommend.

A

AI disclosure

Telling your audience that content was made or significantly changed by AI. Disclosure now ranges from a visible label to a machine-readable tag inside the file, and for some content it is a legal requirement rather than a courtesy. Why it matters: failing to disclose commercial AI content can mean takedowns, lost monetization, or legal exposure. For the rules and how to apply them, see AI Disclosure Compliance 2026.

Aspect ratio

The width-to-height shape of an image or video, written as a ratio such as 16:9 (widescreen), 9:16 (vertical for phones), or 1:1 (square). Set it before you generate, because cropping a finished image to a new shape can cut off your subject. Some tools expose it as a preset and some as a flag. Example: ask Google Gemini for a 9:16 image when you want a phone-shaped result.

Autoregressive model

A model that builds its output one piece at a time, each step based on what came before, the same way a language model predicts the next word. This sequence-by-sequence approach is especially strong at following instructions and rendering legible text inside an image. GPT Image 2.0, the engine inside ChatGPT, uses an autoregressive design, which is part of why it handles on-image text and precise layouts so well.

Avatar (digital human)

A synthetic on-screen presenter that speaks a script you provide. An avatar tool combines a face model, a synthetic or cloned voice, and lip-sync so a single creator can produce talking-head video without filming. Why it matters: it scales training, explainer, and marketing video without a camera. Example: HeyGen builds a reusable presenter that delivers a script as spoken video.

B

Background removal

Automatically cutting the subject out from its background so you can drop it onto a new one. Modern tools use semantic segmentation to find the subject’s edges down to the pixel and export it with a transparent background. Why it matters: it makes clean product shots and avatar backgrounds in seconds, with no manual cutting.

Batch

Generating several images at once from the same prompt, each with a different random starting point, so you can pick the best. Why it matters: it speeds up exploration, though some plans bill per image in the batch.

B-roll

Secondary footage that supports your main shot, such as scenery or cutaways. AI image-to-video is increasingly used to generate B-roll on demand, which also helps hide cuts between AI clips. Example: generating a few seconds of a city street in Kling to cover a transition.

C

Camera control

Directing the virtual camera in a video model using filmmaking terms: pan (rotate horizontally), tilt (rotate vertically), dolly (move the camera toward or away), and zoom (magnify without moving). Why it matters: deliberate camera moves are what separate a static clip from a cinematic one. Example: prompting Seedance for a slow dolly-in on the subject.

CFG scale (guidance scale)

A dial for how strictly the model follows your prompt versus how much freedom it takes. Set it too low and the model drifts from your request; too high and the image can look harsh and over-cooked. Most modern models do best in a moderate range rather than at the extremes. Why it matters: it is the fastest fix when output ignores your prompt (raise it) or looks burnt and unnatural (lower it).

Checkpoint

A saved snapshot of a fully trained model, distributed as a single downloadable file in the open-source world. Different checkpoints have different default looks, such as photoreal versus illustrative. Why it matters: it mostly concerns people running models locally, but it explains why two open-source setups produce different results from the same prompt.

Color grading

Adjusting the color, contrast, and tone of footage to set a mood and to make separate shots look like one piece. Because AI clips often come out slightly different from each other, grading is usually needed to unify them. Why it matters: it is the difference between raw clips and a finished, cohesive look.

Commercial license

The part of a tool’s terms that says whether, and how, you may use its outputs to make money. Permission to sell AI work comes from this license, not from copyright, and some tools allow commercial use only on paid tiers while free tiers are personal-use only. Why it matters: before you sell AI work, confirm the plan you used grants commercial rights. Example: paid OpenAI plans grant you ownership and the right to use your generated images commercially.

Consistency (character and scene)

Keeping the same person, object, or setting looking identical across multiple generations or shots. Because generation involves randomness, a character described only as “a man in a red hat” will look like a different person each time unless you anchor it with reference images, a fixed seed, or a model feature built for it. Why it matters: without it, a multi-shot story falls apart into a slideshow of strangers.

Content Credentials (C2PA)

An open standard that attaches signed, tamper-evident metadata to a file recording that it is AI-generated and what edits it has had. Because it lives in the file’s metadata, it can be stripped if the file is re-saved without it. Why it matters: it is the disclosure layer that platforms and regulators are converging on. Example: GPT Image 2.0 embeds C2PA Content Credentials in its exported images by default.

Content provenance

The verifiable history of where a piece of content came from and how it was changed. The industry approach pairs signed metadata (C2PA) with an invisible watermark (such as SynthID) so origin survives even if metadata is stripped. Why it matters: it is becoming the backbone of trust and AI disclosure online.

ControlNet

An add-on that forces image generation to follow an extra structural input, such as a pose, a depth map, or an edge outline, instead of placing things randomly. Why it matters: it turns generation from a slot machine into a precise drafting tool, mainly in the open-source ecosystem. Example: using a pose input so a character stands in an exact stance you supplied.

Who, if anyone, owns the rights to AI-generated work. In the United States, the Copyright Office holds that purely AI-generated content is not copyrightable; meaningful human authorship is required, and on 2 March 2026 the Supreme Court declined to revisit the rule that a work needs a human creator. Why it matters: you may be allowed to use and sell an image under the tool’s license yet still not own enforceable copyright in the raw output unless you add substantial human creativity. This is general information, not legal advice, and rules vary by country.

D

Deepfake

Synthetic media that convincingly shows a real person doing or saying something they never did. The technology itself is neutral, but the word usually means deceptive or non-consensual uses. Why it matters: making deepfakes of real people without consent is a legal and ethical minefield, and it is the main reason provenance and disclosure rules exist.

Denoising

The core action inside a diffusion model: starting from random static and removing noise step by step, guided by your prompt, until a clear image remains. Why it matters: it explains why generation takes several steps and why changing the starting noise changes the whole result.

Denoising strength

In image-to-image and inpainting, how much the model is allowed to change the original, from 0 (barely touched) to 1 (effectively a new image). Keep it low to preserve a real photo and make subtle edits; raise it to reinvent the image. Why it matters: it is the key preserve-versus-transform slider, and keeping it low protects a real face during edits or upscaling.

Diffusion model

A generator that creates an image or video by gradually removing noise from random static until a result appears that matches your prompt. It is the dominant approach for most 2026 image and video models. Why it matters: it is why output appears over a number of steps and varies run to run unless you fix the seed. Example: Seedance and Kling are diffusion-based video models.

Diffusion Transformer (DiT)

A model that combines diffusion (denoising) with a transformer backbone, which scales better and gives higher quality than the older approach. In 2026 it is the standard backbone for leading image and video models. Why it matters: when a model is called DiT-based, expect strong prompt-following and high fidelity.

Dubbing

Translating and replacing a video’s spoken audio into another language while matching timing, often with a cloned voice and adjusted lip movements. Why it matters: it lets one video reach global audiences without re-recording. Example: ElevenLabs offers automated dubbing across more than two dozen languages.

Duration

The length of a generated clip, usually a handful of seconds per generation because longer clips are harder to keep consistent. Why it matters: you plan projects as short shots stitched together rather than one long take.

E

EU AI Act Article 50

The European transparency rule requiring AI-generated or manipulated content to be marked in a machine-readable way and deepfakes to be disclosed. The obligations apply from 2 August 2026. A later agreement gave systems already on the market a short grace period for the machine-readable marking requirement, extending that specific obligation to 2 December 2026; new systems must comply from 2 August. Penalties can reach 15 million euros or 3 percent of global annual turnover. Why it matters: if your AI content reaches EU audiences, compliant marking stops being optional this year. Example: GPT Image 2.0 and Google’s image tools ship machine-readable provenance by default.

F

Fair use

A United States legal doctrine that allows limited use of copyrighted material without permission, and the main defense AI companies cite for training on web data. Whether training on copyrighted images is fair use is unresolved, with major lawsuits still in active litigation in 2026. Why it matters: it is the open question behind “was this model trained legally,” so treat confident claims either way with caution. Fair use is US-specific and is general information, not legal advice.

Fine-tuning

Further training of an existing model on a focused set of data so it specializes in a style, a character, or a brand. Full fine-tuning is expensive; lightweight methods like LoRA get much of the effect cheaply. Why it matters: it is how studios make a model reliably reproduce a specific look.

Flicker and morphing

Common AI video faults where details shimmer between frames (flicker) or a subject warps into something else (morphing). Both are failures of temporal consistency and get worse with fast motion or long clips. Why it matters: spotting them tells you to shorten the clip, reduce motion, or switch models.

Foundation (base) model

A large, general-purpose model trained on broad data that serves as the starting engine for specialized work built on top. Why it matters: it is the big underlying engine, as distinct from the fine-tunes and small add-ons layered onto it. Example: GPT Image 2.0, Nano Banana Pro, Seedance, and Kling are foundation models you reach through apps or APIs.

Frame interpolation

Creating new in-between frames to raise a video’s frame rate and smooth its motion, such as turning 24 fps into 60 fps. It works best on smooth, slower motion and struggles with fast action. Why it matters: it smooths AI clips and enables slow motion, but overusing it can flatten the deliberate timing of stylized animation.

Frame rate (fps)

How many frames play each second. 24 fps looks cinematic, while 30 and 60 are common for the web; higher frame rates look smoother but cost more to generate. Why it matters: it affects both the feel of the footage and how well it cuts alongside other clips.

G

GAN (Generative Adversarial Network)

An older approach where two networks compete, one generating images and one judging them, improving each other. Diffusion and transformer models have largely replaced GANs for general generation, but GAN-based systems are still common in fast, specialized jobs like upscaling and real-time face work. Why it matters: you will still meet GAN-based tools in upscaling and avatars even though new generators use other methods.

Generative fill

Selecting part of an image and having AI replace or extend it from a prompt. It is a friendly, prompt-driven form of inpainting and outpainting. Why it matters: it is everyday region editing under a more approachable name.

H

Hallucination

When an AI invents content that was not in the input and is not true, while presenting it as real. In photo restoration this shows up as the model inventing a face or details that were never in the original. Why it matters: AI fills gaps by guessing, so always check faces and facts against the source. Example: forget to attach a photo and ask a model to “restore this,” and it may fabricate a convincing but fake antique image. See this handled in practice in how to restore old photos with AI.

I

Image-to-image (img2img)

Giving the model a starting image plus a prompt so it transforms the existing picture rather than starting from scratch. How much it changes is set by denoising strength. Why it matters: it is the basis of restyling and “make this look like X.” Example: uploading a backyard photo to Gemini and asking Nano Banana Pro to redesign the garden.

Image-to-video

Animating a still image into a short clip, optionally guided by a prompt. You supply a starting frame and the model generates motion from it. Why it matters: it is the natural next step after making an AI image. Example: feeding a product still into Seedance to get a slow push-in shot.

Indemnification (IP indemnification)

A promise by the AI provider to cover your legal costs if an output triggers an intellectual-property claim. Most consumer plans do the opposite and make you indemnify the provider, so the liability sits with you. As of 2026, indemnification is the exception, offered mainly on enterprise tiers. Why it matters: this is the biggest hidden risk in paid work. Example: Adobe Firefly’s enterprise indemnification, backed by its licensed training library, is the strongest in the market, while standard consumer chat plans typically offer none.

Inference

Running a trained model to produce an output, as opposed to training it. Every image or video you generate is one inference, and API pricing is usually per inference. Why it matters: it is the unit you actually pay for and wait on.

Inpainting

Editing a selected region of an image while leaving the rest untouched. You mask an area and the model regenerates only inside it, matching the surroundings. Why it matters: it is the precise tool for removing an object or fixing one element. Example: removing a photobomber from a family photo while keeping everyone else identical.

K

Keyframe (start and end frame)

A fixed anchor image at a specific point in a clip that the model must hit. Setting a start and an end frame lets the model fill in a smooth transition between them. Why it matters: it gives you directorial control over where a shot begins and ends. Example: giving Kling a “before” and “after” frame and letting it morph between them.

L

Latent space

A compressed mathematical map where the model represents images as points instead of working pixel by pixel. The model generates there and then decodes back to pixels. Why it matters: it explains why similar prompts give similar images and why blends between two prompts look smooth.

LoRA (Low-Rank Adaptation)

A small add-on file that nudges a base model toward a specific style or subject without retraining the whole thing. LoRAs are popular in the open-source image world because they are tiny and stackable. Why it matters: when someone says “use a LoRA for that style,” they mean a small specialization layer, not a new model.

M

Metadata

Information about a file, such as its creator, date, and edit history. C2PA provenance is a signed, tamper-evident form of metadata, while ordinary metadata is easily stripped or changed. Why it matters: knowing metadata can be removed is why an invisible watermark is also needed.

Motion (motion control)

How much movement appears in a generated clip and your ability to direct it, set by your prompt or by model controls. Too much motion can break consistency. Why it matters: it is the lever between a safe static clip and a dynamic but riskier one.

Multimodal (native multimodal)

A model that handles more than one kind of data, such as text, image, and audio. “Native multimodal” means it was built to handle them together rather than bolted together afterward, so newer video models can generate synchronized sound and picture in one pass. Why it matters: it is why a single prompt can produce a clip with matching dialogue and effects. Example: Seedance 2.0 is described as a native multimodal audio-video model.

N

Negative prompt

A list of things you do not want in the result, such as “blurry” or “extra fingers.” Open-source tools use a dedicated field for this, while chat tools handle it conversationally. Why it matters: it removes recurring artifacts without rewriting the whole prompt.

O

Open-weight model

A model whose trained weights are published so people can download, run, and adapt it themselves, often under conditions. Open weights are frequently released for non-commercial use, with a separate paid license required for commercial production. Why it matters: open weights enable self-hosting and deep customization, but check the license before selling the output. Example: Ideogram 4.0 was released with open weights, with a commercial license required for production use.

Outpainting

Generating new content beyond the edges of an image to extend the canvas, with the model inventing surroundings that match the existing scene. Why it matters: it widens a too-tight crop or changes the aspect ratio without cutting the subject. Example: extending a vertical portrait into a wide banner.

P

Parameters (model)

The internal numbers a model learns during training, used loosely as a measure of its size and capacity. More parameters can mean more capability but also higher cost. Why it matters: it helps when comparing models, though parameter count alone does not equal quality.

Parameters and flags (prompting)

Short switches added to a prompt to set options like aspect ratio or stylization, such as an “—ar” flag for aspect ratio in some tools. Chat tools usually take plain language instead. Why it matters: knowing them speeds up tools that use them. Tools such as Midjourney use these flags; they are named here only as examples of the syntax.

Photorealism

Output that looks like a real photograph rather than an illustration, achieved with camera and lens language and realistic lighting. Why it matters: it is the goal for product, headshot, and restoration work, and the right vocabulary makes a big difference. Example: GPT Image 2.0 for photoreal portraits. For technique, see our photorealistic AI prompts guide.

Prompt

The instruction you give the model describing what you want. Good prompts specify subject, style, lighting, camera, and mood, and modern tools follow long, detailed prompts well. Why it matters: it is the single biggest lever on output quality.

Prompt adherence

How faithfully the output matches what you asked for, including the specific elements, counts, and layout. Why it matters: it is a primary way creators compare models. Example: Nano Banana Pro is noted for strong prompt understanding.

Prompt engineering

The craft of writing instructions that reliably get the result you want, covering structure, specificity, camera and lighting language, and iteration. Why it matters: it is the highest-leverage skill in AI creation, because small wording changes shift the output a lot.

R

Reference image

An image you supply to steer style, subject, or composition. Chat tools accept reference images directly, and some keep a subject consistent across generations. Why it matters: it is often more reliable than describing a style in words. Example: uploading a brand logo to Nano Banana Pro so it persists across product shots.

Render

Producing the final output file from your settings, whether that is a single generated image or an edited timeline exported to video. Why it matters: it is the final, sometimes slow, export step.

Resolution

The pixel dimensions of an image or video, such as 1920x1080. Higher resolution holds more detail but costs more to generate, and many tools generate at a base size then upscale. Why it matters: it determines print size and crispness.

Rotoscoping

Isolating a moving subject from its background frame by frame. Traditionally a manual job, it is now largely automated by AI for video compositing. Why it matters: it is key to placing a subject onto a new background in motion.

S

Sampler (scheduler)

The algorithm that decides how noise is removed at each step of diffusion. Different samplers trade speed against quality and give slightly different looks at the same settings. Why it matters: it is mostly an advanced control, but it explains why the same prompt differs across tools.

Seed

The number that sets the starting random noise. The same seed plus the same prompt and settings produces the same image. Change it for variety, or lock it to reproduce a result and make small controlled tweaks. Why it matters: it is how you get repeatable results and iterate without losing a look you like.

Segmentation

Automatically identifying and outlining specific objects or regions in an image or video, often used as the first step in editing, masking, or background removal. Modern segmentation models can track an object across video frames. Why it matters: it underpins precise editing tools, letting you select exactly what you want to change.

Self-attention

The mechanism inside a transformer that decides which other parts of the input to weigh most when producing each part of the output. In video, attention across frames helps keep a subject consistent over time. Why it matters: it is the technical reason today’s tools follow long prompts and hold subjects steadier than older ones.

Steps

How many denoising iterations the model runs to make an image. More steps can add detail up to a point, then give diminishing returns and cost more time. Why it matters: raising steps is a common quality lever, but past a certain point it just wastes time.

Style transfer

Applying the visual style of one image or a named style to the content of another, done with reference images or prompts such as “in the style of a watercolor.” Why it matters: it keeps your subject while restyling the look in one step.

SynthID (invisible watermark)

A pattern embedded directly into the pixels, audio, or tokens that marks content as AI-generated and survives compression and screenshots. Unlike metadata, it stays inside the content itself, so it is harder to remove. Why it matters: it is the robust half of provenance and is central to meeting transparency rules. Example: Google’s image tools apply SynthID, and GPT Image 2.0 pairs C2PA with an equivalent watermark.

T

Temporal consistency

Keeping characters, objects, lighting, and color stable across video frames so nothing flickers or morphs. It is achieved with attention that links frames and is the single biggest quality jump in 2026 video. Why it matters: it separates believable clips from the melting, flickering look of older AI video. Example: Seedance is positioned for strong realism and steady subjects. For a model comparison, see Seedance vs Kling vs Veo.

Text-to-image (txt2img)

Creating a new image from a written description alone, with no starting picture. Why it matters: it is the default mode of ChatGPT and Gemini image generation. Example: typing “a golden retriever in a sunlit kitchen, photo” into ChatGPT.

Text-to-speech (TTS)

Turning written text into spoken audio with a synthetic voice. Modern TTS is expressive and multilingual and is the backbone of AI voiceover. Why it matters: it removes the need to record narration yourself. Example: ElevenLabs for voiceover.

Text-to-video

Generating a video clip from a written prompt alone, with the model inventing motion, lighting, and often audio. Why it matters: it is the headline feature of tools like Seedance and Kling.

Training data

The images, video, and text a model learned from. Its makeup shapes a model’s style and the legal questions around its outputs. Why it matters: models trained on licensed or owned data carry less legal risk than those trained on scraped web data. Note that most providers, including OpenAI, do not fully disclose their training data; Adobe is a notable exception, training Firefly on its licensed stock library.

Transformer

A model architecture built around attention that weighs how much each part of the input matters to every other part, and processes data in parallel. It underpins both language models and modern image and video models, often combined with diffusion as a Diffusion Transformer. Why it matters: it is the main reason today’s tools follow long, detailed prompts so much better than early ones.

U

Upscaling

Increasing an image’s resolution while adding plausible detail, not just stretching pixels. Why it matters: it is how a small old photo becomes large and sharp enough to print.

V

VAE (Variational Autoencoder)

A component that encodes images into latent space and decodes the model’s result back into viewable pixels. A poor VAE can wash out colors. Why it matters: it mostly concerns advanced users, but it explains occasional color or detail shifts after generation.

Voice clone

Recreating a specific person’s voice from samples so it can speak new text. It requires consent and is central to dubbing and narration. Why it matters: it scales narration and localization, but it carries clear consent and disclosure duties. Example: ElevenLabs for voice and narration.

W

Watermark

A mark added to content to signal its origin. In AI, the important kind is an invisible watermark such as SynthID embedded in the pixels, as opposed to a visible logo overlay. Why it matters: invisible watermarks are harder to remove than metadata and are part of meeting disclosure rules.

World model

A model that builds an internal sense of how a scene and its objects behave over space and time, aiming for physically coherent, longer video. Why it matters: when a video model claims world-model abilities, expect a pitch about physical realism and longer coherent shots.

Image term vs video term: a quick map

Infographic mapping image terms to video terms on a dark blue background: text-to-image to text-to-video, image-to-image to image-to-video, inpainting to region video edit, upscaling to interpolation, reference image to keyframe

If you already know the image vocabulary, here is how it carries over to video.

Image worldVideo equivalentWhat carries over
text-to-imagetext-to-videoGenerate from a prompt with no source media
image-to-imageimage-to-videoStart from a supplied image and transform or animate it
inpainting (region edit)masked or region video editChange one area, keep the rest
upscaling (more pixels)frame interpolation (more frames)Raise quality: detail per frame versus frames per second
seedseedSame role: reproducibility of a result
CFG scaleprompt adherence and guidanceHow strictly the output follows the prompt
reference imagekeyframe or referenceAnchor the look or a specific frame

Provenance and compliance at a glance

Infographic titled How AI Content Is Marked comparing C2PA signed metadata, SynthID invisible watermark, and plain metadata, noting EU AI Act Article 50 machine-readable marking from 2 August 2026

Verified June 2026.

C2PA / Content CredentialsSynthID / invisible watermarkPlain metadata
What it isSigned metadata manifestInvisible pattern in the pixels or audioUnsigned file info
What it provesOrigin plus full edit historyThat content is AI-generatedClaimed origin (weak)
Can it be removed?Yes, if re-saved without itHard; survives compression and screenshotsEasily removed or edited
Ships by default inGPT Image 2.0, Adobe, Google toolsGoogle tools, GPT Image 2.0Most files
Counts for EU AI Act Article 50Yes, as machine-readable markingYes, as machine-readable markingNot sufficient alone

Commonly confused terms

Infographic of easily confused AI terms with one-line differences: txt2img vs img2img, inpainting vs outpainting, upscaling vs interpolation, seed vs CFG scale, license vs copyright

  • txt2img vs img2img: txt2img builds an image from words only; img2img transforms an image you supply.
  • image-to-video vs text-to-video: image-to-video animates a still you provide; text-to-video invents the whole clip from words.
  • inpainting vs outpainting: inpainting edits inside the frame; outpainting adds new content beyond the edges.
  • upscaling vs frame interpolation: upscaling adds detail per frame; interpolation adds frames for smoother motion.
  • fine-tuning vs LoRA: fine-tuning retrains the whole model; a LoRA is a small add-on that adjusts a base model without retraining it.
  • seed vs CFG scale: seed fixes the starting noise for reproducibility; CFG sets how strictly the model follows the prompt.
  • diffusion vs autoregressive: diffusion denoises the whole image over steps; autoregressive builds it piece by piece in sequence.
  • C2PA vs SynthID: C2PA is signed metadata attached to the file; SynthID is an invisible watermark inside the pixels.
  • commercial license vs copyright: a license lets you use or sell the output; copyright is ownership, which AI-only output usually lacks.
  • indemnification vs ownership: ownership says the output is yours; indemnification says who pays if it triggers a claim, which on most consumer plans is you.

Frequently asked questions

What is the difference between img2img and image-to-video?

Both start from an image you supply. Image-to-image keeps the result a still picture and transforms its look or content. Image-to-video animates that still into a short moving clip. The shared idea is that you begin from a source image rather than from words alone.

Do I own the images and videos I make with AI?

Usually you can use and even sell them, because the tool’s commercial license grants that right on the appropriate plan. But owning enforceable copyright is different: in the United States, purely AI-generated output generally cannot be copyrighted without meaningful human authorship. Check the specific tool’s terms, and remember this is general information rather than legal advice.

What is the safest option for commercial client work?

Look for two things in the terms: a clear commercial license on the plan you are using, and IP indemnification, where the provider agrees to cover legal costs if an output triggers a claim. Most consumer chat plans make you carry that liability instead. Adobe Firefly, trained on a licensed library, offers the strongest indemnification, while standard consumer plans typically offer none.

What is CFG scale and what should I set it to?

CFG scale controls how strictly the model follows your prompt. Too low and it drifts from your request; too high and the image looks harsh and over-cooked. Most modern models do best in a moderate range, so if your output looks burnt, lower it, and if it ignores your prompt, raise it.

Do I have to label AI content?

Increasingly, yes. From 2 August 2026 the EU AI Act requires AI-generated or manipulated content shown to EU audiences to be marked in a machine-readable way, with deepfakes disclosed. Many tools now add this marking automatically through C2PA and invisible watermarks. For the practical steps, see AI Disclosure Compliance 2026.

Keep going

This glossary pairs with our Best AI Image Generators 2026: A-Z Encyclopedia for the tools, our GPT Image 2.0 vs Nano Banana Pro comparison for choosing an image model, and Seedance vs Kling vs Veo for video. Bookmark this page and check the date in the header, because the vocabulary moves fast.

Last reviewed by Daniel Riley on · per our editorial standards.