Loading LORA weights does not change anything

Hi!

So I have downloaded LoRa from Civitai. This one Fitzpatrick style - FLUX | Flux LoRA | Civitai and stored it locally.

Then I created a pipeline

def generate_image_using_flux2(prompt: str):    
    repo_id = "diffusers/FLUX.2-dev-bnb-4bit"

    text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
        repo_id, subfolder="text_encoder", torch_dtype=torch_dtype, device_map="cpu"
    )
    dit = AutoModel.from_pretrained(
        repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
    )
    pipe = Flux2Pipeline.from_pretrained(
        repo_id, text_encoder=text_encoder, transformer=dit, torch_dtype=torch_dtype
    )
    pipe.enable_model_cpu_offload()
    pipe.enable_attention_slicing()

    pipe.vae.enable_slicing()

    pipe.load_lora_weights("./jimfitzpatrick-fluxlora.safetensors", weight_name="jimfitzpatrick-fluxlora.safetensors")

    return pipe(
        prompt=prompt,
        generator=torch.Generator(device=device).manual_seed(42),
        num_inference_steps=24,
        guidance_scale=4
    ).images[0]

However the generated image stays the same with our without LORA. Nether style nor anything changes. What do I miss?

1 Like

We don’t have quite enough information to fully identify the problem…


If the image is bit-identical (same seed, same prompt, same steps) with and without the LoRA, then the LoRA did not affect the forward pass at all. In Diffusers terms, either (1) the LoRA weights never got attached (key mismatch or loader path issue), or (2) they attached to the wrong component, or (3) you hit a known Flux + 4-bit LoRA bug/regression.

Below is the shortest path to a working setup, with the background for why each change matters.


Background: how LoRA “changes the image” in Diffusers

A LoRA is a set of small matrices that get injected into specific linear layers of the base model. For diffusion models that means:

  • Denoiser (UNet for SD1.5/SDXL, or Transformer/DiT for Flux)
  • Sometimes also text encoder

Diffusers load_lora_weights() is supposed to:

  1. read the LoRA tensors,
  2. match their keys to modules in the pipeline,
  3. install adapter hooks so inference uses modified weights.

If matching fails, Diffusers can end up effectively doing nothing. Some Flux LoRA issues even describe “no error, but the style isn’t applied and output is identical.” (GitHub)


The two biggest problems in your code

1) You load the Flux.2 transformer as AutoModel

For Flux.2, Hugging Face’s own blog shows you should instantiate the denoiser as Flux2Transformer2DModel, not a generic AutoModel. (Hugging Face)

Why it matters:

  • LoRA injection depends on model class structure and module naming.
  • A generic auto class can still run inference but can break adapter injection or key matching.

2) You are using the “single file” LoRA path in a risky way

The Diffusers docs show load_lora_weights(pretrained_model_name_or_path, weight_name=...) where the first argument is typically a repo id or directory, and weight_name selects the file. (Hugging Face)

Passing a file path and a weight_name is easy to get wrong across versions and loaders. If Diffusers treats the first argument as a directory, it can fail to locate the file cleanly or load an empty state dict.


Flux-specific pitfall: “LoRA scaling” is not what you think

In Flux pipelines, lora_scale is documented as applying to text encoder LoRA layers. It is not a guaranteed “global LoRA strength” knob for the transformer denoiser. (GitHub)

There is also a long-standing Flux issue where changing lora_scale produced no change with a fixed seed. (GitHub)

So for Flux, the reliable test is:

  • confirm adapters actually loaded,
  • force a huge adapter weight,
  • confirm output changes.

Flux + 4-bit: you may be on a version where LoRA is broken

Diffusers v0.32.2 explicitly says it “fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models.” (GitHub)

There are also Flux issues about LoRA failing when transformers are quantized to 4-bit. (GitHub)

So even if your code is “conceptually right”, an older Diffusers release can give exactly your symptom: LoRA loads but has no effect.


What I would do for your exact case (in order)

Step 1: Print versions first

You want to know if you are before or after the 4-bit Flux LoRA regression fix.

import diffusers, transformers
print("diffusers", diffusers.__version__)
print("transformers", transformers.__version__)

If you are older than the fix, upgrade Diffusers (at least past v0.32.2). (GitHub)

Also avoid random nightly dev builds if something breaks. There are issues where load_lora_weights works in one version and breaks in another, then works again after downgrade. (GitHub)


Step 2: Build the pipeline exactly like the Flux.2 reference code (critical)

Replace AutoModel with Flux2Transformer2DModel exactly as shown in the official Flux.2 post. (Hugging Face)


Step 3: Load the LoRA as “directory + weight_name”

This matches the documented pattern and avoids ambiguity. (Hugging Face)


Step 4: Force adapter activation and do an A/B test

Diffusers supports setting adapter scaling. The docs show set_adapters() for controlling scale. (Hugging Face)

Even if you load only one LoRA, forcing adapter weight removes ambiguity.


A “known-good” rewritten version of your function

import torch
from transformers import Mistral3ForConditionalGeneration
from diffusers import Flux2Pipeline, Flux2Transformer2DModel

def generate_image_using_flux2(prompt: str):
    repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
    device = "cuda:0"
    torch_dtype = torch.bfloat16

    # Match the official Flux.2 4-bit loading pattern
    transformer = Flux2Transformer2DModel.from_pretrained(
        repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
    )
    text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
        repo_id, subfolder="text_encoder", dtype=torch_dtype, device_map="cpu"
    )

    pipe = Flux2Pipeline.from_pretrained(
        repo_id,
        transformer=transformer,
        text_encoder=text_encoder,
        torch_dtype=torch_dtype,
    )

    # Load LoRA BEFORE offload to reduce device-map weirdness
    # Use directory + weight_name (doc pattern)
    pipe.load_lora_weights(
        ".",
        weight_name="jimfitzpatrick-fluxlora.safetensors",
        adapter_name="jimfitz",
    )

    # Force a very obvious strength for testing
    pipe.set_adapters("jimfitz", 2.0)

    pipe.enable_model_cpu_offload()

    img = pipe(
        prompt=prompt,
        generator=torch.Generator(device=device).manual_seed(42),
        num_inference_steps=24,
        guidance_scale=4,
    ).images[0]

    return img

Key points this fixes:

  • Correct transformer class for Flux.2 loading. (Hugging Face)
  • LoRA load path matches Diffusers adapter-loading docs. (Hugging Face)
  • Explicit adapter activation and exaggerated weight.

If this still produces identical output, then either the LoRA is incompatible (wrong base model family), or you are still hitting a version-specific LoRA bug for quantized Flux. (GitHub)


If it still doesn’t work: diagnose “incompatible LoRA” quickly

1) The LoRA may be for Flux.1 or SDXL, not Flux.2

This happens a lot with Civitai LoRAs. The symptom matches a known Diffusers issue: “silent erroring, doesn’t error but style isn’t applied and output is identical.” (GitHub)

Because I cannot open the Civitai page from here (site access fails), I cannot confirm what base it was trained for. You can still confirm locally:

Inspect keys inside the safetensors file.

  • If you see lots of unet keys, it is SDXL or SD1.5 style.
  • If you see Flux-specific transformer keys but for Flux.1 shapes or names, it may need conversion.

Flux LoRA key mismatch and conversion is a known topic (kohya conversion scripts are mentioned directly in Diffusers issues). (GitHub)

2) Try the LoRA on a non-quantized Flux.2 pipeline

Quantization is a known boundary where LoRA breaks. There are multiple Flux LoRA issues involving quantized transformers. (GitHub)

If it works unquantized but not on *-bnb-4bit, you have confirmed it is a quantization-path problem, not the LoRA file itself.


Prompt-trigger pitfall (less likely for “identical output”, but real)

Some LoRAs are trained with a trigger token and expect you to include it in the prompt. Diffusers docs explicitly show an example where you must include a token (“cnmt”) to trigger the LoRA. (Hugging Face)

If you never use the trigger token, the effect can be weak or look like “nothing changed,” especially at low CFG.


Similar “no effect” cases online (the ones that match your symptom)

  • Flux LoRA key mismatch leading to identical output, described as “silent erroring.” (GitHub)
  • Flux lora_scale not changing output with fixed seed. (GitHub)
  • Flux LoRA failing when transformer is quantized. (GitHub)
  • Diffusers release explicitly fixing a 4-bit Flux LoRA loading regression. (GitHub)
  • Version-specific load_lora_weights breakage reported on dev builds. (GitHub)

High-quality references and guides


Summary

  • Your strongest fix: load the denoiser as Flux2Transformer2DModel, not AutoModel. (Hugging Face)
  • Load the LoRA as directory + weight_name, then force set_adapters(..., 2.0) to prove it affects output. (Hugging Face)
  • Ensure Diffusers includes the 4-bit Flux LoRA regression fix (v0.32.2 mentions it explicitly). (GitHub)
  • If still identical, assume LoRA incompatibility (Flux.1 or SDXL) or a quantization-path LoRA failure. (GitHub)
1 Like

Wow, what an amazing answer!

Regarding the way I load Lora is due to mean not willing to create access tokens and just using plain downloads to with checkpoints. I originally did not use HF token either but as some models are gated without them I started to use them too so probably I can go with access tokens for civitai also.

I think the issue is that I was using Lora for flux version 1, while using flux version 2. As I a noob I actually did not pay attention to that originally. I realized the issue when I tried to use a different Lora and a different model (that was actually matching) and then I saw that it worked fine.

My question is this though - do I really need to provide the text encoder and the transformer at all? Like I can go without them with from_pretrained right?

1 Like

Yeah. Maybe right.


You diagnosed it correctly. A FLUX.1 LoRA applied to a FLUX.2 pipeline usually matches zero layers, so it becomes a no-op.

On your question: no, you do not have to manually load the text encoder and transformer in normal Diffusers usage. You only do that when you want to override how components are loaded or placed.


What from_pretrained() already does (baseline behavior)

Flux2Pipeline.from_pretrained(repo_id, ...) is built on Diffusers’ DiffusionPipeline.from_pretrained(). It:

  1. Reads the pipeline config (the model_index.json in the repo).
  2. Automatically loads the required components (transformer, text encoder, VAE, scheduler, etc.).
  3. Passes them into the pipeline constructor for you. (Hugging Face)

So the “simple” pattern is valid:

pipe = Flux2Pipeline.from_pretrained("diffusers/FLUX.2-dev-bnb-4bit", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

No manual component loading required.


Why people do pass text_encoder= and transformer= anyway

Manual component loading is mainly for control. Typical reasons:

1) You want explicit device placement per component (avoid OOM)

FLUX.2 is huge. A common trick is:

  • load the text encoder and transformer on CPU via device_map="cpu"
  • then let enable_model_cpu_offload() shuttle them as needed

Black Forest Labs’ official FLUX.2 Diffusers instructions show exactly this pattern for “4-bit transformer and 4-bit text encoder.” (GitHub)

That is why your original code looked like theirs. It is not “required.” It is “VRAM control.”

2) You want to omit the text encoder entirely and supply embeddings yourself

This is a real supported mode for FLUX.2.

Black Forest Labs shows a “remote text encoder” mode where they do:

  • Flux2Pipeline.from_pretrained(..., text_encoder=None)
  • then call the pipeline with prompt_embeds=... instead of prompt=... (GitHub)

This saves local VRAM, but changes your calling convention.

3) You want to swap components (custom text encoder, different transformer build, etc.)

Passing components lets you:

  • replace the text encoder
  • reuse already-loaded components across pipelines
  • experiment with different quantization settings per component

Diffusers explicitly supports reusing components via the components property to initialize other pipelines without reallocating weights. (Hugging Face)


Practical guidance for your setup

If you want the simplest code path

Do not pre-load anything. Let the pipeline load its own parts:

pipe = Flux2Pipeline.from_pretrained(
    "diffusers/FLUX.2-dev-bnb-4bit",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

Then load your FLUX.2-compatible LoRA.

This is the least moving parts.

If you are close to VRAM limits (likely with FLUX.2)

Keep the “manual load on CPU + offload” approach. It is officially documented for FLUX.2. (GitHub)

If you ever set text_encoder=None

You must stop passing prompt=... and instead pass embeddings (like prompt_embeds=...). The FLUX.2 doc’s remote encoder example does exactly this. (GitHub)


About tokens and “plain downloads”

You do not need tokens for “plain downloads” in principle.

  • HF token is only needed to download gated Hugging Face repos. The FLUX.2 docs mention accepting gating and logging in. (GitHub)
  • For LoRAs you downloaded as files (Civitai or elsewhere), Diffusers does not care how you got the file. It only cares that the tensors match the model.

So your “download locally and load from disk” approach is fine.


Summary

  • No, you do not need to pass text_encoder and transformer. from_pretrained() loads them automatically. (Hugging Face)
  • You do pass them when you need explicit device placement or special setups (remote text encoder, CPU offload strategies). (GitHub)
  • If you set text_encoder=None, you must pass prompt_embeds, not prompt. (GitHub)