Skip to content

Docker: no NVIDIA driver on your system for /image/generations API due to CUDA 12.8 required by RTX5000  #5189

@SuperPat45

Description

@SuperPat45

LocalAI version:
2.28.0

Environment, CPU architecture, OS, and Version:
Docker
OS: Ubuntu 24.10
CPU: AMD Ryzen 7 9800X3D
GPU: RTX 5090

Describe the bug
The docker image localai/localai:latest-aio-gpu-nvidia-cuda-12 failed to generate image with stablediffusion model throw the error:

failed to load model with internal loader: could not load model (no success): Unexpected err=RuntimeError('Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx'), type(err)=

To Reproduce
try to generate an image with the stablediffusion model in the UI interface

Expected behavior
No error

Logs

8:41AM DBG context local model name not found, setting to default defaultModelName=stablediffusion
8:41AM DBG Parameter Config: &{PredictionOptions:{BasicModelRequest:{Model:DreamShaper_8_pruned.safetensors} Language: Translate:false N:0 TopP:0xc00264c800 TopK:0xc00264c808 Temperature:0xc00264c870 Maxtokens:0xc00264ca30 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00264c9c8 TypicalP:0xc00264c9c0 Seed:0xc00264caf8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:stablediffusion F16:0xc00264c365 Threads:0xc00264c6e0 Debug:0xc002600738 Roles:map[] Embeddings:0xc00264caf1 Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_ANY FLAG_IMAGE] KnownUsecases: PromptStrings:[xxxxxxxxxxxxxxxxxxxxxxx] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00264c8f8 MirostatTAU:0xc00264c8f0 Mirostat:0xc00264c878 NGPULayers:0xc00264ca38 MMap:0xc00264caf0 MMlock:0xc00264caf1 LowVRAM:0xc00264caf1 Grammar: StopWords:[] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00264cca0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m EnableParameters:negative_prompt,num_inference_steps IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[{Filename:DreamShaper_8_pruned.safetensors SHA256: URI:huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors}] Description: Usage:curl http://localhost:8080/v1/images/generations
-H "Content-Type: application/json"
-d '{
"prompt": "|",
"step": 25,
"size": "512x512"
}' Options:[]}
8:41AM INF BackendLoader starting backend=diffusers modelID=stablediffusion o.model=DreamShaper_8_pruned.safetensors
8:41AM DBG Loading model in memory from file: /build/models/DreamShaper_8_pruned.safetensors
8:41AM DBG Loading Model stablediffusion with gRPC (file: /build/models/DreamShaper_8_pruned.safetensors) (backend: diffusers): {backendString:diffusers model:DreamShaper_8_pruned.safetensors modelID:stablediffusion assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0003cf808 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
8:41AM DBG Loading external backend: /build/backend/python/diffusers/run.sh
8:41AM DBG external backend is file: &{name:run.sh size:73 mode:448 modTime:{wall:0 ext:63879015916 loc:0x598eada0} sys:{Dev:69 Ino:29368129 Nlink:1 Mode:33216 Uid:0 Gid:0 X__pad0:0 Rdev:0 Size:73 Blksize:4096 Blocks:8 Atim:{Sec:1744791999 Nsec:809886729} Mtim:{Sec:1743419116 Nsec:0} Ctim:{Sec:1744791999 Nsec:808886738} X__unused:[0 0 0]}}
8:41AM DBG Loading GRPC Process: /build/backend/python/diffusers/run.sh
8:41AM DBG GRPC Service for stablediffusion will be running at: '127.0.0.1:44063'
8:41AM DBG GRPC Service state dir: /tmp/go-processmanager4159078518
8:41AM DBG GRPC Service Started
8:41AM DBG Wait for the service to start up
8:41AM DBG Options: ContextSize:1024 Seed:333832566 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:8 PipelineType:"StableDiffusionPipeline" SchedulerType:"k_dpmpp_2m" CUDA:true
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout Initializing libbackend for diffusers
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout virtualenv activated
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stdout activated virtualenv has been ensured
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.0 is exactly one major version older than the runtime version 6.30.2 at backend.proto. Please update the gencode to avoid compatibility violations in the next runtime release.
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr warnings.warn(
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr warnings.warn(
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Server started. Listening on: 127.0.0.1:44063
8:41AM DBG GRPC Service Ready
8:41AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00034ce58} sizeCache:0 unknownFields:[] Model:DreamShaper_8_pruned.safetensors ContextSize:1024 Seed:333832566 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/DreamShaper_8_pruned.safetensors Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType:StableDiffusionPipeline SchedulerType:k_dpmpp_2m CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Loading model DreamShaper_8_pruned.safetensors...
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Request Model: "DreamShaper_8_pruned.safetensors"
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ContextSize: 1024
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Seed: 333832566
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr NBatch: 512
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr F16Memory: true
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr MMap: true
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr NGPULayers: 99999999
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr Threads: 8
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ModelFile: "/build/models/DreamShaper_8_pruned.safetensors"
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr PipelineType: "StableDiffusionPipeline"
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr SchedulerType: "k_dpmpp_2m"
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr CUDA: true
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ModelPath: "/build/models"
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 254902.45it/s]
Loading pipeline components...: 0%| | 0/6 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel:
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr ['text_model.embeddings.position_ids']
Loading pipeline components...: 100%|██████████| 6/6 [00:00<00:00, 50.85it/s]
8:41AM DBG GRPC(stablediffusion-127.0.0.1:44063): stderr You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
8:41AM ERR Server error error="failed to load model with internal loader: could not load model (no success): Unexpected err=RuntimeError('Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx'), type(err)=<class 'RuntimeError'>"

Additional context

Notice that the chat correctly use GPU acceleration.

The docker compose configuration:

services:
  localai:
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    container_name: localai
    runtime: nvidia
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
    volumes:
      - /opt/openwebui/data_models:/build/models:cached
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions