Skip to content

vulkan. bug: Intel Arc iGPU hangs with granite-4.0-h-tiny-UD-Q8_K_XL.gguf #16684

@giuseppe

Description

@giuseppe

Name and Version

whenever I try to use the granite-4.0-h-tiny-UD-Q8_K_XL.gguf on the GPU present on my ThinkPad laptop, a second chat message causes a hangs of the GPU and the llama.cpp process is eventually aborted:

[284083.655827] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:554!
[284084.095778] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:572!
[284084.095945] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:570!
[284084.096089] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56e!
[284084.096233] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56c!
[284084.096359] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56a!
[284084.096554] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:568!
[284091.730106] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in llama-server [1808332]
[284091.730124] i915 0000:00:02.0: [drm] llama-server[1808332] context reset due to GPU hang

This is more info about the device:

00:02.0 VGA compatible controller: Intel Corporation Meteor Lake-P [Intel Arc Graphics] (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 2235
        Flags: bus master, fast devsel, latency 0, IRQ 179, IOMMU group 0
        Memory at 4058000000 (64-bit, prefetchable) [size=16M]
        Memory at 4000000000 (64-bit, prefetchable) [size=256M]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
                CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
                      EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
                      1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
                      DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
                CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
                      SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
                      PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
                      DDR3MaxFreq=2932MHz LPDDR3En-
        Capabilities: [70] Express Root Complex Integrated Endpoint, IntMsgNum 0
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
        Capabilities: [d0] Power Management version 3
        Capabilities: [100] Null
        Capabilities: [110] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [420] Physical Resizable BAR
        Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [400] Latency Tolerance Reporting
        Kernel driver in use: i915
        Kernel modules: i915, xe

The workaround proposed in #16681 solves the problem

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

bin/llama-server --jinja  -m ../granite-4.0-h-tiny-UD-Q8_K_XL.ggu

Problem description & steps to reproduce

for my test I've used ramalama chat, and it fails on the second message.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions