-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Open
Labels
Description
Name and Version
whenever I try to use the granite-4.0-h-tiny-UD-Q8_K_XL.gguf on the GPU present on my ThinkPad laptop, a second chat message causes a hangs of the GPU and the llama.cpp process is eventually aborted:
[284083.655827] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:554!
[284084.095778] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:572!
[284084.095945] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:570!
[284084.096089] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56e!
[284084.096233] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56c!
[284084.096359] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:56a!
[284084.096554] Fence expiration time out i915-0000:00:02.0:0000:00:02.0:568!
[284091.730106] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in llama-server [1808332]
[284091.730124] i915 0000:00:02.0: [drm] llama-server[1808332] context reset due to GPU hang
This is more info about the device:
00:02.0 VGA compatible controller: Intel Corporation Meteor Lake-P [Intel Arc Graphics] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 2235
Flags: bus master, fast devsel, latency 0, IRQ 179, IOMMU group 0
Memory at 4058000000 (64-bit, prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP+ OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express Root Complex Integrated Endpoint, IntMsgNum 0
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Null
Capabilities: [110] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [420] Physical Resizable BAR
Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915, xe
The workaround proposed in #16681 solves the problem
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
bin/llama-server --jinja -m ../granite-4.0-h-tiny-UD-Q8_K_XL.gguProblem description & steps to reproduce
for my test I've used ramalama chat, and it fails on the second message.
First Bad Commit
No response