Skip to content

runtime: goroutine starvation due to Gosched #13546

@dvyukov

Description

@dvyukov

The following program hangs:

package main

import (
    "runtime"
    "sync/atomic"
)

func main() {
    const P = 4
    runtime.GOMAXPROCS(P)
    x := uint32(0)
    for p := 0; p < P; p++ {
        go func() {
            atomic.AddUint32(&x, 1)
            for atomic.LoadUint32(&x) != P {
            }
        }()
    }
    for atomic.LoadUint32(&x) != P {
        runtime.Gosched()
    }
}
SIGABRT: abort
PC=0x44dae0 m=0

goroutine 21 [running]:
sync/atomic.LoadUint32(0xc8200b6000)
    src/sync/atomic/asm_amd64.s:92 fp=0xc820060798 sp=0xc820060790
main.main.func1(0xc8200b6000)
    /tmp/gosched.go:15 +0x37 fp=0xc8200607b8 sp=0xc820060798
runtime.goexit()
    src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8200607c0 sp=0xc8200607b8
created by main.main
    /tmp/gosched.go:17 +0x78

goroutine 1 [runnable]:
runtime.Gosched()
    src/runtime/proc.go:235 +0x14
main.main()
    /tmp/gosched.go:20 +0xa7

goroutine 18 [runnable]:
main.main.func1(0xc8200b6000)
    /tmp/gosched.go:13
created by main.main
    /tmp/gosched.go:17 +0x78

goroutine 19 [running]:
    goroutine running on other thread; stack unavailable
created by main.main
    /tmp/gosched.go:17 +0x78

goroutine 20 [running]:
    goroutine running on other thread; stack unavailable
created by main.main
    /tmp/gosched.go:17 +0x78

One goroutine constantly calls runtime.Gosched but another runnable goroutine is starved.
The root cause is: Gosched puts the current goroutine onto global run queue, then the thread check local run queue (empty), then it checks global run queue and picks up the old goroutine again. But at the same time there is another runnable goroutine in remote per-P queue.

This is probably not super critical, as it can happen only if there are goroutines in tight non-preemptable loops. But still we could check local queues ahead of global queue once in a while in findrunnable. We do the opposite hack in schedule -- check global queue ahead of local queue once in a while. On the other hand this will destroy locality, which is bad for performance...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions