Skip to content

runtime: under edge conditions, select fails to read from time.After() channel #14561

@bobziuchkovski

Description

@bobziuchkovski

What version of Go are you using (go version)?

Tested on go1.6 darwin/amd64 and go1.6 linux/amd64

What operating system and processor architecture are you using (go env)?

Mac:

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/bobbyz/projects/community/go:/Users/bobbyz/projects/bztech/go:/Users/bobbyz/projects/bobbyz/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GO15VENDOREXPERIMENT="1"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
CXX="clang++"
CGO_ENABLED="1"

Linux:

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bobbyz/gopath/"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

// This is an exact, unaltered copy-paste of the code I'm running.  I trimmed my live code
// down to just this function body in order to test/reproduce
func (l *logger) close(timeout time.Duration) error {
    fmt.Println("before select")
    select {
    case <-time.After(time.Second):
        fmt.Println("timeout occurred")
        return errors.New("timeout")
    }
}

What did you expect to see?

I should see "before select" printed. Then after 1 second, I should see "timeout occurred" printed.

What did you see instead?

I see "before select" printed. Nothing else prints and the program never exits. This is definitely the only function in my code that prints "before select", so the close method is called, but the channel read from the time.After channel never happens. I've noticed the program spins my CPU at 100% while I'm waiting for the timeout that never occurs.

Other

I can only reproduce this when I add a panic() to one of my functions and recover from the panic. The "close" method shown above runs on program termination, after the panic recovery. If I modify the panicking code to return nil in place of the panic, I can't reproduce. That portion of code is outside of this close method. I also can't reproduce if I remove the select and read directly from the channel.

I ripped apart my code trying to find the cause of the hanging select. I uploaded a functioning but mangled/stripped copy of the code to https://github.com/bobziuchkovski/golang-select-repro. The code itself hardly makes sense anymore, but it runs and reproduces the hanging select about 1 in 8 times on my laptop.

The select in question is in logger.go on line 110. The panic is triggered in main.go on line 45. If I comment-out that panic, I can't reproduce the hanging select. The close method with the select in question is triggered indirectly via line 40 of main.go

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions