Skip to content

Conversation

@lilyball
Copy link

@lilyball lilyball commented Aug 3, 2016

This preserves referential integrity for blocks that were created via
the dispatch_block_create* functions which, among other things,
ensures barrier blocks work correctly.

Also add a fast path for barrier blocks to async.

This is a port of swiftlang/swift#3923, though the shims are done a bit differently. I also don't know how to confirm that it actually works, since I don't know how to go about testing swift-corelibs-libdispatch (I just barely figured out how to compile it on my Linode), but I did confirm that it at least compiles.

This preserves referential integrity for blocks that were created via
the `dispatch_block_create*` functions which, among other things,
ensures barrier blocks work correctly.

Also add a fast path for barrier blocks to async.
@dgrove-oss
Copy link
Contributor

It's entirely possible I'm missing something subtle, but I don't see why we need the extra layer of wrapping (eg swift_dispatch_async) on Linux. the C-level functions like dispatch_async come in with dispatch_block_t as the parameter types, which is a typealias for @convention(block) ()=>Void.

What am I missing? Is there a test case I can execute on Linux that would demonstrate a problem that needs fixing?

@lilyball
Copy link
Author

Does Linux actually preserve the @convention(block) on dispatch_block_t? The implementation in apple/swift on OS X loses the @convention(block) on the imported dispatch_block_t, which is what prompted the original fix. I was told in that PR that I should port the fix here.

@dgrove-oss
Copy link
Contributor

From what I can tell from looking at the generated sil, @convention(block) is preserved on dispatch_block_t on Linux. But, I'm still learning how all this works, so its probably worth an expert verifying (or if there is a test case for barrier blocks that would be definitive, I'd be happy to run it Linux).

@mwwa
Copy link
Contributor

mwwa commented Aug 15, 2016

Edit: Scratch that. If it looks like dispatch_block_t has @convention(block) preserved then we probably don't want to use @_silgen_name where we don't have to.

@lilyball
Copy link
Author

I have a test case in SR-2246. It attempts to create a race condition. If the test case doesn't trigger one, that doesn't prove there's no issue, but the test triggered a race condition almost immediately for me on my iMac. My recollection is that it tends to shove blocks onto the global queue much faster than it can run the sync blocks (hence the explicit sleeps in there), but if it doesn't reproduce for you, you could always try replacing the queue.syncs with async calls (and adding sleeps again so you don't enqueue a massive number of blocks).

@lilyball
Copy link
Author

It occurs to me that a good way to try and trigger this might be to change the sample to start with a suspended queue, then alternate putting barrier blocks and regular blocks on there, and after it's enqueued a bunch of them, resume the queue.

@lilyball
Copy link
Author

Here's what that would look like:

import Foundation

class Test {
    var value = 0
    let queue = DispatchQueue(label: "test queue", attributes: .concurrent)

    func runTest() {
        let numBlocks = 10_000
        queue.suspend()
        for _ in 0..<numBlocks {
            queue.async(flags: .barrier, execute: {
                self.value = 1
                Thread.sleep(forTimeInterval: 0.1)
                self.value = 0
            })
            queue.async {
                if self.value == 1 {
                    print("Barrier violation!")
                    exit(0)
                }
            }
        }
        queue.resume()
        queue.sync {
            print("No violation detected after \(numBlocks) attempts")
        }
    }
}

Test().runTest()

@lilyball
Copy link
Author

Actually, here's a much shorter sample that should be pretty good proof:

import Foundation

let sem = DispatchSemaphore(value: 0)
let queue = DispatchQueue(label: "test queue", attributes: .concurrent)
queue.async(flags: .barrier, execute: {
    switch sem.wait(timeout: DispatchTime.now() + 5) {
    case .success:
        print("Barrier violation!")
    case .timedOut:
        print("No violation detected after 5 seconds")
    }
    exit(0)
})
queue.async {
    sem.signal()
}
dispatchMain()

@dgrove-oss
Copy link
Contributor

Thanks for the test cases! I ran the one from SR-2446 and the two above multiple times on both a 2 core and 24 core Linux box. No failures.

@lilyball
Copy link
Author

Alright then. I guess this problem only affects OS X. Thanks for testing!

@lilyball lilyball closed this Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants