Currently, test code does not have a standard way to handle device, which led to CUDA tests added disorganized ways, such as 1, 2 and 3.
Also it is worth noting that Travis CI is not running any of CUDA test. See here, search for test_lfilter_cuda.
We should have a standard way to add CUDA test.
FYI: PyTorch has a mechanism to instantiate test class for different devices at runtime.