-
Notifications
You must be signed in to change notification settings - Fork 736
Add the AWS Athena JDBC driver for nf-sqldb plugin #2500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, the Athena driver does use Log4j? |
Unfortunately yeah 🤷 |
Ok, not such big problem this what slf4j:log4j-over-slf4j is meant for. However, I would like to avoid including the zip in the Git repo. Not sure if there's a way to download it automatically and include it in the build. |
Ah, okay - yes that helps!
I found a Gradle plugin for this one -> https://github.com/michel-kraemer/gradle-download-task/blob/master/examples/unzip.gradle. Will experiment and circle back |
plugins/nf-sqldb/build.gradle
Outdated
defaultTasks 'downloadAndCleanUpLibsFolder' | ||
|
||
build.dependsOn downloadAndCleanUpLibsFolder | ||
compileGroovy.dependsOn downloadAndCleanUpLibsFolder | ||
compileTestGroovy.dependsOn downloadAndCleanUpLibsFolder | ||
test.dependsOn downloadAndCleanUpLibsFolder | ||
copyPluginLibs.dependsOn downloadAndCleanUpLibsFolder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a better way to do this no? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paolo, as per your comment #2500 (comment) , I've added slf4j:log4j-over-slf4j
library, but not sure how exactly to configure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a bit ugly. Should not be enough
compileGroovy.dependsOn downloadAndCleanUpLibsFolder
and then it will pull the parent tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, for this one, if I only include compileGroovy.dependsOn ...
it fails to compile and it doesn't trigger the download and import of the JAR file.
Execution failed for task ':plugins:nf-sqldb:copyPluginLibs'.
> Could not resolve all files for configuration ':plugins:nf-sqldb:runtimeClasspath'.
> Could not find :AthenaJDBC42_2.0.25.1001:.
Searched in the following locations:
- file:/Users/eklavya/projects/code/nextflow/plugins/nf-sqldb/libs/AthenaJDBC42_2.0.25.1001.jar
Required by:
project :plugins:nf-sqldb
* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 57s
The other task dependencies I added because the Gradle build system threw a warning that the build might not be reproducible if I don't add these dependencies.
Do you have a working Athena config to use as a showcase? |
122731d
to
487a50f
Compare
OK, think i've managed to simplify a bit the inclusion of the Athena driver. It would be nice to include a no brainer example to use it in the readme. Then I think we can merge it. |
Now does not break, but for some mysterious reason dump these warnings. Give up
|
Yeah, this is a bit confusing - the error message you've mentioned #2500 (comment) are exactly which prompted me to add the explicit dependencies between gradle tasks #2500 (comment) Not sure how to proceed here without those explicit task dependencies. |
I've found this possible explanation, but I was not able to make it work |
Thanks for looking into this Paolo! Since this is dependent on Gradle - do you think we can proceed - for the time being - with the explicit dependency declaration as done previously here #2500 (comment) ? |
After rebasing this branch against |
24046a5
to
4b078ae
Compare
Oh, gradle dark magic! the warning went away |
Well, it looks like only the proximity of @jorgeaguileraseqera was enough to fix the build. Quite promising! 😄 |
@abhi18av do we have a no brain example showing how to configure an Athena datasource and run a query? |
Thanks guys for taking a look at this! @pditommaso , for any example regarding Athena, I think the best documentation would be regarding an open dataset since for any private dataset, it is necessary to create a database first, the simplest instructions for this are https://docs.aws.amazon.com/athena/latest/ug/getting-started.html However, for a public dataset such as SRA, I've relied on the Athena database generated by crawling the SRA Metadata as described here the instructions shared in https://www.ncbi.nlm.nih.gov/sra/docs/sra-athena. The exact instructions to setup the necessary tables/database on the AWS side are descibed in the official walkthrough https://www.youtube.com/watch?v=_F4FhcDWSJg Thoughts? |
After some clean and builds, the messages
appears again. Problem comes because we are using a task (unzip) in different subproject I'll create a fix for it |
This PR follows up from the discussion on nextflow-io/nf-sqldb#5
Open database used for testing
I've relied on the Athena database generated by crawling the SRA Metadata as described here the instructions shared in https://www.ncbi.nlm.nih.gov/sra/docs/sra-athena.
The exact instructruction to setup the necessary tables/database on the AWS side are descibed in the official walkthrough https://www.youtube.com/watch?v=_F4FhcDWSJg
Instructions to use the
Athena
JDBC integrationAWSRegion
S3OutputLocation
sra-athena-results
with ametadata
table)Concern (Log4J)
Given the recent adventures of
log4j
dependency, we can useexlude
it usingGradle
safely right?