-
Notifications
You must be signed in to change notification settings - Fork 45
Dag files #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* remove submodule from dev install * fix typo * Added bwa component * Added cpus to bwa command * added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue - Deprecation of the `manifest.config´ file - Add the manifest information to the `nextflow.config` file * Added component for haplotypecaller * Added merge vcfs to haplotypecaller component * Added mark duplicates component * Added bam index to mark duplicates * Added base_recalibrator component * Removed publishDir for haplotypecaller * Added apply_bqsr process to base_recalibrator component * Updated changelog * Added description to haplotypecaller * Add check for the location of specific dot files * Updated changelog * Updated version
files in the resources folder
treedag and forktree files
Codecov Report
@@ Coverage Diff @@
## dev #209 +/- ##
==========================================
+ Coverage 41.95% 41.97% +0.01%
==========================================
Files 72 72
Lines 6461 6464 +3
==========================================
+ Hits 2711 2713 +2
- Misses 3750 3751 +1
Continue to review full report at Codecov.
|
tiagofilipe12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cimendes good job here. 👍 . Left you some comments. Main thing is a suggestion to remove duplication and another is that you have code that seems to be fixing some component and is not related with this PR. While it's ok because the PR isn't that big, it is always better to keep PRs to its subject. Also changelog needs to have those additions.
flowcraft/generator/engine.py
Outdated
| os.mkdir(resources_dir) | ||
| outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w") | ||
| outfile_tree_fork.write(json.dumps(dict_viz)) | ||
| outfile_tree_fork.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting that you are keeping consistency here between both methods. But now you can see that some duplication exists between the two methods? Maybe you could write a function called something like write_dag_to_file and use it in both methods.
Then you basically put everything inside that function and re-use in both places:
def write_dag_to_file(file_name, dict_viz):
resources_dir = os.path.join(dirname(self.nf_file), "resources")
if not os.path.exists(resources_dir):
os.mkdir(resources_dir)
outfile_tree_fork = open(os.path.join(resources_dir, file_name), "w")
outfile_tree_fork.write(json.dumps(dict_viz))
outfile_tree_fork.close()or you can even go with with open... Then you just call the function in both places. Something like:
write_dag_to_file('forkTree.json', dict_viz)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.
flowcraft/generator/inspect.py
Outdated
| def _dag_file_to_dict(self): | ||
| """Function that opens the dotfile named .treeDag.json in the current | ||
| working directory | ||
| """Function that opens the accessory named treeDag.json in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is accessory? Also notice that this method not only opens that file but loads its content to a dict and hence the docstring is incomplete. It was already incomplete before I know.
| } | ||
|
|
||
| {{ forks }} | ||
| {{ forks }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this comment? Same what?
ODiogoSilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, but pls check the comments about duplication and the directory checks. Also, the forkTree content is not correct.
flowcraft/generator/engine.py
Outdated
|
|
||
| outfile_dag = open(os.path.join(dirname(self.nf_file), output_file) | ||
| , "w") | ||
| if not os.path.exists(resources_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you check for the existance of this directory twice? It doesn't seem like it should be the responsiblity of this function to worry about this. My suggestion is that this check can be made at a higher level and here we assume that the directory already exists. Then, here and below will become simply the file writting operation without the check.
flowcraft/generator/engine.py
Outdated
| os.mkdir(resources_dir) | ||
| outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w") | ||
| outfile_tree_fork.write(json.dumps(dict_viz)) | ||
| outfile_tree_fork.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.
simplified dag and treefork file write in a single function added suggestions in #209
|
The verification was moved to the render_pipeline function and the function to write the json was made more general to accommodate both forktree.json and treedag.json files. Thanks @ODiogoSilva for pointing out my very silly mistake! |
ODiogoSilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-Authored-By: Diogo Silva <[email protected]>
Co-Authored-By: Diogo Silva <[email protected]>
* Dag files (#209) * move DAG JSON files to the resources directory * added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue - Deprecation of the `manifest.config´ file * Set phred encoding when it fails to be determined - trimmomatic (#211) * fix bug publishdir (downsample_fastq component) * add pphred33 when encoding fails to be determined, if still fails retry with phred64 encoding (trimmomatic component) * Fix downsample (#222) * edited file names for downsample fastqs * stringified depth for file name
This PR addresses an issue raised in #194 where the .treeDag.json and forktree.json files aren't automatically staged when publishing the resulting FlowCraft pipelines to a repository. As they are hidden files, they are often overlooked, breaking the execution of the pipelines when run remotely.
There was no reason to keep these files as dotfiles, so they were moved to the resources folder.