Skip to content

Conversation

@cimendes
Copy link
Member

This PR addresses an issue raised in #194 where the .treeDag.json and forktree.json files aren't automatically staged when publishing the resulting FlowCraft pipelines to a repository. As they are hidden files, they are often overlooked, breaking the execution of the pipelines when run remotely.

There was no reason to keep these files as dotfiles, so they were moved to the resources folder.

cimendes and others added 6 commits June 18, 2019 13:18
* remove submodule from dev install

* fix typo

* Added bwa component

* Added cpus to bwa command

* added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue

- Deprecation of the `manifest.config´ file
- Add the manifest information to the `nextflow.config` file

* Added component for haplotypecaller

* Added merge vcfs to haplotypecaller component

* Added mark duplicates component

* Added bam index to mark duplicates

* Added base_recalibrator component

* Removed publishDir for haplotypecaller

* Added apply_bqsr process to base_recalibrator component

* Updated changelog

* Added description to haplotypecaller

* Add check for the location of specific dot files

* Updated changelog

* Updated version
@cimendes cimendes added enhancement New feature or request engine bufix labels Jun 18, 2019
@cimendes cimendes requested a review from tiagofilipe12 June 18, 2019 14:49
@codecov-io
Copy link

codecov-io commented Jun 21, 2019

Codecov Report

Merging #209 into dev will increase coverage by 0.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #209      +/-   ##
==========================================
+ Coverage   41.95%   41.97%   +0.01%     
==========================================
  Files          72       72              
  Lines        6461     6464       +3     
==========================================
+ Hits         2711     2713       +2     
- Misses       3750     3751       +1
Impacted Files Coverage Δ
flowcraft/generator/error_handling.py 85% <ø> (ø) ⬆️
flowcraft/generator/components/variant_calling.py 100% <ø> (ø) ⬆️
flowcraft/generator/components/mapping.py 100% <ø> (ø) ⬆️
flowcraft/generator/inspect.py 10.47% <0%> (ø) ⬆️
flowcraft/templates/downsample_fastq.py 0% <0%> (ø) ⬆️
flowcraft/generator/engine.py 87.88% <100%> (+0.02%) ⬆️
flowcraft/flowcraft.py 60.62% <100%> (ø) ⬆️
flowcraft/tests/test_assemblerflow.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 890d54d...e430a3c. Read the comment docs.

Copy link
Collaborator

@tiagofilipe12 tiagofilipe12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cimendes good job here. 👍 . Left you some comments. Main thing is a suggestion to remove duplication and another is that you have code that seems to be fixing some component and is not related with this PR. While it's ok because the PR isn't that big, it is always better to keep PRs to its subject. Also changelog needs to have those additions.

os.mkdir(resources_dir)
outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w")
outfile_tree_fork.write(json.dumps(dict_viz))
outfile_tree_fork.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting that you are keeping consistency here between both methods. But now you can see that some duplication exists between the two methods? Maybe you could write a function called something like write_dag_to_file and use it in both methods.
Then you basically put everything inside that function and re-use in both places:

def write_dag_to_file(file_name, dict_viz):
    resources_dir = os.path.join(dirname(self.nf_file), "resources")
    if not os.path.exists(resources_dir):
            os.mkdir(resources_dir)
    outfile_tree_fork = open(os.path.join(resources_dir, file_name), "w")
    outfile_tree_fork.write(json.dumps(dict_viz))
    outfile_tree_fork.close()

or you can even go with with open... Then you just call the function in both places. Something like:

write_dag_to_file('forkTree.json', dict_viz)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.

def _dag_file_to_dict(self):
"""Function that opens the dotfile named .treeDag.json in the current
working directory
"""Function that opens the accessory named treeDag.json in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is accessory? Also notice that this method not only opens that file but loads its content to a dict and hence the docstring is incomplete. It was already incomplete before I know.

}

{{ forks }}
{{ forks }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this comment? Same what?

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, but pls check the comments about duplication and the directory checks. Also, the forkTree content is not correct.


outfile_dag = open(os.path.join(dirname(self.nf_file), output_file)
, "w")
if not os.path.exists(resources_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check for the existance of this directory twice? It doesn't seem like it should be the responsiblity of this function to worry about this. My suggestion is that this check can be made at a higher level and here we assume that the directory already exists. Then, here and below will become simply the file writting operation without the check.

os.mkdir(resources_dir)
outfile_tree_fork = open(os.path.join(resources_dir, "forkTree.json"), "w")
outfile_tree_fork.write(json.dumps(dict_viz))
outfile_tree_fork.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the other comments @cimendes, they should remove much of this duplication/boilerplate when creating these files.

cimendes added 2 commits June 27, 2019 23:58
simplified dag and treefork file write in a single function
added suggestions in #209
@cimendes
Copy link
Member Author

The verification was moved to the render_pipeline function and the function to write the json was made more general to accommodate both forktree.json and treedag.json files. Thanks @ODiogoSilva for pointing out my very silly mistake!

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cimendes and others added 2 commits July 4, 2019 10:53
Co-Authored-By: Diogo Silva <[email protected]>
Co-Authored-By: Diogo Silva <[email protected]>
@cimendes cimendes merged commit c8a8574 into dev Jul 4, 2019
@cimendes cimendes deleted the DAG_files branch July 4, 2019 10:00
cimendes added a commit that referenced this pull request Sep 16, 2019
* Dag files (#209)

* move DAG JSON files to the resources directory

* added manifest information to the `nextflow.config` file to allow for remote execution (#204) - Partial solve to #194 issue
- Deprecation of the `manifest.config´ file

* Set phred encoding when it fails to be determined - trimmomatic (#211)

* fix bug publishdir (downsample_fastq component)

* add pphred33 when encoding fails to be determined, if still fails retry with phred64 encoding (trimmomatic component)

* Fix downsample (#222)

* edited file names for downsample fastqs
* stringified depth for file name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bufix engine enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants