-
Notifications
You must be signed in to change notification settings - Fork 1k
Add metabuli/build module #8148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 30 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
f843ae2
add metabuli/build module
pawelciurkaardigen a044140
add test for metabuli/build
pawelciurkaardigen 791b118
tests
pawelciurkaardigen 689d72f
update tests: pass accession2taxid from `prokaryotypes` directory, up…
pawelciurkaardigen 1246ee7
tackling some linting errors
pawelciurkaardigen 316caeb
filling in meta.yml
pawelciurkaardigen dcb6589
Merge branch 'master' into metabuli_build
pawelciurkaardigen 78f920d
don't validate split file for md5
pawelciurkaardigen a9846b1
Merge remote-tracking branch 'origin/metabuli_build' into metabuli_build
pawelciurkaardigen df482de
remove whitespaces
pawelciurkaardigen aab47d9
address few code review comments
pawelciurkaardigen c4b6fb1
Merge branch 'master' into metabuli_build
pawelciurkaardigen a0bc3e1
Merge branch 'master' into metabuli_build
jfy133 8d382cc
replace realpath -s with echo as -s is not available in busybox image
pawelciurkaardigen e56e767
populate stub section with output databse files
pawelciurkaardigen eae4a0a
populate stub section with output databse files
pawelciurkaardigen c1e6b80
Merge branch 'metabuli_build' of github.com:nf-core/modules into meta…
pawelciurkaardigen 3575fc1
Added a test with two input assemblies, updated fasta input description
pawelciurkaardigen 4a5792b
add --cds-info input handling
pawelciurkaardigen bb22691
add --cds-info input handling
pawelciurkaardigen b60d0ad
Apply suggestion from @jfy133
pawelciurkaardigen 3764d52
Update modules/nf-core/metabuli/build/meta.yml
sofstam 5b716aa
Merge branch 'master' into metabuli_build
sofstam 6149ca1
Add topics
9331317
Update tests
77d5450
Merge branch 'master' into metabuli_build
sofstam bcc41f7
Fix linting
9520585
Fix stub test
0579aa5
Merge branch 'master' into metabuli_build
sofstam 4656534
Merge branch 'master' into metabuli_build
sofstam 84a4ba1
Update snapshots
2829908
Update modules/nf-core/metabuli/build/meta.yml
sofstam 0dc43ef
Merge branch 'master' into metabuli_build
sofstam d794913
Merge branch 'master' into metabuli_build
sofstam 35bb943
Merge branch 'master' into metabuli_build
sofstam 7ce7f54
Fix linting
70ec4e1
Fix linting
32344a6
Merge branch 'master' into metabuli_build
sofstam 09a007e
Merge branch 'master' into metabuli_build
sofstam 923df22
Merge branch 'master' into metabuli_build
sofstam 44a2b82
Merge branch 'master' into metabuli_build
sofstam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json | ||
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| dependencies: | ||
| - "bioconda::metabuli=1.1.1" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| process METABULI_BUILD { | ||
| tag "$meta.id" | ||
| label 'process_medium' | ||
| conda "${moduleDir}/environment.yml" | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/metabuli:1.1.1--pl5321h0bb26bb_0': | ||
| 'biocontainers/metabuli:1.1.1--pl5321h0bb26bb_0' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(fasta) | ||
| path taxonomy_names, stageAs: 'taxonomy/names.dmp' | ||
| path taxonomy_nodes, stageAs: 'taxonomy/nodes.dmp' | ||
| path taxonomy_merged, stageAs: 'taxonomy/merged.dmp' | ||
| path accession2taxid, stageAs: 'taxonomy/*' | ||
| path cds_info | ||
|
|
||
| output: | ||
| tuple val(meta), path("$prefix"), emit: db | ||
| tuple val("${task.process}"), val('metabuli'), eval('metabuli 2>&1 | awk \'/metabuli Version:/ {print $3}\''), emit: versions_metabuli, topic: versions | ||
|
sofstam marked this conversation as resolved.
|
||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
sofstam marked this conversation as resolved.
|
||
| make_merged = taxonomy_merged ? "" : "touch taxonomy/merged.dmp" | ||
|
sofstam marked this conversation as resolved.
|
||
| cds_info_arg = cds_info ? "--cds-info cds_info.txt" : "" | ||
|
sofstam marked this conversation as resolved.
|
||
| """ | ||
| $make_merged | ||
| echo $fasta | tr ' ' '\\n' > fasta.txt | ||
| echo $cds_info | tr ' ' '\\n' > cds_info.txt | ||
|
|
||
| metabuli build \\ | ||
| "${prefix}" \\ | ||
| fasta.txt \\ | ||
| $accession2taxid \\ | ||
| --taxonomy-path taxonomy \\ | ||
| --max-ram ${task.memory.toGiga()} \\ | ||
|
pawelciurkaardigen marked this conversation as resolved.
|
||
| --threads ${task.cpus} \\ | ||
| ${cds_info_arg} \\ | ||
| $args | ||
| """ | ||
|
|
||
| stub: | ||
| def args = task.ext.args ?: '' | ||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
|
sofstam marked this conversation as resolved.
|
||
| """ | ||
| mkdir -p "$prefix" | ||
|
|
||
| touch "$prefix/acc2taxid.map" | ||
| touch "$prefix/diffIdx" | ||
| touch "$prefix/info" | ||
| touch "$prefix/split" | ||
| touch "$prefix/taxID_list" | ||
| touch "$prefix/db.parameters" | ||
| """ | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| name: "metabuli_build" | ||
| description: Builds a database for classification with metabuli from FASTA files | ||
| and a taxonomy | ||
| keywords: | ||
| - database | ||
|
pawelciurkaardigen marked this conversation as resolved.
|
||
| - taxonomic classification | ||
| - classification | ||
| - metagenomics | ||
| tools: | ||
| - "metabuli": | ||
| description: "Metabuli: specific and sensitive metagenomic classification via | ||
| joint analysis of DNA and amino acid" | ||
| homepage: "https://github.com/steineggerlab/Metabuli" | ||
| documentation: "https://github.com/steineggerlab/Metabuli" | ||
| tool_dev_url: "https://github.com/steineggerlab/Metabuli" | ||
| doi: "10.1101/2023.05.31.543018" | ||
| licence: | ||
| - "GPL v3" | ||
| identifier: biotools:metabuli | ||
| input: | ||
| - - meta: | ||
| type: map | ||
| description: | | ||
| Groovy Map containing sample information | ||
| - fasta: | ||
| type: file | ||
| description: List of fasta files with input assemblies | ||
| ontologies: [] | ||
| - taxonomy_names: | ||
| type: file | ||
| description: File describing individual members of a taxonomic tree in NCBI | ||
| nodes.dmp format | ||
| ontologies: [] | ||
| - taxonomy_nodes: | ||
| type: file | ||
| description: File describing parent-child relationships of a taxonomic tree | ||
| in NCBI nodes.dmp format | ||
| ontologies: [] | ||
| - taxonomy_merged: | ||
| type: file | ||
| description: Optional input to map old/deprecated TaxID to new ones | ||
| ontologies: [] | ||
|
sofstam marked this conversation as resolved.
Outdated
|
||
| - accession2taxid: | ||
| type: directory | ||
| description: TSV file (with no header) of first column with mapping | ||
| accession (from first part of each fasta entry) and second column the | ||
| corresponding TaxID | ||
| - cds_info: | ||
| type: file | ||
| description: List of files to cds files | ||
| ontologies: [] | ||
|
sofstam marked this conversation as resolved.
Outdated
|
||
| output: | ||
| db: | ||
| - - meta: | ||
| type: map | ||
| description: Groovy Map containing sample information | ||
| - $prefix: | ||
| type: directory | ||
| description: metabuli database directory for classification | ||
| versions_metabuli: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - metabuli: | ||
| type: string | ||
| description: The name of the tool | ||
| - metabuli 2>&1 | awk '/metabuli Version:/ {print $3}': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| topics: | ||
| versions: | ||
| - - ${task.process}: | ||
| type: string | ||
| description: The name of the process | ||
| - metabuli: | ||
| type: string | ||
| description: The name of the tool | ||
| - metabuli 2>&1 | awk '/metabuli Version:/ {print $3}': | ||
| type: eval | ||
| description: The expression to obtain the version of the tool | ||
| authors: | ||
| - "@pawelciurkaardigen" | ||
| - "@MichalStachowiakArdigen" | ||
| - "@sofstam" | ||
| maintainers: | ||
| - "@pawelciurkaardigen" | ||
| - "@MichalStachowiakArdigen" | ||
| - "@softam" | ||
|
sofstam marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,171 @@ | ||
| nextflow_process { | ||
|
|
||
| name "Test Process METABULI_BUILD" | ||
| script "../main.nf" | ||
| process "METABULI_BUILD" | ||
|
|
||
| tag "modules" | ||
| tag "modules_nfcore" | ||
| tag "metabuli" | ||
| tag "metabuli/build" | ||
|
|
||
| test("sarscov2 - sarscov2 DNA - no cds info") { | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id:'test' ], | ||
| [ | ||
| file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), | ||
| ] | ||
| ] | ||
| input[1] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/names.dmp', checkIfExists: true) | ||
| input[2] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/nodes.dmp', checkIfExists: true) | ||
| input[3] = [] | ||
| input[4] = file(params.modules_testdata_base_path + 'genomics/prokaryotes/metagenome/taxonomy/accession2taxid/nucl_gb.accession2taxid', checkIfExists: true) | ||
| input[5] = [] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assertAll( | ||
| { assert process.success }, | ||
| { assert snapshot( | ||
| path("${process.out.db[0][1]}/acc2taxid.map"), | ||
| path("${process.out.db[0][1]}/diffIdx"), | ||
| path("${process.out.db[0][1]}/info"), | ||
| file("${process.out.db[0][1]}/split").name, | ||
| path("${process.out.db[0][1]}/taxID_list"), | ||
| file("${process.out.db[0][1]}/db.parameters").name, | ||
|
sofstam marked this conversation as resolved.
|
||
| process.out.findAll { key, val -> key.startsWith("versions")} | ||
| ).match() | ||
| } | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| test("sarscov2 - sarscov2 DNA - two input fasta") { | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id:'test' ], | ||
| [ | ||
| file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), | ||
| file(params.modules_testdata_base_path + "genomics/prokaryotes/haemophilus_influenzae/genome/genome.fna.gz", checkIfExists: true), | ||
| ] | ||
| ] | ||
| input[1] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/names.dmp', checkIfExists: true) | ||
| input[2] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/nodes.dmp', checkIfExists: true) | ||
| input[3] = [] | ||
| input[4] = file(params.modules_testdata_base_path + 'genomics/prokaryotes/metagenome/taxonomy/accession2taxid/nucl_gb.accession2taxid', checkIfExists: true) | ||
| input[5] = [] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assertAll( | ||
| { assert process.success }, | ||
| { assert snapshot( | ||
| path("${process.out.db[0][1]}/acc2taxid.map"), | ||
| path("${process.out.db[0][1]}/diffIdx"), | ||
| path("${process.out.db[0][1]}/info"), | ||
| file("${process.out.db[0][1]}/split").name, | ||
| path("${process.out.db[0][1]}/taxID_list"), | ||
| file("${process.out.db[0][1]}/db.parameters").name, | ||
| process.out.findAll { key, val -> key.startsWith("versions")} | ||
| ).match() | ||
| } | ||
| ) | ||
| } | ||
|
|
||
| } | ||
|
|
||
| test("sarscov2 - sarscov2 DNA - with cds info") { | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id:'test' ], | ||
| [ | ||
| file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), | ||
| ] | ||
| ] | ||
| input[1] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/names.dmp', checkIfExists: true) | ||
| input[2] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/nodes.dmp', checkIfExists: true) | ||
| input[3] = [] | ||
| input[4] = file(params.modules_testdata_base_path + 'genomics/prokaryotes/metagenome/taxonomy/accession2taxid/nucl_gb.accession2taxid', checkIfExists: true) | ||
| input[5] = [ | ||
| file(params.modules_testdata_base_path + 'genomics/prokaryotes/bacteroides_fragilis/genome/genome.fna.gz', checkIfExists: true) | ||
| ] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assertAll( | ||
| { assert process.success }, | ||
| { assert snapshot( | ||
| path("${process.out.db[0][1]}/acc2taxid.map"), | ||
| path("${process.out.db[0][1]}/diffIdx"), | ||
| path("${process.out.db[0][1]}/info"), | ||
| file("${process.out.db[0][1]}/split").name, | ||
| path("${process.out.db[0][1]}/taxID_list"), | ||
| file("${process.out.db[0][1]}/db.parameters").name, | ||
| process.out.findAll { key, val -> key.startsWith("versions")} | ||
| ).match() | ||
| } | ||
| ) | ||
| } | ||
|
|
||
| } | ||
|
|
||
| test("sarscov2 - sarscov2 DNA - with cds info - stub") { | ||
|
|
||
| options "-stub" | ||
|
|
||
| when { | ||
| process { | ||
| """ | ||
| input[0] = [ | ||
| [ id:'test' ], | ||
| [ | ||
| file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), | ||
| ] | ||
| ] | ||
| input[1] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/names.dmp', checkIfExists: true) | ||
| input[2] = file(params.modules_testdata_base_path +'genomics/prokaryotes/metagenome/taxonomy/taxdmp/nodes.dmp', checkIfExists: true) | ||
| input[3] = [] | ||
| input[4] = file(params.modules_testdata_base_path + 'genomics/prokaryotes/metagenome/taxonomy/accession2taxid/nucl_gb.accession2taxid', checkIfExists: true) | ||
| input[5] = [ | ||
| file(params.modules_testdata_base_path + 'genomics/prokaryotes/bacteroides_fragilis/genome/genome.fna.gz', checkIfExists: true) | ||
| ] | ||
| """ | ||
| } | ||
| } | ||
|
|
||
| then { | ||
| assertAll( | ||
| { assert process.success }, | ||
| { assert snapshot( | ||
| path("${process.out.db[0][1]}/acc2taxid.map"), | ||
| path("${process.out.db[0][1]}/diffIdx"), | ||
| path("${process.out.db[0][1]}/info"), | ||
| file("${process.out.db[0][1]}/split").name, | ||
| path("${process.out.db[0][1]}/taxID_list"), | ||
| file("${process.out.db[0][1]}/db.parameters").name, | ||
| process.out.versions_metabuli | ||
| ).match() | ||
| } | ||
| ) | ||
| } | ||
|
|
||
| } | ||
|
|
||
|
|
||
| } | ||
|
sofstam marked this conversation as resolved.
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.