-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom names in transformations #1240
Comments
Sorry, I do not have plans to implement 100% custom names. You can kind of work around this by defining a custom grouping variable and feeding it to library(drake)
custom_ids <- c("method1_data1", "method2_data2")
plan <- drake_plan(
x = target( # Start with a short name.
my_fun(path = file_in(my_path)),
transform = map(
my_path = c(
"/nothing/interesting/METHOD1/stuff/DATA1",
"/nothing/interesting/METHOD2_BLA_BLA/stuff/DATA2"
),
custom_ids = !!custom_ids, # The !! is important.
.id = custom_ids,
)
)
)
plan
#> # A tibble: 2 x 2
#> target command
#> <chr> <expr>
#> 1 x_method1_dat… my_fun(path = file_in("/nothing/interesting/METHOD1/stuff/DATA…
#> 2 x_method2_dat… my_fun(path = file_in("/nothing/interesting/METHOD2_BLA_BLA/st…
drake_plan_source(plan)
#> drake_plan(
#> x_method1_data1 = my_fun(path = file_in("/nothing/interesting/METHOD1/stuff/DATA1")),
#> x_method2_data2 = my_fun(path = file_in("/nothing/interesting/METHOD2_BLA_BLA/stuff/DATA2"))
#> ) Created on 2020-04-21 by the reprex package (v0.3.0) |
Hm, I was just about to write a similar feature request. The arguments I
with resulting target names
If that won't be possible, would you mind explaining how to work around this for |
You can supply a library(drake)
library(tidyverse)
grid <- expand_grid(
param1 = c("FirstLongVariableName", "SecondLongVariableName"),
param2 = c("LongVariableName_1", "LongVariableName_2")
) %>%
mutate(
id1 = recode(
param1,
FirstLongVariableName = "first",
SecondLongVariableName = "second"
)
) %>%
mutate(
id2 = recode(
param2,
LongVariableName_1 = "b1",
LongVariableName_2 = "b2"
)
)
grid
#> # A tibble: 4 x 4
#> param1 param2 id1 id2
#> <chr> <chr> <chr> <chr>
#> 1 FirstLongVariableName LongVariableName_1 first b1
#> 2 FirstLongVariableName LongVariableName_2 first b2
#> 3 SecondLongVariableName LongVariableName_1 second b1
#> 4 SecondLongVariableName LongVariableName_2 second b2
drake_plan(
model = target(
my_fun(param1, param2),
transform = map(.data = !!grid, .id = c(id1, id2))
)
)
#> # A tibble: 4 x 2
#> target command
#> <chr> <expr>
#> 1 model_first_b1 my_fun("FirstLongVariableName", "LongVariableName_1")
#> 2 model_first_b2 my_fun("FirstLongVariableName", "LongVariableName_2")
#> 3 model_second_b1 my_fun("SecondLongVariableName", "LongVariableName_1")
#> 4 model_second_b2 my_fun("SecondLongVariableName", "LongVariableName_2") Created on 2020-04-21 by the reprex package (v0.3.0) |
Thank you very much for the example, @wlandau! This works, but I had to use |
Yes, thanks for these hacks @wlandau. |
I let it sit for a while and changed my mind because implementation turned out to be trivially easy. Implemented in 312e124. library(drake)
drake_plan(
x = target(
f(x),
transform = map(x = !!seq_len(2), .names = c("a", "b"))
),
y = target(
f(w, x),
transform = cross(
w = !!seq_len(2),
x,
.names = c("aa", "ab", "ba", "bb")
)
),
z = target(
g(y),
transform = map(y)
),
final = target(
h(z),
transform = combine(z, .by = x, .names = c("final1", "final2"))
)
)
#> # A tibble: 12 x 2
#> target command
#> <chr> <expr>
#> 1 final1 h(z_aa, z_ab)
#> 2 final2 h(z_ba, z_bb)
#> 3 a f(1L)
#> 4 b f(2L)
#> 5 aa f(1L, a)
#> 6 ab f(2L, a)
#> 7 ba f(1L, b)
#> 8 bb f(2L, b)
#> 9 z_aa g(aa)
#> 10 z_ab g(ab)
#> 11 z_ba g(ba)
#> 12 z_bb g(bb) Created on 2020-04-23 by the reprex package (v0.3.0) I had considered this feature before, but I resisted on general principles. It breaks the abstraction of the interface, which is usually unwise from a software development perspective. I have made the mistake of being too eager to add flexibility over the last 3-ish years of developing If I could go back and develop the whole package from scratch, I would recommend clever workarounds for as long as possible before immediately implementing new features. For example, triggers was not a good idea in hindsight. The implementation was extremely complicated to create and even more burdensome to maintain, and I now strongly suspect the most important functionality can be completely covered with global objects, For custom names, it took a go at implementation to see this, but it turned out to have less technical debt than I originally expected. |
Thank you so much for implementing custom names, @wlandau!! I like the flexibility, but I wonder whether it might also be more error prone as one now has to think through and double check the order of arguments and expansions to provide a proper name vector. Especially, when changing the code (e.g., swapping of arguments), this could easily lead to working with "incorrectly" named targets. Personally, I am only interested in modifying
which gives the same result as the
|
I share your concerns, and this was part of my initial resistance. I recommend avoiding
Fundamental changes to behavior get tricky because |
This is all true! I think I will stay with using |
I will keep using Here, I'm trying to prepend the previous names: plan <- drake::drake_plan(
# Read haplotype seqs and clean their names
haps = target(
read_haps(path = drake::file_in(path)),
transform = map(
path = c(
"/home/lejno/Desktop/aBayesQR-nf/out_bwa_001/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf001/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf01/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t10_tf01/haplotypes"
),
.names = c("abqr_001",
"csnv_t5_tf001",
"csnv_t5_tf01",
"csnv_t10_tf01")
)
),
# Align the haps using Muscle, ClustalO and ClustalW with default settings
aln = target(
msa::msa(inputSeqs = haps, method = methods),
transform = cross(
haps,
methods = c("Muscle", "ClustalOmega", "ClustalW"),
.names = outer(
c("Muscle", "ClustO", "ClustW"),
haps,
paste,
sep = "_"
) %>% as.vector()
)
)
)
#> Error in outer(c("Muscle", "ClustO", "ClustW"), haps, paste, sep = "_") :
#> object 'haps' not found Of course, I can just copy the first |
You can work with external variables to support your plan, such as the library(drake)
hap_names <- c(
"abqr_001",
"csnv_t5_tf001",
"csnv_t5_tf01",
"csnv_t10_tf01"
)
plan <- drake_plan(
haps = target(
read_haps(path = drake::file_in(path)),
transform = map(
path = c(
"/home/lejno/Desktop/aBayesQR-nf/out_bwa_001/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf001/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf01/haplotypes",
"/home/lejno/Desktop/cliqueSNV-nf/out_t10_tf01/haplotypes"
),
.names = !!hap_names
)
),
aln = target(
msa::msa(inputSeqs = haps, method = methods),
transform = cross(
haps,
methods = c("Muscle", "ClustalOmega", "ClustalW"),
.names = as.vector(outer(
c("Muscle", "ClustO", "ClustW"),
!!hap_names,
paste,
sep = "_"
))
)
)
)
drake_plan_source(plan)
#> drake_plan(
#> Muscle_abqr_001 = msa::msa(inputSeqs = abqr_001, method = "Muscle"),
#> ClustO_abqr_001 = msa::msa(inputSeqs = abqr_001, method = "ClustalOmega"),
#> ClustW_abqr_001 = msa::msa(inputSeqs = abqr_001, method = "ClustalW"),
#> Muscle_csnv_t5_tf001 = msa::msa(inputSeqs = csnv_t5_tf001, method = "Muscle"),
#> ClustO_csnv_t5_tf001 = msa::msa(inputSeqs = csnv_t5_tf001, method = "ClustalOmega"),
#> ClustW_csnv_t5_tf001 = msa::msa(inputSeqs = csnv_t5_tf001, method = "ClustalW"),
#> Muscle_csnv_t5_tf01 = msa::msa(inputSeqs = csnv_t5_tf01, method = "Muscle"),
#> ClustO_csnv_t5_tf01 = msa::msa(inputSeqs = csnv_t5_tf01, method = "ClustalOmega"),
#> ClustW_csnv_t5_tf01 = msa::msa(inputSeqs = csnv_t5_tf01, method = "ClustalW"),
#> Muscle_csnv_t10_tf01 = msa::msa(inputSeqs = csnv_t10_tf01, method = "Muscle"),
#> ClustO_csnv_t10_tf01 = msa::msa(inputSeqs = csnv_t10_tf01, method = "ClustalOmega"),
#> ClustW_csnv_t10_tf01 = msa::msa(inputSeqs = csnv_t10_tf01, method = "ClustalW"),
#> abqr_001 = read_haps(path = drake::file_in("/home/lejno/Desktop/aBayesQR-nf/out_bwa_001/haplotypes")),
#> csnv_t5_tf001 = read_haps(path = drake::file_in("/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf001/haplotypes")),
#> csnv_t5_tf01 = read_haps(path = drake::file_in("/home/lejno/Desktop/cliqueSNV-nf/out_t5_tf01/haplotypes")),
#> csnv_t10_tf01 = read_haps(path = drake::file_in("/home/lejno/Desktop/cliqueSNV-nf/out_t10_tf01/haplotypes"))
#> ) Created on 2020-04-24 by the reprex package (v0.3.0) |
Prework
drake
's code of conduct.Proposal
I'd like to be able to explicitly specify names of targets created with map() or cross().
I don't want to have to rely on variable names (see #1220) or remember numbers (
.id = FALSE
), but simply supply a character vector with my own meaningful names.Something like this:
Having short meaningful names would help in downstream prototyping or exploratory analysis. Currently I have to do
short_name <- loadd("a_la_la_la_la_long_long_li_long_long_name")
for each target.The text was updated successfully, but these errors were encountered: