Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize_multi_plate_design does not optimize within plate distribution of covariates #48

Open
julianesiebourg opened this issue May 2, 2024 · 3 comments

Comments

@julianesiebourg
Copy link
Collaborator

optimize_multi_plate_design does not really optimize within plate distribution of covariates. Instead, clusters form.

Replacing the parameters p = 1 and penalize_lines = "none" fixes the issue but these cannot be set in optimize_multi_plate_design

Example

samples <- structure(list(Subject.Number = c(1, 2, 3, 3, 4, 4, 5, 5, 6, 
7, 7, 8, 8, 9, 10, 10, 11, 12, 13, 13, 14, 14, 15, 15, 16, 17, 
17, 18, 18, 19, 19, 20, 21, 21, 22, 22, 23, 24, 25, 26, 26, 27, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
42, 43, 43, 44, 44, 45, 46, 46, 47, 48, 49, 50, 51, 52, 53, 54, 
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68), Gender = c("Male", 
"Male", "Female", "Female", "Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Female", "Female", "Female", "Female", "Female", "Male", "Male", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Female", "Female", "Male", "Female", "Male", "Male", 
"Male", "Female", "Male", "Male", "Male", "Female", "Female", 
"Female", "Female", "Male", "Female", "Female", "Male", "Female", 
"Male", "Male", "Male", "Female", "Female", "Male", "Male", "Male", 
"Female", "Female", "Male", "Male", "Female", "Male", "Male", 
"Male", "Male", "Female", "Female", "Female"), Timepoint = c("DAY 1", 
"DAY 1", "DAY 1", "DAY 168", "DAY 1", "DAY 168", "DAY 1", "DAY 112", 
"DAY 1", "DAY 1", "DAY 112", "DAY 1", "DAY 112", "DAY 1", "DAY 1", 
"DAY 112", "DAY 1", "DAY 1", "DAY 1", "DAY 168", "DAY 1", "DAY 112", 
"DAY 1", "DAY 112", "DAY 1", "DAY 1", "DAY 112", "DAY 1", "DAY 112", 
"DAY 1", "DAY 112", "DAY 1", "DAY 1", "DAY 112", "DAY 1", "DAY 112", 
"DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 112", "DAY 1", "DAY 112", 
"DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", 
"DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", 
"DAY 1", "DAY 168", "DAY 1", "DAY 112", "DAY 1", "DAY 112", "DAY 1", 
"DAY 1", "DAY 112", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", 
"DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", 
"DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", "DAY 1", 
"DAY 1", "DAY 1", "DAY 1"), Site = c("350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "350545", "350545", "350545", "350545", "350545", "350545", 
"350545", "356312", "356312", "356312", "356312", "356312", "356312", 
"356312", "356312", "356312", "356312", "356312", "356312", "356312", 
"356312", "356312", "356312", "356312", "356312", "356312", "356312", 
"356312", "356312", "356312", "356312", "356312", "356312", "356312", 
"356312", "356312", "356312", "356312", "356312", "356312", "356312", 
"356312"), Previous_treatment = c("Tx naive", "Pre-treated", 
"Tx naive", "Tx naive", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Tx naive", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Tx naive", "Tx naive", "Tx naive", "Tx naive", 
"Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", 
"Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", 
"Pre-treated", "Pre-treated", "Tx naive", "Tx naive", "Pre-treated", 
"Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", 
"Tx naive", "Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Pre-treated", "Tx naive", "Tx naive", "Pre-treated", 
"Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Pre-treated", "Tx naive", "Pre-treated", "Pre-treated", 
"Tx naive", "Tx naive", "Tx naive", "Tx naive", "Tx naive", NA, 
"Pre-treated", "Pre-treated", "Pre-treated", "Tx naive", "Tx naive", 
"Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", "Pre-treated", 
"Pre-treated", "Tx naive")), row.names = c(NA, -88L), class = "data.frame")

# make batch container
bc <- BatchContainer$new(
  dimensions = list(
    "plate" = 1,
    "row" = list(values =c(1:8)),
    "column" = list(values = c(1:11))
  )
)

# initial assignment 
bc <- assign_in_order(bc, samples)

# factors to balance
balance_variables <- c("Site", "Timepoint", "Previous_treatment", "Gender")

# running the wrapper
bc1 <- optimize_multi_plate_design(bc,
  within_plate_variables = balance_variables,
  plate = "plate",
  row = "row",
  column = "column",
  n_shuffle = 2,
  max_iter = 3000,
  quiet = TRUE
)

# plot site
print(plot_plate(bc$get_samples(remove_empty_locations = FALSE),
                   column = column, row = row,
         .color = Site)


# running it manually
# set scoring function for each balance variable
scoring_funcs <- purrr::map(
  balance_variables, 
  ~ mk_plate_scoring_functions(bc, row = "row", column = "column", group = .x, p = 1, penalize_lines = "none")
  ) %>% unlist()
names(scoring_funcs) <- balance_variables

bc2 <- optimize_design(
  bc,
  scoring = scoring_funcs,
  max_iter = 3000,
  quiet = TRUE,
  acceptance_func = accept_leftmost_improvement
)

# plot site
print(plot_plate(bc$get_samples(remove_empty_locations = FALSE),
                   column = column, row = row,
         .color = Site)
@julianesiebourg
Copy link
Collaborator Author

It seems that for euklidean distances (p = 2) higher distances are influencing the score more than smaller distances. In the extreme case this leads to the formation of clusters in the opposite corners of a plate. Since clusters are not lines, the line penalization does not help much here.
With manatten distance (p = 1) we see this problem less.

Maybe it would be better to go for a adjacency score (parametrized with "order of adjacency" e.g. n=1 direct neighbor, n = 2 second order neighbors).
This had been set up by Iakov in the past in an adhoc fashion: https://code.roche.com/PMDA/cross-project/opm/opm-macs-augenklinik-bern/-/blob/main/Olink%20Plate%20Layout/sample_randomization_March21_Iakov.Rmd

The score might also be influenced by how many levels a factor has. e.g. Sex with 2 levels vs a trial with 8 Sites.

@julianesiebourg
Copy link
Collaborator Author

A quick and dirty fix just sets the paramters correctly in the optimize_multi_plate_design function is here #49

@julianesiebourg
Copy link
Collaborator Author

Idea from @banfai:
Could we maximize the minimal distance across all pairs instead of maximizing the average distance?
This should give what we need, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant