Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample MixedModel object and refit #806

Open
joseah opened this issue Feb 28, 2025 · 1 comment
Open

Resample MixedModel object and refit #806

joseah opened this issue Feb 28, 2025 · 1 comment

Comments

@joseah
Copy link

joseah commented Feb 28, 2025

Dear MixedModels.jl team,

I've implemented a non-parametric bootstrapping strategy for genetic analysis using the MixedModels.jl. One thing I've noticed is that the MixedModelsPermutations package by @palday resamples residuals directly within the model (in-place) without having to regenerate from scratch the MixedModel object for every bootstrap iteration followed by refit!. This results in a huge speed up.

Is there any recommendation on how to do this resampling observations (essentially as if we were to input a new resampled dataset to the fit function but within the MixedModel object)? I've been attempting to resample the model.reterms attribute but I was wondering if there's a better way to take into account all the necessary attributes for this task (e.g. X, Xymat, reterms)? Is this something feasible considering the MixedModels class definition?

Many thanks,
Jose

@palday
Copy link
Member

palday commented Mar 3, 2025

You don't need to worry about X vs. Xymat -- X is a view into the latter.

I haven't tested this out, so I don't promise the accuracy (and know certain parts probably aren't quite right), but here's something between pseudo- and actual code to do things:

using StatsBase
# pre-allocate a few things
selection = collect(1:nobs(model))
scratch = deepcopy(model)
rng = MersenneTwister(1)

for i in 1:niter:
   sample!(rng, 1:nobs(model), selection; replace=true)
   # careful use of `copy!` might be a bit faster here
   scratch.X .= @view model.X[selection, :]
   scratch.y .= @view model.y[selection, :]
   for (scratch_re, re) in zip(scratch.reterms, model.reterms)
      # the transposes are stored, so we need to do columns 
      scratch_re.z .= @view(re.z[:, selection])
      scratch_re.adjA .= @view(re.adjA[:, selection])
   end
end

(The above example ignores the weighted case.)

This is assuming that you're sampling blindly without taking the blocking structure into account. That's a very naive way to sample for entire observations (as opposed to residuals) and may (probably?) not give the inference you want. Beyond that concern, it also destroys the sparsity pattern in the random effects because the observations are no longer sorted by block. This has several knock-on effects:

  • The sparse matrices will be stored far less efficiently, which will result in higher memory use and slower element access. The latter bit will slow down everything.
  • The intermediate computations used in fitting a model will also have their sparsity patterns destroyed, which means that you'll either hit slower, more general methods OR encounter cases that can't occur when the sparsity pattern is maintained and thus get a MethodError because we haven't implemented the necessary code for those patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants