Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue in sparseL: number of nonzero elements seems incorrect #807

Closed
ajinkya-k opened this issue Mar 5, 2025 · 16 comments
Closed

Comments

@ajinkya-k
Copy link
Contributor

ajinkya-k commented Mar 5, 2025

I noticed that the number of non zero elements in sparseL does not match the sum of number of non-zero elements in the individual components of L.

Here's an example:

using DataFrames, MixedModels

dat = DataFrame(MixedModels.dataset("insteval"))
describe(dat, :min, :max, :mean, :nunique, :eltype)

#
mfast = fit(
  MixedModel,
  @formula(
    y ~ 1 + service + (1 | d) + (1 | s) + (1 | dept) + (0 + service | dept)
  ),
  dat,
  progress=false    # suppress the display of a progress bar
)

sum(count.(!iszero, mfast.L))
# output: 774794

count(!iszero, sparseL(mfast; full=true))
# output: 774777

Am I missing something?

@ajinkya-k ajinkya-k changed the title Potential issue in sparseL Potential issue in sparseL: number of nonzero elements seems incorrect Mar 5, 2025
@ajinkya-k
Copy link
Contributor Author

@dmbates am I missing something here?

@dmbates
Copy link
Collaborator

dmbates commented Mar 14, 2025

If you look at A[3,3] and L[3,3] (these are the logical positions, the actual positions are A[6] and L[6]), the diagonal blocks corresponding to the dept random effects, you will see that A[6] is a UniformBlockDiagonal matrix and L[6] which is regarded as UpperTriangular is actually stored as a Matrix and has 14 non-zeros above the diagonal. The first step in creating L[3,3] from A[3,3] and \lambda_3 is to take each 2 by 2 block in A[3,3] (which is stored as an array of dimensions [2,2,14]) multiply it on the left by (\lambda_3)' and on the right by \lambda_3 and store the result in a square block on the diagonal of L[3,3]. The elements above the diagonal are ignored in the subsequent downdate and Cholesky factor operations.

I sent you a link to a Google Collab notebook that shows these non-zeros about the diagonal. I'm a little surprised to see that the difference in the number of non-zeros is 17; I can only account for 14.

@dmbates
Copy link
Collaborator

dmbates commented Mar 14, 2025

The other three vestigial non-zeros above the diagonal are in L[4,4] (a.k.a. last(m1.L)).

@ajinkya-k
Copy link
Contributor Author

ajinkya-k commented Mar 14, 2025

so this is a bug right? If so, I do have an implementation of sparseL that has the right number of zeros that i can submit as a PR.

@dmbates
Copy link
Collaborator

dmbates commented Mar 14, 2025

But the count from the current sparseL is the correct one. The difference from the other way of counting is that some of the elements in the strict upper triangle, which are ignored in later calculations, are used as scratch storage.

If it was important we could add code to zero out the elements above the diagonal in the diagonal blocks that are stored as dense matrices but I don't see that as a priority.

@ajinkya-k
Copy link
Contributor Author

oh sorry. I read it the other way round.

@ajinkya-k
Copy link
Contributor Author

So just to recap: the count in sparseL is correct, but that in the individual components of L is not, right?

@dmbates
Copy link
Collaborator

dmbates commented Mar 14, 2025

If the blocks on the diagonal of L were of type LowerTriangular{Matrix{T}} the components would be correct, too. The discrepancy is because we didn't type them that way, which would have involved using, e.g. L[3].data instead of L[3] when generating the values.

I feel that as long as we provide sparseL and its result is correct we shouldn't be too concerned if the indirect way of evaluating this gives misleading answers.

@ajinkya-k
Copy link
Contributor Author

sounds good. In that case i will just close this issue.

@palday
Copy link
Member

palday commented Mar 14, 2025

One final note here: some of the alternative storage formats we've considered would make this discrepancy go away (by only storing the lower triangle) or make this problem worse (by densely storing everything, as the internal 3-block L functionality does). While making it worse seems bad, it may actually be useful when it comes to move parts of the evaluation to the GPU. See also #789

I've been thinking about how to apply some of the lessons I learned from refactoring to support multiple optimizers to a potential refactoring supporting different storage backends, but I haven't yet a chance to spend the time on the code yet.

@dmbates
Copy link
Collaborator

dmbates commented Mar 15, 2025

Perhaps this should be a comment for #789 but I don't think that the 3-block L approach will be viable for models fit to large data sets with multiple, partially crossed grouping factors, say like the ml-32m example. The current fully-blocked version uses 6 blocks in L of the form

Diagonal
Sparse Dense
Dense Dense Dense

with the last row of blocks having only 2 rows. However the L[2,1] block is of size 80,000 by 200,000 and it is not practical to store and manipulate it as a dense matrix. It is possible to append the third row of blocks to the second row of blocks, resulting in a 3-block representation, but I think that is all that can be expected in the way of simplifications.

Usually the most time-consuming step is "downdating" L[2,2] after L[2,1] is evaluated and that is simply because you need to form products of q_1 sparse vectors of length q_2 with themselves.

@ajinkya-k
Copy link
Contributor Author

What are the 3 blocks in the 3block L?

@dmbates
Copy link
Collaborator

dmbates commented Mar 15, 2025

For something like the ml-32m model the sizes of the 6 blocks are

200,000 Diagonal
 80,000 Sparse     Dense
      2 Dense      Dense     Dense 

The 3-block representation would append the rows of L[3,1] to L[2,1] and the L[3,2] and L[3,3] blocks to L[2,2] resulting in

200,000 Diagonal
 80,002 Sparse  Dense

My point in the earlier comment is that you still need to use a sparse representation for L[2,1] because it is only about 0.2% non-zeros. As a dense matrix it would be nearly 120 GiB in size and evaluating L[2,1] * L[2,1]' would be a formidable task, even on a GPU.

@ajinkya-k
Copy link
Contributor Author

hany illustration of three blocks using the L from insteval (block 1: white border, block 2: yellow, block 3: light green )

Image

@dmbates
Copy link
Collaborator

dmbates commented Mar 15, 2025

Do you want to use that in place of the right hand panel in the current Figure 2? I think it might help to relate to Table 2.

@ajinkya-k
Copy link
Contributor Author

Wrong thread but let me check how I can do that without making it look asymmetric

Do you want to use that in place of the right hand panel in the current Figure 2? I think it might help to relate to Table 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants