Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and add more unit tests #42

Open
5 tasks
davidssmith opened this issue Jan 10, 2018 · 5 comments
Open
5 tasks

Simplify and add more unit tests #42

davidssmith opened this issue Jan 10, 2018 · 5 comments

Comments

@davidssmith
Copy link
Owner

I'm not entirely happy with the number of dependencies and helper functions for what is essentially a glorified non-linear least squares fitting code. I would be happier if it were distilled to its essential functions and all of those functions were unit tested on small, known data sets, rather than the QIBA phantom.

To start the discussion, I propose removing:

  • command line arguments
  • parallel processing (the new algos are so fast that the benefit is minimal)
  • move demo and validate to test/
  • replace jacobian from Calculus.jl with a simpler, inline expression
  • remove any plotting and leave that to the user, or use something simpler, like UnicodePlots

These are just ideas. I'll think about it for a while and take any comments the community has.

@davidssmith
Copy link
Owner Author

Since the actual in vivo data set has no known ground truth, perhaps it should be removed and the QIBA phantom should be the demo. Then the validation would occur on a handful of fake voxels with known parameter values that are checked in runtests.jl.

@davidssmith
Copy link
Owner Author

In fact, the nightly build seems to be failing due to a bug in ArgParse which could be removed theoretically.

@notZaki
Copy link
Contributor

notZaki commented Jan 19, 2018

I've never used the command line version, so can't really comment on that.

The parallel processing (nlsfit()) has been useful to me in the past when working with simulation data. But yea, it's not really something I'd use for clinical data unless batch processing a big collection. Is there a significant overhead for keeping this feature?

Yes to moving demo and validate.

I have no opinion on jacobian. Just curious if ForwardDiff.jl is any better than Calculus.jl.

The plotting code doesn't seem to be causing any problems. It'll even be out of sight if demo/validate are moved to test.


I think it'll be useful for each model to have its own fitting function - e.g. fitTofts(), fitExtTofts(), etc. The 'numbered model paradigm' worked well for a sequence of nested models, and I guess it is convenient for users that want a pre-configured fitting pipeline, but it is awkward to add new models to it.

Another change - which might be unfeasible - is to reduce the size of the package by removing unnecessary files, such as the DICOM files of the QIBA phantoms. The problem there is that git tracks the files, so they're never truly gone. They could be purged from the git history, but then the peerj paper is no longer reproducible.

@davidssmith
Copy link
Owner Author

The overhead for keeping parallel functionality is a slower single CPU fit, since it still runs through the same framework, and of course code bloat with code that is especially difficult for the casual user to understand.

It looks like ForwardDiff.jl requires a differentiable function, but I bet it is faster. Maybe you can do some math to the Tofts integral to turn it into a differentiable function suitable for passing to it.

I don't understand why each model should have it's own fitting function. Why is the numbered model paradigm not working for you?

I'd like to remove the DICOMs and replace them with something like DataDeps.jl, but yes it won't reduce the repo size unfortunately unless we nuke the old versions, including the one in the paper.

@notZaki
Copy link
Contributor

notZaki commented Jan 24, 2018

I guess parallel processing could be removed if it's slowing down the single CPU fit, as that will be how most people use the package. It could be replaced by adding Threads.@threads before for-loops (the ones iterating over each voxel), which'll take advantage of multi-threading - if the user has it enabled.

The problem I see is that fitdce() will become big and complicated as more models are added. For example, if we were to add the 2CXM - which has parameters Fp, PS, ve, vp - do we add Fpmax and PSmax as optional parameters to fitdce()? Also, if the user asks to fit both the ExtTofts and 2CXM, how would fitdce() handle the fact that Ktrans, Fp and PS are not in both models?
It's why I think the fitting part - i.e. the part after if x in models - be its own function. Then, fitdce() would be a convenient wrapper for the Tofts model family without having to include all models.
This would also allow users to call the fitting part directly to get the 'raw' fits without any post-processing, and allows each model's fit to be tested without having to go through fitdce().
There's also the smaller issue of the model numbers being arbitrary - e.g. the ExtTofts model is currently Models 3 and 5 - which isn't aesthetically pleasing.

I think the new package managed (Pkg3) is supposed to use shallow clones, so that might speed up installation after removing auxiliary files. But, anyone wanting to contribute to the package will still have to download the entire git history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants