-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add three solvers for differences of convex functions. #1307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, and some very cool new stuff coming in! My comments are mostly wrt docs, and one efficiency thing.
y_n \\in \\partial h(x_n), \\qquad x_{n+1} | ||
= \\mathrm{Prox}^{\\gamma}_g(x_n + \\gamma y_n) | ||
|
||
Here, :math:`\\gamma` is the stepsize parameter ``gamma``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's obvious enough that you don't need to mention it explicitly.
= \\mathrm{Prox}^{\\gamma}_g(x_n + \\gamma y_n) | ||
|
||
Here, :math:`\\gamma` is the stepsize parameter ``gamma``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a comment on the difference to the "simple" DCA? I guess the main advantage is that you don't need the subgradient of g
, but only its prox
.
|
||
[TA1997] Tao, P D, and An, L T H. *Convex analysis approach to d.c. | ||
programming: Theory, algorithms and applications*. Acta Mathematica | ||
Vietnamica, 22.1 (1997), pp 289--355. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the text references it, I wouldn't add this citation. Without reference it's unclear how it relates to this function.
Initial point, updated in-place. | ||
g : `Functional` | ||
Convex part. Needs to provide a `Functional.convex_conj` with a | ||
`Functional.proximal` factory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this. Maybe this is an artefact of diffing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
and the iteration is given by | ||
|
||
.. math:: | ||
y_n \in \partial h(x_n), \qquad x_{n+1} \in \partial g^*(y_n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you draw the connection to the gradient
property?
min_x a/2 (x - b)^2 - |x| | ||
|
||
For finding possible (local) solutions, we consider several cases: | ||
* x > 0 ==> ∂|.|(x) = 1, i.e., a necessary optimality condition is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need a unicode marker for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
raise ValueError('`g.domain` and `h.domain` need to be equal, but ' | ||
'{} != {}'.format(space, h.domain)) | ||
for _ in range(niter): | ||
g.proximal(gamma)(x + gamma * h.gradient(x), out=x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use x.lincomb
in the argument, you can probably avoid one copy.
for _ in range(niter): | ||
g.proximal(gamma)(x + gamma * K.adjoint(y) - | ||
gamma * phi.gradient(x), out=x) | ||
h.convex_conj.proximal(mu)(y + mu * K(x), out=y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
.. math :: | ||
x_{n+1} &= \mathrm{Prox}_{\gamma}^g (x_n + \gamma K^* y_n | ||
- \gamma \nabla \varphi(x_n)), \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a matter of taste, but I'd take out the common factor gamma
.
g.proximal(gamma)(x + gamma * K.adjoint(y) - | ||
gamma * phi.gradient(x), out=x) | ||
h.convex_conj.proximal(mu)(y + mu * K(x), out=y) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you break down the code to make it more efficient, perhaps it makes sense to again add a _simple
implementation of the method:
for _ in range(niter):
x = g.proximal(gamma)(x + gamma * (K.adjoint(y) - phi.gradient(x)))
y = h.convex_conj.proximal(mu)(y + mu * K(x))
no=4&page=451&year=2003&ppage=462>`_. It solves the problem | ||
|
||
.. math :: | ||
\\min g(x) - h(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked this the first time. Please go for new-style r'''
math-containing docstrings and no double backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem then is the broken link, which is itself too long for one line because of PEP8. I have not found a way to break the link without inserting additional whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, too bad. In that case I'd prefer violating PEP8 and let the link go over the length limit, to still get the benefit of less ugly and error-prone math typing. Other opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I changed this, and neither pytest --pep8
nor PEP8speaks are complaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice contribution! Some minor comments.
|
||
"""Solvers for the optimization of the difference of convex functions. | ||
|
||
Collection of DCA and related methods which make use of structured optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explanation of the acronym DCA somewhere?
Convex part. Needs to provide a `Functional.convex_conj` with a | ||
`Functional.gradient` method. | ||
h : `Functional` | ||
Negative of the concave part. Needs to provide a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the first sentence hard to understand. Should it be a convex or concave function, and why the negative? Can't we just write Convex functional. Needs to [...]
x : `LinearSpaceElement` | ||
Initial point, updated in-place. | ||
g : `Functional` | ||
Convex part. Needs to provide a `Functional.convex_conj` with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write this as Needs to implement Functional.convex_conj.gradient.
? Or is that more confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise than just 👍, `Functional.convex_conj.gradient` will produce a dead link, so it's better to write ``g.convex_conj.gradient`` then.
See also | ||
-------- | ||
""" | ||
# `prox_dca`, `doubleprox_dc` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put back in See also
.
raise ValueError('`g.domain` and `h.domain` need to be equal, but ' | ||
'{} != {}'.format(space, h.domain)) | ||
for _ in range(niter): | ||
g.convex_conj.gradient(h.gradient(x), out=x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do g_convex_conj = g.convex_conj
outside the loop and then use g_convex_conj
in the loop. If the initialization is heavy this should be faster, right? Poke @kohr-h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, good point
---------- | ||
[BB2016] Banert, S, and Bot, R I. *A general double-proximal gradient | ||
algorithm for d.c. programming*. arXiv:1610.06538 [math.OC] (2016). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A See also
section?
callback(x) | ||
|
||
|
||
def doubleprox_dc_simple(x, y, g, h, phi, K, niter, gamma, mu): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be kept? If so, move it to a test of the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve inserted it on @kohr-h’s request (#1307 (comment)), and in the test it is checked that both methods give exactly the same result. The same principle is already in ODL:
odl/odl/solvers/nonsmooth/admm.py
Line 149 in 0ab389f
def admm_linearized_simple(x, f, g, L, tau, sigma, niter, **kwargs): |
def adupdates_simple(x, g, L, stepsize, inner_stepsizes, niter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good that the methods are next to each other so that, if things go seriously wrong, users can go for the "emergency fallback" that is (hopefully) bug-free. As long as it's not in __all__
it doesn't clutter anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
@@ -0,0 +1,90 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we usually have this in the top of the files? @kohr-h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but without it, the ∂ symbol in the comment would cause a Python 2 error, see #1307 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In less than two years we can remove it again :-)
* {b + 1/a} if b > 1/a. | ||
""" | ||
|
||
# Set the problem parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short comment on that this puts us in the third case?
assert float(y_simpl) == pytest.approx(float(y)) | ||
|
||
# All methods should give approximately one solution of the problem. | ||
assert dist_dca == pytest.approx(0, abs=1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not use the HIGH_ACCURACY
and LOW_ACCURACY
somehow? Or what are they used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this as a regression test. The test shows that this method gives the accuracy 1e-6
for the current configuration of ODL (and the accuracy is too low for rounding errors) and the given number of iterations. Maybe I could change this to LOW_ACCURACY, but then it is not clear that the values should not even be equal even in theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this. We have something similar in the ray transform tests, but on the other hand things are not fully within our control there. @adler-j?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to leave this as it is. Instead I will add a clarifying comment.
|
||
from __future__ import division | ||
import odl | ||
from odl.solvers import (dca, prox_dca, doubleprox_dc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no parentheses here
def dca(x, g, h, niter, callback=None): | ||
r"""Subgradient DCA of Tao and An. | ||
|
||
This algorithm solves a problem of the form:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the colon being printed in the doc, add a space before the ::
|
||
This algorithm solves a problem of the form:: | ||
|
||
min_x g(x) - h(x), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be completely outrageous to use f
and g
as the other optimization methods do?
I think I addressed everything now. |
That’s kind of strange. The failing Travis test says
This does not happen on my local machine, and I always thought that 1.56e-9 < 1e-6. |
After updating my conda packages, I get the same error. Possible reasons for this failure are thus (I suspect that it might be the pytest update)
|
😂 I'd assume that as well.. |
For reference, here is the PR with the relevant changes. But since the old tests seem to be fine, I guess there's some subtle issue with your test. |
I tried to figure out what’s going on with the debugger.
However, if I convert everything to float, it works fine (as does everything with NumPy 1.13). |
This seems like an issue with pytest, can you try to make a minimal example that doesn't involve ODL and file an issue with pytest if it's reproducible? Perhaps it's already been raised though. |
I will poke this one again 😉 It is more or less good to go, no? |
I’m not aware of any open issues with this pull request. (Except for the bug report to pytest, which might be unnecessary anyway because it works fine with the newest numpy version.) |
We need to undo this, otherwise LGTM. |
This violates PEP8 because the URL cannot be shortened.
Clarify documentation of the functionals.
DOC: Adapt notation of proximals to ODL conventions.
DOC: Correct several mistakes in the docstrings.
Done. There is no way to remove this annoying “Update branch” button or change its function to a rebase, I assume? |
No, unfortunately not. I guess everybody has to be burned at least once :-) |
This family of solvers allows to handle e.g. nonconvex regularizers like the difference of two convex minimizers. The three solvers differ in the use of proximals and subgradients for the different parts of the problem.
The main restriction is that there is no solver which can handle objective functions whose convex part is the composition of a functional with a linear operator.