Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] added definitions for lp-distances. Faster than just using norm #1234

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/LinearAlgebra.jl
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ export
lyap,
mul!,
norm,
lpdist,
l1dist,
euclidean,
l2dist,
normalize!,
normalize,
nullspace,
Expand Down
83 changes: 83 additions & 0 deletions src/generic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -892,6 +892,89 @@

norm(v::AdjOrTrans, p::Real) = norm(v.parent, p)

"""
lpdist(x::T, y::T, p::Int) where {T <: AbstractVector}
lpdist(x::T, y::T, p::Real) where {T <: AbstractVector}

Computes the Lₚ distance between two vectors `x` and `y`, defined as:
for 0 < p < Inf: Lₚ dist = ᵖ√(Σ|xᵢ - yᵢ|ᵖ)
for p = 0: count of mismatches for p = 0
for p + Inf: L_∞ = maxᵢ(abs(xᵢ - yᵢ))
"""

function _lpunnorm(x::T, y::T, fn) where {T}
r = 0.0
for i in eachindex(x,y)
@inbounds r += fn(x[i] - y[i])
Copy link
Member

@stevengj stevengj Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this algorithm suffers from spurious overflow if the elements of x or y are very large.

e.g. lpdist([1e300],[0.0]) should return 1e300, but this will return Inf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! Any pointers on how I can fix this?

Copy link
Member

@jishnub jishnub Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps divide each vector by the max of the maximum absolute value of each vector before computing the distance, and rescale the result later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that rescaling is not needed for the L1 norm or the maximum norm.

Note also that you should be using norm, not abs, everywhere, in order to handle vectors whose entries are themselves vectors. In fact, arguably you should be calling the distance function recursively for such cases.

In general, I would study the generic_norm implementation, which already deals with such issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I intended this to be an implementation of vectors of reals (and maybe complex) only. Would it be better to specialise for real/complex vectors and have a separate implementation for recursive structures?

Copy link
Member

@stevengj stevengj Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how it's done for norm. There's no need to have two implementations, since norm_sqr just calls abs2 for scalars, and norm for scalars just calls abs.

You should really study the norm implementation and build off that, as otherwise you're re-inventing the wheel here.

end

return r
end


function lpdist(x::T, y::T, p::Int) where {T <: AbstractVector}
p < 0 && throw(DomainError("p must be non-negative"))
length(x) == length(y) || throw(DimensionMismatch("x and y have different lenghts"))
if p == 0
@warn("Technically not a distance metric for p = 0")
fn = !iszero # simply count number of nonzeros
elseif p == 1
fn = abs
elseif p == 2
fn = abs2
Comment on lines +922 to +924
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be norm and norm_sqr, respectively.

else
fn = u -> abs(u)^p
end
r = _lpunnorm(x, y, fn)

p <= 1 && return r
p == 2 && return √r

return r^(1/p)
end

function lpdist(x::T, y::T, p::Real) where {T <: AbstractVector}
length(x) == length(y) || throw(DimensionMismatch("x and y have different lenghts"))
p < 0 && throw(DomainError("p must be non-negative"))
p < 1 && @warn("Technically not a distance metric for 0 < p < 1")

# handle inf norm separatey
if p == Inf
r = 0.0
for i in eachindex(x,y)
@inbounds r = max(abs(x[i] - y[i]), r)
end
return r
elseif iszero(p)
return lpdist(x, y, 0)

Check warning on line 949 in src/generic.jl

View check run for this annotation

Codecov / codecov/patch

src/generic.jl#L949

Added line #L949 was not covered by tests
else
fn = u -> abs(u)^p
end
r = _lpunnorm(x, y, fn)

return r^(1/p)
end

"""
euclidean(x::T, y::T)
l2dist(x::T, y::T)

Compute the L₂ distance between vectors x and y.
Basically just aliases for lpdist(x, y, 2)
"""
euclidean(x::T, y::T) where {T <: AbstractVector} = lpdist(x, y, 2)
l2dist(x::T, y::T) where {T <: AbstractVector} = lpdist(x, y, 2)

"""
l1dist(x::T, y::T)

Compute the L₁ distance between vectors x and y.
Basically just an alias for lpdist(x, y, 1)
"""
l1dist(x::T, y::T) where {T <: AbstractVector} = lpdist(x, y, 1)


## dot products
"""
dot(x, y)
x ⋅ y
Expand Down
16 changes: 16 additions & 0 deletions test/generic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -837,4 +837,20 @@ end
end
end

@testset "distance functions" begin
x = [1., 2., 3.]; y = [1., 2.1, 3.4]
@test lpdist(x, y, 1) ≈ norm(x .- y, 1)
@test lpdist(x, y, 2) ≈ norm(x .- y, 2)
@test lpdist(x, y, 4) ≈ norm(x .- y, 4)
@test lpdist(x, y, 3.5) ≈ norm(x .- y, 3.5)
@test euclidean(x, y) ≈ norm(x .- y, 2)
@test l2dist(x, y) ≈ norm(x .- y, 2)
@test l1dist(x, y) ≈ norm(x .- y, 1)
@test lpdist(x, y, 0) ≈ norm(x .- y, 0)
@test lpdist(x, y, Inf) ≈ norm(x .- y, Inf)
@test_throws DomainError lpdist(x, y, -1)
@test_throws DomainError lpdist(x, y, -0.5)
@test_throws DimensionMismatch lpdist(x, [1.5, 2.0, 2.0, 3.1], 1)
end

end # module TestGeneric