Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of zlib #548

Closed
wants to merge 10 commits into from
Closed

Initial implementation of zlib #548

wants to merge 10 commits into from

Conversation

choffstein
Copy link

This is an initial implementation of deflating and inflating a string, implemented with zlib.

@choffstein
Copy link
Author

My biggest concern here is the memory management and who is responsible for the underlying strings. I think I have matched things up with the finalizers, but I would love it someone a bit more experienced could poke through zlib_wrapper and zlib.jl to make sure there aren't any unforeseen circumstances where there could be a memory leak.

Also, I don't think this is the most efficient, or correct, implementation of zlib -- especially with how I "inflate" the data (via realloc). I am sure someone knows a better way.

@JeffBezanson
Copy link
Member

Cool, thanks! I'll take a look at some point.

@ViralBShah
Copy link
Member

@choffstein Any specific reason why only strings, or is it just a starting point? Presumably, we can use zlib to compress any serialized data in julia, arrays of Numbers, etc. - although it is questionable how much compression one would get.

@choffstein
Copy link
Author

Every other zlib library I have run across works has only worked with byte arrays (often implemented via string types -- see: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/zlib/rdoc/Zlib/Deflate.html#method-c-deflate).

My experience tends to be: take a data structure, serialize it to some sort of byte array format / string (JSON, XML, proprietary), deflate it, store / send it, inflate it, then deserialize.

It seems like the appropriate way to reduce coupling and increase cohesion is to make the zlib library take a single input type. That way it is agnostic of the type of information being passed in -- all it needs to know is that it is a byte array (or, String).

So, theoretically, the whole thing can be changed to:

type CompressedBytes
    zp::Ptr{Void}

    function CompressedBytes(zp::Ptr{Void})
        cs = new(zp)
        finalizer(cs, _jl_zlib_free_pack)
        cs
    end
end

function deflate(s::String)
        p = convert(Ptr{Uint8}, s)
    sz = convert(Uint, 0)
    zp = ccall(dlsym(_jl_zlib_wrapper, :_jl_zlib_deflate), Ptr{Void}, (Ptr{Uint8},), p)
    CompressedBytes(zp)
end

function inflate(cs::CompressedBytes)
    cp = ccall(dlsym(_jl_zlib_wrapper, :_jl_zlib_inflate), Ptr{Uint8}, (Ptr{Void},), cs.zp)
    s = cstring(cp)
    _c_free(cp)
    s
end

function string(cs::CompressedBytes)
    cp = ccall(dlsym(_jl_zlib_wrapper, :_jl_zlib_print_pack), Ptr{Uint8}, (Ptr{Void},), cs.zp)
    s = cstring(cp) # copies the string
    _c_free(cp)
    s
end

@StefanKarpinski
Copy link
Member

Thanks! The core compress method should operate on a Array{Uint8,1} value instead. In Julia String is just an abstraction. In order to compress a string, you have to encode it first as a byte array and then compress that. That conversion process implicitly chooses some encoding. We default to UTF-8, but you could explicitly encode string data as anything and then compress the resulting array of bytes.

@StefanKarpinski
Copy link
Member

Can you rebase this onto master so that the commits are cleaner?

@choffstein
Copy link
Author

See #553

@choffstein choffstein closed this Mar 9, 2012
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Oct 11, 2021
Remove unnecessary convert(Vector, wv) calls
Keno pushed a commit that referenced this pull request Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants