Releases: ml-explore/mlx
Releases Β· ml-explore/mlx
v0.1.0
Highlights
- Memory use improvements:
- Gradient checkpointing for training with
mx.checkpoint
- Better graph execution order
- Buffer donation
- Gradient checkpointing for training with
Core
- Gradient checkpointing with
mx.checkpoint
- CPU only QR factorization
mx.linalg.qr
- Release Python GIL during
mx.eval
- Depth-based graph execution order
- Lazy loading arrays from files
- Buffer donation for reduced memory use
mx.diag
,mx.diagonal
- Breaking:
array.shape
is a Python tuple - GPU support for
int64
anduint64
reductions - vmap over reductions and arg reduction:
sum
,prod
,max
,min
,all
,any
argmax
,argmin
NN
- Softshrink activation
Bugfixes
- Comparisons with
inf
work, and fixmx.isinf
- Bug fix with RoPE cache
- Handle empty Matmul on the CPU
- Negative shape checking for
mx.full
- Correctly propagate
NaN
in some binary opsmx.logaddexp
,mx.maximum
,mx.minimum
- Fix > 4D non-contiguous binary ops
- Fix
mx.log1p
withinf
input - Fix SGD to apply weight decay even with 0 momentum
v0.0.11
Highlights:
- GGUF improvements:
- Native quantizations
Q4_0
,Q4_1
, andQ8_0
- Metadata
- Native quantizations
Core
- Support for reading and writing GGUF metadata
- Native GGUF quantization (
Q4_0
,Q4_1
, andQ8_0
) - Quantize with group size of 32 (2x32, 4x32, and 8x32)
NN
Module.save_weights
supports safetensorsnn.init
package with several commonly used neural network initializers- Binary cross entropy and cross entropy losses can take probabilities as targets
Adafactor
innn.optimizers
Bugfixes
- Fix
isinf
and friends for integer types - Fix array creation from list Python ints to
int64
,uint
, andfloat32
- Fix power VJP for
0
inputs - Fix out of bounds
inf
reads ingemv
mx.arange
crashes on NaN inputs
v0.0.10
Highlights:
- Faster matmul: up to 2.5x faster for certain sizes, benchmarks
- Fused matmul + addition (for faster linear layers)
Core
- Quantization supports sizes other than multiples of 32
- Faster GEMM (matmul)
- ADMM primitive (fused addition and matmul)
mx.isnan
,mx.isinf
,isposinf
,isneginf
mx.tile
- VJPs for
scatter_min
andscatter_max
- Multi output split primitive
NN
- Losses: Gaussian negative log-likelihood
Misc
- Performance enhancements for graph evaluation with lots of outputs
- Default PRNG seed is based on current time instead of 0
- Primitive VJP takes output as input. Reduces redundant work without need for simplification
- PRNGs default seed based on system time rather than fixed to 0
- Format boolean printing in Python style when in Python
Bugfixes
- Scatter < 32 bit precision and integer overflow fix
- Overflow with
mx.eye
- Report Metal out of memory issues instead of silent failure
- Change
mx.round
to follow NumPy which rounds to even
v0.0.9
Highlights:
- Initial (and experimental) GGUF support
- Support Python buffer protocol (easy interoperability with NumPy, Jax, Tensorflow, PyTorch, etc)
at[]
syntax for scatter style operations:x.at[idx].add(y)
, (min
,max
,prod
, etc)
Core
- Array creation from other mx.arrayβs (
mx.array([x, y])
) - Complete support for Python buffer protocol
mx.inner
,mx.outer
- mx.logical_and, mx.logical_or, and operator overloads
- Array at syntax for scatter ops
- Better support for in-place operations (
+=
,*=
,-=
, ...) - VJP for scatter and scatter add
- Constants (
mx.pi
,mx.inf
,mx.newaxis
, β¦)
NN
- GLU activation
cosine_similarity
loss- Cache for
RoPE
andALiBi
Bugfixes / Misc
- Fix data type with
tri
- Fix saving non-contiguous arrays
- Fix graph retention for inlace state, and remove
retain_graph
- Multi-output primitives
- Better support for loading devices
v0.0.7
Core
- Support for loading and saving HuggingFace's safetensor format
- Transposed quantization matmul kernels
mlx.core.linalg
sub-package withmx.linalg.norm
(Frobenius, infininty, p-norms)tensordot
andrepeat
NN
- Layers
Bilinear
,Identity
,InstanceNorm
Dropout2D
,Dropout3D
- more customizable
Transformer
(pre/post norm, dropout) - More activations:
SoftSign
,Softmax
,HardSwish
,LogSoftmax
- Configurable scale in
RoPE
positional encodings
- Losses:
hinge
,huber
,log_cosh
Misc
- Faster GPU reductions for certain cases
- Change to memory allocation to allow swapping
v0.0.6
Core
- quantize, dequantize, quantized_matmul
- moveaxis, swapaxes, flatten
- stack
- floor, ceil, clip
- tril, triu, tri
- linspace
Optimizers
- RMSProp, Adamax, Adadelta, Lion
NN
- Layers:
QuantizedLinear
,ALiBi
positional encodings - Losses: Label smoothing, Smooth L1 loss, Triplet loss
Misc
- Bug fixes