Optimization of apriori algorithm by replacing iterator with matrix operations #566

jmayse · 2019-07-18T16:42:08Z

Currently, the apriori implementation constructs a set of possible item combinations then iterates through this set to determine which combinations have a support above min_support. This iteration is slow and can be replaced by matrix equality operations and/or arithmetic. I propose the following changes:

    while max_itemset and max_itemset < (max_len or float('inf')):
        next_max_itemset = max_itemset + 1
        combin = np.array(list(generate_new_combinations(itemset_dict[max_itemset])))

        if combin.size == 0:
            break

        if verbose:
            print('\rProcessing %d combinations | Sampling itemset size %d' %
                  (combin.size[0], next_max_itemset), end="")

        if is_sparse:
            all_ones = np.ones((int(rows_count), 1))
            _bools = X[:, combin[:, 0]] == all_ones
            for n in range(1, combin.shape[1]):
                _bools = _bools & (X[:, combin[:, n]] == all_ones)
        else:
            _bools = np.all(X[:, combin], axis=2)

        support = _support(np.array(_bools), rows_count, is_sparse)
        _mask = (support >= min_support).reshape(-1)

        if any(_mask):
            itemset_dict[next_max_itemset] = np.array(combin[_mask])
            support_dict[next_max_itemset] = np.array(support[_mask])
            max_itemset = next_max_itemset
        else:
            break

In practice, this implementation is much faster than the current implementation, albeit at the cost of a slightly increased memory footprint:

https://gist.github.com/jmayse/ad688d6a7fd842269996a701d7cecd4c

However, it usually remains slower than the fpgrowth implementation, as is expected:

https://gist.github.com/jmayse/7c76a2d838ac164b923a47b29527f2ed

The text was updated successfully, but these errors were encountered:

jmayse mentioned this issue Jul 18, 2019

Apriori optimization #567

Merged

5 tasks

rasbt closed this as completed in #567 Jul 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of apriori algorithm by replacing iterator with matrix operations #566

Optimization of apriori algorithm by replacing iterator with matrix operations #566

jmayse commented Jul 18, 2019 •

edited

Loading

Optimization of apriori algorithm by replacing iterator with matrix operations #566

Optimization of apriori algorithm by replacing iterator with matrix operations #566

Comments

jmayse commented Jul 18, 2019 • edited Loading

jmayse commented Jul 18, 2019 •

edited

Loading