You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the apriori implementation constructs a set of possible item combinations then iterates through this set to determine which combinations have a support above min_support. This iteration is slow and can be replaced by matrix equality operations and/or arithmetic. I propose the following changes:
while max_itemset and max_itemset < (max_len or float('inf')):
next_max_itemset = max_itemset + 1
combin = np.array(list(generate_new_combinations(itemset_dict[max_itemset])))
if combin.size == 0:
break
if verbose:
print('\rProcessing %d combinations | Sampling itemset size %d' %
(combin.size[0], next_max_itemset), end="")
if is_sparse:
all_ones = np.ones((int(rows_count), 1))
_bools = X[:, combin[:, 0]] == all_ones
for n in range(1, combin.shape[1]):
_bools = _bools & (X[:, combin[:, n]] == all_ones)
else:
_bools = np.all(X[:, combin], axis=2)
support = _support(np.array(_bools), rows_count, is_sparse)
_mask = (support >= min_support).reshape(-1)
if any(_mask):
itemset_dict[next_max_itemset] = np.array(combin[_mask])
support_dict[next_max_itemset] = np.array(support[_mask])
max_itemset = next_max_itemset
else:
break
In practice, this implementation is much faster than the current implementation, albeit at the cost of a slightly increased memory footprint:
Currently, the apriori implementation constructs a set of possible item combinations then iterates through this set to determine which combinations have a support above
min_support
. This iteration is slow and can be replaced by matrix equality operations and/or arithmetic. I propose the following changes:In practice, this implementation is much faster than the current implementation, albeit at the cost of a slightly increased memory footprint:
https://gist.github.com/jmayse/ad688d6a7fd842269996a701d7cecd4c
However, it usually remains slower than the
fpgrowth
implementation, as is expected:https://gist.github.com/jmayse/7c76a2d838ac164b923a47b29527f2ed
The text was updated successfully, but these errors were encountered: