GDTM - Graph-based Dynamic Topic Model

The software for the algorithm presented in the following paper:

To be added PDF

Description

GDTM is a single-pass DTM approach that combines a context-rich and incremental feature representation model, called Random Indexing (RI) with a novel online graph partitioning algorithm to address scalability and dynamicity in topic modeling over short texts. In addition, GDTM uses a rich language modeling approach based on the Skip-gram technique to account for sparsity.

Usage

#Synopsis
java -jar gdtm δ α γ

#Params:
- δ # Function words adjustment parameter {value >= 1}
- α # Partition expansion threshold {value = [0...1]}
- γ # Function word elimination threshold {value = [0...1]}

Following is a list of arbitrary parameters to costumize or enhance the performance relative to the volume of the stream.

RI Params

-dim: the dimension of the vector. {value >= 2}. default = 2000

-noz: the number of non zero elemtns. {value = [1...dim]}. default = 8

-win: the size of the moving window to construct the contex structures. default = 2

-mwt: RI vectors pruning parameter. {value = [0...1]}. default = 0.3

See also

-skip: Skip-gram value. {1 = bigram, 2 = 1-skip-bigram, 3 = 2-skip-bigram, ...}

-SN (snapshot): the algorithm will take a snapshot of the partitioned documents and clean the momry.

-intput: the input can be set arbitrarily.

-output: the output can be set arbitrarily

Input Data Format

Document₁
Document₂
...
Document_n

Output

T₁:L₁₁  T₂:L₁₂  ...  T_m:L_1m
T₁:L₂₁  T₂:L₂₂  ...  T_m:L_2m
...
T₁:L_n1  T₂:L_n2  ...  T_m:L_nm

Where T_i indicates the topic number i and L_ji indicates the corresponding likelihood of the topic i for the document j.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
auxiliary		auxiliary
bin		bin
data		data
img		img
jars		jars
src		src
README.md		README.md
gdtm		gdtm
longstoplist_en.txt		longstoplist_en.txt
props.properties		props.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDTM - Graph-based Dynamic Topic Model

Description

Usage

Input Data Format

Output

Protocol

Contributors

Acknowledgement

About

Releases

Packages

Languages

kambizG/GDTM

Folders and files

Latest commit

History

Repository files navigation

GDTM - Graph-based Dynamic Topic Model

Description

Usage

Input Data Format

Output

Protocol

Contributors

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages