The source code of the Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data.
Quora: download from
ParaNMT: You can download from this paper ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations.
python >= 3.6
torch == 1.6.0
nltk == 3.4.5
zss == 1.2.0
We use the Stanford Parser to obtain the parse tree and template.
The command is as follows:
input file:
java -Xmx12g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 1 -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -file file.txt -outputFormat text -outputDirectory /outputdir/
input filelist:
java -Xmx12g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 1 -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -filelist filenames.txt -outputFormat text -outputDirectory /outputdir/
If the data is large, you can use the split command to divide the file into multiple small files for parsing. Then you can use the and in the autocg directory to extract parse tree and template.