Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data

The source code of the Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data.

Datasets

Quora: download from https://drive.google.com/file/d/1RdIQEoWJbm4HtNYaxFHjleBgX5FIZZtp/view?usp=sharing.

ParaNMT: You can download from this paper ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations.

Requirements

python >= 3.6
torch == 1.6.0
nltk == 3.4.5
zss == 1.2.0

Data Processing

We use the Stanford Parser to obtain the parse tree and template.

The command is as follows:

input file：
java -Xmx12g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 1 -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -file file.txt -outputFormat text -outputDirectory /outputdir/
input filelist:
java -Xmx12g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 1 -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -filelist filenames.txt -outputFormat text -outputDirectory /outputdir/

If the data is large, you can use the split command to divide the file into multiple small files for parsing. Then you can use the pos_to_file.py and template.py in the autocg directory to extract parse tree and template.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
autocg		autocg
data-aug		data-aug
evaluation		evaluation
parser		parser
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
command.txt		command.txt
content_recognition.py		content_recognition.py
cvae_para5m-parse.yaml		cvae_para5m-parse.yaml
cvae_para5m-template.yaml		cvae_para5m-template.yaml
cvae_quaro-parse.yaml		cvae_quaro-parse.yaml
cvae_quaro-template.yaml		cvae_quaro-template.yaml
data_init.py		data_init.py
fine_tune.py		fine_tune.py
generate_sentence.py		generate_sentence.py
generator.py		generator.py
integrate.py		integrate.py
model_cvae.py		model_cvae.py
paranmt_fine-tune_config.yaml		paranmt_fine-tune_config.yaml
quaro_fine-tune_config.yaml		quaro_fine-tune_config.yaml
train_cvae.py		train_cvae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data

Datasets

Requirements

Data Processing

About

Releases

Packages

Languages

lanse-sir/SUP

Folders and files

Latest commit

History

Repository files navigation

Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data

Datasets

Requirements

Data Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages