A notebook for question and answer generation using one of the most powerful opensource NLU models, FLAN-T5-11B. #215

Rallio67 · 2022-12-31T21:13:37Z

This is code that can be run in a notebook or by itself to generate a dictionary for use in creating synthetic dialogue that can be verified for factual accuracy. To use this notebook your need your trusted source material to be in the format of a list of strings (they will be truncated to under 1100 characters). Requires transformers and accelerate. Make sure to use T5 with bfloat16 or full precision.

It would be nice if someone can convert this approach to work on colab. T5-11B should be able to run on TPU with colab.

This is code that can be run in a notebook or by itself to generate a dictionary for use in creating synthetic dialogue that can be verified for factual accuracy. To use this notebook your need your trusted source material to be in the format of a list of strings (they will be truncated to under 1100 characters). Requires transformers and accelerate. Make sure to use T5 with bfloat16 or full precision.

yk · 2022-12-31T23:42:13Z

could you a) run pre-commit to pass linting, and b) could you have a look at how the folder structure of notebooks/ looks like and make yours in the same way? doesn't have to be .ipynb obviously, but the accompanying short markdown file that informs people what a piece of code does is quite helpful

TwoDukes · 2023-01-01T08:37:21Z

It would be nice if someone can convert this approach to work on colab. T5-11B should be able to run on TPU with colab.

Colab Link
Swapped down to -xl model by default for free tier users wanting to work with it. Can be changed to back to -xxl if you have pro available

Moved to proper folder structure

Rallio67 · 2023-01-01T23:19:46Z

I think it should be fixed now. I ran pre-commit and added the .md file and changed to have correct folder structure. Let me know if this works now. TwoDukes you may want to use the changes I put in. There was some whitespace problems and a few unused variables that were left over from other things I was doing before. I actually forgot incorporating the logits score was some custom code I wrote. I think I implemented it correctly, but someone else checking that may be helpful.

yk · 2023-01-02T10:13:27Z

you might need to add & commit again after running pre-commit if it was not all green. there are 2 things it does: auto-fixes (in which case the file has new changes & needs to be added again), or it yells at you, in which case you need to fix it (and also add the file again). pre-commit is currently still failing, so if you ran it, you might need to add & commit it again.
For the future, if you run pre-commit install once in the repo, it will do this automatically on every commit in the future
You might also think of making your code just a bit more interactive. It's not super important because code in the notebooks/ folder is meant as scrappy demonstrations of stuff, but an easy thing you could do is instead of making your scripts be top-level scripts, make them into typer scripts. typer makes it super easy to make a script into a cli, then all the constants that are just "somewhere in the code" can be made automatically into command line parameters, and a help text is generated, etc. etc. i.e. not urgent and not necessary, but it might even speed up your own development process :)

andreaskoepf · 2023-01-05T12:10:14Z

@Rallio67 pre-commit still reports problems, can you fix and give it another try?

gameveloster · 2023-01-12T04:06:47Z

Whats the GPU VRAM requirements for the XXL model?

Rallio67 · 2023-01-13T00:57:30Z

You need a 24 gigabyte card and you need to use bfloat16. RTX3090 and other ampere level cards works with 24 gigabyte memory like the A10, A100, A5000 etc.

Rallio67 requested review from yk and andreaskoepf as code owners December 31, 2022 21:13

fozziethebeat added the ml label Jan 1, 2023

Rallio67 added 3 commits January 1, 2023 18:11

Create T5_closed_book_QA_generators.py

55c8d33

Create T5_closed_book_QA_generators.md

ada3531

Delete T5_closed_book_QA_generators.py

6d27d31

Moved to proper folder structure

fixed pre-commit issues

30f370f

yk requested a review from andrewm4894 as a code owner January 14, 2023 22:15

yk enabled auto-merge (squash) January 14, 2023 22:15

yk approved these changes Jan 14, 2023

View reviewed changes

yk merged commit a203900 into LAION-AI:main Jan 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A notebook for question and answer generation using one of the most powerful opensource NLU models, FLAN-T5-11B. #215

A notebook for question and answer generation using one of the most powerful opensource NLU models, FLAN-T5-11B. #215

Rallio67 commented Dec 31, 2022

yk commented Dec 31, 2022

TwoDukes commented Jan 1, 2023

Rallio67 commented Jan 1, 2023

yk commented Jan 2, 2023

andreaskoepf commented Jan 5, 2023

gameveloster commented Jan 12, 2023

Rallio67 commented Jan 13, 2023

A notebook for question and answer generation using one of the most powerful opensource NLU models, FLAN-T5-11B. #215

A notebook for question and answer generation using one of the most powerful opensource NLU models, FLAN-T5-11B. #215

Conversation

Rallio67 commented Dec 31, 2022

yk commented Dec 31, 2022

TwoDukes commented Jan 1, 2023

Rallio67 commented Jan 1, 2023

yk commented Jan 2, 2023

andreaskoepf commented Jan 5, 2023

gameveloster commented Jan 12, 2023

Rallio67 commented Jan 13, 2023