Skip to content

mangopy/long-form-generation-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

long-form-generation-llm

Factuality

  1. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
  2. Language Models Hallucinate, but May Excel at Fact Verification
  3. RAGAS: Automated Evaluation of Retrieval Augmented Generation (sentence-level generation)
  4. FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
  5. Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
  6. Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompting Method
  7. Fine-tuning Language Models for Factuality
  8. Chain-of-Verification Reduces Hallucination in Large Language Models
  9. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
  10. RARR: Researching and Revising What Language Models Say, Using Language Models
  11. FELM: Benchmarking Factuality Evaluation of Large Language Models
  12. Improving Model Factuality with Fine-grained Critique-based Evaluator
  13. Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
  14. RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
  15. FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
  16. FactAlign: Long-form Factuality Alignment of Large Language Models
  17. Counterfactual Generation from Language Models
  18. LongReward: Improving Long-context Large Language Models with AI Feedback
  19. MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Dataset resource

Long-form QA dataset (question, long answer)

  1. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences
  2. LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recal (based on ELI5)
  3. L-Eval: Instituting Standardized Evaluation for Long Context Language Models
  4. MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
  5. AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization
  6. ExpertQA: Expert-Curated Questions and Attributed Answers

NLI dataset (claim, evidence)

  1. WICE: Real-World Entailment for Claims in Wikipedia

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published