How to contribute

Every system is a separate yaml file. The first few fields contain basic metadata about the system/model, the rest of the file is a set of triples of _class, _link and _notes. Class can be one of three values: 🟩 open, 🟧 partial or 🟥 closed (leave empty to signify NA). Link is a URL providing evidence for the openness classification. Notes provide context and reasoning for the classification.

You're free to build on this work and reuse the data. It is licensed under CC-BY 4.0, with the stipulation that attribution should come in the form of a link to http://opening-up-chatgpt.github.io and a citation to the paper in which the initial dataset & criteria were published:

Andreas Liesenfeld, Alianda Lopez, and Mark Dingemanse. 2023. Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators. In Proceedings of the 5th Conference on Conversational User Interfaces (CUI ’23), July 19–21, 2023, Eindhoven, The Netherlands.

Which models are included?

The index aims to include any instruct-tuned generative AI system or model that is described by the responsible organisation or builder as "open-source" or "open", or that is marketed as such by offical outlets of the responsible organisation or builder. Generally, the index aims to:

refer to the model by its most recent version, without naming the model size. Evaluation is then based around the largest model in the family. This may skip over some nuances with how different sizes might use different base models and fails to capture how models have evolved over time.
the index is periodially updated by our small team (or community contributers!) to capture how models and the information supplied related to them evolve. For instance, as new models first get released and then get preprints, related 'preprint' and 'paper' entries may be updated in due course.
models spanning across different modalities may be includes in more than one modality category (text; image; video etc) leading to multiple entries in the index.

Openness Criteria and system information

For each model, the yaml files in this database collect (1)some general information about the system, (2) about the organization behind it, (2) and about 14 dimension of openness. The below list spells out for the openness criteria for features in the areas of system information and organisation, followed by openness criteria groupt into 'Availability', 'Documentation' and 'Access'. Use these guidelines to document determinations of openness levels as precisely as possible, including links to evidence. Notes are optional.

System

name: Name of the model including eventual version number or size indication, e.g. Llama 3.1 or Olmo-7B-instruct

link: Link to official model publisher website or, if that does not exist, platform hosting the model.

type: Model type in one word, e.g. text, video, audio. Multiple keywords possible.

basemodelname: If applicable, name of base model ("foundation model") that was used.

endmodelname: Name of the model the enduser interacts with.

endmodellicense: License that applies to enduser interaction with the model.

releasedate: Earliest release date of the model through any offical source, in YYYY MMM format, e.g. 2024 NOV.

Organisation

name:  Organisation that released the model. Usually synonymous with model builder.

link: Link to offical source of information about model release, e.g. offical website or blog.

Availability

Datasources Basemodel

Are datasources for training the base model comprehensively documented and freely made available?

🟥 Training data sources of base large language model are not open for inspection or shared.

🟧 Some of the training data sources of base large language model are open for inspection or shared.

🟩 All training  data sources of base large language model are not open for inspection or shared.

Datasources Endmodel

Are datasources for training the model that the enduser interacts with comprehensively documented and freely made available?

🟥 Training data sources of the end model are not open for inspection or shared.

🟧 Some of the training data sources of end large language model are open for inspection or shared.

🟩 All training  data sources of end large language model are not open for inspection or shared.

Weights basemodel

Are the weights of the base models made freely available?

🟥 Weights of the base model are not shared.

🟧 Weights of the base model are partially/not fully shared.

🟩 Weights of the base model are shared.

Weights endmodel

Are the weights of the model that the enduser interacts with made freely available?

🟥 Weights of the user-facing end model are not shared.

🟧 Weights of the user-facing end model are partially/not fully shared.

🟩 Weights of the user-facing end model are shared.

Training Code

Is the source code of datasource processing, model training and tuning comprehensively and freely made available?

🟥 No source code available.

🟧 Some source code is open.

🟩 Project source code openly available and fully open available for inspection.

Documentation

Code Documentation

Is the source code of datasource processing, model training and tuning comprehensively documented?

🟥 Code documentation not available.

🟧 Some components of the system features code documentation, but not every step of base and/or end model training and tuning  is documented (irrespective of whether these components are shared).

🟩 All components of the system features a comprehensive code documentation.

Architecture Documentation

Is the hardware architecture used for datasource processing and model training comprehensively documented?

🟥 System architecture and model training setup are not documented.

🟧 System architecture and model training setup is partially documented.

🟩 System architecture and model training setup is fully documented.

Preprint

Are archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps?

🟥 No archived preprint(s) available.

🟧 Archived preprint(s) that detail some parts of the system including datasource processing, model training and tuning steps.

🟩 Archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps.

Paper

Are peer-reviewed scientific publications available that detail all major parts of the system including datasource processing, model training and tuning steps?

🟥 No peer-reviewed paper(s) available.

🟧 Peer-reviewed paper(s) detail parts of the software including base models, fine-tuning, or RLHF components.

🟩 Peer-reviewed paper(s) are available that cover all parts of the software including base models, fine-tuning, and RLHF components.

Model card

Is a model card in standardized format available that provides comprehensive insight on model architecture, training, fine-tuning, and evaluation are available?

🟥 Model card(s) not available.

🟧 Model card(s) that provide partial insight on model architecture, training, fine-tuning, and evaluation are available.

🟩 Model card(s) are available that provide comprehensive insight on model architecture, training, fine-tuning, and evaluation are available.

Datasheet

Is a datasheet as defined in "Datasheets for Datasets" (Gebru et al. 2021) available?

🟥 Datasheet(s) are not available.

🟧 Datasheet(s) that provide partial insight on data collection and curation are available.

🟩 Datasheet(s) are available that provide comprehensive insight on data collection and curation are available following the standards defined in [Datasheets for Datasets](https://doi.org/10.1145/3458723) by Gebru et al. (2021)

Access methods

Package

Is a packaged release of the model available on a software repository (e.g. a Python Package Index, Homebrew)?

🟥 No index software package is available.

🟧 User-oriented code or web-interface is available but not as a versioned package.

🟩 A packaged release of the model available on a software repository is available (e.g. a Python Package Index, Homebrew).

API

Is an API available that provides unrestricted access to the model (other than security and CDN restrictions)?

🟥 No API access.

🟧 Commerial or restricted-access user API is available.

🟩 An API available that provides unrestricted access to the model (other than security and CDN restrictions).

Licenses

Is the project fully covered by Open Source Initiative (OSI)-approved licenses, including all data sources and training pipeline code?

🟥 The project is not licensed clearly or does not use an Open Source Initiative (OSI)-approved license.

🟧 Only parts of the model and data sources are released under an  Open Source Initiative (OSI)-approved license, such as model weights.

🟩 The project is fully covered by Open Source Initiative (OSI)-approved license, including all data sources and training pipeline code.

Additional openness criteria for text-to-image generators only

Watermarking

Are watermarking techniques comprehensively documented and shared?

🟥 Watermarking techniques is used but not documented. 

🟧 Some information about watermarking techniques are documented and/or shared. 

🟩 Watermarking techniques are comprehensively documented and shared or not applied.

Prompt moderation

Is prompt moderation comprehensively documented and shared?

🟥 Prompt moderation is used but not documented. 

🟧 Some information about prompt moderation is documented and/or shared. 

🟩 Prompt moderation is comprehensively documented and shared or not applied.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.github/workflows		.github/workflows
AMD Llama Code.yaml		AMD Llama Code.yaml
AMD Nitro Diffusion.yaml		AMD Nitro Diffusion.yaml
AlchemistCoder.yaml		AlchemistCoder.yaml
AltDiffusion.yaml		AltDiffusion.yaml
AquilaChat.yaml		AquilaChat.yaml
AquilaCode.yaml		AquilaCode.yaml
Arabic StableLM.yaml		Arabic StableLM.yaml
BELLE.yaml		BELLE.yaml
BERT.yaml		BERT.yaml
BTLM.yaml		BTLM.yaml
Baichuan.yaml		Baichuan.yaml
CT-LLM.yaml		CT-LLM.yaml
ChatMusician.yaml		ChatMusician.yaml
ChatRWKV.yaml		ChatRWKV.yaml
Claire.yaml		Claire.yaml
CodeGeeX.yaml		CodeGeeX.yaml
CodeGemma.yaml		CodeGemma.yaml
CodeGen.yaml		CodeGen.yaml
CodeLlama.yaml		CodeLlama.yaml
CodeT5.yaml		CodeT5.yaml
Codetulu.yaml		Codetulu.yaml
CogView.yaml		CogView.yaml
Conversational Speech Model.yaml		Conversational Speech Model.yaml
Cosmos.yaml		Cosmos.yaml
Crystal.yaml		Crystal.yaml
DeepFloyd.yaml		DeepFloyd.yaml
DeepHermes.yaml		DeepHermes.yaml
DeepSeek-Coder.yaml		DeepSeek-Coder.yaml
DynamiCrafter.yaml		DynamiCrafter.yaml
FLUX.1.yaml		FLUX.1.yaml
Falcon.yaml		Falcon.yaml
FastChat-T5.yaml		FastChat-T5.yaml
Fietje.yaml		Fietje.yaml
GLM.yaml		GLM.yaml
GPT-NeoXT.yaml		GPT-NeoXT.yaml
Geitje.yaml		Geitje.yaml
Gemma Japanese.yaml		Gemma Japanese.yaml
Gemma.yaml		Gemma.yaml
Granite Code.yaml		Granite Code.yaml
Granite.yaml		Granite.yaml
H2O-Danube.yaml		H2O-Danube.yaml
Hunyuan Video.yaml		Hunyuan Video.yaml
Hunyuan.yaml		Hunyuan.yaml
Infinity-Instruct.yaml		Infinity-Instruct.yaml
InternLM.yaml		InternLM.yaml
JASCO.yaml		JASCO.yaml
Jais.yaml		Jais.yaml
K2.yaml		K2.yaml
Llama-Sherkala.yaml		Llama-Sherkala.yaml
LongAlign.yaml		LongAlign.yaml
Lucie.yaml		Lucie.yaml
MAGNeT.yaml		MAGNeT.yaml
MPT.yaml		MPT.yaml
MimicMotion.yaml		MimicMotion.yaml
Minimax-Text.yaml		Minimax-Text.yaml
Mistral.yaml		Mistral.yaml
MusicGen.yaml		MusicGen.yaml
NOVA Video.yaml		NOVA Video.yaml
Nanbeige.yaml		Nanbeige.yaml
Neo.yaml		Neo.yaml
NeuralChat.yaml		NeuralChat.yaml
NexusRaven.yaml		NexusRaven.yaml
OLMo.yaml		OLMo.yaml
OPT.yaml		OPT.yaml
OmniGen.yaml		OmniGen.yaml
Open-Assistant.yaml		Open-Assistant.yaml
Open-Sora.yaml		Open-Sora.yaml
OpenChat.yaml		OpenChat.yaml
OpenCodeInterpreter.yaml		OpenCodeInterpreter.yaml
OpenCoderPlus.yaml		OpenCoderPlus.yaml
OpenELM.yaml		OpenELM.yaml
OpenMoE.yaml		OpenMoE.yaml
Persimmon.yaml		Persimmon.yaml
Phi.yaml		Phi.yaml
PixArt.yaml		PixArt.yaml
Pythia.yaml		Pythia.yaml
Qwen.yaml		Qwen.yaml
RWKV.yaml		RWKV.yaml
RedPajama.yaml		RedPajama.yaml
Salamandra.yaml		Salamandra.yaml
SantaCoder.yaml		SantaCoder.yaml
Snowflake Arctic.yaml		Snowflake Arctic.yaml
Solar.yaml		Solar.yaml
Stable Beluga.yaml		Stable Beluga.yaml
StarChat.yaml		StarChat.yaml
StarCoder.yaml		StarCoder.yaml
Starling.yaml		Starling.yaml
StripedHyena.yaml		StripedHyena.yaml
T5.yaml		T5.yaml
Teuken.yaml		Teuken.yaml
Tulu.yaml		Tulu.yaml
Vicuna.yaml		Vicuna.yaml
VideoCrafter.yaml		VideoCrafter.yaml
WaveCoder.yaml		WaveCoder.yaml
WizardLM.yaml		WizardLM.yaml
XGen.yaml		XGen.yaml
Xwin-LM.yaml		Xwin-LM.yaml
Yi-Coder.yaml		Yi-Coder.yaml
Yi.yaml		Yi.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to contribute

Which models are included?

Openness Criteria and system information

System

Organisation

Availability

Documentation

Access methods

Additional openness criteria for text-to-image generators only

About

Releases

Packages

Contributors 6

Language-Technology-Assessment/main-database

Folders and files

Latest commit

History

Repository files navigation

How to contribute

Which models are included?

Openness Criteria and system information

System

Organisation

Availability

Documentation

Access methods

Additional openness criteria for text-to-image generators only

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages