What is this?

A GPT-2 model (medium version, ~354.8 M parameters) for Danish text generation. The model was not pre-trained from scratch but adapted from the English version using CLP-Transfer.

How to use

Test the model using the pipeline from the 🤗 Transformers library:

from transformers import pipeline

generator = pipeline("text-generation", model = "KennethTM/gpt2-medium-danish")
text = generator("Manden arbejdede som")

print(text[0]["generated_text"])

Or load it using the Auto* classes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/gpt2-medium-danish")
model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-medium-danish")

Model training

The training data are the Danish part of the oscar dataset ('unshuffled_deduplicated_da') and a context length of 1024 tokens.

The model weights are initialized from the English GPT-2 medium model ('source model') with new word token embeddings created from the Danish GPT-2 small model ('helper model') using the CLP-Transfer method.

The model is trained using ~1.000.000 samples.

For reference, the model achieves a perplexity of 24.7 on 5.000 random validation samples.

The model is trained on an 8 GB GPU.

Notes

This is a pre-trained model, for optimal performance it should be finetuned for new tasks.

Downloads last month
24
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train KennethTM/gpt2-medium-danish