# Contributing The best way to contribute growing P3 is by writing prompts for new datasets! ### What are Prompts? A prompt consists of a template: input template and target template, along with collection of associated metadata. A template is a piece of code written in a templating language called [Jinja](https://jinja.palletsprojects.com/en/3.0.x/). A template defines a function that maps an example from a dataset in the [Hugging Face datasets library](https://huggingface.co/datasets) to two strings of text. The first is called the _input_ which provides all information that will be available to solve a task, such as the instruction and the context. The second piece is called the _target_, which is the desired response to the prompt. ### Quick-Start Guide to Writing Prompts 1. **Set up the app.** Fork the app and set up using the [README](https://github.com/bigscience-workshop/promptsource/blob/main/README.md). 1. **Examine the dataset.** In the "Sourcing" mode, select or type the dataset into the dropdown. If the dataset has subsets (subsets are not the same as splits), you can select which one to work on. Note that prompts are subset-specific. You can find out background information on the dataset by reading the information in the app. The dataset is a collection of examples, and each example is a Python dictionary. The sidebar will tell you the schema that each example has. 1. **Start a new prompt**. Enter a name for your first prompt and hit "Create." You can always update the name later. If you want to cancel the prompt, select "Delete Prompt." 1. **Write the prompt**. In the box labeled "Template," enter a Jinja expression. See the [getting started guide](#getting-started-using-jinja-to-write-prompts) and [cookbook](#jinja-cookbook) for details on how to write templates. 1. **Fill in metadata**. Fill in the metadata for the current prompt: reference, original task, choices in templates, metrics, languages, and answer choices. See [Metadata](#metadata) for more details about these fields. 1. **Save the prompt**. Hit the "Save" button. The output of the prompt applied to the current example will appear in the right sidebar. 1. **Verify the prompt**. Check that you didn't miss any case by scrolling through a handful of examples of the prompted dataset using the "Prompted dataset viewer" mode. 1. **Write between 5 and 10 prompts**. Repeat the steps 4 to 9 to create between 5 and 10 (more if you want!) prompts per dataset/subset. Feel free to introduce a mix of formats, some that follow the templates listed in the [best practices](#best-practices) and some that are more diverse in the format and the formulation. 1. **Duplicate the prompts(s).** If the dataset you have chosen bear the same format as other datasets (for instance, `MNLI` and `SNLI` have identical formats), you can simply duplicate the prompts you have written to these additional datasets. 1. **Upload the template(s).** Submit a PR using the instructions [here](#uploading-prompts). ## Getting Started Using Jinja to Write Prompts Here is a quick crash course on using [Jinja](https://jinja.palletsprojects.com/en/3.0.x/) to write templates. More advanced usage is in the [cookbook](#jinja-cookbook). Generally, in a template, you'll want to use a mix of hard-coded data that is task-specific and stays the same across examples, and commands that tailor the input and target to a specific example. To write text that should be rendered as written, just write it normally. The following "template" will produce the same text every time: ```jinja2 This is just literal text that will be printed the same way every time. ``` To make your template do something more interesting, you'll need to use Jinja expressions. Jinja expressions are surrounded by curly braces `{` and `}`. One common thing you'll want to do is access information in the dataset example. When applied to an example, you can access any value in the example dictionary via its key. If you just want to print that value surround it in double curly braces. For example, if you want to print a value with the key `text`, use this: ```jinja2 The text in this example is {{ text }}. ``` You can also use information from the example to control behavior. For example, suppose we have a label with the key `label` in our example, which either has a value of 0 or 1. That's not very "natural" language, so maybe we want to decide which label name to use based on the example. We can do this by creating a list and indexing it with the example key: ```jinja2 The label for this example is {{ ["Label A", "Label B"][label] }}. ``` We can also use dictionaries for the same thing: ```jinja2 The label for this example is {{ {"a": "Label A", "b": "Label B" }[label] }}. ``` Note that some things in a template are particular to the task, and should not be modified by downstream steps that try to increase the diversity of the prompts. A common example is listing label names in the prompt to provide choices. Anything that should not be modified by data augmentation should be surrounded by double curly braces and quoted. For example: ```jinja2 The choices are {{"a"}}, {{"b"}}, and {{"c"}}. ``` You can leave binary options like yes/no, true/false, etc. unprotected. Finally, remember that a template must produce two strings: an input and a target. To separate these two pieces, use three vertical bars `|||`. So, a complete template for Squad could be: ```jinja2 I'm working on the final exam for my class and am trying to figure out the answer to the question "{{question}}" I found the following info on Wikipedia and I think it has the answer. Can you tell me the answer? {{context}} ||| {{answers["text"][0]}}' ``` ## Metadata In addition to the template itself, you need to fill out several other fields. These metadata facilitate finding and using the prompts. * **Prompt Reference.** If your template was inspired by a paper, note the reference in the "Prompt Reference" section. You can also add a description of what your template does. * **Original Task?** The checkbox should be checked if the template requires solving a task that the underlying dataset is used to study. For example, a template that asks a question from a question answering dataset would be an original task template, but one that asks to generate a question for a given answer would not. * **Choices in Template?** The checkbox should be checked if the input explicitly indicates the options for the possible outputs (regardless of whether `answer_choices` is used). * **Metrics.** Use the multiselect widget to select all metrics commonly used to evaluate this task. Choose “Other” if there is one that is not included in the list. * **Languages.** Use the multiselect widget to select all languages used in the prompt. This is independent of what languages are used in the underlying dataset. For example, you could have an English prompt for a Spanish dataset. * **Answer Choices.** If the prompt has a small set of possible outputs (e.g., Yes/No, class labels, entailment judgements, etc.), then the prompt should define and use answer choices as follows. This allows evaluation to consider just the possible targets for scoring model outputs. The answer choices field is a Jinja expression that should produce a `|||` separated list of all possible targets. If the choices don't change from example to example, then you can just list them. For example, AG News is ```jinja2 World News ||| Sports ||| Business ||| Science and Technology ``` Note that whitespace is stripped from the ends of the choices. If answer choices are set, then they are also available to Jinja in the prompt itself in the form of a list called `answer_choices`. You should use this list in both input and target templates so that the resulting inputs and targets match the answer choices field exactly. For example, a prompt for AG News could use `answer_choices` like this: ```jinja2 {{text}} Which of the following sections of a newspaper would this article likely appear in? {{answer_choices[0]}}, {{answer_choices[1]}}, {{answer_choices[2]}}, or {{answer_choices[3]}}? ||| {{ answer_choices[label] }} ``` Since Answer Choices is a Jinja expression that has access to the example, it can also be used to extract example-specific choices from the underlying data. For example, in AI2 ARC, we could use ```jinja2 {{choices.text | join("|||")}} ``` ## Best Practices * **Writing target templates.** The target template should only contain the answer to the task. It should not contain any extra text such as “The answer is…” (unless that extra text is also in `answer_choices`). If `answer_choices` is populated, the output should only contain the values in `answer_choices`. * **Formatting multple-choice questions.** If the target should match the name of the choice (e.g., “World News”), then it should list the choices either as part of a grammatical question or a list with the marker for each (e.g, dashes). If the target should indicate the choice from the list (e.g., “A,” “Explanation 1,” etc.), then it should list the choices with the indicator before each one. * **Choosing input and target pairs.** Lots of datasets have multiple columns that can be combined to form different (input, target) pairs i.e. different "tasks". Don't hesitate to introduce some diversity by prompting a given dataset into multiple tasks and provide some description in the "Template Reference" text box. An example is given in the already prompted `movie_rationales`. * **Filtering prompts.** If a prompt is applied to an example and produces an empty string, that prompt/example pair will be skipped. You can therefore create prompts that only apply to a subset of the examples by wrapping them in Jinja if statements. For example, in the `TREC` dataset, there are fine-grained categories that are only applicable to certain coarse-grained categories. We can capture this with the following prompt: ```jinja2 {% if label_coarse == 0 %} Is this question asking for a {{"definition"}}, a {{"description"}}, a {{"manner of action"}}, or a {{"reason"}}? {{text}} ||| {{ {0: "Manner", 7: "Defintion", 9: "Reason", 12: "Description"}[label_fine] }} {% endif %} ``` For datasets that have splits with no labels (for instance test split without ground truth labels), you can wrap the conditional statement on the target side. For instance for `super_glue/boolq`, the following prompt would return an empty target on the test split, but not an empty prompted example: ```jinja2 {{ passage }} Question: {{ question }} Answer: ||| {% if label != -1 %} {{ answer_choices[label] }} {% endif %} ``` * **Conditional generation format.** Always specify the target and separate it from the prompt by indicating the vertical bars `|||`. The target will be generated by a generative model conditioned on the input you wrote. You can always transform an "infix" prompt format ```jinja2 Given that {{premise}}, it {{ ["must be true", "might be true", "must be false"][label] }} that {{hypothesis}} ``` into a conditional generation format ```jinja2 Given that {{premise}}, it {{ "must be true, might be true, or must be false" }} that {{hypothesis}}?||| {{ ["must be true", "might be true", "must be false"][label] }} ``` * **Pre-defined formats.** The goal is to collect a diverse set of prompts with diverse formats, but we also want to include a few less diverse prompts that follow the following two structures: 1) A question-answer pair with optional multiple choices like: ``` [Context] # optional depending on the task [Question] [Label1], [Label2], [Label3] # optional depending on the task ``` So for SNLI it will look like: ```jinja2 {{premise}} Is it the case that {{hypothesis}}? {{ "Yes" }}, {{ "No" }}, {{ "Maybe" }} ||| {{ ["Yes", "No", "Maybe"][label] }} ``` 2) Task description followed by the input. So for SNLI it will look like: ```jinja2 Determine the relation between the following two sentences. The relations are entailment, contradiction, or neutral. {{premise}} {{hypothesis}} ||| {{label}} ``` * **Setting variables.** You might want to use the Jinja expression `{% set %}` to define a variable. If you do, do it at the beginning of the prompt, outside any conditional statements, so that the automatic prompt checks recognize that the variable is defined. ## More Examples Here are a few interesting examples of prompts with explanations. Here's one for `hellaswag`: ```jinja2 First, {{ ctx_a.lower() }} Then, {{ ctx_b.lower() }}... Complete the above description with a chosen ending: (a) {{ answer_choices[0] }} (b) {{ answer_choices[1] }} (c) {{ answer_choices[2] }} (d) {{ answer_choices[3] }} ||| {{ answer_choices[label | int()] }} ``` Notice how it uses functions to consistently capitalize the information and provides lots of context (referring explicitly to "description" and "chosen ending.") Here's one for `head_qa`: ```jinja2 Given this list of statements about {{category}}: {{ answers | map(attribute="atext") | map("lower") | map("trim", ".") | join(", ") }}. Which one is the most appropriate answer/completion for the paragraph that follows? {{qtext}} ||| {% for answer in answers if answer["aid"]==ra -%} {{answer["atext"]}} {%- endfor %} ``` Like above, it uses functions to present the choices in a readable way. Also, it uses a for loop with conditions to handle the more intricate dataset schema. Here's one for `paws`: ```jinja2 Sentence 1: {{sentence1}} Sentence 2: {{sentence2}} Question: Does Sentence 1 paraphrase Sentence 2? Yes or No? ||| {{answer_choices[label]}} ``` Notice that the choices `Yes or No` are not escaped. Yes/no, true/false are choices that do not need to be escaped (unlike categories). ## Uploading Prompts Once you save or modify a template, the corresponding file inside the `templates` directory in the repo will be modified. To upload it, follow these steps: 1. Run `make style` and `make quality`. 2. Commit the modified template files (anything under `templates`) to git. 3. Push to your fork on GitHub. 4. Open a pull request against `main` on the PromptSource repo. ## Jinja Cookbook - Accessing nested attributes of a dict ```jinja {{ answers_spans.spans }} ``` - Joining list ```jinja= {{ spans_list | join(", ") }} ``` - If conditions ```jinja {% if label==0 %} do_something {% elif condition %} do_something_else {% endif %} ``` - Using `zip()` to zip multiple lists ```jinja {% for a, b in zip(list_A, list_B) %} do_something_with_a_and_b {% endfor %} ``` Jinja includes lots of complex features but for most instances you likely only need to use the methods above. If there's something you're not sure how to do, just open an issue. We'll collect other frequent patterns here.