Generative Artificial Intelligence systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.
During paper collection, we followed a systematic review process grounded in the PRISMA method. We first scraped arXiv, Semantic Scholar, and ACL through a keyword search. Our keyword list was comprised of 44 terms, with each being closely related to prompting and prompt engineering. We then deduplicated our datset based on paper titles, conducted extensive human and AI review for relevance, and automatically removed unrelated papers by checking paper bodies for the term "prompt".
The PRISMA review process. We accumulate 4,247 unique records from which we extract 1,565 relevant records.
We present a comprehensive taxonomy of prompting techniques, methods for instructing Large Language Models (LLMs) to complete tasks. We divide prompting techniques into three categories: text-based, multilingual, and multimodal. Multilingual techniques are used to prompt LLMs in non-English settings. Multimodal techniques are used when working with non-textual modalities such as image and audio.
All text-based prompting techniques from our dataset.
All multilingual prompting techniques.
All multimodal prompting techniques.
We discussed various prompting terms including prompt engineering, answer engineering, and few-shot prompting.
In our first case study, we benchmark six distinct prompting techniques using the MMLU benchmark. We also explore the impact of formatting on results, finding variations between two different formats for each prompting technique.
Accuracy values are shown for each prompting technique. Purple error bars illustrate the minimum and maximum for each technique, since they were each ran on different phrasings (except SC) and formats.
In the second case study, we apply prompting techniques to the task of labelling reddit posts as indicative of suicide crisis syndrome (SCS). Through this case study, we aim to provide an example of the prompt engineering process in the context of a real world problem. We utilize the University of Maryland Reddit Suicidality Dataset and an expert prompt engineer, documenting the process in which they boost F1 score from 0 to 0.53.
Our systematic review of all prompting techniques is based on the dataset of 1,565 relevant papers we collected. Below is a preview of the dataset. Specific columns, such as 'abstract', have been excluded. The full dataset is available on Huggingface, including the complete CSV file and all paper PDFs.
We conducted several analyses of the dataset which can be found within the paper, including an analysis of citation counts for different GenAI models, prompting techniques, and datasets.
@misc{
schulhoff2024prompt,
title={The Prompt Report: A Systematic Survey of Prompting Techniques},
author={Sander Schulhoff and Michael Ilie and Nishant Balepur and Konstantine Kahadze and Amanda Liu and Chenglei Si and Yinheng Li and Aayush Gupta and HyoJung Han and Sevien Schulhoff and Pranav Sandeep Dulepet and Saurav Vidyadhara and Dayeon Ki and Sweta Agrawal and Chau Pham and Gerson Kroiz and Feileen Li and Hudson Tao and Ashay Srivastava and Hevander Da Costa and Saloni Gupta and Megan L. Rogers and Inna Goncearenco and Giuseppe Sarli and Igor Galynker and Denis Peskoff and Marine Carpuat and Jules White and Shyamal Anadkat and Alexander Hoyle and Philip Resnik},
year={2024},
eprint={2406.06608},
archivePrefix={arXiv},
primaryClass={cs.CL}
}