CRA-I Blog
The CRA-I Blog frequently shares news, timely information about the computing research industry community, and items of interest to the general community. Subscribe to blog emails here to stay connected.
The CRA-I Blog frequently shares news, timely information about the computing research industry community, and items of interest to the general community. Subscribe to blog emails here to stay connected.
ACM SIGPLAN BLOG: Prompts are Programs
/in Community UpdatesThe following is a repost from the ACM SIGPLAN Blog: PL Perspectives. It was written by Tommy Guy, Peli de Halleux, Reshabh K Sharma, and past CRA-I Co-Chair Ben Zorn on October 22, 2024. Please see the original post here.
In this post, we highlight just how important it is to understand that an AI model prompt has much in common with a traditional software program. Taking this perspective creates important opportunities and challenges for the programming language and software engineering communities and we urge these communities to undertake new research agendas to address them.
Moving Beyond Chat
ChatGPT, released in December 2022, had a huge impact on our understanding of what large language models (LLMs) can do and how we can use them. The millions of people who have used it understand what a prompt is and how powerful they can be. We marvel at the breadth and depth of the ability of the AI model to understand and respond to what we say and its ability to hold an informed conversation that allows us to refine its responses as needed.
Having said that, many chatbot users have experienced challenges in getting LLMs to do what they want. Skill is required in phrasing the input to the chatbot so that it correctly interprets the user intent. Similarly, the user may have very specific expectations of what the chatbot produces (such as data formatted in a particular way, such as JSON object), that is important to capture in the prompt.
Also, chat interactions with LLMs have significant limitations beyond challenges in phrasing a prompt. Unlike writing and debugging a piece of code, having an interactive chat session does not result in an artifact that can then be reused, shared, parameterized, etc. So, for one-off uses chat is a good experience, but for repeated application of a solution, chat falls short.
Prompts are Programs
The shortcomings of chatbots are overcome when LLM interactions are embedded into software systems that support automation, reuse, etc. We call such systems AI Software systems (AISW) to distinguish them from software that does not leverage an LLM at runtime (which we call Plain Ordinary Software, POSW). In this context, LLM prompts have to be considered part of the broader software system and have same robustness, security, etc. requirements that any software has. In a related blog, we’ve outlined how much the evolution of AISW will impact the entire system stack. In this post, we focus on how important prompts are in this new software ecosystem and what new challenges they present to our existing approaches to creating robust software.
Before proceeding, we clarify what we mean by a “prompt”. First, our most familiar experience with prompting is what we type into a chatbot. We call the direct input to the chatbot the user prompt. Another, more complex prompt is the prompt that was written to process the user prompt, which is often called the system prompt. The system prompt contains application-specific directions (such as “You are a chatbot…”) and is combined with other inputs (such as the user prompt, documents, etc.) before being sent to the LLM. The system prompt is a fixed set of instructions that define the nature of the task to be completed, what other inputs are expected, and how the output should be generated. In that way, the system prompt guides the execution of the LLM to compute a specific result, much as any software function. In the following discussion, our focus is mainly on thinking of system prompts as programs but many of the observations also directly apply to the user prompts as well.
An Example of a Prompt
We use the following prompt as an example, loosely adapted from a recent paper on prompt optimization to illustrate our discussion.
You are given two items: 1) a sentence and 2) a word contained in that sentence.
Return the part of speech tag for the given word in the sentence.
This system prompt describes the input it expects (in this case a pair of a sentence such as “The cat ate the hat.” and a word, such as “hat”), the transformation to perform, and the expected structure of the output. With this example, it is easy to see that all the approaches we take to creating robust software should now be rethought in terms of how they apply to prompts.
If Prompts are Programs, What is the Programming Language?
There are many questions related to understanding the best way to prompt language models and it is a topic of active PL and AI research. Expressing prompts purely in natural language can be effective in practice. In addition, best practice guidelines for writing prompts often recommend structuring prompts using traditional document structuring mechanisms (like using markdown) and clearly delineating sections, such as a section of examples, output specifications, etc. Uses of templating, where parts of prompts can be substituted programmatically, are also popular. Approaches to controlling the structure and content in the output of prompts both in model training and through external specifications, such as OpenAI JSON mode, or Pydantic Validators, have been effective.
Efforts have also been made to more deeply integrate programming language constructs into the prompts themselves, including the Guidance and LMQL languages, which allows additional specifications. All of these methods (1) observe the value of more explicit and precise specifications in the prompt and (2) leverage any opportunity to apply systematic checking to the resulting model output.
Prompting in natural language will evolve as the rich set of infrastructures that the LLMs can interact with become available. Tools that extend the abilities of LLMs to take actions (such as retrieval augmented generation, search, or code execution) become abstractions that are available to the LLM to use but must be expressed in the prompt such that the user intent to leverage them is clear. Much PL research is required to define such tool abstractions, help LLMs choose them effectively, and help prompt writers express their intent effectively.
Software Engineering for Prompts
If we understand that prompts are programs, then how do we transition our knowledge and tools for building POSW so that we can create robust and effective prompts? Tooling for authoring, debugging, deploying and maintaining prompts is required and existing tools for POSW do not directly transfer.
One major difference between prompts and traditional software is that the underlying engine that interprets prompts, the LLM, is not deterministic and so the same prompt can result in different results in different calls even using the same LLM. Also, because the types and varieties of LLMs are proliferating, it is even harder to ensure that the same prompt will produce the same result across different LLMs. In fact, LLMs are evolving rapidly and there are important tradeoffs that can be made between inference cost, output quality, and local models versus cloud-hosted models. The implication of this fact is that when the underlying model changes, the prompt may require changes as well, which suggests that prompts will require continuous tweaking as models evolve.
There are a number of existing research approaches to automatically optimizing and updating prompts, such as DSPy, but such technologies are still in their infancy. Also, a given AI software application may choose to use different models at different times for efficiency, so like having binary formats that support multiple ISAs, (e.g., the Apple Universal binary format), prompts may require structure that supports multiple target LLMs.
Ultimately, tools that support testing, debugging, and optimizing the prompt/model pairing will be necessary and become widely used. Because standards for prompt representation or even how prompts are integrated into existing software applications have not been adopted, research into the most effective approaches to these problems is needed.
Next Steps for Prompt Research
Because prompts are programs, the software engineering and programming languages communities have much to offer in improving our understanding and ability to create expressive, effective, efficient, and easy to write prompts. There are incredible research opportunities to explore and the impact will inform the next generation of software systems that will be based on AISW. Moreover, because writing prompts is much more accessible to non-programmers, an entirely new set of challenges relates to how our research can support individuals who are not professional developers to leverage LLMs through writing effective, expressive, robust and reusable prompts.
In this post, we’ve considered how a single prompt should be considered a program but, in practice, many applications that leverage AI contain multiple prompts that are chained together with traditional software. Multi-prompt systems introduce even greater software engineering challenges, such as how to ensure that a composition of prompts is robust and predictable. And this field is moving very fast. Agentic systems, such as AutoGen and Swarm, where AI-based agents are defined and interact with each other, are already widely available. How does our existing understanding of building robust software translate to these new scenarios? Learning what such systems are capable of and how we can construct them robustly is increasingly important for the research community to explore.
The challenges and effective strategies for creating robust prompts are not well understood and will evolve as rapidly as the underlying LLM models and systems evolve. The PL and SE communities have to be agile and eager to bring the decades of research and experience building languages and tools for robust software development to this new and important domain.
Biographies:
Tommy Guy is a Principal Architect on the Copilot AI team at Microsoft. His research interests include AI-assisted data mining, large-scale AB testing, and the productization of AI.
Peli de Halleux is a Principal Research Software Developer Engineer in Redmond, Washington working in the Research in Software Engineering (RiSE) group. His research interests include empowering individuals to build LLM-powered applications more efficiently.
Reshabh K Sharma is a PhD student at the University of Washington. His research lies at the intersection of programming languages and security, focusing on developing infrastructure for creating secure systems and improving existing systems using software-based mitigations to address various vulnerabilities, including those in LLM-based systems.
Ben Zorn is a Partner Researcher at Microsoft Research in Redmond, Washington working in (and previously having managed) the Research in Software Engineering (RiSE) group. His research interests include programming language design and implementation, end-user programing, and empowering individuals with responsible uses of artificial intelligence.
Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.
Bruce Henrickson (LLNL) and Ronak Shah (NVIDIA) Join CRA-Industry Council
/in Community Updates, CRA-I Announcements, CRA-I General InformationCRA-Industry (CRA-I) is pleased to welcome Bruce Hendrickson (LLNL) and Ronak Shah (NVIDIA) to the CRA-I Council. They join an expanding group of Council members, under the leadership of CRA-I Council Chair Ron Brachman (Cornell Tech), who will continue collaborating with the CRA-I Steering Committee. Together, they will focus on shaping the committee’s future directions, engaging with the community, and advancing the goals of CRA-I.
Bruce Hendrickson is the Principal Associate Director for Computing at Lawrence Livermore National Lab. In this role, he leads the largest computing organization in the U.S. national lab system. He has spent his entire career in DOE labs, focused on mathematics, algorithms and architectures for large scale simulation and data science. He is a life-long techno-optimistic and is excited about the abundant opportunities for computing technology to help address societal challenges. Bruce is a member of the CRA Board of Directors, and is working to better connect CRA (and specifically CRA-I) with the large community of computing researchers at the national labs.
Ronak Shah is a Senior Manager at NVIDIA, where he oversees research institution relationships. His team helps researchers accelerate their time-to-science, fosters innovative methods, and builds partnerships. Ronak joined NVIDIA in 2016 as a Senior Account Manager for Higher Education and later managed Account Managers in the Northeast US. Prior to his tenure at NVIDIA, he was an Executive Account Manager at CDW Government, managing higher education business relationships. Ronak holds a BS in Industrial Management with a specialization in Manufacturing Engineering Technology from Northern Illinois University.
Please help the industry research community by continuing to nominate outstanding colleagues for the CRA-I Council. We have a goal to reach a steady state of 21 members that represent the breadth of the industry computing research community. Read more here and send nominations to industryinfo@cra.org.
Welcome, Bruce and Ronak!
Hector Gonzalez (SpiNNcloud Systems) Joins CRA-Industry Council
/in Community Event, Community Updates, CRA-I AnnouncementsCRA-Industry (CRA-I) is excited to announce that Hector Gonzalez of SpiNNcloud Systems has joined the CRA-I Council. Hector joins a vibrant group of council members led by CRA-I Council Chair Ron Brachman from Cornell Tech. Together, they are committed to working with the CRA-I Steering Committee to guide the direction of future initiatives, engage with the community, and advance the goals of CRA-I.
Hector is the co-founder and co-CEO of SpiNNcloud Systems, a deep-tech company providing brain-inspired microchips and systems for the third generation of AI. Hector has helped position SpiNNcloud among the most relevant hardware startups in Germany. Under his co-leadership, the company has received several recognitions, including the largest EU grant for startups (EIC Transition) in the challenge of “Green digital devices of the future”. Hector is a Fellow of the Konrad Zuse School of Excellence in Embedded Composite Artificial Intelligence (SECAI). He holds a B.Sc. degree in Electronics Engineering and is a graduate of the MIT and Masdar Institute Cooperative Program in Abu Dhabi, where he earned an M.Sc. degree in Microsystems after conducting research on AI hardware for EEG-based emotion detection. His PhD studies at TU Dresden focused on chip design for AI-enabled Digital Signal Processors (DSPs) for automotive radars. Hector has held senior industrial positions in Instrumentation & Control across various countries and has received numerous international academic honors and awards. He has authored or co-authored more than 23 peer-reviewed articles, is the inventor of a patent in the cognitive radar field, and has been part of numerous press communications from prominent sources such as IEEE Spectrum, the BBC, Arm, EE Times, Sandia National Labs, eeNews Europe, Silicon Angle, and the the EU flagship Human Brain Project, among others.
Please help the industry research community by continuing to nominate outstanding colleagues for the CRA-I Council. Read more here and send nominations to industryinfo@cra.org.
Welcome, Hector!