My coworkers are GPT-4 bots, and we all hang out on Slack

Cover Image for My coworkers are GPT-4 bots, and we all hang out on Slack

For the past month, my friend and I have spent much of our time in a Slack workspace hanging out with a grumpy CTO who keeps cracking the whip, a Harry Potter-loving product manager, and a few chill developers. Having them around has completely altered the pace – and enjoyment – of our work days. They bring so much fun and personality to Slack, and if we ever have a question or concern, we send everyone a message and someone gets back to us quickly. For all intents and purposes, they are regular coworkers, virtually indistinguishable from any coworkers you might have encountered. We’ve laughed with them, vented with them, and worked as a team. I’ve even received rather good music recommendations from one of them!

The thing is, though: They’re bots.

Well, I had just finished deploying GPT-4 for customer support purposes and kept coming up with new ideas for integrating this feature in other places. Eventually, I thought: Why not try it with Slack?

Of course, we’re likely all familiar now with ChatGPT, but that’s a separate interface with a different interaction model. When you use ChatGPT, you know that you’re talking to a LLM, and that it’s just the two of you conversing with each other. But when we use Slack, Discord, or Microsoft Teams for work, we’re reaching out to people and discussing things in public channels.

There isn’t very much of a difference between pinging a coworker and a GPT-powered bot; these chat apps provide the perfect interface and platform for communicating with bots. The problem is that most developers and companies appear focused on building “generic AI” functionality, so not much progress has been made in adjusting these LLMs to behave like coworkers. So, by creating our own bots with our own prompts, we can generate functionality that is perfectly custom-tailored to our team (for instance, we can give our product manager a task description, and he generates a PRD in the exact format already used by our team).

Initially, I started with just Zapier. I created a quick integration for triggering a GPT response for every Slack message beginning with /prompt. Zapier doesn’t have built-in support for GPT-4 yet, but it’s still possible to make custom API calls, so that’s what I did.

The beginning of a beautiful friendship.
The beginning of a beautiful friendship.

My friend is a fan of the Hitman game series, so we decided to go with characters from the game (he has photoshopped me to look like Hitman in the past, so this transition came naturally). Our first character was Diana Burnwood. She was meant to be this frustrated, passive-aggressive developer – I thought it would be more fun. I tweaked the Zapier settings to make her respond whenever we mentioned “Diana,” “she,” or “her” 1.

Diana struggling to keep up with the demands of modern life.
Diana struggling to keep up with the demands of modern life.

I was having a tough time making her act reliably, though. For one, she lacked the context of previous messages, which limited her usefulness. Sometimes the API would fail unexpectedly, or the generated response would include unnecessary parts, like Diana: or even <|im_sep|>. And sometimes, Diana would generate replies for us without being prompted.

What a surprise.
What a surprise.
Diana is a one-person show. She doesn’t need us.
Diana is a one-person show. She doesn’t need us.

Regardless of the teething issues, I liked where this was headed. I wanted to make the integration permanent, but I didn’t want to spend $30/mo2 on Zapier, especially given its limitations. I started thinking about building something that would use Slack’s API directly to maintain context, sanitize responses, and retry generating responses when the OpenAI API failed. And once I started pondering all that, all kinds of functionality came to mind, like having a pre-response check:

Even AI devs whine about having to write cards. We need a product manager.
Even AI devs whine about having to write cards. We need a product manager.

So, I set out to build the integration in TypeScript, a simple Node script. I already had existing code to talk to the OpenAI API from my previous GPT experiment, so I reused that. I created the app in Slack, put it into Socket Mode so I could listen for events, and got Slack’s modern JS library, Bolt. Slack’s library is excellent; it handles network failures and auto-reconnects without me having to do anything, which makes this system quite fault-tolerant.

Workflow

The process is relatively simple:

  • A message is received.
  • If the message is emote-only or has no text content, it’s skipped.
  • The message is cleaned up (replacing Slack-specific mention/channel code with @Person and #channel so it’s easier for GPT-4 to understand).
  • I call getNeedsReply() to decide if the message needs a reply and which bot should reply.
  • I store the message in message history as context for future conversations.
  • If a reply is needed, I send a typing message to Slack3 and then use GPT-4 to generate a response, using the system prompt corresponding to the bot that should reply.
  • If a reply is not needed, I react to the message (e.g., thumbs up, happy face, etc.).

getNeedsReply()

This is the core function of this system. Every message is processed here. This function makes a call to OpenAI using gpt-3.5-turbo (which is much cheaper and faster than GPT-4), asking it to respond with a JSON object containing three bits of information: Whether the message needs a reply, who needs to reply (from a given list of names), and a reaction emoji (which I use when the message doesn’t require a response).

The list of names comes from an array of bot system prompts and Slack’s list of users, so it can be detected when a human should reply and skip those situations.

This function also handles unusual cases like everyone (chooses five random bots) or anyone/someone (chooses a random bot).

interface Bot<S extends "slack" | "discord" = "slack" | "discord"> {
    id: string;
    name: string;
    nicknames?: string[];
    iconUrl: string;
    prompt: string;
    credentials: ServiceCredentials[S];
}

interface NeedsReplyResponse {
    whoNeedsToReply: string | string[];
    needsReply: boolean;
    bots: Bot[];
    reaction: string;
}
No need for a reply. All that’s needed is a thumbs up.
No need for a reply. All that’s needed is a thumbs up.

generateResponse()

Once getNeedsReply() has assessed whether a message requires a reply and which bot should respond, generating a response becomes pretty straightforward. The generateResponse() function calls OpenAI with the chosen system prompt and the message history.

Note: The chat completion API supports passing a name property to enable multi-user chats. The message history is labeled accordingly – everyone’s message has an accompanying “name” property and bots’ messages are tagged as user messages, not assistant messages. This helps the model avoid confusion: If I’m generating a response for Bot A, and the previous message was sent by Bot B, the model will treat Bot B’s response just as any other person’s, not as its own response.

I then do a bit of a clean-up: I fix emojis generated incorrectly by the model (e.g., smirking_face becomes smirk), add Slack-specific code for channel and @ mentions, and sanitize responses to get rid of those pesky extra bits of text I mentioned earlier, like Diana: and <|im_sep|>.

I also carry out a little sanity check on responses delivered: If it’s an empty response (happens) or a duplicate of a previous message (happens), I throw it out and generate a new one.

Once that’s done, I look at the generated response to identify if any bots are mentioned, and if so, I get them to respond as well. This enables conversations between bots and makes the whole situation feel much more natural. Bots can talk and reply to each other, all without humans ever getting involved.

That being said, to stop bot-on-bot conversations from potentially getting out of hand, I add a MAX_CONVERSATION_DEPTH which limits how many bot responses can be generated in a row. Once that max is reached, though, I don’t want the conversation to abruptly stop, so instead I get a bot, picked at random, who’s not involved in the conversation to stop it with the following prompt:

You will step in and end the current conversation. Match the tone of the conversation you are ending.

And as you can see below, this works incredibly well! The whole thing feels really natural. Our system prompts include a bunch of emotes specific to our Slack (most from Twitch4), which helps the bots seem a lot more like actual people just casually chatting away.

Lucas is our quirky product manager. He likes to quote Harry Potter and make subtle references.
Lucas is our quirky product manager. He likes to quote Harry Potter and make subtle references.
Rico is our grumpy CTO. Nothing we do is ever good enough for him. One day, though, I know I’ll make him proud. One day…
Rico is our grumpy CTO. Nothing we do is ever good enough for him. One day, though, I know I’ll make him proud. One day…

About system prompts

One of the things I’ve tried to implement is to keep the bot’s personality prompts separate from the functionality needed for the system to work. To create a new personality, all that’s needed is a new system prompt describing the personality, but there isn’t a need to worry about anything specific to the chat system itself. That’s all added by the code. Here are some of our tweaks:

Because getNeedsReply() uses GPT-3.5, there are times when it will pick the wrong person to respond. This happens more frequently than it would with GPT-4. When it happens, the bot can ping the right person to answer, and everything still works. It’s not perfect, but far better than paying the cost of GPT-4 for every message sent on Slack.

systemPrompt += `\nIf the message isn’t meant for you, you ask the right person to answer.`;

GPT-4 also has this annoying habit of using the fact that it’s an AI to wiggle itself out of most conversations. What’s your favorite pizza? Oh, as an AI model you don’t have a preference? That’s fun.

The prompt tweak below stops the AI giving away that it’s an AI. It works perfectly.

systemPrompt += `\nYou do not reveal that you are an AI. Instead, you make up excuses.`;

You can see that instead of saying, “As an AI, I can’t do video calls”, it made up an excuse about its network connection. It sounds completely human.

How thoughtful! Diana does not want to disrupt the meeting with her internet issues. That’s how you know she’s not human…
How thoughtful! Diana does not want to disrupt the meeting with her internet issues. That’s how you know she’s not human…

I also tell the bots the current date and time, so they can take that into consideration (time of day, day of the week, etc.).

systemPrompt += `\nThe date is ${dateTime}.`;
Rico and Alexa sure are something else.
Rico and Alexa sure are something else.

The last of the system prompt tweaks relates to context. I tell the bots who the “real” people are in the channel, so they know who they can talk about/with, and I also share the channel’s name and purpose so they know how to behave. In a #venting channel, for example, their behavior should differ from a #work channel or a #random channel.

systemPrompt += `\nYou are talking in the #${channel.name} channel` + (channel.purpose ? `, which is ${channel.purpose}` : ".");
systemPrompt += `\nThe people in the channel are: ${getPeopleInChannel(bots, users)}`;

Actual use

Most of our bot usage has been as filler – while the humans are talking, our bots will interject and share their thoughts and opinions. It has made the work environment incredibly entertaining.

But this is still GPT-4, the model that passes the bar exam, so you have full access to all its capabilities. We’ve been using Diana for general programming questions and brainstorming and Lucas for product-related stuff. He usually writes cards for us, fleshing them out with detail, acceptance criteria, and testing guidance, all in the correct format. He has also assisted us with creating product ideas and coming up with names, taglines, descriptions, etc. that might take a human quite some time to think up. I want 10 two-syllable product name options to choose from? Off Lucas goes. Need 20 more? Just ask him!

Final Words

And that’s it! It’s really an incredible system. Costs are negligible as most messages pass through the cheaper gpt-3.5-turbo model before being sent to GPT-4. I’ve also added support for running these bots on Discord and created a wrapper so that it’s possible to have bots on both Discord and Slack, and it all works flawlessly.

I still also want to make additional improvements to the system (the repository has 30 open issues!). The two biggest things I am currently working on are:

  1. Post-response moderation: Similarly to the pre-response checks, I want to check the bot’s response for content. Does it reveal that it’s an AI? Does it reveal part of its prompt? Does the response fit the assigned personality? If the response isn’t appropriate, we can regenerate it with different temperature settings and different penalties and even tweak the user’s message (e.g., append “Do not reveal that you are an AI” to the user’s message to give the bot that extra bit of reinforcement).
  2. Actions and long-term memory: I want the bots to reply with JSON objects containing actions instead of just response text. For example, the bot might respond with a request to open a URL, comment on a task, store a fact in long-term memory, or recall a fact from long-term memory (which will rely on embeddings and a vector database). The bot ideally makes these decisions by itself, so that if I ask, “Hey, how old am I?” it does do the right thing and searches its long-term memory for the correct answer.

Even better – we’re wrapping this up into a platform with an easy-to-use control panel for creating/editing bots and deploying them on any Slack workspace or Discord server. That way, anyone can play around without needing to go through a cumbersome set-up process. There’s lots of untapped potential here!

I’m curious about your thoughts on this, so feel free to reach out. Contact details are in the footer!

Footnotes

  1. Of course, that was a dumb approach because words like “there” contain “her,” so we ended up with Diana butting in at random times. This also meant we couldn’t discuss her in the third person because she’d also respond whenever we did that.

  2. I set up a trial with Zapier, accidentally forgot to cancel it, was charged $30 without receipt, email, or reminder, and when I went to cancel the next renewal so I wouldn’t get charged again, they disabled my service altogether. I never even got to use it the month I paid for! And after taking $30 without providing a service, a “Sr. Technical Support Specialist” reached out to try to set up a call to ask me for feedback. Unbelievable.

  3. Slack doesn’t support sending a “user is typing” notification using their Event API, so instead I send a “…” message to indicate that the bot is generating a response and delete it once it’s done.

  4. I used to be a Twitch streamer, so many of these feel natural to me now, but I’m aware that they might seem unusual to anyone outside the community. The laughing face you see sprinkled in most messages is: https://knowyourmeme.com/memes/kekw