A Minimalist ChatGPT for Jupyter Notebook or Command Line

Generated using DALL-E-2 with the prompt "A painting of a chatbot under construction, made from building blocks"

Introduction
Chat Completions API Overview
Interacting with the API
Token Counting
Truncating Conversation History to Limit Tokens
- MinChatGPT
  - Implementing the conversation functionality
Demo
Command Line Interface
Limitations
Conclusion

Introduction

In this blog post, we will explore how to implement a minimalist ChatGPT-style app in a Jupyter Notebook or command line. The goal is to provide an understanding of the important concepts, components, and techniques required to create a chat app on top of a large language model (LLM), specifically OpenAI’s GPT. The resulting chat app can serve as a foundation for creating your own customised conversational AI applications.

The code in this blog post can be found in a notebook here. The script for the command line version can be found here.

Chat Completions API Overview

Let us begin with a quick overview of the Chat Completions endpoint of the OpenAI API, which enables you to interact with OpenAI’s large language models to generate text-based responses in a conversational manner. It’s designed for both single-turn tasks and multi-turn conversations.

Example API Call:

The provided code snippet demonstrates how to make an API call for chat completions. In this example, the chat model used is “gpt-3.5-turbo,” and a conversation is created with system, user, and assistant messages:

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who wrote 'A Tale of Two Cities'?"},
        {"role": "assistant", "content": "Charles Dickens wrote 'A Tale of Two Cities'."},
        {"role": "user", "content": "When was it first published?"}
    ]
)

Message Structure:

The main input for the API call is the messages parameter, which is an array of message objects.
Each message object has two properties: role (either “system,” “user,” or “assistant”) and content (the text of the message).
Conversations can be as short as one message or include many back-and-forth turns.

Typical Conversation Format:

A typical conversation format starts with a system message, followed by alternating user and assistant messages. The system message helps set the behavior of the assistant, but it’s optional. If omitted, the model’s behavior will be similar to a generic message like “You are a helpful assistant.”

Importance of Conversation History:

Including conversation history is crucial when user instructions refer to prior messages. The model has no memory of past requests, so all relevant information must be supplied within the conversation history. If a conversation exceeds the model’s token limit, it needs to be shortened.

Interacting with the API

Now let write a set of functions for communicating with OpenAI’s Chat Completions API. These functions will serve as the backbone of our minimalist chat app. Each function plays a specific role in managing the conversation, formatting messages, and handling responses.

import openai
import json
import os
import sys
# Uncomment and replace with your api key or api key path
# openai.api_key = YOUR_API_KEY
# openai.api_key_path = YOUR_API_KEY_PATH

def get_system_message(system=None):
    """
    Generate a system message for the conversation.

    Args:
        system (str, optional): The system message content. Defaults to None.

    Returns:
        dict: A message object with 'role' set to 'system' and 'content' containing the system message.
    """
    if system is None:
        system = "You are a helpful assistant."
    return {"role": "system", "content": system}

get_system_message is responsible for creating a system message. This message is optional but can be used to set the behavior of the assistant. If no system message is provided, it defaults to “You are a helpful assistant.” The function returns a message object with ‘role’ set to ‘system’ and ‘content’ containing the system message.

def get_response(msg, 
                 system_msg=None,
                 msgs=[], model='gpt-4', 
                 return_incomplete=False):
    """
    Get a response from the Chat Completions API.

    Args:
        msg (str): The user's message.
        system_msg (str, optional): The system message. Defaults to None.
        msgs (list, optional): Previous conversation messages. Defaults to an empty list.
        model (str, optional): The chat model to use. Defaults to 'gpt-4'.
        return_incomplete (bool, optional): Whether to return incomplete responses. Defaults to False.

    Returns:
        list or tuple: A list of response chunks if not returning incomplete, or a tuple containing the list of chunks and a completion status.
    """
    _stream_response = openai.ChatCompletion.create(
      model=model,
      messages=[
            system_msg if system_msg is not None else get_system_message(),
            *msgs,
            {"role": "user", "content": msg}
        ],
        stream=True 
    )
    _chunks = []
    
    if return_incomplete:
        complete = False
    
    try:
        for _chunk in _stream_response:
            _delta = _chunk['choices'][0]['delta']
            # Last will be empty
            if 'content' in _delta:
                sys.stdout.write(_delta['content'])

            _chunks.append(_chunk)
            
    # Raise KeyboardInterrupt if return_incomplete is False
    except KeyboardInterrupt:
        if not return_incomplete:
            raise
        
    return _chunks if not return_incomplete else (_chunks, complete)

get_response is the core function for obtaining a response from the Chat Completions API. It takes the user’s message, an optional system message, previous messages, the model to use, and a flag to indicate whether incomplete responses should be returned.
The function first creates an API call with the specified parameters, including the system message (or a default one) and the user’s message. It uses stream=True to stream the response chunks.
It then processes the response chunks, extracting the content of each chunk and storing it in _chunks.
If return_incomplete is set to True, the function returns a result even if the stream interrupted. In this case the function returns a tuple containing the list of chunks and a completion status. If return_incomplete is False, it only returns the result when the full stream is processed and returns only the list of chunks.

def stream2msg(stream):
    """
    Convert a stream of response chunks into a single message.

    Args:
        stream (list): A list of response chunks.

    Returns:
        str: A single message containing the concatenated content of the response chunks.
    """
    return "".join([i["choices"][0]["delta"].get("content", "") for i in stream])

stream2msg is a utility function that converts a stream of response chunks into a single message. It takes a list of response chunks as input and concatenates the content of each chunk to form a complete message.

def format_msgs(inp, ans):
    """
    Format user input and model's response into message objects.

    Args:
        inp (str): User input message.
        ans (str or list): Model's response message as a string or a list of response stream chunks

    Returns:
        list: A list containing user and assistant message objects.
    """
    msg_inp = {"role": "user", "content": inp}
    msg_ans = {"role": "assistant", "content": stream2msg(ans) if not isinstance(ans, str) else ans}
    return [msg_inp, msg_ans]

format_msgs takes the user’s input and the model’s response (which can be a message string or a list of response chunks) and creates a list containing message objects for both the user and the assistant, which can subsequently be used in the conversation history.

Token Counting

Generated using DALL-E-2 with the prompt "Two toucans counting tokens, digital art"

Before we delve into the implementation details, let us briefly discuss token counting. Tokens are chunks of text that language models use to process and generate responses. It’s crucial to keep track of token counts, as they impact the cost and feasibility of using the API. Token counting includes both input and output tokens. This means that not only the messages you send to the model but also the responses you receive contribute to the total token count.

The exact way tokens are counted can vary between different model versions. The function below for counting tokens is adapted from the OpenAI API documentation (date 05.09.2023). It was written for gpt-3.5-turbo-0613 and serves as a reference. The documentation adds this caveat.

The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate.

Depending on the model, the value returned by the function might not be exact but it will be a decent estimate that suffices for this simple example.

It’s also worth noting that each model has a maximum token limit. Exact details for each model are available in the Models section of the documentation. For example it is 8192 for gpt-4 and 4097 for gpt-3.5-turbo. In our example, we are using the model’s maximum token limit, but in practice, you may want to use a lower value to ensure that both input and output tokens are within the limit.

import tiktoken

def num_tokens_from_messages(messages, model="gpt-4"):
  """Returns the number of tokens used by a list of messages."""
  try:
      encoding = tiktoken.encoding_for_model(model)
  except KeyError:
      encoding = tiktoken.get_encoding("cl100k_base")
  num_tokens = 0
  for message in messages:
      num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
      for key, value in message.items():
          num_tokens += len(encoding.encode(value))
          if key == "name":  # if there's a name, the role is omitted
              num_tokens += -1  # role is always required and always 1 token
  num_tokens += 2  # every reply is primed with <im_start>assistant
  return num_tokens

num_tokens_from_messages is a function that takes a list of messages as input and returns the estimated number of tokens used by those messages.
It uses the tiktoken library to calculate token counts. The function attempts to get the token encoding for the specified model. If it encounters a KeyError (indicating an unsupported model), it falls back to the “cl100k_base” encoding, which is a reasonable default.
The function initializes num_tokens to 0, which will be used to accumulate the token count.
For each message in the input list of messages:
- It adds 4 tokens to account for the message structure, including <im_start>, role or name, content, and <im_end> tags.
- It then iterates through the message items (e.g., role, content).
- For each item, it calculates the token count by encoding the value using the specified encoding and adding the length of the encoded value.
- If the item is the “name,” it subtracts 1 token because the role is always required and always counts as 1 token.
Finally, it adds 2 tokens to account for the message primed with <im_start>assistant.

Truncating Conversation History to Limit Tokens

Generated using DALL-E-2 with the prompt "A lengthy historical manuscript and a pair of scissors"

In the context of managing conversations with language models, it’s crucial to ensure that the conversation history remains within the model’s token limit. To achieve this, we have a function called maybe_truncate_history which helps truncate the conversation history when it approaches or exceeds the maximum token limit.

Here’s an overview of this function and its purpose:

def maybe_truncate_history(msgs, max_tokens, model='gpt-4', includes_input=True):
    
    msgs_new = []
    
    if msgs[0]['role'] == 'system':
        msgs_new.append(msgs[0])
        start = 1
        msgs = msgs[1:]

    if includes_input:
        # At least the last message should be included if input
        msgs_new.append(msgs[-1])
        msgs = msgs[:-1]
    
    # First ensure that input (and maybe system) messages don't exceed token limit
    tkns = num_tokens_from_messages(msgs_new)
    
    if tkns > max_tokens:
        return False, tkns, []
    
    # Then retain latest messages that fit within token limit
    for msg in msgs[::-1]:
        msgs_tmp =  msgs_new[:1] + [msg] + msgs_new[1:]
        tkns = num_tokens_from_messages(msgs_tmp)
        if tkns <= max_tokens:
            msgs_new = msgs_tmp
        else:
            break
            
    return True, tkns, msgs_new

maybe_truncate_history is designed to manage the length of conversation history within the token limit of the model. It takes as input the current list of messages (msgs), the maximum token limit (max_tokens), the model name (defaults to gpt-4). There is also a flag indicating whether the input is present in the messages to ensure it is not dropped.
If the first message in the conversation history is a system message, it is added to msgs_new, and start is set to 1. This step is necessary because system messages should not be truncated.
If includes_input is True, the last message (usually the user’s input) is added to msgs_new, and it’s removed from the msgs list.
The function first checks if the token count of the messages in msgs_new exceeds the max_tokens. If it does, it returns False, the token count (tkns), and an empty list to indicate that the conversation history cannot be accommodated within the token limit.
Next, the function attempts to retain the latest messages that fit within the token limit. It iterates through the msgs list in reverse order, gradually adding messages to msgs_tmp. If the token count of msgs_tmp is within the max_tokens, it updates msgs_new with msgs_tmp. This ensures that the conversation history retains as much context as possible while staying within the token limit.
The function returns True to indicate that the conversation history has been successfully truncated to fit within the token limit. It also returns the updated token count (tkns) and the modified msgs_new.

This is a simple approach to manage token counts which drops entire messages to keep to the token limit but there are more sophisticated approaches that you could try such as summarising or filtering earlier parts of the conversation.

MinChatGPT

Generated using DALL-E-2 with the prompt "A partially constructed mini lego robot surrounded by lego bricks"

Finally we are in a position to implement our minimalist chat app. With the MinChatGPT class, we define the foundation of our minimalist chat app. First let us set up the class and add some helper functions. Subsequently we will implement the conversation functionality.

class MinChatGPT(object):
    """
    A simplified ChatGPT chatbot implementation. 
    
    Parameters:
        system: A system-related parameter(optional).
        model: The OpenAI model to use; restricted to 'gpt-3.5-turbo' and 'gpt-4'.
        log: Boolean that decides if logging is required or not.
        logfile: The location of the file where chat logs will be stored.
        errfile: The location of the file where error logs will be stored.
        include_incomplete: Boolean that decides if incomplete responses are to be included in the history or not.
        num_retries: The number of times to retry if there is a connection error
        mock: Boolean that decides if the system is in testing mode.
        debug: Boolean that decides if the system should go into debug mode.
        max_tokens: Maximum number of tokens the model can handle while generating responses.
    """
    def __init__(self,
            system=None, 
            model='gpt-4',
            log=True, 
            logfile='./chatgpt.log',
            errfile='./chatgpt.error', 
            include_incomplete=True, # whether to include incomplete responses in history 
            num_retries=3,
            mock=False,
            debug=False,
                 
            max_tokens=None):
        """
        Initializes a MinChatGPT instance with provided parameters.
        """
        # For simplicity restrict to these two
        assert model in ['gpt-3.5-turbo', 'gpt-4'] # Ensures the model parameter is valid
        
        # System & GPT Model related parameters
        self.system = system
        self.system_msg =  get_system_message(system) # Retrieve system message if available 
        self.model = model
        
        # logging related parameters
        self.log = log
        self.logfile = logfile
        self.errfile = errfile
        
        # Behavioural flags
        self.include_incomplete = include_incomplete
        self.num_retries = num_retries
        self.mock = mock
        self.debug = debug
        
        # History and error storage related parameters
        self.history = []
        self.history_size = []
        self.errors = []
        
        # Setting maximum tokens model can handle, defaults are provided for the two specified models
        self.max_tokens = {'gpt-4': 8192, 'gpt-3.5-turbo': 4097}[model] if max_tokens is None else max_tokens
        
    def _logerr(self):
        with open(self.errfile, 'w') as f:
            f.write('\n'.join(self.errors))
                
    def _logchat(self):
        with open(self.logfile, 'w') as f:
            json.dump(fp=f, obj={'history': self.history, 'history_size': self.history_size}, indent=4)
            
    def _chatgpt_response(self, msg="", newline=True):
        sys.stdout.write(f'\nMinChatGPT: {msg}' + ('\n' if newline else ''))

The __init__ method initializes our chatbot instance with parameters, such as the system message, the OpenAI model to be used, and several behavioural flags for logging, debugging, or testing (mock).
It also initializes history-related containers, such as self.history, self.history_size, and self.errors, for tracking the chat history and potential errors.
The max_tokens parameter sets the limit for tokens the model can handle, defaulting to the restrictions of the chosen model.
We have two logging functions, _logerr and _logchat, saving error logs and chat logs respectively to specified locations.
We also have a helper function _chatgpt_response for printing the bot’s response to the console.

However we have not yet implemented the main functionality of the chatbot, which is to manage the conversation. Let us go ahead and implement a chat method that enables the user to interact with the model.

Implementing the conversation functionality

Generated using DALL-E-2 with the prompt "A mini robot made of building blocks with a speech bubble from its mouth"

The chat method is the main entry point for initiating a conversation with the MinChatGPT chatbot. It manages user interaction, input processing, handling special cases like an ‘exit’ command or an empty message, generating responses, and logging information if desired.

Here is a detailed walkthrough of the chat method

def chat(self):
    """
    Initiates a chat session with the user. During the chat session, the chatbot will receive user message inputs, 
    process them and generate appropriate responses.
    
    The chat session will continue indefinitely until the user enters a termination command like "Bye", "Goodbye", "exit", 
    or "quit". The function also logs the chat session, and any errors that occur during the session.
    """
    
    # Maybe_exit flag for controlling the exit prompt
    maybe_exit = False

    # Welcome message for user
    print('Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.')
    

User Interaction

The core of the chat method is an infinite while loop that simulates a conversation. The user is requested for an input message, which is then handled in the loop. To allow the user to end the conversation at any point, the code checks for certain phrases such as “bye”, “goodbye”, “exit”, or “quit”.

    # Main chat loop
    while True:
        # Capture user input
        inp = input('\nYour message: ')
        

Handling Empty Messages

If the user input is an empty string, the method reminds the user to enter a message and goes back to the start of the loop to ask again.

        
        
        try:
            # Handling empty input from user
            if len(inp) == 0:
                print('Please enter a message')
                continue
                

Exiting

If the previous input appeared to have indicated an intention to exit (maybe_exit == True), the user is asked for confirmation. If the user gives an affirmative response, the bot replies with a goodbye and breaks the loop to end the conversation. If the user does not want to exit, the bot continues to chat.

            # Case insensitive user input
            stripped_lowered_inp = inp.strip().lower()

            # Handling user's confirmation on exit
            if maybe_exit:
                if  stripped_lowered_inp in ['y', 'yes', 'yes!', 'yes.']:
                    self._chatgpt_response('Goodbye!')
                    break
                else:
                    self._chatgpt_response("Ok. Let's keep chatting.")
                    maybe_exit = False
                    continue
                    

Intention to exit

This simple approach determines if the user input is matches any of the exit signals. If it does, the maybe_exit flag is assigned a value of True and in the next interaction the user is asked for confirmation. You could also try more sophisticated approaches that get the model to infer whether the user wishes to end the conversation.

            
            # Checking if user wants to exit
            if stripped_lowered_inp in [
                'exit', 'exit()', 'exit!', 'exit.',
                'quit', 'quit()', 'quit!', 'quit.',
                'bye', 'bye!', 'bye.',
                'goodbye', 'goodbye!', 'goodbye.'
            ]:
                maybe_exit = True
                self._chatgpt_response('Are you sure you want to quit? Enter Yes or No.')
                continue
                

Process User Inputs

The code next deals with non-empty, non-exit user inputs. It prepares the message history to be sent to the OpenAI model by appending the user’s new message. The history is then checked to ensure it doesn’t exceed the max token limit of the model. If the history is too long, we inform the user, don’t produce a response, and again loop to the start for a new input.

            # Preparing message history before calling the model
            msgs = [self.system_msg, *self.history, {'role': 'user', 'content': inp}]

            # Call to helper function to check if conversation history does not exceed max tokens
            valid, tkns, trimmed = maybe_truncate_history(msgs, max_tokens=self.max_tokens)

            [_, *msgs_to_send, _] = trimmed
            

Generate Response and Update History

If the length of the input is within limits, then the bot produces a response. If the system is in mock mode, it just returns a test message. Otherwise, an actual response is generated and delivered to the user. If there is a connection error in getting a response from the API, it retries for upto num_retries Incomplete messages are handled as per the include_incomplete flag which determines whether or not to add incomplete responses to the history. The code also saves the length of the history used for this response generation.

            # Handling valid and invalid token scenarios
            if valid:
                
                # Inform user if history was truncated
                if len(trimmed) < len(msgs):
                    print(f'\nDropping earliest {len(msgs) - len(trimmed)} messages from history to keep within token limits')
                
                num_api_calls = 0
                
                if self.mock:
                    # For testing response functionality
                    msg = 'Test message'
                    self._chatgpt_response(msg)
                else:    
                    # Generate response from model
                    self._chatgpt_response(newline=False)
                    while True:
                        try:
                            msg, complete = get_response(inp, system_msg=self.system_msg, msgs=msgs_to_send, return_incomplete=True) 
                            break
                        except ConnectionResetError:
                            if num_api_calls < self.num_retries:
                                num_api_calls += 1
                            else:
                                raise
                    
                    # Skip to next if incomplete messages not included in history
                    if not complete and not self.include_incomplete:
                        continue
            else:
                # If message exceeds token limit, ask user to reduce message length
                print(f'\nTotal number of {tkns} tokens exceeds max number of tokens allowed. Please try again after reducing message length.')
                continue

            # Keeping track of history size   
            self.history_size.append(len(msgs_to_send))
            

Logging and Debugging

Log details are printed if the system is in debug mode. Then, a new pair of messages is created from the user input and the generated response and added to the message history. If the log flag is set, the chat history is saved

            # Debug information provided for development and troubleshooting
            if self.debug:
                print(f"\n\nLast {self.history_size[-1]} message(s) used as history / Num tokens sent: {tkns} / Num retries: {num_api_calls + 1}")
                print("Messages sent:")
                print("="*100)
                for i in trimmed:
                    print(f'{i["role"]}: {i["content"]}')
                print("="*100)

            # Adding user and system response to chat history
            self.history.extend(format_msgs(inp, msg))

            # Saving chat history if logging is True  
            if self.log:
                self._logchat()
                

Handling Errors

Any exceptions that occur during the above process are caught, added to the bot’s error log, and displayed to the user, who is then invited to try again.

        except Exception as e:  # Exception handling for unexpected inputs or system errors
            self.errors.append(str(e))
            # Logging error details if logging is True
            if self.log:
                self._logerr()

            print(f'\nThere was the following error:\n\n{e}.\n\nPlease try again.')
            continue
            

Finally make this a method for the MinChatGPT class

MinChatGPT.chat = chat

Demo

Let us now take a look at a simple demo in debug mode to see what input is given to the model each time. We can also see how it behaves when given an empty input, how it handles exit signals and what happens when you interrupt it mid-message.

minchat = MinChatGPT(log=True, debug=True)

minchat.chat()

Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.

Your message: 
Please enter a message

Your message: Bye

MinChatGPT: Are you sure you want to quit? Enter Yes or No.

Your message: No

MinChatGPT: Ok. Let's keep chatting.

Your message: What spices and herbs go well with chocolate? Answer as a comma separated list.

MinChatGPT: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.

Last 0 message(s) used as history / Num tokens sent: 34
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
====================================================================================================

Your message: Why does cinnamon go well?

MinChatGPT: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.

Last 2 message(s) used as history / Num tokens sent: 87
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
====================================================================================================

Your message: Can you give some examples of these dishes?

MinChatGPT: Certainly, here are some examples of chocolate dishes where cinnamon can shine:

1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.

2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.

3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.

4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors

Last 4 message(s) used as history / Num tokens sent: 169
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
====================================================================================================

Your message: Ok got the idea. 

MinChatGPT: Great! If you have any other questions or need further information, feel free to ask. Enjoy your culinary adventures with chocolate and cinnamon!

Last 6 message(s) used as history / Num tokens sent: 294
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
assistant: Certainly, here are some examples of chocolate dishes where cinnamon can shine:

1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.

2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.

3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.

4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors
user: Ok got the idea. 
====================================================================================================

Your message: Goodbye!

MinChatGPT: Are you sure you want to quit? Enter Yes or No.

Your message: Yes

MinChatGPT: Goodbye!

Command Line Interface

To run this as a command line application, copy all the code from this notebook into a python file called minchatgpt.py. Then add this code to the end of the file.

if __name__ == '__main__':
    import argparse
    import os
    
    # Get key from environment instead of assigning
    openai.api_key = os.environ.get("API_KEY")
    # alternatively
    # openai.api_key_path = os.environ.get("API_KEY_PATH")

    # Define a function to parse boolean arguments
    def bool_arg(s):
        if s.lower() in ['true', 't', 'yes', 'y', '1']:
            return True
        elif s.lower() in ['false', 'f', 'no', 'n', '0']:
            return False
        else:
            raise ValueError('Boolean value expected.')

    parser = argparse.ArgumentParser(
        description='MinChatGPT: A minimalist chat app based on OpenAI\'s GPT model')
    parser.add_argument(
        '--debug', help='Run in debug mode', type=bool_arg, default=False)
    parser.add_argument(
        '--mock', help='Run in mock mode', type=bool_arg, default=False)
    parser.add_argument(
        '--log', help='Log chat history', type=bool_arg, default=True)
    parser.add_argument(
        '--logfile', type=str, default='./chatgpt.log', help='Location of chat history log file')
    parser.add_argument(
        '--errfile', type=str, default='./chatgpt.error', help='Location of error log file')
    parser.add_argument(
        '--model', type=str, default='gpt-4', help='OpenAI model to use')
    parser.add_argument(
        '--include_incomplete', type=bool_arg, default=True,
        help='Include incomplete responses in history') 
    parser.add_argument(
        '--num_retries', type=int, default=3, 
        help='Number of times to retry if there is a connection error')
    parser.add_argument(
        '--max_tokens', type=int, default=None, 
        help='Maximum number of tokens the model can handle while generating responses')

    args = parser.parse_args()
    kwargs = vars(args)

    minchat = MinChatGPT(**kwargs)
    minchat.chat()

To run the application, assign your API key to the API_KEY environment variable (or to the API_KEY_PATH environment variable). Then run python minchatgpt.py with arguments as required. For example, to run in debug mode with logging enabled, you can use the following command:

export API_KEY=YOUR_API_KEY; python minchatgpt.py --debug True --log True

Limitations

The goal of MinChatGPT is to demonstrate how a chat app can be implemented on top of a conversational language model. Whilst it serves as a useful starting point for engaging with LLMs, it has several limitations at this stage, including:

Lack of Input Moderation: MinChatGPT doesn’t filter or restrict the type of content that users can input. This can potentially lead to inappropriate or offensive messages that might violate the API’s rules.
Inability to Resume Chats or Start New Ones: The app does not provide features for resuming previous conversations from saved history or starting entirely new chats. Users are limited to a single, continuous conversation session. However it would be fairly straightforward to incorporate these features.
Limited Testing of Chat Logic: MinChatGPT’s chat logic has not been comprehensively tested with a wide range of input combinations. As a result, there may be scenarios where the chat logic behaves unexpectedly, encounters errors or does not properly handle errors.

Conclusion

In this blog post, we explored the building blocks for creating a minimalist chat-style application based on OpenAI’s GPT model within a Jupyter notebook (or command line). We discussed API interaction, token counting, conversation history truncation, and building a chat interface. You can use MinChatGPT as a starting point for building more complex and sophisticated applications. You can also modify it to make it compatible with other LLMs. I encourage you to experiment by adding features, making it more robust, extending its capabilities and adapting it to suit your requirements.