In this blog post, we will show how to use the diffusers
library to upscale images using the Stable Diffusion Upscaler model. The model is a diffusion-based super-resolution model that is capable of generating high-quality upscaled images.
The diffusers
library provides a simple and easy-to-use interface for working with the Stable Diffusion Upscaler model. This blogpost will assume you have installed the diffusers
library and have access to a GPU. If you haven’t installed the library yet, follow the instructions to install the library in the official documentation. You can also take a look at my [blog post]/2023/09/16/How-I-Ran-Textual-Inversion.html, which covers how to set up the library on an AWS EC2 instance.
Let us get started with imports and setting up a pipeline to do the upscaling. The pipeline will take care of loading the models and weights and provide a simple interface that takes as input an image and returns the upscaled image.
from PIL import Image
import numpy as np
from diffusers import StableDiffusionUpscalePipeline
import torch
# load model and scheduler
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(
model_id, revision="fp16", torch_dtype=torch.float16,
)
pipeline = pipeline.to("cuda")
/home/ubuntu/fusion/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:269: FutureWarning: You are loading the variant fp16 from stabilityai/stable-diffusion-x4-upscaler via `revision='fp16'` even though you can load it via `variant=`fp16`. Loading model variants via `revision='fp16'` is deprecated and will be removed in diffusers v1. Please use `variant='fp16'` instead.
warnings.warn(
text_encoder/model.safetensors not found
Loading pipeline components...: 100%|██████████| 6/6 [00:01<00:00, 5.41it/s]
/home/ubuntu/fusion/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py:130: FutureWarning: The configuration file of the vae does not contain `scaling_factor` or it is set to 0.18215, which seems highly unlikely. If your checkpoint is a fine-tuned version of `stabilityai/stable-diffusion-x4-upscaler` you should change 'scaling_factor' to 0.08333 Please make sure to update the config accordingly, as not doing so might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull Request for the `vae/config.json` file
deprecate("wrong scaling_factor", "1.0.0", deprecation_message, standard_warn=False)
Let us download an image of a sunflower head and use it as an example for super-resolution. The image contains a lot of texture and detail, which makes it a good candidate to demonstrate the capabilities of the Stable Diffusion model for super-resolution.
!wget https://upload.wikimedia.org/wikipedia/commons/4/44/Helianthus_whorl.jpg
--2024-02-01 23:40:22-- https://upload.wikimedia.org/wikipedia/commons/4/44/Helianthus_whorl.jpg
Resolving upload.wikimedia.org (upload.wikimedia.org)... 198.35.26.112, 2620:0:863:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|198.35.26.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 327296 (320K) [image/jpeg]
Saving to: ‘Helianthus_whorl.jpg’
Helianthus_whorl.jp 100%[===================>] 319.62K --.-KB/s in 0.02s
2024-02-01 23:40:22 (19.4 MB/s) - ‘Helianthus_whorl.jpg’ saved [327296/327296]
img = Image.open('Helianthus_whorl.jpg')
img.size
(640, 480)
The model upscales to 4x the initial size (hence stable-diffusion-x4-upscale
) so let us downscale the image to a quarter of the original dimensions and then rescale using the model.
new_size = tuple(np.floor_divide(img.size, 4).astype('int'))
low_res_img = img.resize(new_size)
low_res_img.size
(160, 120)
We can see that the downscaled image is quite blurry with a lot of the textural details lost.
import matplotlib.pyplot as plt
import numpy as np
# List of image arrays
image_arrays = [img, low_res_img]
titles = ["Original Image", "Downscaled Image" ]
# Create subplots
fig, axes = plt.subplots(1, len(image_arrays), figsize=(15, 5))
# Iterate over image arrays and titles
for i, (img_array, title) in enumerate(zip(image_arrays, titles)):
axes[i].imshow(img_array)
axes[i].set_title(title)
axes[i].axis('off')
# Adjust layout and display the plot
plt.tight_layout()
Image credit: L. Shyamal, CC BY-SA 2.5 https://creativecommons.org/licenses/by-sa/2.5, via Wikimedia Commons
Upscaling using the super-resolution model is a simple matter of calling pipeline
with the prompt and image. There are other settings that can be adjusted such as the number of iterations and the number of images to generate. Refer to the documentation for more details. Here we will use the default settings of a single image and 75 iterations.
line = pipeline(prompt = 'Sunflower head displaying the floret arrangement',
image= low_res_img)
upscaled_image = line.images[0]
100%|██████████| 75/75 [00:13<00:00, 5.42it/s]
Since this model increases the size of the image by 4x and the input was the original image downscaled by 4x, the super-resolved image should be the same size as the original image.
upscaled_image.size
(640, 480)
Below we plot the downscaled input image, the downscaled image resized using bicubic interpolation, the super-resolution model output and the original image.
import matplotlib.pyplot as plt
# List of image arrays
image_arrays = [low_res_img, low_res_img.resize(img.size), upscaled_image, img]
titles = ["Downscaled", "Bicubic interpolation", "Super-resolution", "Original"]
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
axes = axes.ravel()
# Iterate over image arrays and titles
for i, (img_array, title) in enumerate(zip(image_arrays, titles)):
axes[i].imshow(img_array)
axes[i].set_title(title, fontsize=16)
axes[i].axis('off')
# Adjust layout and display the plot
plt.tight_layout()
Image credit: L. Shyamal, CC BY-SA 2.5 https://creativecommons.org/licenses/by-sa/2.5, via Wikimedia Commons
It can be seen that the super-resolved image is superior to the one rescaled using bicubic interpolation and restores a lot of the textural detail that had gotten lost during downscaling. However it is not perfect. For example, the innermost florets have the same shape as the outer ones, whereas in the original image they are pointy and directed inwards. This area in the downscaled image is quite blurred and the model fails to recover the details of the original image.
In this simple example, we have only touched on the basics of diffusion-based super-resolution and the capabilities of the Stable Diffusion Upscaler model in order to get you started. I encourage you to explore the various settings and options that can be adjusted to obtain different and possibly better results.
]]>This blog post will guide you through the steps of creating matrices in LaTeX. It will start with the general syntax and then explain how to create row and column vectors, determinants, arbitrary sized matrices and nested matrices. It will conclude with several examples of real world matrices and the use of matrices in mathematical expressions.
In LaTeX, matrices are created using the “bmatrix” environment. The general syntax for a matrix is as follows:
\begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i \\
\end{bmatrix}
Replace the placeholders (a, b, c, etc.) with your desired matrix elements. The ampersand (&
) is used to separate columns, and the double backslash (\\
) indicates the end of a row.
Both column and row vectors are essentially specialized forms of matrices, and you can use the “bmatrix” environment to represent them.
To create a column vector, you can use the “bmatrix” environment with a single column:
\begin{bmatrix}
a \\
b \\
c \\
\end{bmatrix}
Similarly, a row vector is a matrix with a single row:
\begin{bmatrix}
a & b & c
\end{bmatrix}
LaTeX supports various bracket types for matrices.
To represent matrices with parentheses $(\cdot)$ you can use the “pmatrix” environment:
\begin{pmatrix}
a & b \\
c & d \\
\end{pmatrix}
To represent determinants, you can use the “vmatrix” or “Vmatrix” environments:
vmatrix: Vertical bars $\vert \cdot \vert$ as brackets
\begin{vmatrix}
a & b \\
c & d \\
\end{vmatrix}
Vmatrix: Double vertical $\Vert \cdot \Vert$ bars as brackets
\begin{Vmatrix}
a & b \\
c & d \\
\end{Vmatrix}
To structure data in a matrix format without any brackets, you can use the “matrix” environment:
\begin{matrix}
a & b \\
c & d \\
\end{matrix}
Sometimes you want to represent a range of elements in a matrix without explicitly listing them all. For example when you have an arbitrary $m \times n$ matrix and you want to show only a few representative elements.
For this, you can use ellipses. There are different commands for horizontal, vertical and diagonal ellipses
The \dots
(or \ldots
) command can be used to represent a range of skipped elements in a row. For example, here is row vector showing only the first two and last of $n$ elements
\begin{bmatrix}
a_1 & a_2 & \dots & a_n
\end{bmatrix}
The \vdots
command can be used to represent a range of skipped elements in a column. For example, here is column vector showing only the first two and last of $n$ elements
\begin{bmatrix}
a_1 \\
a_2 \\
\ldots \\
a_n
\end{bmatrix}
The \ddots
command, often used in combination with \dots
and \vdots
can be used to skip columns and rows simultaneously.
\begin{bmatrix}
a_{11} & \dots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{n1} & \dots & a_{nn} \\
\end{bmatrix}
Matrix elements can be arbitrary LaTeX expressions and the matrix will automatically adjust to accommodate the size of the expression.
R = \begin{bmatrix}
\cos \theta &-\sin \theta \\
\sin \theta &\cos \theta
\end{bmatrix}
\nabla \times \mathbf {F} =
\begin{vmatrix}
\boldsymbol {\hat {\imath }} & {\boldsymbol {\hat {\jmath }}} & {\boldsymbol {\hat {k}}}
\\ frac {\partial }{\partial x} & \frac {\partial }{\partial y} & \frac {\partial }{\partial z}\\F_{x} & F_{y} & F_{z}
\end{vmatrix}
W = \frac{1}{\sqrt{N}} \begin{bmatrix}
1 & 1 & 1 & 1 & \cdots & 1 \\
1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{N-1} \\
1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(N-1)} \\
1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{3(N-1)} \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
1 & \omega^{N-1} & \omega^{2(N-1)} & \omega^{3(N-1)} & \cdots & \omega^{(N-1)(N-1)}
\end{bmatrix}
where $\omega = e^{-2\pi i/N}$
You can also nest matrices within matrices. Here’s an example of a block diagonal matrix that represents an important quantum computing logic gate called the CNOT gate:
\begin{bmatrix}
I_2 & 0 \\
0 & X
\end{bmatrix}
= \begin{bmatrix}
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix} & 0 \\
0 & \begin{bmatrix}
0 & 1 \\
1 & 0
\end{bmatrix}
\end{bmatrix}
where $I_2$ is the $2 \times 2$ identity matrix and $X$ is the Pauli-X matrix
You can nest matrices to any depth as needed. For example, the matrix for another quantum logic gate, the Toffoli gate is a block diagonal matrix with a nested matrix in the bottom right corner:
\begin{bmatrix}
I_4 & 0 \\
0 & CNOT
\end{bmatrix}
= \begin{bmatrix}
I_4 & 0 \\
0 & \begin{bmatrix}
I_2 & 0 \\
0 & X
\end{bmatrix}
\end{bmatrix}
= \begin{bmatrix}
I_4 & 0 \\
0 & \begin{bmatrix}
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix} & 0 \\
0 & \begin{bmatrix}
0 & 1 \\
1 & 0
\end{bmatrix}
\end{bmatrix}
\end{bmatrix}
where $I_4$ is the $4 \times 4$ identity matrix.
Matrices can be incorporated into mathematical expressions just like any other variable. For example, here of the rotation of an arbitrary vector $\mathbf{v}$ by an angle $\theta$:
R\mathbf {v} =
\begin{bmatrix}
\cos \theta &-\sin \theta \\
\sin \theta &\cos \theta
\end{bmatrix}
\begin{bmatrix}
x\\y
\end{bmatrix}=
\begin{bmatrix}
x\cos \theta -y\sin \theta
\\x\sin \theta +y\cos \theta
\end{bmatrix}
Here is another example of the exponential of a diagonal matrix that demonstrates how matrices in LaTeX can be seamlessly integrated into a variety of mathematical expressions and how they play nicely with other LaTeX capabilities like superscripts and brackets.
\begin{aligned}
e^{\operatorname{diag}\left(
\begin{bmatrix}
a_1 \\
a_2
\end{bmatrix}
\right)} &= \exp\left(\begin{bmatrix}
a_1 & 0 \\
0 & a_2
\end{bmatrix}\right)
\\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix}
a_1 & 0 \\
0 & a_2
\end{bmatrix}^i
\\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix}
a_1^i & 0 \\
0 & a_2^i
\end{bmatrix}
\\&= \begin{bmatrix}
e^{a_1} & 0 \\
0 & e^{a_2}
\end{bmatrix}
\end{aligned}
In this blog post we covered the key features of representing matrices in LaTeX. You should now be able to create a variety of matrices and incorporate them in your LaTeX code.
Quantum computing is a rapidly growing field that leverages the principles of quantum mechanics to process information. One of the pillars of quantum computing is the quantum circuit, a model for quantum computation in which a computation is represented by a sequence of quantum gates. In this blog we will learn how to create quantum gate and quantum circuit diagrams in Python using the SymPy library.
SymPy is a Python library for symbolic mathematics. It includes a module for quantum computing called sympy.physics.quantum
which we will use to create quantum gates and circuits and to generate their diagrams. We will start from simple single qubit gates to more complex two qubit gates and finally learn to create custom gates and quantum circuits.
This tutorial assumes that you have basic knowledge of quantum computing and quantum gates. I am planning on writing a blog on quantum gates soon, but until then I recommend this Wikipedia article as a good reference for quantum gates.
Let’s start by creating an X gate, also known as a NOT gate. The X gate acts on a single qubit. It is represented by the following matrix:
\[X = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}\]SymPy has a number of commonly used quantum gates already defined. To instantiate a gate, you need to specify the qubit or qubits it acts.
Key qubit numbering conventions in SymPy:
Let’s now create an X gate acting on the first qubit.
from sympy.physics.quantum.gate import X
gate = X(0)
In order to plot our quantum gate, we’ll use the plot
function from the circuitplot
module. The plot
function takes two arguments, the gate and the number of qubits in the circuit. Let’s plot the X gate we created above.
import sympy.physics.quantum.circuitplot as plot
plot.circuit_plot(gate, 1);
Saving the generated quantum circuit figure is a straightforward process. You can use the savefig
function from the matplotlib
package to save the figure. Let’s save the figure we generated above.
import matplotlib.pyplot as plt
plt.savefig("X_gate.png");
<Figure size 432x288 with 0 Axes>
The Hadamard gate or $H$ gate is a one-qubit gate which performs a Hadamard transform on the given qubit.
It is represented by the following matrix:
The steps to create a Hadamard gate are the same as the X gate. Let’s create a Hadamard gate acting on the first qubit and plot it.
from sympy.physics.quantum.gate import H
H_gate = H(0)
plot.circuit_plot(H_gate, 1);
Controlled NOT gate, or CNOT, is a two-qubit gate which takes two inputs, a control and a target qubit and applies a NOT to the target only when the control is $\left\vert 1\right>$. We use the notation CNOT_{xy} to denote a gate where the qubit with index $x$ is the control and qubit with index $y$ is the target.
The matrix representation of CNOT_{21} i.e. CNOT with the second qubit as the control and the first qubit as the target is:
\[\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\]I should note that you may find other sources that represent the CNOT gate with the target qubit as the first qubit and the control qubit as the second qubit in which case the above would be the matrix representation of CNOT_{12}. However I have tried to use notation that is consistent the way the gates are instantiated in SymPy.
The CNOT
class in SymPy takes two arguments, the control qubit index and the target qubit index in that order. Let us first construct and plot a CNOT_{21} gate.
from sympy.physics.quantum.gate import CNOT
CNOT_21 = CNOT(1, 0) # Note the zero based indexing so that 2->1, 1->0
plot.circuit_plot(CNOT_21, 2);
Note that in the diagram the the qubits are numbered from top to bottom. The topmost wire corresponds the first index, the second wire to the second index and so on. Now let’s plot a CNOT_{12} gate, where the control and the target are reversed.
CNOT_12 = CNOT(0, 1)
plot.circuit_plot(CNOT_12, 2);
If you look at the matrix for $X$ you can see that it flips qubits turning $\left\vert 0\right>$ to $\left\vert 1\right>$ and $\left\vert 1\right>$ to $\left\vert 0\right>$. (Recall that the vector representations of the qubits are $\left\vert 0\right> = [1, 0]^T$ and $\left\vert 1\right> = [0, 1]^T$).
The gate is often represented using the symbol $\oplus$ that’s used in the CNOT gate. X
has a built in function plot_gate_plus
that can plot the gate using this symbol but it is not used by circuit_plot
. Let us subclass X
to create XNOT
, where we redefine the plot_gate
function as an alias for the plot_gate_plus
function, maintaining identical functionality otherwise so that now circuit_plot
will plot X
using the desired symbol.
class XNOT(X):
plot_gate = X.plot_gate_plus
plot.circuit_plot(XNOT(0), 1);
You can use the UGate
class to create a custom quantum gate. As an example, we’ll create a rotation gate. The rotation of a qubit by an angle $\theta$ around the Y-axis of the Bloch sphere is represented by the following matrix:
Here, $R_Y(\theta)$ represents the Y-axis rotation gate, $I$ is the identity matrix, and $Y$ is the Pauli $Y$ matrix, which is defined as:
\[Y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix}\]To create a Y rotation gate from the UGate class, you need to instantiate the class with the following arguments:
Let us write a simple function to create a Y rotation gate.
from sympy.physics.quantum.gate import UGate
import sympy as sy
def R_Y(qubit, theta):
Y_sy = sy.Matrix([[0, -sy.I], [sy.I, 0]])
gate = UGate(qubit, sy.eye(2)*sy.cos(theta/2) - sy.I * Y_sy * sy.sin(theta/2))
return gate
Let us now define an $R_Y\left(\frac{\pi}{4}\right)$ gate.
R_Y_piby4 = R_Y(0, sy.pi/4)
If you now try to plot the gate, this is what it looks like:
plot.circuit_plot(R_Y_piby4, 1);
As you can see this is labeled as a generic $U$ gate. To customize the label, you can modify the gate_name_latex
attribute of the gate. Let’s update the function accordingly.
def R_Y(qubit, theta):
Y_sy = sy.Matrix([[0, -sy.I], [sy.I, 0]])
gate = UGate(qubit, sy.eye(2)*sy.cos(theta/2) - sy.I * Y_sy * sy.sin(theta/2))
name = f'R_Y\\left({sy.latex(theta)}\\right)'
gate.gate_name_latex = name
gate.gate_name = name
return gate
Now, if you create and plot the gate again, you’ll see it is correctly labeled as $R_Y\left(\frac{\pi}{4}\right)$.
R_Y_piby4 = R_Y(0, sy.pi/4)
plot.circuit_plot(R_Y_piby4, 1);
We can also construct custom multi-qubit gates. As an example, let’s create a Toffoli gate which takes three inputs, two of which are controls whilst the other is the target. The Toffoli gate applies a NOT to target only when both controls are $\left\vert 1\right>$. It is effectively a controlled-CNOT gate.
The matrix representation of the Toffoli gate is:
\[\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}\]To construct the Toffoli we can use the CGate
class. The CGate
class takes two arguments:
Since the Toffoli gate can be represented as controlled CNOT gate, we can create a Toffoli gate as follows:
from sympy.physics.quantum.gate import CGate
control_qubit1 = 2
control_qubit2 = 1
target_qubit = 0
toffoli_1 = CGate([control_qubit1], CNOT(control_qubit2, target_qubit))
plot.circuit_plot(toffoli_1, 3);
You can also regard the Toffoli gate as a doubly-controlled NOT gate where the first two qubits are controls and the third qubit is the target. In this case, we can create a Toffoli gate (using XNOT
to ensure the NOT symbol is used in the plot) as follows:
toffoli_2 = CGate([control_qubit1, control_qubit2], XNOT(target_qubit))
plot.circuit_plot(toffoli_2, 3);
As you can see, both the representations are equivalent and lead to identical gate diagrams.
A sequence of quantum gates gives rise to a quantum circuit. You can construct a quantum circuit in SymPy by multiplying gates together in the order in which you want them to be applied in the circuit
A simple quantum circuit uses $X$ and rotation gates to create a $H$ gate. It is straightforward to show that
\[H = R_Y\left(-\frac{\pi}{4}\right)X R_Y\left(\frac{\pi}{4}\right)\]Let’s create and plot the circuit in SymPy
circuit = R_Y(0, -sy.pi/4) * X(0) * R_Y(0, sy.pi/4)
plot.circuit_plot(circuit, 1);
Note that since the gates in the circuit are applied from left to right, the order of the gates in the circuit is the reverse of the order in which they are multiplied to form $H$. To confirm that the circuit is indeed equivalent to the $H$ gate, we can use the represent
function from the represent
module to get the matrix representation of the circuit.
from sympy.physics.quantum.represent import represent
represent(circuit, nqubits=1).simplify()
$\displaystyle \left[\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right]$
which is indeed the matrix representation of the Hadamard gate.
Now we will implement a more complex circuit involving 2 qubits. The CNOT has a certain property that if you apply a Hadamard gate to both qubits at the input as well as the output, you get a CNOT gate with the control and target qubits swapped.
Let’s see how this comes about. First, a Hadamard gate applied in parallel to both qubits gives rise to the following matrix
\[H\otimes H = \frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 \end{bmatrix}\]The CNOT_{12} gate has the following matrix
\[CNOT_{12} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}\]Use these two matrices it is straightforward to show that
\[H\otimes H\cdot CNOT_{21}\cdot H\otimes H = CNOT_{12}\]Let us plot the circuit
circuit = H(0) * H(1) * CNOT_21 * H(0) * H(1)
plot.circuit_plot(circuit, 2);
We can also get the matrix representation of the circuit to confirm that it is indeed a CNOT_{12} gate.
matrix = represent(circuit, nqubits=2)
from IPython.display import Markdown
## Needed to make the matrix display correctly in the markdown document
Markdown(f'$${sy.latex(matrix)}$$')
In this blog, we learned how to create quantum gate and quantum circuit diagrams using Python’s SymPy library. We started by creating single qubit gates like the X gate and Hadamard gate. We then moved on to creating multi-qubit gates like the CNOT gate and Toffoli gate. Finally, we learned how to create custom gates and quantum circuits.
This only scratches the surface of SymPy’s quantum computing capabilities and of quantum computing in general. If you want to learn more about quantum computing, I recommend checking out the MIT Open Learning courses on Quantum Information Science (courses 8.370 and 8.371). For more about SymPy’s quantum computing capabilities, check out the SymPy documentation.
]]>In this blog post, we will explore how to implement a minimalist ChatGPT-style app in a Jupyter Notebook or command line. The goal is to provide an understanding of the important concepts, components, and techniques required to create a chat app on top of a large language model (LLM), specifically OpenAI’s GPT. The resulting chat app can serve as a foundation for creating your own customised conversational AI applications.
The code in this blog post can be found in a notebook here. The script for the command line version can be found here.
Let us begin with a quick overview of the Chat Completions endpoint of the OpenAI API, which enables you to interact with OpenAI’s large language models to generate text-based responses in a conversational manner. It’s designed for both single-turn tasks and multi-turn conversations.
Example API Call:
The provided code snippet demonstrates how to make an API call for chat completions. In this example, the chat model used is “gpt-3.5-turbo,” and a conversation is created with system, user, and assistant messages:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who wrote 'A Tale of Two Cities'?"},
{"role": "assistant", "content": "Charles Dickens wrote 'A Tale of Two Cities'."},
{"role": "user", "content": "When was it first published?"}
]
)
Message Structure:
messages
parameter, which is an array of message objects.role
(either “system,” “user,” or “assistant”) and content
(the text of the message).Typical Conversation Format:
A typical conversation format starts with a system message, followed by alternating user and assistant messages. The system message helps set the behavior of the assistant, but it’s optional. If omitted, the model’s behavior will be similar to a generic message like “You are a helpful assistant.”
Importance of Conversation History:
Including conversation history is crucial when user instructions refer to prior messages. The model has no memory of past requests, so all relevant information must be supplied within the conversation history. If a conversation exceeds the model’s token limit, it needs to be shortened.
Now let write a set of functions for communicating with OpenAI’s Chat Completions API. These functions will serve as the backbone of our minimalist chat app. Each function plays a specific role in managing the conversation, formatting messages, and handling responses.
import openai
import json
import os
import sys
# Uncomment and replace with your api key or api key path
# openai.api_key = YOUR_API_KEY
# openai.api_key_path = YOUR_API_KEY_PATH
def get_system_message(system=None):
"""
Generate a system message for the conversation.
Args:
system (str, optional): The system message content. Defaults to None.
Returns:
dict: A message object with 'role' set to 'system' and 'content' containing the system message.
"""
if system is None:
system = "You are a helpful assistant."
return {"role": "system", "content": system}
get_system_message
is responsible for creating a system message. This message is optional but can be used to set the behavior of the assistant. If no system message is provided, it defaults to “You are a helpful assistant.” The function returns a message object with ‘role’ set to ‘system’ and ‘content’ containing the system message.def get_response(msg,
system_msg=None,
msgs=[], model='gpt-4',
return_incomplete=False):
"""
Get a response from the Chat Completions API.
Args:
msg (str): The user's message.
system_msg (str, optional): The system message. Defaults to None.
msgs (list, optional): Previous conversation messages. Defaults to an empty list.
model (str, optional): The chat model to use. Defaults to 'gpt-4'.
return_incomplete (bool, optional): Whether to return incomplete responses. Defaults to False.
Returns:
list or tuple: A list of response chunks if not returning incomplete, or a tuple containing the list of chunks and a completion status.
"""
_stream_response = openai.ChatCompletion.create(
model=model,
messages=[
system_msg if system_msg is not None else get_system_message(),
*msgs,
{"role": "user", "content": msg}
],
stream=True
)
_chunks = []
if return_incomplete:
complete = False
try:
for _chunk in _stream_response:
_delta = _chunk['choices'][0]['delta']
# Last will be empty
if 'content' in _delta:
sys.stdout.write(_delta['content'])
_chunks.append(_chunk)
# Raise KeyboardInterrupt if return_incomplete is False
except KeyboardInterrupt:
if not return_incomplete:
raise
return _chunks if not return_incomplete else (_chunks, complete)
get_response
is the core function for obtaining a response from the Chat Completions API. It takes the user’s message, an optional system message, previous messages, the model to use, and a flag to indicate whether incomplete responses should be returned.stream=True
to stream the response chunks._chunks
.return_incomplete
is set to True
, the function returns a result even if the stream interrupted. In this case the function returns a tuple containing the list of chunks and a completion status. If return_incomplete
is False
, it only returns the result when the full stream is processed and returns only the list of chunks.def stream2msg(stream):
"""
Convert a stream of response chunks into a single message.
Args:
stream (list): A list of response chunks.
Returns:
str: A single message containing the concatenated content of the response chunks.
"""
return "".join([i["choices"][0]["delta"].get("content", "") for i in stream])
stream2msg
is a utility function that converts a stream of response chunks into a single message. It takes a list of response chunks as input and concatenates the content of each chunk to form a complete message.def format_msgs(inp, ans):
"""
Format user input and model's response into message objects.
Args:
inp (str): User input message.
ans (str or list): Model's response message as a string or a list of response stream chunks
Returns:
list: A list containing user and assistant message objects.
"""
msg_inp = {"role": "user", "content": inp}
msg_ans = {"role": "assistant", "content": stream2msg(ans) if not isinstance(ans, str) else ans}
return [msg_inp, msg_ans]
format_msgs
takes the user’s input and the model’s response (which can be a message string or a list of response chunks) and creates a list containing message objects for both the user and the assistant, which can subsequently be used in the conversation history.Before we delve into the implementation details, let us briefly discuss token counting. Tokens are chunks of text that language models use to process and generate responses. It’s crucial to keep track of token counts, as they impact the cost and feasibility of using the API. Token counting includes both input and output tokens. This means that not only the messages you send to the model but also the responses you receive contribute to the total token count.
The exact way tokens are counted can vary between different model versions. The function below for counting tokens is adapted from the OpenAI API documentation (date 05.09.2023). It was written for gpt-3.5-turbo-0613
and serves as a reference. The documentation adds this caveat.
The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate.
Depending on the model, the value returned by the function might not be exact but it will be a decent estimate that suffices for this simple example.
It’s also worth noting that each model has a maximum token limit. Exact details for each model are available in the Models section of the documentation. For example it is 8192 for gpt-4
and 4097 for gpt-3.5-turbo
. In our example, we are using the model’s maximum token limit, but in practice, you may want to use a lower value to ensure that both input and output tokens are within the limit.
import tiktoken
def num_tokens_from_messages(messages, model="gpt-4"):
"""Returns the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
num_tokens = 0
for message in messages:
num_tokens += 4 # every message follows <im_start>{role/name}\n{content}<im_end>\n
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name": # if there's a name, the role is omitted
num_tokens += -1 # role is always required and always 1 token
num_tokens += 2 # every reply is primed with <im_start>assistant
return num_tokens
num_tokens_from_messages
is a function that takes a list of messages as input and returns the estimated number of tokens used by those messages.
It uses the tiktoken
library to calculate token counts. The function attempts to get the token encoding for the specified model. If it encounters a KeyError (indicating an unsupported model), it falls back to the “cl100k_base” encoding, which is a reasonable default.
The function initializes num_tokens
to 0, which will be used to accumulate the token count.
<im_start>
, role or name, content, and <im_end>
tags.<im_start>assistant
.In the context of managing conversations with language models, it’s crucial to ensure that the conversation history remains within the model’s token limit. To achieve this, we have a function called maybe_truncate_history
which helps truncate the conversation history when it approaches or exceeds the maximum token limit.
Here’s an overview of this function and its purpose:
def maybe_truncate_history(msgs, max_tokens, model='gpt-4', includes_input=True):
msgs_new = []
if msgs[0]['role'] == 'system':
msgs_new.append(msgs[0])
start = 1
msgs = msgs[1:]
if includes_input:
# At least the last message should be included if input
msgs_new.append(msgs[-1])
msgs = msgs[:-1]
# First ensure that input (and maybe system) messages don't exceed token limit
tkns = num_tokens_from_messages(msgs_new)
if tkns > max_tokens:
return False, tkns, []
# Then retain latest messages that fit within token limit
for msg in msgs[::-1]:
msgs_tmp = msgs_new[:1] + [msg] + msgs_new[1:]
tkns = num_tokens_from_messages(msgs_tmp)
if tkns <= max_tokens:
msgs_new = msgs_tmp
else:
break
return True, tkns, msgs_new
maybe_truncate_history
is designed to manage the length of conversation history within the token limit of the model. It takes as input the current list of messages (msgs
), the maximum token limit (max_tokens
), the model name (defaults to gpt-4
). There is also a flag indicating whether the input is present in the messages to ensure it is not dropped.
If the first message in the conversation history is a system message, it is added to msgs_new
, and start
is set to 1. This step is necessary because system messages should not be truncated.
If includes_input
is True
, the last message (usually the user’s input) is added to msgs_new
, and it’s removed from the msgs
list.
The function first checks if the token count of the messages in msgs_new
exceeds the max_tokens
. If it does, it returns False
, the token count (tkns
), and an empty list to indicate that the conversation history cannot be accommodated within the token limit.
Next, the function attempts to retain the latest messages that fit within the token limit. It iterates through the msgs
list in reverse order, gradually adding messages to msgs_tmp
. If the token count of msgs_tmp
is within the max_tokens
, it updates msgs_new
with msgs_tmp
. This ensures that the conversation history retains as much context as possible while staying within the token limit.
The function returns True
to indicate that the conversation history has been successfully truncated to fit within the token limit. It also returns the updated token count (tkns
) and the modified msgs_new
.
This is a simple approach to manage token counts which drops entire messages to keep to the token limit but there are more sophisticated approaches that you could try such as summarising or filtering earlier parts of the conversation.
Finally we are in a position to implement our minimalist chat app. With the MinChatGPT
class, we define the foundation of our minimalist chat app. First let us set up the class and add some helper functions. Subsequently we will implement the conversation functionality.
class MinChatGPT(object):
"""
A simplified ChatGPT chatbot implementation.
Parameters:
system: A system-related parameter(optional).
model: The OpenAI model to use; restricted to 'gpt-3.5-turbo' and 'gpt-4'.
log: Boolean that decides if logging is required or not.
logfile: The location of the file where chat logs will be stored.
errfile: The location of the file where error logs will be stored.
include_incomplete: Boolean that decides if incomplete responses are to be included in the history or not.
num_retries: The number of times to retry if there is a connection error
mock: Boolean that decides if the system is in testing mode.
debug: Boolean that decides if the system should go into debug mode.
max_tokens: Maximum number of tokens the model can handle while generating responses.
"""
def __init__(self,
system=None,
model='gpt-4',
log=True,
logfile='./chatgpt.log',
errfile='./chatgpt.error',
include_incomplete=True, # whether to include incomplete responses in history
num_retries=3,
mock=False,
debug=False,
max_tokens=None):
"""
Initializes a MinChatGPT instance with provided parameters.
"""
# For simplicity restrict to these two
assert model in ['gpt-3.5-turbo', 'gpt-4'] # Ensures the model parameter is valid
# System & GPT Model related parameters
self.system = system
self.system_msg = get_system_message(system) # Retrieve system message if available
self.model = model
# logging related parameters
self.log = log
self.logfile = logfile
self.errfile = errfile
# Behavioural flags
self.include_incomplete = include_incomplete
self.num_retries = num_retries
self.mock = mock
self.debug = debug
# History and error storage related parameters
self.history = []
self.history_size = []
self.errors = []
# Setting maximum tokens model can handle, defaults are provided for the two specified models
self.max_tokens = {'gpt-4': 8192, 'gpt-3.5-turbo': 4097}[model] if max_tokens is None else max_tokens
def _logerr(self):
with open(self.errfile, 'w') as f:
f.write('\n'.join(self.errors))
def _logchat(self):
with open(self.logfile, 'w') as f:
json.dump(fp=f, obj={'history': self.history, 'history_size': self.history_size}, indent=4)
def _chatgpt_response(self, msg="", newline=True):
sys.stdout.write(f'\nMinChatGPT: {msg}' + ('\n' if newline else ''))
__init__
method initializes our chatbot instance with parameters, such as the system message, the OpenAI model to be used, and several behavioural flags for logging, debugging, or testing (mock).self.history
, self.history_size
, and self.errors
, for tracking the chat history and potential errors.max_tokens
parameter sets the limit for tokens the model can handle, defaulting to the restrictions of the chosen model._logerr
and _logchat
, saving error logs and chat logs respectively to specified locations._chatgpt_response
for printing the bot’s response to the console.However we have not yet implemented the main functionality of the chatbot, which is to manage the conversation. Let us go ahead and implement a chat
method that enables the user to interact with the model.
The chat
method is the main entry point for initiating a conversation with the MinChatGPT chatbot. It manages user interaction, input processing, handling special cases like an ‘exit’ command or an empty message, generating responses, and logging information if desired.
Here is a detailed walkthrough of the chat
method
def chat(self):
"""
Initiates a chat session with the user. During the chat session, the chatbot will receive user message inputs,
process them and generate appropriate responses.
The chat session will continue indefinitely until the user enters a termination command like "Bye", "Goodbye", "exit",
or "quit". The function also logs the chat session, and any errors that occur during the session.
"""
# Maybe_exit flag for controlling the exit prompt
maybe_exit = False
# Welcome message for user
print('Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.')
User Interaction
The core of the chat
method is an infinite while loop that simulates a conversation. The user is requested for an input message, which is then handled in the loop. To allow the user to end the conversation at any point, the code checks for certain phrases such as “bye”, “goodbye”, “exit”, or “quit”.
# Main chat loop
while True:
# Capture user input
inp = input('\nYour message: ')
Handling Empty Messages
If the user input is an empty string, the method reminds the user to enter a message and goes back to the start of the loop to ask again.
try:
# Handling empty input from user
if len(inp) == 0:
print('Please enter a message')
continue
Exiting
If the previous input appeared to have indicated an intention to exit (maybe_exit == True
), the user is asked for confirmation.
If the user gives an affirmative response, the bot replies with a goodbye and breaks the loop to end the conversation.
If the user does not want to exit, the bot continues to chat.
# Case insensitive user input
stripped_lowered_inp = inp.strip().lower()
# Handling user's confirmation on exit
if maybe_exit:
if stripped_lowered_inp in ['y', 'yes', 'yes!', 'yes.']:
self._chatgpt_response('Goodbye!')
break
else:
self._chatgpt_response("Ok. Let's keep chatting.")
maybe_exit = False
continue
Intention to exit
This simple approach determines if the user input is matches any of the exit signals.
If it does, the maybe_exit
flag is assigned a value of True
and in the next interaction the user is asked for confirmation.
You could also try more sophisticated approaches that get the model to infer whether the user wishes to end the conversation.
# Checking if user wants to exit
if stripped_lowered_inp in [
'exit', 'exit()', 'exit!', 'exit.',
'quit', 'quit()', 'quit!', 'quit.',
'bye', 'bye!', 'bye.',
'goodbye', 'goodbye!', 'goodbye.'
]:
maybe_exit = True
self._chatgpt_response('Are you sure you want to quit? Enter Yes or No.')
continue
Process User Inputs
The code next deals with non-empty, non-exit user inputs. It prepares the message history to be sent to the OpenAI model by appending the user’s new message. The history is then checked to ensure it doesn’t exceed the max token limit of the model. If the history is too long, we inform the user, don’t produce a response, and again loop to the start for a new input.
# Preparing message history before calling the model
msgs = [self.system_msg, *self.history, {'role': 'user', 'content': inp}]
# Call to helper function to check if conversation history does not exceed max tokens
valid, tkns, trimmed = maybe_truncate_history(msgs, max_tokens=self.max_tokens)
[_, *msgs_to_send, _] = trimmed
Generate Response and Update History
If the length of the input is within limits, then the bot produces a response. If the system is in mock mode, it just returns a test message. Otherwise, an actual response is generated and delivered to the user.
If there is a connection error in getting a response from the API, it retries for upto num_retries
Incomplete messages are handled as per the include_incomplete
flag which determines whether or not to add incomplete responses to the history.
The code also saves the length of the history used for this response generation.
# Handling valid and invalid token scenarios
if valid:
# Inform user if history was truncated
if len(trimmed) < len(msgs):
print(f'\nDropping earliest {len(msgs) - len(trimmed)} messages from history to keep within token limits')
num_api_calls = 0
if self.mock:
# For testing response functionality
msg = 'Test message'
self._chatgpt_response(msg)
else:
# Generate response from model
self._chatgpt_response(newline=False)
while True:
try:
msg, complete = get_response(inp, system_msg=self.system_msg, msgs=msgs_to_send, return_incomplete=True)
break
except ConnectionResetError:
if num_api_calls < self.num_retries:
num_api_calls += 1
else:
raise
# Skip to next if incomplete messages not included in history
if not complete and not self.include_incomplete:
continue
else:
# If message exceeds token limit, ask user to reduce message length
print(f'\nTotal number of {tkns} tokens exceeds max number of tokens allowed. Please try again after reducing message length.')
continue
# Keeping track of history size
self.history_size.append(len(msgs_to_send))
Logging and Debugging
Log details are printed if the system is in debug mode. Then, a new pair of messages is created from the user input and the generated response and added to the message history. If the log flag is set, the chat history is saved
# Debug information provided for development and troubleshooting
if self.debug:
print(f"\n\nLast {self.history_size[-1]} message(s) used as history / Num tokens sent: {tkns} / Num retries: {num_api_calls + 1}")
print("Messages sent:")
print("="*100)
for i in trimmed:
print(f'{i["role"]}: {i["content"]}')
print("="*100)
# Adding user and system response to chat history
self.history.extend(format_msgs(inp, msg))
# Saving chat history if logging is True
if self.log:
self._logchat()
Handling Errors
Any exceptions that occur during the above process are caught, added to the bot’s error log, and displayed to the user, who is then invited to try again.
except Exception as e: # Exception handling for unexpected inputs or system errors
self.errors.append(str(e))
# Logging error details if logging is True
if self.log:
self._logerr()
print(f'\nThere was the following error:\n\n{e}.\n\nPlease try again.')
continue
Finally make this a method for the MinChatGPT
class
MinChatGPT.chat = chat
Let us now take a look at a simple demo in debug
mode to see what input is given to the model each time. We can also see how it behaves when given an empty input, how it handles exit signals and what happens when you interrupt it mid-message.
minchat = MinChatGPT(log=True, debug=True)
minchat.chat()
Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.
Your message:
Please enter a message
Your message: Bye
MinChatGPT: Are you sure you want to quit? Enter Yes or No.
Your message: No
MinChatGPT: Ok. Let's keep chatting.
Your message: What spices and herbs go well with chocolate? Answer as a comma separated list.
MinChatGPT: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
Last 0 message(s) used as history / Num tokens sent: 34
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
====================================================================================================
Your message: Why does cinnamon go well?
MinChatGPT: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
Last 2 message(s) used as history / Num tokens sent: 87
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
====================================================================================================
Your message: Can you give some examples of these dishes?
MinChatGPT: Certainly, here are some examples of chocolate dishes where cinnamon can shine:
1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.
2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.
3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.
4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors
Last 4 message(s) used as history / Num tokens sent: 169
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
====================================================================================================
Your message: Ok got the idea.
MinChatGPT: Great! If you have any other questions or need further information, feel free to ask. Enjoy your culinary adventures with chocolate and cinnamon!
Last 6 message(s) used as history / Num tokens sent: 294
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
assistant: Certainly, here are some examples of chocolate dishes where cinnamon can shine:
1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.
2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.
3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.
4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors
user: Ok got the idea.
====================================================================================================
Your message: Goodbye!
MinChatGPT: Are you sure you want to quit? Enter Yes or No.
Your message: Yes
MinChatGPT: Goodbye!
To run this as a command line application, copy all the code from this notebook into a python file called minchatgpt.py
. Then add this code to the end of the file.
if __name__ == '__main__':
import argparse
import os
# Get key from environment instead of assigning
openai.api_key = os.environ.get("API_KEY")
# alternatively
# openai.api_key_path = os.environ.get("API_KEY_PATH")
# Define a function to parse boolean arguments
def bool_arg(s):
if s.lower() in ['true', 't', 'yes', 'y', '1']:
return True
elif s.lower() in ['false', 'f', 'no', 'n', '0']:
return False
else:
raise ValueError('Boolean value expected.')
parser = argparse.ArgumentParser(
description='MinChatGPT: A minimalist chat app based on OpenAI\'s GPT model')
parser.add_argument(
'--debug', help='Run in debug mode', type=bool_arg, default=False)
parser.add_argument(
'--mock', help='Run in mock mode', type=bool_arg, default=False)
parser.add_argument(
'--log', help='Log chat history', type=bool_arg, default=True)
parser.add_argument(
'--logfile', type=str, default='./chatgpt.log', help='Location of chat history log file')
parser.add_argument(
'--errfile', type=str, default='./chatgpt.error', help='Location of error log file')
parser.add_argument(
'--model', type=str, default='gpt-4', help='OpenAI model to use')
parser.add_argument(
'--include_incomplete', type=bool_arg, default=True,
help='Include incomplete responses in history')
parser.add_argument(
'--num_retries', type=int, default=3,
help='Number of times to retry if there is a connection error')
parser.add_argument(
'--max_tokens', type=int, default=None,
help='Maximum number of tokens the model can handle while generating responses')
args = parser.parse_args()
kwargs = vars(args)
minchat = MinChatGPT(**kwargs)
minchat.chat()
To run the application, assign your API key to the API_KEY
environment variable (or to the API_KEY_PATH
environment variable). Then run python minchatgpt.py
with arguments as required. For example, to run in debug mode with logging enabled, you can use the following command:
export API_KEY=YOUR_API_KEY; python minchatgpt.py --debug True --log True
The goal of MinChatGPT is to demonstrate how a chat app can be implemented on top of a conversational language model. Whilst it serves as a useful starting point for engaging with LLMs, it has several limitations at this stage, including:
Lack of Input Moderation: MinChatGPT doesn’t filter or restrict the type of content that users can input. This can potentially lead to inappropriate or offensive messages that might violate the API’s rules.
Inability to Resume Chats or Start New Ones: The app does not provide features for resuming previous conversations from saved history or starting entirely new chats. Users are limited to a single, continuous conversation session. However it would be fairly straightforward to incorporate these features.
Limited Testing of Chat Logic: MinChatGPT’s chat logic has not been comprehensively tested with a wide range of input combinations. As a result, there may be scenarios where the chat logic behaves unexpectedly, encounters errors or does not properly handle errors.
In this blog post, we explored the building blocks for creating a minimalist chat-style application based on OpenAI’s GPT model within a Jupyter notebook (or command line). We discussed API interaction, token counting, conversation history truncation, and building a chat interface. You can use MinChatGPT as a starting point for building more complex and sophisticated applications. You can also modify it to make it compatible with other LLMs. I encourage you to experiment by adding features, making it more robust, extending its capabilities and adapting it to suit your requirements.
]]>In this blog I outline the steps I took in setting up an AWS EC2 instance to run the Hugging Face Diffusers Textual Inversion tutorial. Note that this is not a tutorial about Textual Inversion or Diffusion models. Instead it extends the official tutorial by providing the steps I took to setup an EC2 instance, install the code and run the example.
From the Hugging Face tutorial
Textual Inversion is a training technique for personalizing image generation models with just a few example images of what you want it to learn. This technique works by learning and updating the text embeddings (the new embeddings are tied to a special word you must use in the prompt) to match the example images you provide.
The repo has a link to run on Colab but I wanted the convenience of running it on an EC2 instance. The process I follows is admittedly hacky and by no means the best or most efficient way to do it but it was quick and it worked. The idea behind this blogpost is that if you can get it running without too much frustration then you will be motivated to take your learning further and explore the topic in more depth and do things in a more robust way.
The code in this blog is for PyTorch version of the example but there is also a Jax version for which I refer you to the tutorial. You could also probably use a similar process to run any other examples in Diffusers
I think I have given all the commands that I ran but I may have missed some so let me know if something does not work.
These are the key details of my AWS EC2 instance (values in square brackets are options to choose or input)
Instance type [g4dn.xlarge]
1 x [140] GiB [gp2] Root volume (Not encrypted)
1 x [100] GiB [gp2] EBS volume (Not encrypted)
I used the Deep Learning AMI (Ubuntu 18.04) Version 64.2 (ami-04d05f63d9566224b).
I also used an existing security group that I had set up previously with an inbound rule with Type All traffic
.
I added the following to my ~/.ssh/config
file on my local machine
Host fusion
AddKeysToAgent yes
HostName <YOUR_INSTANCE_IP_ADDRESS of the form ec2-xx-xxx-xxx-xxx.eu-west-1.compute.amazonaws.com>
IdentityFile <LOCATION_OF_YOUR_PRIVATE_KEY>
User ubuntu
LocalForward localhost:8892 localhost:8892
LocalForward localhost:6014 localhost:6014
I added port forwarding for jupyter (8892) and tensorboard (6014) so that I could access them from my local machine. Then I could run the following to connect to the instance
ssh fusion
The instructions here state that Diffusers has been tested using Python 3.8+.
The instance had Python 3.6 and the installation of the libraries failed so here is what I did to install Python 3.8
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.8
While doing this, I encountered the following error
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
I attempted to rectify this by running the following commands. I can’t recall where I found this solution but some similar solutions are suggested here and here.
sudo rm /var/lib/dpkg/lock-frontend
sudo dpkg --configure -a
The issue did not resolve and I had to reboot the instance and then run the commands again and then the installation worked.
To install pip
I ran the following commands
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.8 get-pip.py
To use create virtual environments with venv
I needed to install python3.8-venv
sudo apt install python3.8-venv
Finally I set up a virtual environment in my home directory called fusion
with the following command
python3.8 -m venv fusion
Now I was in a position to following the installation instructions in the tutorial.
First I activated the virtual environment
source ~/fusion/bin/activate
Unless otherwise specified, all the python commands in this article are intended to be run from within the virtual environment.
Then I installed the libraries
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
I got the following error
FileNotFoundError: [errno 2] no such file or directory: '/tmp/pip-build-ioswilqu/safetensors/setup.py'
According to this StackOverflow post the solution is to run the following command
pip3 install --upgrade pip
Then re-run the installation command to complete the installation.
The next step is to install the dependencies for the example after navigating to the examples/textual_inversion
.
cd examples/textual_inversion
pip install -r requirements.txt
Accelerate is a library from Hugging Face that helps train on multiple GPUs/TPUs or with mixed-precision. It automatically configures the training setup based on your hardware and environment. You can initialise it with a custom configuration, or use the default configuration.
For the custom configuration, the command is
accelerate config
I used the default configuration
accelerate config default
After you run the command it will tell you where the configuration file is located. For me it was /home/ubuntu/.cache/huggingface/accelerate/default_config.yaml
. Ensure that use_cpu
is set to false
to enable GPU training. Here is what my configuration file looked like
{
"compute_environment": "LOCAL_MACHINE",
"debug": false,
"distributed_type": "NO",
"downcast_bf16": false,
"machine_rank": 0,
"main_training_function": "main",
"mixed_precision": "no",
"num_machines": 1,
"num_processes": 1,
"rdzv_backend": "static",
"same_network": false,
"tpu_use_cluster": false,
"tpu_use_sudo": false,
"use_cpu": false
}
Then I created a new file in examples/textual_inversion
called run.py
and added the following code from the tutorial to download the mini dataset for the example
from huggingface_hub import snapshot_download
local_dir = "./cat"
snapshot_download(
"diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes"
)
Now call the run.py
file to download the dataset
python3 run.py
The dataset is very small, containing only six images of cat toys, as shown below
Next I created another file run.sh
in examples/textual_inversion
and added the following code from the tutorial making only the change of leaving out the
--push_to_hub
flag as I did not want to push the model to the Hugging Face Hub
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="./cat"
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<cat-toy>" \
--initializer_token="toy" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 \
--scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="textual_inversion_cat"
Note that in textual_inversion.py
the special word used for textual inversion is input via flag placeholder_token
which in the example is, unsurprisingly, <cat-toy>
.
Then I tried to run the example
sh run.sh
but I kept getting this warning
UserWarning: CUDA initialization: The NVIDIA driver on your system is too old
The rest of the warning also suggested downloading a newer driver from here. I went to the page on my local machine and selected the following options
Product Type: Data Center / Tesla
Product Series: T-Series
Product: Tesla T4
Operating System: Linux 64-bit
CUDA Toolkit: 12.2
Language: English (US)
I clicked “Download” in the next page
and then right-clicked on the “Agree & Download” button on the subsequent page and copied the link address
Back in the instance I downloaded the driver using the link
wget https://us.download.nvidia.com/tesla/535.129.03/NVIDIA-Linux-x86_64-535.129.03.run
Then I installed the driver
sudo sh NVIDIA-Linux-x86_64-535.129.03.run
and went through the installation process accepting the default options.
Following the installation, the warning message disappeared and I was able to run the example. The example took about a couple of hours to run I think although I left it running and did not monitor the time exactly. It saves tensorboard logs which can be viewed as follows
tensorboard --logdir textual_inversion_cat --port <YOUR_PORT>
You can now view this on the browswer on whichever port you have forwarded to the instance. For me it was localhost:6014
.
Note that I needed to install six
to get this to work
pip install six
Note that checkpoints and weights are saved in the textual_inversion_cat
directory in the examples/textual_inversion
folder. If you want to redo the training for any reason, delete this directory.
So that I could disconnect from the instance and the process would continue running, I ran it on tmux. To do so run
tmux new -s <YOUR_SESSION_NAME>
Then run the example, making sure the virtual environment is activated and you are in the examples/textual_inversion
directory. To leave the tmux session press Ctrl+b
and then d
. To reattach to the session run. To reattach to the session run
tmux attach -t <YOUR_SESSION_NAME>
To delete the session either run exit
within the session or from outside the session run
tmux kill-session -t <YOUR_SESSION_NAME>
The tutorial provides an inference script which I have slightly modified to run in a Jupyter notebook.
First I needed to install jupyterlab and to register the virtual environment with jupyter
pip install jupyterlab
python -m ipykernel install --user --name=fusion
Then I created a notebook in inference.ipynb
in examples/textual_inversion
and ran a cell with this code to setup the model
from diffusers import StableDiffusionPipeline
import torch
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_textual_inversion("sd-concepts-library/cat-toy")
Now you can generate an image, noting that the placeholder_token
<cat-toy>
must be present in the prompt
image = pipeline("A <cat-toy> train", num_inference_steps=50).images[0]
Since image
is a PIL
image, you can view in the notebook by simply running
image
Here are some ideas for what to do next
In this blogpost we will implement Example 4.2: Jack’s Car Rental from Chapter 4 Reinforcement Learning (Sutton and Barto aka the RL book). This is an example of a problem involving a finite Markov Decision Process for which policy iteration is used to find an optimal policy.
I strongly suggest that you study Chapter 4 of the book and that you have a go at implementing the example yourself using this blogpost as a reference in case you get stuck.
Here is a slightly modified version of the description of the problem given in Chapter 4.3 Policy Iteration of the RL book.
Example 4.2: Jack’s Car Rental
Jack manages two locations for a nationwide car rental company. Each day, some number of customers arrive at each location to rent cars. If Jack has a car available, he rents it out and is credited \$10 by the national company. If he is out of cars at that location, then the business is lost. Cars become available for renting the day after they are returned. To help ensure that cars are available where they are needed, Jack can move them between the two locations overnight, at a cost of \$2 per car moved. We assume that the number of cars requested and returned at each location are Poisson random variables, meaning that the probability that the number is $n$ is $\frac{\lambda^n}{n!}e^{\lambda}$, where $\lambda$ is the expected number. Suppose $\lambda$ is 3 and 4 for rental requests at the first and second locations and 3 and 2 for returns. To simplify the problem slightly, we assume that there can be no more than 20 cars at each location (any additional cars are returned to the nationwide company, and thus disappear from the problem) and a maximum of five cars can be moved from one location to the other in one night. We take the discount rate to be \gamma = 0.9 and formulate this as a continuing finite MDP, where the time steps are days, the state is the number of cars at each location at the end of the day, and the actions are the net numbers of cars moved between the two locations overnight.
A Markov Decision Process (MDP) is a mathematical framework used in reinforcement learning to describe an environment. It provides a formalism to make sequential decisions under uncertainty, with an assumption that the future depends only on the current state and not on the past states. An MDP is described by a tuple (S, A, P, R), where S represents states, A represents actions, P is the state transition probability, and R is the reward function.
Using the MDP model, we can characterize Jack’s Car Rental problem as follows. The state is represented by the number of cars at each location at the end of the day, the actions correspond to the number of cars moved from location 1 to location 2, where a negative number means the cars were moved from location 2 to location 1 instead. The state transition probability depends on the number of cars rented and returned, which follow Poisson distributions. The reward function is defined by the profit made from renting cars and the cost of moving cars between locations.
A finite MDP is one in which the sets of states, actions, and rewards (S, A, and R) all have a finite number of elements. Since the states, rewards and actions are integer valued and take on a finite number of values e.g. the action is an integer between -5 and +5 , this is a finite MDP.
Let us now define the problem more formally.
The probability of ending up in state $s’$ and receiving reward $r$ given that we started in state $s$ and took action $a$, $p(s’, r \vert s, a)$, is often described in the RL book as the four argument function. It is used in the policy evaluation step of policy iteration to calculate the value of a state under a given policy.
Let us now define $p(s’, r \vert s, a)$ for this problem
Since each combination $(n_{r, 1}, n_{r, 2}, n_{b, 1}, n_{b, 2})$ is independent we can sum over their probabilities.
Let us now implement the four argument function $p(s’, r \vert s, a)$ for this problem. We will implement it in two ways. First we will implement it in a way that is easy to understand and then we will implement it in a way that is faster to run.
To start with let us import the necessary libraries and define some helper functions.
import numpy as np
import random
import pandas as pd
from scipy.stats import poisson
import itertools
import matplotlib.pyplot as plt
def get_allowed_acts(state):
# Returns the allowed actions for a given state
n1, n2 = state
acts = [0] # can always move no cars
# We can move upto 5 cars from 1 to 2
# but no more than 20-n2
# so that the total at 2 does not exceed 20
for i in range(1, min(21-n2, n1 + 1, 6)):
acts.append(i)
# The actions are defined as the number of cars moved from 1 to 2
# so if cars are moved in the opposite direction the action is negative
for i in range(1, min(21-n1, n2 + 1, 6)):
acts.append(-i)
return acts
def get_state_after_act(state, act):
# Returns the intermediate state after action
# i.e. the number of cars at each location
# after cars are moved between locations
return (min(state[0] - act, 20), min(state[1] + act, 20))
To start with let us implement the four argument function in a way that is easy to follow. We will use this to clarify our understanding of the problem and to verify the more efficient vectorised implementation that we will implement later.
def get_four_arg_iter(state, act):
First steps
Find the intermediate state after action and set up the parameters and function for the Poisson distributions.
num_after_move = get_state_after_act(state, act)
lam_b1, lam_b2 = 3, 2
lam_r1, lam_r2 = 3, 4
x1, x2 = num_after_move
def prob_fn(x, cond, lam):
# If condition is true use P(X=x), if false use 1-P(X<=x-1) = P(X >= x)
return poisson.pmf(x, mu=lam) if cond else 1 - poisson.cdf(x-1, mu=lam)
Probabilities for numbers of cars rented and added back
For each location we go through all the valid values of $n_r$ and $n_b$, calculate the next state, rental credit and the probability of $(s’, r)$ pair to which these values give rise.
location_dicts = [dict(), dict()]
for idx, (xi, lam_ri, lam_bi) in enumerate(zip((x1, x2), (lam_r1, lam_r2), (lam_b1, lam_b2))):
for nr_i in range(0, xi+1):
p_nri = prob_fn(nr_i, nr_i < xi, lam_ri)
n_space_i = 20 - (xi - nr_i)
for nb_i in range(0, n_space_i + 1):
p_nbi = prob_fn(nb_i, nb_i < n_space_i, lam_bi)
s_next_i = xi - nr_i + nb_i
location_dicts[idx][(nr_i, nb_i)] = (s_next_i, 10 * nr_i, p_nri * p_nbi)
Four argument function
Next we combine the states from the two locations, calculate the total reward and $p(s’, r|s, a, n_{b,1}, n_{b,2}, n_{r,1}, n_{r,2})$ by multiplying the probabilities from each location.
We then accumulate the probabilities for the $(s’, r)$ pairs given each combination of $n_{b,1}, n_{b,2}, n_{r,1}, n_{r,2}$ to arrive at the values of $p(s’, r|s, a)$
psrsa = dict()
move_cost = 2 * act
for (nb1, nr1), (s_next1, r1, prob1) in location_dicts[0].items():
for (nb2, nr2), (s_next2, r2, prob2) in location_dicts[1].items():
s_next = (s_next1, s_next2)
r = -move_cost + r1 + r2
key = (s_next, r)
prob = prob1 * prob2
if key not in psrsa:
psrsa[key] = prob
else:
psrsa[key] += prob
return psrsa
Whilst straightforward to understand, this implementation is quite slow as we are looping through all the valid values of $n_{r,i}$ and $n_{b,i}$ for each location.
s_init = (12, 10)
n_move = 4
%timeit get_four_arg_iter(s_init, n_move)
56.6 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
During the policy iteration algorithm the function will be called multiple times so ideally we need a faster running time which we can achieve by vectorising the implementation.
The function will use numpy to calculate the probabilities for all pairs of requests and returns at a given location in parallel. It will then use pandas to group by the $(s’,r)$ pairs to which each $(n_{r,1}, n_{r,2}, n_{b,1}, n_{b,2})$ combination gives rise, in an efficient manner.
Let use first define a helper function that creates a unique index for each state which makes it easy to iterate though the states whilst running the algorithm.
def get_idx(s1, s2):
# Flatten the 2d state space into a 1d space
return s1 * 21 + s2
Now we can implement the four argument function in a vectorised manner.
def get_four_arg_vect(state, act):
First steps
Find the intermediate state after action and set up the parameters and function for the Poisson distributions.
num_after_move = get_state_after_act(state, act)
lam_b1, lam_b2 = 3, 2
lam_r1, lam_r2 = 3, 4
x1, x2 = num_after_move
def prob_fn(x, cond, lam):
# If condition is true use P(X=x), if false use 1-P(X<=x-1) = P(X >= x)
return np.where(cond,
poisson.pmf(x, mu=lam),
1 - poisson.cdf(x-1, mu=lam))
Probabilities and next states for each location
This is similar to the iterative implementation above but we calculate the probabilities and next states each combination of nr and nb for a given location at the same time rather than looping through them.
One subtlety is that the maximum value of $n_{b,i}$ is $20 - (x_i - n_{r,i})$ but a simple vectorised approach not involving structures like ragged arrays requires all rows of the array to have the same number of columns. To handle this we masking to filter out the invalid combinations.
location_arrs = []
for xi, lam_ri, lam_bi in zip((x1, x2), (lam_r1, lam_r2), (lam_b1, lam_b2)):
# Define nr, calculate probability of nr given xi and calculate the number of spaces for nb
# Shape: [xi + 1]
nr_i = np.arange(0, xi+1)
p_nri = prob_fn(nr_i, nr_i < xi, lam_ri)
n_space_i = 20 - (xi - nr_i)
# All the possible values of nb that can arise from a given xi
# Not all lead to valid combinations of (nr, nb) so we will mask them out later
# Note that np.max(n_space_i) = 20 - xi
# Shape: [20 - xi + 1]
nb_i = np.arange(np.max(n_space_i) + 1)
# Note that the condition is nb_i < n_space_i[:, None]
# which has shape [xi + 1, 20 - xi + 1].
# This ensures we calculate probabilities with a different upper limit for each row.
p_nbi = prob_fn(nb_i, np.less(nb_i, n_space_i[:, None]), lam_bi)
# Mask to exclude invalid combinations of (nr, nb)
# which occur when nb exceeds the number of spaces available
# Shape: [xi + 1, 20 - xi + 1]
mask_i = np.less_equal(nb_i, n_space_i[:, None])
# Select the valid pairs
nr_i, nb_i = np.where(mask_i)
# Find value of next state and probability of next state
s_next_i = xi - nr_i + nb_i
prob_nbi_nri = (p_nbi * p_nri[:, None])[mask_i]
location_arrs.append((s_next_i, nr_i, prob_nbi_nri))
Combine the states
At this point we have all the valid combinations of $(n_{r,1}, n_{b,1})$ and $(n_{r,2}, n_{b,2})$ We now combine them to get all the valid combinations of $(n_{r,1}, n_{b,1}, n_{r,2}, n_{b,2})$ and the states, rewards and probabilities that arise from them.
(s_next1, s_next2), (n_rent1, n_rent2), (prob1, prob2) = [
map(np.ravel, np.meshgrid(arr1, arr2))
for (arr1, arr2) in zip(*location_arrs)
]
n_rent = n_rent1 + n_rent2
prob = prob1 * prob2
Final Dataframe
We store the $(s’, r_c)$ rather than $(s’, r)$ pairs in the dataframe
but we can easily convert the former to the latter by subtracting the cost of moving the cars.
We saw that $p(s’, r \vert s, a)$ depends only on the intermediate state
$x = (x_1, x_2)$ so storing the values in this way let us reuse them for all the $(s, a)$ pairs that lead to given $x$.
Since the cost 2*abs(n_a) is constant for all the (s’, r) given (s, a)
the grouping in groupby will be independent of this constant offset.
Note also the probabilities in the prob
column are correct despite this offset because
as noted earlier $p(s’, r \vert s, a) = p(s’, r + 2\vert n_a \vert \vert s, a)$
df = pd.DataFrame(
{'s1': s_next1,
's2': s_next2,
'nr': n_rent,
'prob': prob
}
)
df = df.groupby(['s1', 's2', 'nr'], as_index=False).prob.sum()
df['r'] = df['nr'] * 10
# Add flat index to help iterate through states
df['idx'] = get_idx(df['s1'], df['s2'])
return df
Let us verify this method matches for the same values of s_init
and n_move
pdict_iter = get_four_arg_iter(s_init, n_move)
z = get_four_arg_vect(s_init, n_move)
pdict_vect = z.assign(rr = -2*abs(n_move) + z['r']).set_index(['s1','s2', 'rr']).to_dict()['prob']
pdict_vect = {((_s1, _s2), _r): p for (_s1, _s2, _r), p in pdict_vect.items()}
assert set(pdict_iter) == set(pdict_vect)
for i in pdict_iter:
assert np.isclose(pdict_iter[i], pdict_vect[i])
del z, pdict_vect, pdict_iter
We can also see that the function runs considerably faster than the iterative implementation.
%timeit get_four_arg_vect(s_init, n_move)
9.01 ms ± 365 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
To find an optimal policy we start with some arbitrary policy and repeat the following until convergence
Because the Jack’s car rental MDP is finite, it has only a finite number of policies which means that policy iteration must converge to an optimal policy and the optimal value function in a finite number of iterations.
To get started let us define the state space and some other values that we will need later. Since the state space is finite it can be stored in a finite array. We will use a 1d array to store the states and a dictionary to map the index of each state to the state itself. We will also define some parameters including the discount factor $\gamma$ and the threshold $\theta$ for the policy evaluation step.
state_tuples = list(itertools.product(range(21), range(21)))
state_idx = [get_idx(*s) for s in state_tuples]
states = dict(zip(state_idx, state_tuples))
gamma = 0.9
theta = 1e-6
Next we will write a helper function that finds
\[\sum_{s',r} p(s',r \vert s, a)\left[r + \gamma V(s')\right]\]The function assumes the existence of a cache where $p(s’,r \vert s, a)$ dataframes are stored keyed by the intermediate states $x$ following the action. As noted earlier we store the $(s’, r + 2\vert n_a \vert)$ rather than the $(s’, r)$ pairs in the dataframe in order to be able to use the dataframe for different $(s, a)$ pairs that give rise to the same intermediate state $x$.
For instance all the following pairs of $(s, a)$ give rise to the same $x = (19, 17)$
\[((16, 20), -3) \longrightarrow (19, 17) \\ ((17, 19), -2) \longrightarrow (19, 17) \\ ((18, 18), -1) \longrightarrow (19, 17) \\ ((19, 17), 0) \longrightarrow (19, 17) \\ ((20, 16), 1) \longrightarrow (19, 17)\]Using a cache can help avoid unnecessary calculations and speed up the running time of the algorithm.
def get_value(state, action, values, cache, gamma=0.9):
#
start = get_state_after_act(state, action)
if start not in cache:
cache[start] = get_four_arg_vect(state, action)
psrsa_df = cache[start]
# Select values of next states
V_s_next = values[psrsa_df['idx'].values].values
# Find final reward by subtracting move cost
rental_credit = psrsa_df['r'].values
move_cost = abs(action) * 2
r = rental_credit - move_cost
# Calculate value
psrsa = psrsa_df['prob'].values
val = ((gamma * V_s_next + r) * psrsa)
return val.sum()
Now will implement and run policy iteration. The algorithm statement given below is from Policy Iteration (using iterative policy evaluation) for estimating $\pi \approx \pi_*$ found in Chapter 4.3 Policy Iteration of the RL book.
1. Initialization
$V(s) \in \mathcal{R}$ and $\pi(s) \in A(s)$ arbitrarily for all $s \in \mathcal{S}$
# Also initialise a cache and a `history` array to store the intermediate results for plotting
cache = dict()
V = pd.Series(index=state_idx, data=np.zeros(len(states)))
pi = pd.Series(index=state_idx, data=np.zeros(len(states)))
delta_vals = {}
history = []
iters = 0
while True:
2. Policy Evaluation
\[\begin{aligned} &\text{Loop:} \\ &\quad \quad \Delta \leftarrow 0 \\ &\quad \quad \text{Loop for each $s \in \mathcal{S}$:} \\ &\quad \quad \quad \quad v \leftarrow V(s) \\ &\quad \quad \quad \quad V(s) \leftarrow \sum_{s', r} p\left(s',r\vert s, \pi(s)\right)\left[\gamma V(s) + r \right] \\ &\quad \quad \Delta \leftarrow \max\left(\Delta, \left\lvert v - V(s)\right\rvert\right) \\ &\text{until $\Delta < \theta$ (a small positive number determining the accuracy of estimation)} \end{aligned}\]
print('Starting Policy Evaluation')
iters += 1
policy_eval_iters = 0
delta_vals[iters] = []
while True:
delta = 0
for idx in (state_idx):
v = V[idx]
V[idx] = get_value(states[idx], pi[idx], V, cache, gamma)
delta = max(delta, abs(v - V[idx]))
policy_eval_iters += 1
sys.stdout.write('\rIteration {}, Policy Eval Iteration {}, Δ = {:.5e}'.format(iters, str(policy_eval_iters).rjust(2, ' '), delta));
delta_vals[iters].append(delta)
if delta < theta:
break
print()
print('Policy Evaluation done')
print()
3. Policy Improvement
\[\begin{aligned} &policy\text{-}stable \leftarrow true \\ &\text{For each $s \in \mathcal{S}$:} \\ &\quad \quad old\text{-}action \leftarrow \pi(s) \\ &\quad \quad \pi(s) \leftarrow \arg\max_a\sum_{s', r} p\left(s',r\vert s, a\right)\left[\gamma V(s) + r \right] \\ &\quad \quad \text{If $old\text{-}action \neq \pi(s)$, then $policy\text{-}stable \leftarrow false$} \\ &\text{If $policy\text{-}stable$ the stop and return $V \approx v_*$ and $\pi \approx \pi_*$; else go to $2$} \ \end{aligned}\]
print('Starting Policy Iteration')
policy_stable = True
history.append(pi.copy(deep=True))
for idx in (state_idx):
s = states[idx]
act = pi[idx]
actions = get_allowed_acts(s)
values_for_acts = [get_value(s, a, V, cache, gamma) for a in actions]
best_act_ind = np.argmax(values_for_acts)
pi[idx] = actions[best_act_ind]
if (act != pi[idx]):
policy_stable = False
if policy_stable:
print('Done')
break
else:
print('Policy changed - repeating eval')
print()
print('-' * 100)
Starting Policy Evaluation
Iteration 1, Policy Eval Iteration 96, Δ = 8.99483e-07
Policy Evaluation done
Starting Policy Iteration
Policy changed - repeating eval
----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 2, Policy Eval Iteration 76, Δ = 9.01651e-07
Policy Evaluation done
Starting Policy Iteration
Policy changed - repeating eval
----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 3, Policy Eval Iteration 70, Δ = 9.62239e-07
Policy Evaluation done
Starting Policy Iteration
Policy changed - repeating eval
----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 4, Policy Eval Iteration 52, Δ = 8.39798e-07
Policy Evaluation done
Starting Policy Iteration
Policy changed - repeating eval
----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 5, Policy Eval Iteration 17, Δ = 7.18887e-07
Policy Evaluation done
Starting Policy Iteration
Done
We can visualise the progress of the algorithm in the plots below. Evidently policy evaluation converges faster across iterations of the algorithm and after the first few iterations attains a lower final value.
Having successfully run the algorithm, let us visualise the results by creating a figure similar to Figure 4.2 in the RL book. The figure shows the initial policy $\pi_0$ and the policy after each iteration as a heatmap for each combination of the number of cars at each location, $(s_1, s_2)$. It also shows the value function $v_{\pi_4}$ as a 3D plot for each state.
The policies are in a flattened form so we reshape them into a 2D grid for plotting the results in the form of heatmaps. We also reshape the states and values into 21x21
pi_grid = np.reshape([pi[i] for i in state_idx], (21, 21))
idx_grid = np.reshape([states[i] for i in state_idx], (21, 21, 2))
V_grid = np.reshape([V[i] for i in state_idx], (21, 21))
Here is a helper function that adds text labels to the grid. The function takes the states, values, mask and offsets for the text labels as arguments. It also takes a threshold and a boolean lower
which determines whether the mask is applied to values lower or higher than the threshold. The function then plots the values as text on the grid, with the text colour set to white if the mask is satisfied and black otherwise.
def plot_grid(states, values, mask, offx, offy, th, axis=None, lower=True):
# Assumes a colormesh has already been plotted on the axis
# and adds text labels to the grid
axis = plt.gca() if axis is None else axis
for state, value, m in zip(states, values, mask):
axis.text(state[1]+offx,state[0]+offy,value,
color='white' if (m < th if lower else m > th) else 'k')
Finally let us make the plots. The plots are arranged in a 2x3 grid. The first 5 plots show the policies $\pi_0, \pi_1, \pi_2, \pi_3, \pi_4$ and the last plot shows the value function $v_{\pi_4}$.
fig = plt.figure(figsize=(33, 22))
gs = plt.GridSpec(2, 3)
fontsize = 19
cmap = 'YlOrRd'
for t in range(len(history) + 1):
if t < 5:
pi_grid_i = np.reshape([history[t][i] for i in state_idx], (21, 21))
axis = plt.subplot(gs[t])
axis.pcolormesh(pi_grid_i, cmap=cmap)
plot_grid(
idx_grid.reshape([-1, 2]),
pi_grid_i.reshape([-1]),
pi_grid_i.reshape([-1]),
.25, .25, 2, axis=axis, lower=False
)
axis.set_aspect('equal', adjustable='box')
axis.set_title(f'$\pi_{t}$', fontsize=fontsize)
else:
axis = plt.subplot(gs[-1], projection='3d')
X = idx_grid[..., 1]
Y = idx_grid[..., 0]
Z = V_grid
axis.plot_surface(X, Y, Z, cmap='summer', linewidth=0, antialiased=False)
axis.view_init(60, -60)
axis.set_xlim([0, 20])
axis.set_ylim([0, 20]);
axis.set_title('$v_{\\pi_4}$', fontsize=25)
if t in [3, 5]:
space = '\n'*2 if t == 5 else ''
axis.set_ylabel(f'{space}No. cars at location 1', fontsize=fontsize)
axis.set_xlabel(f'{space}No. cars at location 2', fontsize=fontsize)
axis.tick_params(labelsize=fontsize)
Here is an animated version showing how the algorithm progresses towards the final result
And here is an interactive version of the final state-value function