Minibatch AI

A quick introduction to super-resolution using Stable Diffusion and the Diffusers library

2024-02-02T00:00:00+00:00

In this blog post, we will show how to use the diffusers library to upscale images using the Stable Diffusion Upscaler model. The model is a diffusion-based super-resolution model that is capable of generating high-quality upscaled images.

The diffusers library provides a simple and easy-to-use interface for working with the Stable Diffusion Upscaler model. This blogpost will assume you have installed the diffusers library and have access to a GPU. If you haven’t installed the library yet, follow the instructions to install the library in the official documentation. You can also take a look at my [blog post]/2023/09/16/How-I-Ran-Textual-Inversion.html, which covers how to set up the library on an AWS EC2 instance.

Let us get started with imports and setting up a pipeline to do the upscaling. The pipeline will take care of loading the models and weights and provide a simple interface that takes as input an image and returns the upscaled image.

from PIL import Image
import numpy as np
from diffusers import StableDiffusionUpscalePipeline
import torch


# load model and scheduler
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(
    model_id, revision="fp16", torch_dtype=torch.float16,

)
pipeline = pipeline.to("cuda")

/home/ubuntu/fusion/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py:269: FutureWarning: You are loading the variant fp16 from stabilityai/stable-diffusion-x4-upscaler via `revision='fp16'` even though you can load it via `variant=`fp16`. Loading model variants via `revision='fp16'` is deprecated and will be removed in diffusers v1. Please use `variant='fp16'` instead.
  warnings.warn(
text_encoder/model.safetensors not found
Loading pipeline components...: 100%|██████████| 6/6 [00:01<00:00,  5.41it/s]
/home/ubuntu/fusion/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py:130: FutureWarning: The configuration file of the vae does not contain `scaling_factor` or it is set to 0.18215, which seems highly unlikely. If your checkpoint is a fine-tuned version of `stabilityai/stable-diffusion-x4-upscaler` you should change 'scaling_factor' to 0.08333 Please make sure to update the config accordingly, as not doing so might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull Request for the `vae/config.json` file
  deprecate("wrong scaling_factor", "1.0.0", deprecation_message, standard_warn=False)

Let us download an image of a sunflower head and use it as an example for super-resolution. The image contains a lot of texture and detail, which makes it a good candidate to demonstrate the capabilities of the Stable Diffusion model for super-resolution.

!wget https://upload.wikimedia.org/wikipedia/commons/4/44/Helianthus_whorl.jpg

--2024-02-01 23:40:22--  https://upload.wikimedia.org/wikipedia/commons/4/44/Helianthus_whorl.jpg
Resolving upload.wikimedia.org (upload.wikimedia.org)... 198.35.26.112, 2620:0:863:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|198.35.26.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 327296 (320K) [image/jpeg]
Saving to: ‘Helianthus_whorl.jpg’

Helianthus_whorl.jp 100%[===================>] 319.62K  --.-KB/s    in 0.02s   

2024-02-01 23:40:22 (19.4 MB/s) - ‘Helianthus_whorl.jpg’ saved [327296/327296]

img = Image.open('Helianthus_whorl.jpg')
img.size

(640, 480)

The model upscales to 4x the initial size (hence stable-diffusion-x4-upscale) so let us downscale the image to a quarter of the original dimensions and then rescale using the model.

new_size = tuple(np.floor_divide(img.size, 4).astype('int'))
low_res_img = img.resize(new_size)
low_res_img.size

(160, 120)

We can see that the downscaled image is quite blurry with a lot of the textural details lost.

import matplotlib.pyplot as plt
import numpy as np

# List of image arrays
image_arrays = [img, low_res_img]
titles = ["Original Image", "Downscaled Image" ]

# Create subplots
fig, axes = plt.subplots(1, len(image_arrays), figsize=(15, 5))

# Iterate over image arrays and titles
for i, (img_array, title) in enumerate(zip(image_arrays, titles)):
    axes[i].imshow(img_array)
    axes[i].set_title(title)
    axes[i].axis('off')

# Adjust layout and display the plot
plt.tight_layout()

Image credit: L. Shyamal, CC BY-SA 2.5 https://creativecommons.org/licenses/by-sa/2.5, via Wikimedia Commons

Upscaling using the super-resolution model is a simple matter of calling pipeline with the prompt and image. There are other settings that can be adjusted such as the number of iterations and the number of images to generate. Refer to the documentation for more details. Here we will use the default settings of a single image and 75 iterations.

line = pipeline(prompt = 'Sunflower head displaying the floret arrangement',
                          image= low_res_img)

upscaled_image = line.images[0]

100%|██████████| 75/75 [00:13<00:00,  5.42it/s]

Since this model increases the size of the image by 4x and the input was the original image downscaled by 4x, the super-resolved image should be the same size as the original image.

upscaled_image.size

(640, 480)

Below we plot the downscaled input image, the downscaled image resized using bicubic interpolation, the super-resolution model output and the original image.

import matplotlib.pyplot as plt
# List of image arrays

image_arrays = [low_res_img, low_res_img.resize(img.size), upscaled_image, img]
titles = ["Downscaled", "Bicubic interpolation", "Super-resolution", "Original"]

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
axes = axes.ravel()

# Iterate over image arrays and titles
for i, (img_array, title) in enumerate(zip(image_arrays, titles)):
    axes[i].imshow(img_array)
    axes[i].set_title(title, fontsize=16)
    axes[i].axis('off')

# Adjust layout and display the plot
plt.tight_layout()

Image credit: L. Shyamal, CC BY-SA 2.5 https://creativecommons.org/licenses/by-sa/2.5, via Wikimedia Commons

It can be seen that the super-resolved image is superior to the one rescaled using bicubic interpolation and restores a lot of the textural detail that had gotten lost during downscaling. However it is not perfect. For example, the innermost florets have the same shape as the outer ones, whereas in the original image they are pointy and directed inwards. This area in the downscaled image is quite blurred and the model fails to recover the details of the original image.

In this simple example, we have only touched on the basics of diffusion-based super-resolution and the capabilities of the Stable Diffusion Upscaler model in order to get you started. I encourage you to explore the various settings and options that can be adjusted to obtain different and possibly better results.

How to create matrices in LaTeX

2023-12-29T23:00:00+00:00

Image credit: Jahobr, CC0, via Wikimedia Commons

General Syntax
Column and Row Vectors
- Column Vector
- Row Vector
Different Bracket Types
Use of Ellipses
Some example matrices
Nested matrices
Matrices in Mathematical Expressions
Conclusion

This blog post will guide you through the steps of creating matrices in LaTeX. It will start with the general syntax and then explain how to create row and column vectors, determinants, arbitrary sized matrices and nested matrices. It will conclude with several examples of real world matrices and the use of matrices in mathematical expressions.

General Syntax

In LaTeX, matrices are created using the “bmatrix” environment. The general syntax for a matrix is as follows:

\begin{bmatrix}
    a & b & c \\
    d & e & f \\
    g & h & i \\
\end{bmatrix}

\[\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \\ \end{bmatrix}\]

Replace the placeholders (a, b, c, etc.) with your desired matrix elements. The ampersand (&) is used to separate columns, and the double backslash (\\) indicates the end of a row.

Column and Row Vectors

Both column and row vectors are essentially specialized forms of matrices, and you can use the “bmatrix” environment to represent them.

Column Vector

To create a column vector, you can use the “bmatrix” environment with a single column:

  \begin{bmatrix}
      a \\
      b \\
      c \\
  \end{bmatrix}

\[\begin{bmatrix} a \\ b \\ c \\ \end{bmatrix}\]

Row Vector

Similarly, a row vector is a matrix with a single row:

  \begin{bmatrix}
      a & b & c
  \end{bmatrix}

\[\begin{bmatrix} a & b & c \end{bmatrix}\]

Different Bracket Types

LaTeX supports various bracket types for matrices.

Parentheses

To represent matrices with parentheses $(\cdot)$ you can use the “pmatrix” environment:

\begin{pmatrix}
    a & b \\
    c & d \\
\end{pmatrix} 

\[\begin{pmatrix} a & b \\ c & d \\ \end{pmatrix}\]

Determinants

To represent determinants, you can use the “vmatrix” or “Vmatrix” environments:

vmatrix: Vertical bars $\vert \cdot \vert$ as brackets

\begin{vmatrix}
    a & b \\
    c & d \\
\end{vmatrix}

\[\begin{vmatrix} a & b \\ c & d \\ \end{vmatrix}\]

Vmatrix: Double vertical $\Vert \cdot \Vert$ bars as brackets

\begin{Vmatrix}
    a & b \\
    c & d \\
\end{Vmatrix}

\[\begin{Vmatrix} a & b \\ c & d \\ \end{Vmatrix}\]

No brackets

To structure data in a matrix format without any brackets, you can use the “matrix” environment:

\begin{matrix}
    a & b \\
    c & d \\
\end{matrix}

\[\begin{matrix} a & b \\ c & d \end{matrix}\]

Use of Ellipses

Sometimes you want to represent a range of elements in a matrix without explicitly listing them all. For example when you have an arbitrary $m \times n$ matrix and you want to show only a few representative elements.

For this, you can use ellipses. There are different commands for horizontal, vertical and diagonal ellipses

Horizontal

The \dots (or \ldots) command can be used to represent a range of skipped elements in a row. For example, here is row vector showing only the first two and last of $n$ elements

\begin{bmatrix}
    a_1 & a_2 & \dots & a_n
\end{bmatrix}

\[\begin{bmatrix} a_1 & a_2 & \dots & a_n \end{bmatrix}\]

Vertical

The \vdots command can be used to represent a range of skipped elements in a column. For example, here is column vector showing only the first two and last of $n$ elements

\begin{bmatrix}
    a_1 \\
    a_2 \\
    \ldots \\
    a_n
\end{bmatrix}

\[\begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix}\]

Diagonal

The \ddots command, often used in combination with \dots and \vdots can be used to skip columns and rows simultaneously.

\begin{bmatrix}
    a_{11} & \dots & a_{1n} \\
    \vdots & \ddots & \vdots \\
    a_{n1} & \dots & a_{nn} \\
\end{bmatrix}

\[\begin{bmatrix} a_{11} & \dots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nn} \\ \end{bmatrix}\]

Some example matrices

Matrix elements can be arbitrary LaTeX expressions and the matrix will automatically adjust to accommodate the size of the expression.

Example: 2x2 rotation matrix

R = \begin{bmatrix}
    \cos \theta &-\sin \theta \\
    \sin \theta &\cos \theta
\end{bmatrix}

\[R = \begin{bmatrix} \cos \theta &-\sin \theta \\ \sin \theta &\cos \theta \end{bmatrix}\]

Example: curl of a vector field expressed as a determinant

\nabla \times \mathbf {F} =
\begin{vmatrix}
  \boldsymbol {\hat {\imath }} & {\boldsymbol {\hat {\jmath }}} & {\boldsymbol {\hat {k}}}
  \\ frac {\partial }{\partial x} & \frac {\partial }{\partial y} & \frac {\partial }{\partial z}\\F_{x} & F_{y} & F_{z}
\end{vmatrix}

\[\nabla \times \mathbf {F} = \begin{vmatrix} \boldsymbol {\hat {\imath }} & {\boldsymbol {\hat {\jmath }}} & {\boldsymbol {\hat {k}}} \\ \frac {\partial }{\partial x} & \frac {\partial }{\partial y} & \frac {\partial }{\partial z}\\ F_{x} & F_{y} & F_{z} \end{vmatrix}\]

Example: N-point discrete Fourier transform matrix

W = \frac{1}{\sqrt{N}} \begin{bmatrix}
1 & 1 & 1 & 1 & \cdots & 1 \\
1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{N-1} \\
1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(N-1)} \\
1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{3(N-1)} \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
1 & \omega^{N-1} & \omega^{2(N-1)} & \omega^{3(N-1)} & \cdots & \omega^{(N-1)(N-1)}
\end{bmatrix}

\[W = \frac{1}{\sqrt{N}} \begin{bmatrix} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{N-1} \\ 1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(N-1)} \\ 1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{3(N-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{N-1} & \omega^{2(N-1)} & \omega^{3(N-1)} & \cdots & \omega^{(N-1)(N-1)} \end{bmatrix}\]

where $\omega = e^{-2\pi i/N}$

Nested matrices

You can also nest matrices within matrices. Here’s an example of a block diagonal matrix that represents an important quantum computing logic gate called the CNOT gate:

\begin{bmatrix}
I_2 & 0 \\
0 & X
\end{bmatrix}
= \begin{bmatrix}
    \begin{bmatrix}
        1 & 0 \\
        0 & 1 
    \end{bmatrix} & 0 \\
    0 & \begin{bmatrix}
        0 & 1 \\
        1 & 0 
        \end{bmatrix}
\end{bmatrix}

\[\begin{bmatrix} I_2 & 0 \\ 0 & X \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & 0 \\ 0 & \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \end{bmatrix}\]

where $I_2$ is the $2 \times 2$ identity matrix and $X$ is the Pauli-X matrix

You can nest matrices to any depth as needed. For example, the matrix for another quantum logic gate, the Toffoli gate is a block diagonal matrix with a nested matrix in the bottom right corner:

\begin{bmatrix}
I_4 & 0 \\
0 & CNOT
\end{bmatrix}
= \begin{bmatrix}
I_4 & 0 \\
0 & \begin{bmatrix}
I_2 & 0 \\
0 & X
\end{bmatrix}
\end{bmatrix}
= \begin{bmatrix}
    I_4 & 0 \\
    0 & \begin{bmatrix}
        \begin{bmatrix}
        1 & 0 \\
        0 & 1 
    \end{bmatrix} & 0 \\
    0 & \begin{bmatrix}
        0 & 1 \\
        1 & 0 
        \end{bmatrix}
        \end{bmatrix}
\end{bmatrix}

\[\begin{bmatrix} I_4 & 0 \\ 0 & CNOT \end{bmatrix} = \begin{bmatrix} I_4 & 0 \\ 0 & \begin{bmatrix} I_2 & 0 \\ 0 & X \end{bmatrix} \end{bmatrix} = \begin{bmatrix} I_4 & 0 \\ 0 & \begin{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & 0 \\ 0 & \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \end{bmatrix} \end{bmatrix}\]

where $I_4$ is the $4 \times 4$ identity matrix.

Matrices in Mathematical Expressions

Matrices can be incorporated into mathematical expressions just like any other variable. For example, here of the rotation of an arbitrary vector $\mathbf{v}$ by an angle $\theta$:

R\mathbf {v} = 
  \begin{bmatrix}
  \cos \theta &-\sin \theta \\
  \sin \theta &\cos \theta 
  \end{bmatrix}
  \begin{bmatrix}
    x\\y
  \end{bmatrix}=
  \begin{bmatrix}
  x\cos \theta -y\sin \theta 
  \\x\sin \theta +y\cos \theta 
  \end{bmatrix}

\[R\mathbf {v} = \begin{bmatrix} \cos \theta &-\sin \theta \\ \sin \theta &\cos \theta \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix}= \begin{bmatrix} x\cos \theta -y\sin \theta \\x\sin \theta +y\cos \theta \end{bmatrix}\]

Here is another example of the exponential of a diagonal matrix that demonstrates how matrices in LaTeX can be seamlessly integrated into a variety of mathematical expressions and how they play nicely with other LaTeX capabilities like superscripts and brackets.

\begin{aligned}
e^{\operatorname{diag}\left(
\begin{bmatrix}
    a_1 \\
    a_2 
\end{bmatrix}
\right)} &= \exp\left(\begin{bmatrix}
    a_1 & 0 \\
    0 & a_2 
\end{bmatrix}\right)
\\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix}
    a_1 & 0 \\
    0 & a_2 
\end{bmatrix}^i
\\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix}
    a_1^i & 0 \\
    0 & a_2^i
\end{bmatrix}
\\&= \begin{bmatrix}
    e^{a_1} & 0 \\
    0 & e^{a_2}
\end{bmatrix}
\end{aligned}

\[\begin{aligned} e^{\operatorname{diag}\left( \begin{bmatrix} a_1 \\ a_2 \end{bmatrix} \right)} &= \exp\left(\begin{bmatrix} a_1 & 0 \\ 0 & a_2 \end{bmatrix}\right) \\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix} a_1 & 0 \\ 0 & a_2 \end{bmatrix}^i \\&= \sum_{i=0}^\infty \frac{1}{i!} \begin{bmatrix} a_1^i & 0 \\ 0 & a_2^i \end{bmatrix} \\&= \begin{bmatrix} e^{a_1} & 0 \\ 0 & e^{a_2} \end{bmatrix} \end{aligned}\]

Conclusion

In this blog post we covered the key features of representing matrices in LaTeX. You should now be able to create a variety of matrices and incorporate them in your LaTeX code.

How to Create Quantum Gate Diagrams in Python

2023-12-21T00:00:00+00:00

Generated using DALL-E-2 with the prompt "An abstract digital art painting of a quantum gate"

Introduction
Creating a Single Qubit Quantum Gate - X gate
Creating a Diagram
Saving the Diagram Figure
Single Gate Example: Hadamard
Two Qubit Gate Example: CNOT
Creating Custom Gates
Creating Circuit Diagrams
- Single Qubit Circuit - Hadamard in terms of $X$ and rotation gates
- Multi-Qubit Circuit - using Hadamard gates to swap CNOT control and target
Conclusion

Introduction

Quantum computing is a rapidly growing field that leverages the principles of quantum mechanics to process information. One of the pillars of quantum computing is the quantum circuit, a model for quantum computation in which a computation is represented by a sequence of quantum gates. In this blog we will learn how to create quantum gate and quantum circuit diagrams in Python using the SymPy library.

SymPy is a Python library for symbolic mathematics. It includes a module for quantum computing called sympy.physics.quantum which we will use to create quantum gates and circuits and to generate their diagrams. We will start from simple single qubit gates to more complex two qubit gates and finally learn to create custom gates and quantum circuits.

This tutorial assumes that you have basic knowledge of quantum computing and quantum gates. I am planning on writing a blog on quantum gates soon, but until then I recommend this Wikipedia article as a good reference for quantum gates.

Creating a Single Qubit Quantum Gate - X gate

Let’s start by creating an X gate, also known as a NOT gate. The X gate acts on a single qubit. It is represented by the following matrix:

\[X = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}\]

SymPy has a number of commonly used quantum gates already defined. To instantiate a gate, you need to specify the qubit or qubits it acts.

Key qubit numbering conventions in SymPy:

Qubits are zero-indexed, so the first qubit is 0, the second is 1, and so on.
Qubits are numbered starting from the bottom so the wire at the bottom of the diagram stands for qubit 0, the next one above is 1, and so on.

Let’s now create an X gate acting on the first qubit.

from sympy.physics.quantum.gate import X

gate = X(0)

Creating a Diagram

In order to plot our quantum gate, we’ll use the plot function from the circuitplot module. The plot function takes two arguments, the gate and the number of qubits in the circuit. Let’s plot the X gate we created above.

import sympy.physics.quantum.circuitplot as plot
plot.circuit_plot(gate, 1);

Saving the Diagram Figure

Saving the generated quantum circuit figure is a straightforward process. You can use the savefig function from the matplotlib package to save the figure. Let’s save the figure we generated above.

import matplotlib.pyplot as plt
plt.savefig("X_gate.png");

Single Gate Example: Hadamard

The Hadamard gate or $H$ gate is a one-qubit gate which performs a Hadamard transform on the given qubit.
It is represented by the following matrix:

\[H = \frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1 \\ 1 & -1\end{bmatrix}\]

The steps to create a Hadamard gate are the same as the X gate. Let’s create a Hadamard gate acting on the first qubit and plot it.

from sympy.physics.quantum.gate import H
H_gate = H(0)
plot.circuit_plot(H_gate, 1);

Two Qubit Gate Example: CNOT

Controlled NOT gate, or CNOT, is a two-qubit gate which takes two inputs, a control and a target qubit and applies a NOT to the target only when the control is $\left\vert 1\right>$. We use the notation CNOT_xy to denote a gate where the qubit with index $x$ is the control and qubit with index $y$ is the target.

The matrix representation of CNOT₂₁ i.e. CNOT with the second qubit as the control and the first qubit as the target is:

\[\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\]

I should note that you may find other sources that represent the CNOT gate with the target qubit as the first qubit and the control qubit as the second qubit in which case the above would be the matrix representation of CNOT₁₂. However I have tried to use notation that is consistent the way the gates are instantiated in SymPy.

The CNOT class in SymPy takes two arguments, the control qubit index and the target qubit index in that order. Let us first construct and plot a CNOT₂₁ gate.

from sympy.physics.quantum.gate import CNOT
CNOT_21 = CNOT(1, 0) # Note the zero based indexing so that 2->1, 1->0
plot.circuit_plot(CNOT_21, 2);

Note that in the diagram the the qubits are numbered from top to bottom. The topmost wire corresponds the first index, the second wire to the second index and so on. Now let’s plot a CNOT₁₂ gate, where the control and the target are reversed.

CNOT_12 = CNOT(0, 1)
plot.circuit_plot(CNOT_12, 2);

Creating Custom Gates

X Gate represented as a NOT gate

If you look at the matrix for $X$ you can see that it flips qubits turning $\left\vert 0\right>$ to $\left\vert 1\right>$ and $\left\vert 1\right>$ to $\left\vert 0\right>$. (Recall that the vector representations of the qubits are $\left\vert 0\right> = [1, 0]^T$ and $\left\vert 1\right> = [0, 1]^T$).

The gate is often represented using the symbol $\oplus$ that’s used in the CNOT gate. X has a built in function plot_gate_plus that can plot the gate using this symbol but it is not used by circuit_plot. Let us subclass X to create XNOT, where we redefine the plot_gate function as an alias for the plot_gate_plus function, maintaining identical functionality otherwise so that now circuit_plot will plot X using the desired symbol.

class XNOT(X):
    plot_gate = X.plot_gate_plus
    
plot.circuit_plot(XNOT(0), 1);

Rotation Gate

You can use the UGate class to create a custom quantum gate. As an example, we’ll create a rotation gate. The rotation of a qubit by an angle $\theta$ around the Y-axis of the Bloch sphere is represented by the following matrix:

\[R_Y\left(\theta\right) = I\cos\left(\frac{\theta}{2}\right) - iY\sin\left(\frac{\theta}{2}\right)\]

Here, $R_Y(\theta)$ represents the Y-axis rotation gate, $I$ is the identity matrix, and $Y$ is the Pauli $Y$ matrix, which is defined as:

\[Y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix}\]

To create a Y rotation gate from the UGate class, you need to instantiate the class with the following arguments:

The qubit the gate acts on
The matrix representation of the gate

Let us write a simple function to create a Y rotation gate.

from sympy.physics.quantum.gate import UGate
import sympy as sy

def R_Y(qubit, theta):
    Y_sy = sy.Matrix([[0, -sy.I], [sy.I, 0]])
    gate = UGate(qubit, sy.eye(2)*sy.cos(theta/2) - sy.I * Y_sy * sy.sin(theta/2))
    return gate

Let us now define an $R_Y\left(\frac{\pi}{4}\right)$ gate.

R_Y_piby4 = R_Y(0, sy.pi/4)

If you now try to plot the gate, this is what it looks like:

plot.circuit_plot(R_Y_piby4, 1);

As you can see this is labeled as a generic $U$ gate. To customize the label, you can modify the gate_name_latex attribute of the gate. Let’s update the function accordingly.

def R_Y(qubit, theta):
    Y_sy = sy.Matrix([[0, -sy.I], [sy.I, 0]])
    gate = UGate(qubit, sy.eye(2)*sy.cos(theta/2) - sy.I * Y_sy * sy.sin(theta/2))
    name = f'R_Y\\left({sy.latex(theta)}\\right)'
    gate.gate_name_latex = name
    gate.gate_name = name
    return gate

Now, if you create and plot the gate again, you’ll see it is correctly labeled as $R_Y\left(\frac{\pi}{4}\right)$.

R_Y_piby4 = R_Y(0, sy.pi/4)
plot.circuit_plot(R_Y_piby4, 1);

Toffoli Gate

We can also construct custom multi-qubit gates. As an example, let’s create a Toffoli gate which takes three inputs, two of which are controls whilst the other is the target. The Toffoli gate applies a NOT to target only when both controls are $\left\vert 1\right>$. It is effectively a controlled-CNOT gate.

The matrix representation of the Toffoli gate is:

\[\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}\]

To construct the Toffoli we can use the CGate class. The CGate class takes two arguments:

A list of control qubits
The gate to be applied to the target qubit

Since the Toffoli gate can be represented as controlled CNOT gate, we can create a Toffoli gate as follows:

from sympy.physics.quantum.gate import CGate
control_qubit1 = 2
control_qubit2 = 1
target_qubit = 0

toffoli_1 = CGate([control_qubit1], CNOT(control_qubit2, target_qubit))
plot.circuit_plot(toffoli_1, 3);

You can also regard the Toffoli gate as a doubly-controlled NOT gate where the first two qubits are controls and the third qubit is the target. In this case, we can create a Toffoli gate (using XNOT to ensure the NOT symbol is used in the plot) as follows:

toffoli_2 = CGate([control_qubit1, control_qubit2], XNOT(target_qubit))
plot.circuit_plot(toffoli_2, 3);

As you can see, both the representations are equivalent and lead to identical gate diagrams.

Creating Circuit Diagrams

A sequence of quantum gates gives rise to a quantum circuit. You can construct a quantum circuit in SymPy by multiplying gates together in the order in which you want them to be applied in the circuit

Single Qubit Circuit - Hadamard in terms of $X$ and rotation gates

A simple quantum circuit uses $X$ and rotation gates to create a $H$ gate. It is straightforward to show that

\[H = R_Y\left(-\frac{\pi}{4}\right)X R_Y\left(\frac{\pi}{4}\right)\]

Let’s create and plot the circuit in SymPy

circuit = R_Y(0, -sy.pi/4) * X(0) * R_Y(0, sy.pi/4)
plot.circuit_plot(circuit, 1);

Note that since the gates in the circuit are applied from left to right, the order of the gates in the circuit is the reverse of the order in which they are multiplied to form $H$. To confirm that the circuit is indeed equivalent to the $H$ gate, we can use the represent function from the represent module to get the matrix representation of the circuit.

from sympy.physics.quantum.represent import represent
represent(circuit, nqubits=1).simplify()

$\displaystyle \left[\begin{matrix}\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2}\\frac{\sqrt{2}}{2} & - \frac{\sqrt{2}}{2}\end{matrix}\right]$

which is indeed the matrix representation of the Hadamard gate.

Multi-Qubit Circuit - using Hadamard gates to swap CNOT control and target

Now we will implement a more complex circuit involving 2 qubits. The CNOT has a certain property that if you apply a Hadamard gate to both qubits at the input as well as the output, you get a CNOT gate with the control and target qubits swapped.

Let’s see how this comes about. First, a Hadamard gate applied in parallel to both qubits gives rise to the following matrix

\[H\otimes H = \frac{1}{2}\begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 \end{bmatrix}\]

The CNOT₁₂ gate has the following matrix

\[CNOT_{12} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}\]

Use these two matrices it is straightforward to show that

\[H\otimes H\cdot CNOT_{21}\cdot H\otimes H = CNOT_{12}\]

Let us plot the circuit

circuit = H(0) * H(1) * CNOT_21 * H(0) * H(1)
plot.circuit_plot(circuit, 2);

We can also get the matrix representation of the circuit to confirm that it is indeed a CNOT₁₂ gate.

matrix = represent(circuit, nqubits=2)
from IPython.display import Markdown
## Needed to make the matrix display correctly in the markdown document
Markdown(f'$${sy.latex(matrix)}$$') 

\[\left[\begin{matrix}1 & 0 & 0 & 0\\0 & 0 & 0 & 1\\0 & 0 & 1 & 0\\0 & 1 & 0 & 0\end{matrix}\right]\]

Conclusion

In this blog, we learned how to create quantum gate and quantum circuit diagrams using Python’s SymPy library. We started by creating single qubit gates like the X gate and Hadamard gate. We then moved on to creating multi-qubit gates like the CNOT gate and Toffoli gate. Finally, we learned how to create custom gates and quantum circuits.

This only scratches the surface of SymPy’s quantum computing capabilities and of quantum computing in general. If you want to learn more about quantum computing, I recommend checking out the MIT Open Learning courses on Quantum Information Science (courses 8.370 and 8.371). For more about SymPy’s quantum computing capabilities, check out the SymPy documentation.

A Minimalist ChatGPT for Jupyter Notebook or Command Line

2023-09-16T04:15:00+00:00

Generated using DALL-E-2 with the prompt "A painting of a chatbot under construction, made from building blocks"

Introduction
Chat Completions API Overview
Interacting with the API
Token Counting
Truncating Conversation History to Limit Tokens
- MinChatGPT
  - Implementing the conversation functionality
Demo
Command Line Interface
Limitations
Conclusion

Introduction

In this blog post, we will explore how to implement a minimalist ChatGPT-style app in a Jupyter Notebook or command line. The goal is to provide an understanding of the important concepts, components, and techniques required to create a chat app on top of a large language model (LLM), specifically OpenAI’s GPT. The resulting chat app can serve as a foundation for creating your own customised conversational AI applications.

The code in this blog post can be found in a notebook here. The script for the command line version can be found here.

Chat Completions API Overview

Let us begin with a quick overview of the Chat Completions endpoint of the OpenAI API, which enables you to interact with OpenAI’s large language models to generate text-based responses in a conversational manner. It’s designed for both single-turn tasks and multi-turn conversations.

Example API Call:

The provided code snippet demonstrates how to make an API call for chat completions. In this example, the chat model used is “gpt-3.5-turbo,” and a conversation is created with system, user, and assistant messages:

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who wrote 'A Tale of Two Cities'?"},
        {"role": "assistant", "content": "Charles Dickens wrote 'A Tale of Two Cities'."},
        {"role": "user", "content": "When was it first published?"}
    ]
)

Message Structure:

The main input for the API call is the messages parameter, which is an array of message objects.
Each message object has two properties: role (either “system,” “user,” or “assistant”) and content (the text of the message).
Conversations can be as short as one message or include many back-and-forth turns.

Typical Conversation Format:

A typical conversation format starts with a system message, followed by alternating user and assistant messages. The system message helps set the behavior of the assistant, but it’s optional. If omitted, the model’s behavior will be similar to a generic message like “You are a helpful assistant.”

Importance of Conversation History:

Including conversation history is crucial when user instructions refer to prior messages. The model has no memory of past requests, so all relevant information must be supplied within the conversation history. If a conversation exceeds the model’s token limit, it needs to be shortened.

Interacting with the API

Now let write a set of functions for communicating with OpenAI’s Chat Completions API. These functions will serve as the backbone of our minimalist chat app. Each function plays a specific role in managing the conversation, formatting messages, and handling responses.

import openai
import json
import os
import sys
# Uncomment and replace with your api key or api key path
# openai.api_key = YOUR_API_KEY
# openai.api_key_path = YOUR_API_KEY_PATH

def get_system_message(system=None):
    """
    Generate a system message for the conversation.

    Args:
        system (str, optional): The system message content. Defaults to None.

    Returns:
        dict: A message object with 'role' set to 'system' and 'content' containing the system message.
    """
    if system is None:
        system = "You are a helpful assistant."
    return {"role": "system", "content": system}

get_system_message is responsible for creating a system message. This message is optional but can be used to set the behavior of the assistant. If no system message is provided, it defaults to “You are a helpful assistant.” The function returns a message object with ‘role’ set to ‘system’ and ‘content’ containing the system message.

def get_response(msg, 
                 system_msg=None,
                 msgs=[], model='gpt-4', 
                 return_incomplete=False):
    """
    Get a response from the Chat Completions API.

    Args:
        msg (str): The user's message.
        system_msg (str, optional): The system message. Defaults to None.
        msgs (list, optional): Previous conversation messages. Defaults to an empty list.
        model (str, optional): The chat model to use. Defaults to 'gpt-4'.
        return_incomplete (bool, optional): Whether to return incomplete responses. Defaults to False.

    Returns:
        list or tuple: A list of response chunks if not returning incomplete, or a tuple containing the list of chunks and a completion status.
    """
    _stream_response = openai.ChatCompletion.create(
      model=model,
      messages=[
            system_msg if system_msg is not None else get_system_message(),
            *msgs,
            {"role": "user", "content": msg}
        ],
        stream=True 
    )
    _chunks = []
    
    if return_incomplete:
        complete = False
    
    try:
        for _chunk in _stream_response:
            _delta = _chunk['choices'][0]['delta']
            # Last will be empty
            if 'content' in _delta:
                sys.stdout.write(_delta['content'])

            _chunks.append(_chunk)
            
    # Raise KeyboardInterrupt if return_incomplete is False
    except KeyboardInterrupt:
        if not return_incomplete:
            raise
        
    return _chunks if not return_incomplete else (_chunks, complete)

get_response is the core function for obtaining a response from the Chat Completions API. It takes the user’s message, an optional system message, previous messages, the model to use, and a flag to indicate whether incomplete responses should be returned.
The function first creates an API call with the specified parameters, including the system message (or a default one) and the user’s message. It uses stream=True to stream the response chunks.
It then processes the response chunks, extracting the content of each chunk and storing it in _chunks.
If return_incomplete is set to True, the function returns a result even if the stream interrupted. In this case the function returns a tuple containing the list of chunks and a completion status. If return_incomplete is False, it only returns the result when the full stream is processed and returns only the list of chunks.

def stream2msg(stream):
    """
    Convert a stream of response chunks into a single message.

    Args:
        stream (list): A list of response chunks.

    Returns:
        str: A single message containing the concatenated content of the response chunks.
    """
    return "".join([i["choices"][0]["delta"].get("content", "") for i in stream])

stream2msg is a utility function that converts a stream of response chunks into a single message. It takes a list of response chunks as input and concatenates the content of each chunk to form a complete message.

def format_msgs(inp, ans):
    """
    Format user input and model's response into message objects.

    Args:
        inp (str): User input message.
        ans (str or list): Model's response message as a string or a list of response stream chunks

    Returns:
        list: A list containing user and assistant message objects.
    """
    msg_inp = {"role": "user", "content": inp}
    msg_ans = {"role": "assistant", "content": stream2msg(ans) if not isinstance(ans, str) else ans}
    return [msg_inp, msg_ans]

format_msgs takes the user’s input and the model’s response (which can be a message string or a list of response chunks) and creates a list containing message objects for both the user and the assistant, which can subsequently be used in the conversation history.

Token Counting

Generated using DALL-E-2 with the prompt "Two toucans counting tokens, digital art"

Before we delve into the implementation details, let us briefly discuss token counting. Tokens are chunks of text that language models use to process and generate responses. It’s crucial to keep track of token counts, as they impact the cost and feasibility of using the API. Token counting includes both input and output tokens. This means that not only the messages you send to the model but also the responses you receive contribute to the total token count.

The exact way tokens are counted can vary between different model versions. The function below for counting tokens is adapted from the OpenAI API documentation (date 05.09.2023). It was written for gpt-3.5-turbo-0613 and serves as a reference. The documentation adds this caveat.

The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate.

Depending on the model, the value returned by the function might not be exact but it will be a decent estimate that suffices for this simple example.

It’s also worth noting that each model has a maximum token limit. Exact details for each model are available in the Models section of the documentation. For example it is 8192 for gpt-4 and 4097 for gpt-3.5-turbo. In our example, we are using the model’s maximum token limit, but in practice, you may want to use a lower value to ensure that both input and output tokens are within the limit.

import tiktoken

def num_tokens_from_messages(messages, model="gpt-4"):
  """Returns the number of tokens used by a list of messages."""
  try:
      encoding = tiktoken.encoding_for_model(model)
  except KeyError:
      encoding = tiktoken.get_encoding("cl100k_base")
  num_tokens = 0
  for message in messages:
      num_tokens += 4  # every message follows {role/name}\n{content}\n
      for key, value in message.items():
          num_tokens += len(encoding.encode(value))
          if key == "name":  # if there's a name, the role is omitted
              num_tokens += -1  # role is always required and always 1 token
  num_tokens += 2  # every reply is primed with assistant
  return num_tokens

num_tokens_from_messages is a function that takes a list of messages as input and returns the estimated number of tokens used by those messages.
It uses the tiktoken library to calculate token counts. The function attempts to get the token encoding for the specified model. If it encounters a KeyError (indicating an unsupported model), it falls back to the “cl100k_base” encoding, which is a reasonable default.
The function initializes num_tokens to 0, which will be used to accumulate the token count.
For each message in the input list of messages:
- It adds 4 tokens to account for the message structure, including , role or name, content, and tags.
- It then iterates through the message items (e.g., role, content).
- For each item, it calculates the token count by encoding the value using the specified encoding and adding the length of the encoded value.
- If the item is the “name,” it subtracts 1 token because the role is always required and always counts as 1 token.
Finally, it adds 2 tokens to account for the message primed with assistant.

Truncating Conversation History to Limit Tokens

Generated using DALL-E-2 with the prompt "A lengthy historical manuscript and a pair of scissors"

In the context of managing conversations with language models, it’s crucial to ensure that the conversation history remains within the model’s token limit. To achieve this, we have a function called maybe_truncate_history which helps truncate the conversation history when it approaches or exceeds the maximum token limit.

Here’s an overview of this function and its purpose:

def maybe_truncate_history(msgs, max_tokens, model='gpt-4', includes_input=True):
    
    msgs_new = []
    
    if msgs[0]['role'] == 'system':
        msgs_new.append(msgs[0])
        start = 1
        msgs = msgs[1:]

    if includes_input:
        # At least the last message should be included if input
        msgs_new.append(msgs[-1])
        msgs = msgs[:-1]
    
    # First ensure that input (and maybe system) messages don't exceed token limit
    tkns = num_tokens_from_messages(msgs_new)
    
    if tkns > max_tokens:
        return False, tkns, []
    
    # Then retain latest messages that fit within token limit
    for msg in msgs[::-1]:
        msgs_tmp =  msgs_new[:1] + [msg] + msgs_new[1:]
        tkns = num_tokens_from_messages(msgs_tmp)
        if tkns <= max_tokens:
            msgs_new = msgs_tmp
        else:
            break
            
    return True, tkns, msgs_new

maybe_truncate_history is designed to manage the length of conversation history within the token limit of the model. It takes as input the current list of messages (msgs), the maximum token limit (max_tokens), the model name (defaults to gpt-4). There is also a flag indicating whether the input is present in the messages to ensure it is not dropped.
If the first message in the conversation history is a system message, it is added to msgs_new, and start is set to 1. This step is necessary because system messages should not be truncated.
If includes_input is True, the last message (usually the user’s input) is added to msgs_new, and it’s removed from the msgs list.
The function first checks if the token count of the messages in msgs_new exceeds the max_tokens. If it does, it returns False, the token count (tkns), and an empty list to indicate that the conversation history cannot be accommodated within the token limit.
Next, the function attempts to retain the latest messages that fit within the token limit. It iterates through the msgs list in reverse order, gradually adding messages to msgs_tmp. If the token count of msgs_tmp is within the max_tokens, it updates msgs_new with msgs_tmp. This ensures that the conversation history retains as much context as possible while staying within the token limit.
The function returns True to indicate that the conversation history has been successfully truncated to fit within the token limit. It also returns the updated token count (tkns) and the modified msgs_new.

This is a simple approach to manage token counts which drops entire messages to keep to the token limit but there are more sophisticated approaches that you could try such as summarising or filtering earlier parts of the conversation.

MinChatGPT

Generated using DALL-E-2 with the prompt "A partially constructed mini lego robot surrounded by lego bricks"

Finally we are in a position to implement our minimalist chat app. With the MinChatGPT class, we define the foundation of our minimalist chat app. First let us set up the class and add some helper functions. Subsequently we will implement the conversation functionality.

class MinChatGPT(object):
    """
    A simplified ChatGPT chatbot implementation. 
    
    Parameters:
        system: A system-related parameter(optional).
        model: The OpenAI model to use; restricted to 'gpt-3.5-turbo' and 'gpt-4'.
        log: Boolean that decides if logging is required or not.
        logfile: The location of the file where chat logs will be stored.
        errfile: The location of the file where error logs will be stored.
        include_incomplete: Boolean that decides if incomplete responses are to be included in the history or not.
        num_retries: The number of times to retry if there is a connection error
        mock: Boolean that decides if the system is in testing mode.
        debug: Boolean that decides if the system should go into debug mode.
        max_tokens: Maximum number of tokens the model can handle while generating responses.
    """
    def __init__(self,
            system=None, 
            model='gpt-4',
            log=True, 
            logfile='./chatgpt.log',
            errfile='./chatgpt.error', 
            include_incomplete=True, # whether to include incomplete responses in history 
            num_retries=3,
            mock=False,
            debug=False,
                 
            max_tokens=None):
        """
        Initializes a MinChatGPT instance with provided parameters.
        """
        # For simplicity restrict to these two
        assert model in ['gpt-3.5-turbo', 'gpt-4'] # Ensures the model parameter is valid
        
        # System & GPT Model related parameters
        self.system = system
        self.system_msg =  get_system_message(system) # Retrieve system message if available 
        self.model = model
        
        # logging related parameters
        self.log = log
        self.logfile = logfile
        self.errfile = errfile
        
        # Behavioural flags
        self.include_incomplete = include_incomplete
        self.num_retries = num_retries
        self.mock = mock
        self.debug = debug
        
        # History and error storage related parameters
        self.history = []
        self.history_size = []
        self.errors = []
        
        # Setting maximum tokens model can handle, defaults are provided for the two specified models
        self.max_tokens = {'gpt-4': 8192, 'gpt-3.5-turbo': 4097}[model] if max_tokens is None else max_tokens
        
    def _logerr(self):
        with open(self.errfile, 'w') as f:
            f.write('\n'.join(self.errors))
                
    def _logchat(self):
        with open(self.logfile, 'w') as f:
            json.dump(fp=f, obj={'history': self.history, 'history_size': self.history_size}, indent=4)
            
    def _chatgpt_response(self, msg="", newline=True):
        sys.stdout.write(f'\nMinChatGPT: {msg}' + ('\n' if newline else ''))

The __init__ method initializes our chatbot instance with parameters, such as the system message, the OpenAI model to be used, and several behavioural flags for logging, debugging, or testing (mock).
It also initializes history-related containers, such as self.history, self.history_size, and self.errors, for tracking the chat history and potential errors.
The max_tokens parameter sets the limit for tokens the model can handle, defaulting to the restrictions of the chosen model.
We have two logging functions, _logerr and _logchat, saving error logs and chat logs respectively to specified locations.
We also have a helper function _chatgpt_response for printing the bot’s response to the console.

However we have not yet implemented the main functionality of the chatbot, which is to manage the conversation. Let us go ahead and implement a chat method that enables the user to interact with the model.

Implementing the conversation functionality

Generated using DALL-E-2 with the prompt "A mini robot made of building blocks with a speech bubble from its mouth"

The chat method is the main entry point for initiating a conversation with the MinChatGPT chatbot. It manages user interaction, input processing, handling special cases like an ‘exit’ command or an empty message, generating responses, and logging information if desired.

Here is a detailed walkthrough of the chat method

def chat(self):
    """
    Initiates a chat session with the user. During the chat session, the chatbot will receive user message inputs, 
    process them and generate appropriate responses.
    
    The chat session will continue indefinitely until the user enters a termination command like "Bye", "Goodbye", "exit", 
    or "quit". The function also logs the chat session, and any errors that occur during the session.
    """
    
    # Maybe_exit flag for controlling the exit prompt
    maybe_exit = False

    # Welcome message for user
    print('Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.')
    

User Interaction

The core of the chat method is an infinite while loop that simulates a conversation. The user is requested for an input message, which is then handled in the loop. To allow the user to end the conversation at any point, the code checks for certain phrases such as “bye”, “goodbye”, “exit”, or “quit”.

    # Main chat loop
    while True:
        # Capture user input
        inp = input('\nYour message: ')
        

Handling Empty Messages

If the user input is an empty string, the method reminds the user to enter a message and goes back to the start of the loop to ask again.

        
        
        try:
            # Handling empty input from user
            if len(inp) == 0:
                print('Please enter a message')
                continue
                

Exiting

If the previous input appeared to have indicated an intention to exit (maybe_exit == True), the user is asked for confirmation. If the user gives an affirmative response, the bot replies with a goodbye and breaks the loop to end the conversation. If the user does not want to exit, the bot continues to chat.

            # Case insensitive user input
            stripped_lowered_inp = inp.strip().lower()

            # Handling user's confirmation on exit
            if maybe_exit:
                if  stripped_lowered_inp in ['y', 'yes', 'yes!', 'yes.']:
                    self._chatgpt_response('Goodbye!')
                    break
                else:
                    self._chatgpt_response("Ok. Let's keep chatting.")
                    maybe_exit = False
                    continue
                    

Intention to exit

This simple approach determines if the user input is matches any of the exit signals. If it does, the maybe_exit flag is assigned a value of True and in the next interaction the user is asked for confirmation. You could also try more sophisticated approaches that get the model to infer whether the user wishes to end the conversation.

            
            # Checking if user wants to exit
            if stripped_lowered_inp in [
                'exit', 'exit()', 'exit!', 'exit.',
                'quit', 'quit()', 'quit!', 'quit.',
                'bye', 'bye!', 'bye.',
                'goodbye', 'goodbye!', 'goodbye.'
            ]:
                maybe_exit = True
                self._chatgpt_response('Are you sure you want to quit? Enter Yes or No.')
                continue
                

Process User Inputs

The code next deals with non-empty, non-exit user inputs. It prepares the message history to be sent to the OpenAI model by appending the user’s new message. The history is then checked to ensure it doesn’t exceed the max token limit of the model. If the history is too long, we inform the user, don’t produce a response, and again loop to the start for a new input.

            # Preparing message history before calling the model
            msgs = [self.system_msg, *self.history, {'role': 'user', 'content': inp}]

            # Call to helper function to check if conversation history does not exceed max tokens
            valid, tkns, trimmed = maybe_truncate_history(msgs, max_tokens=self.max_tokens)

            [_, *msgs_to_send, _] = trimmed
            

Generate Response and Update History

If the length of the input is within limits, then the bot produces a response. If the system is in mock mode, it just returns a test message. Otherwise, an actual response is generated and delivered to the user. If there is a connection error in getting a response from the API, it retries for upto num_retries Incomplete messages are handled as per the include_incomplete flag which determines whether or not to add incomplete responses to the history. The code also saves the length of the history used for this response generation.

            # Handling valid and invalid token scenarios
            if valid:
                
                # Inform user if history was truncated
                if len(trimmed) < len(msgs):
                    print(f'\nDropping earliest {len(msgs) - len(trimmed)} messages from history to keep within token limits')
                
                num_api_calls = 0
                
                if self.mock:
                    # For testing response functionality
                    msg = 'Test message'
                    self._chatgpt_response(msg)
                else:    
                    # Generate response from model
                    self._chatgpt_response(newline=False)
                    while True:
                        try:
                            msg, complete = get_response(inp, system_msg=self.system_msg, msgs=msgs_to_send, return_incomplete=True) 
                            break
                        except ConnectionResetError:
                            if num_api_calls < self.num_retries:
                                num_api_calls += 1
                            else:
                                raise
                    
                    # Skip to next if incomplete messages not included in history
                    if not complete and not self.include_incomplete:
                        continue
            else:
                # If message exceeds token limit, ask user to reduce message length
                print(f'\nTotal number of {tkns} tokens exceeds max number of tokens allowed. Please try again after reducing message length.')
                continue

            # Keeping track of history size   
            self.history_size.append(len(msgs_to_send))
            

Logging and Debugging

Log details are printed if the system is in debug mode. Then, a new pair of messages is created from the user input and the generated response and added to the message history. If the log flag is set, the chat history is saved

            # Debug information provided for development and troubleshooting
            if self.debug:
                print(f"\n\nLast {self.history_size[-1]} message(s) used as history / Num tokens sent: {tkns} / Num retries: {num_api_calls + 1}")
                print("Messages sent:")
                print("="*100)
                for i in trimmed:
                    print(f'{i["role"]}: {i["content"]}')
                print("="*100)

            # Adding user and system response to chat history
            self.history.extend(format_msgs(inp, msg))

            # Saving chat history if logging is True  
            if self.log:
                self._logchat()
                

Handling Errors

Any exceptions that occur during the above process are caught, added to the bot’s error log, and displayed to the user, who is then invited to try again.

        except Exception as e:  # Exception handling for unexpected inputs or system errors
            self.errors.append(str(e))
            # Logging error details if logging is True
            if self.log:
                self._logerr()

            print(f'\nThere was the following error:\n\n{e}.\n\nPlease try again.')
            continue
            

Finally make this a method for the MinChatGPT class

MinChatGPT.chat = chat

Demo

Let us now take a look at a simple demo in debug mode to see what input is given to the model each time. We can also see how it behaves when given an empty input, how it handles exit signals and what happens when you interrupt it mid-message.

minchat = MinChatGPT(log=True, debug=True)

minchat.chat()

Welcome to MinChatGPT! Type "Bye", "Goodbye", "exit" or "quit", to quit.

Your message: 
Please enter a message

Your message: Bye

MinChatGPT: Are you sure you want to quit? Enter Yes or No.

Your message: No

MinChatGPT: Ok. Let's keep chatting.

Your message: What spices and herbs go well with chocolate? Answer as a comma separated list.

MinChatGPT: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.

Last 0 message(s) used as history / Num tokens sent: 34
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
====================================================================================================

Your message: Why does cinnamon go well?

MinChatGPT: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.

Last 2 message(s) used as history / Num tokens sent: 87
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
====================================================================================================

Your message: Can you give some examples of these dishes?

MinChatGPT: Certainly, here are some examples of chocolate dishes where cinnamon can shine:

1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.

2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.

3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.

4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors

Last 4 message(s) used as history / Num tokens sent: 169
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
====================================================================================================

Your message: Ok got the idea. 

MinChatGPT: Great! If you have any other questions or need further information, feel free to ask. Enjoy your culinary adventures with chocolate and cinnamon!

Last 6 message(s) used as history / Num tokens sent: 294
Messages sent:
====================================================================================================
system: You are a helpful assistant.
user: What spices and herbs go well with chocolate? Answer as a comma separated list.
assistant: Cinnamon, nutmeg, chili powder, cardamom, ginger, vanilla, peppermint, lavender, rosemary, star anise, sea salt, cloves, espresso powder.
user: Why does cinnamon go well?
assistant: Cinnamon adds a warmth and complexity to the flavor of chocolate, enhancing its richness and depth. The sweet-spicy character of cinnamon can complement both milk and dark chocolate, and it's often used in various chocolate dishes, such as hot cocoa, truffles, and cakes, to create a more intriguing taste profile.
user: Can you give some examples of these dishes?
assistant: Certainly, here are some examples of chocolate dishes where cinnamon can shine:

1. Cinnamon Hot Chocolate: This beverage combines the richness of chocolate with the warmth of cinnamon, creating a comforting drink.

2. Cinnamon Chocolate Truffles: These desserts blend the two flavors in a sweet, bite-size treat.

3. Mexican Mole Sauce: This traditional dish uses both chocolate and cinnamon (among other ingredients) to create a unique, rich sauce often served over meats.

4. Chocolate and Cinnamon Swirl Bread: A sweet bread where both flavors
user: Ok got the idea. 
====================================================================================================

Your message: Goodbye!

MinChatGPT: Are you sure you want to quit? Enter Yes or No.

Your message: Yes

MinChatGPT: Goodbye!

Command Line Interface

To run this as a command line application, copy all the code from this notebook into a python file called minchatgpt.py. Then add this code to the end of the file.

if __name__ == '__main__':
    import argparse
    import os
    
    # Get key from environment instead of assigning
    openai.api_key = os.environ.get("API_KEY")
    # alternatively
    # openai.api_key_path = os.environ.get("API_KEY_PATH")

    # Define a function to parse boolean arguments
    def bool_arg(s):
        if s.lower() in ['true', 't', 'yes', 'y', '1']:
            return True
        elif s.lower() in ['false', 'f', 'no', 'n', '0']:
            return False
        else:
            raise ValueError('Boolean value expected.')

    parser = argparse.ArgumentParser(
        description='MinChatGPT: A minimalist chat app based on OpenAI\'s GPT model')
    parser.add_argument(
        '--debug', help='Run in debug mode', type=bool_arg, default=False)
    parser.add_argument(
        '--mock', help='Run in mock mode', type=bool_arg, default=False)
    parser.add_argument(
        '--log', help='Log chat history', type=bool_arg, default=True)
    parser.add_argument(
        '--logfile', type=str, default='./chatgpt.log', help='Location of chat history log file')
    parser.add_argument(
        '--errfile', type=str, default='./chatgpt.error', help='Location of error log file')
    parser.add_argument(
        '--model', type=str, default='gpt-4', help='OpenAI model to use')
    parser.add_argument(
        '--include_incomplete', type=bool_arg, default=True,
        help='Include incomplete responses in history') 
    parser.add_argument(
        '--num_retries', type=int, default=3, 
        help='Number of times to retry if there is a connection error')
    parser.add_argument(
        '--max_tokens', type=int, default=None, 
        help='Maximum number of tokens the model can handle while generating responses')

    args = parser.parse_args()
    kwargs = vars(args)

    minchat = MinChatGPT(**kwargs)
    minchat.chat()

To run the application, assign your API key to the API_KEY environment variable (or to the API_KEY_PATH environment variable). Then run python minchatgpt.py with arguments as required. For example, to run in debug mode with logging enabled, you can use the following command:

export API_KEY=YOUR_API_KEY; python minchatgpt.py --debug True --log True

Limitations

The goal of MinChatGPT is to demonstrate how a chat app can be implemented on top of a conversational language model. Whilst it serves as a useful starting point for engaging with LLMs, it has several limitations at this stage, including:

Lack of Input Moderation: MinChatGPT doesn’t filter or restrict the type of content that users can input. This can potentially lead to inappropriate or offensive messages that might violate the API’s rules.
Inability to Resume Chats or Start New Ones: The app does not provide features for resuming previous conversations from saved history or starting entirely new chats. Users are limited to a single, continuous conversation session. However it would be fairly straightforward to incorporate these features.
Limited Testing of Chat Logic: MinChatGPT’s chat logic has not been comprehensively tested with a wide range of input combinations. As a result, there may be scenarios where the chat logic behaves unexpectedly, encounters errors or does not properly handle errors.

Conclusion

In this blog post, we explored the building blocks for creating a minimalist chat-style application based on OpenAI’s GPT model within a Jupyter notebook (or command line). We discussed API interaction, token counting, conversation history truncation, and building a chat interface. You can use MinChatGPT as a starting point for building more complex and sophisticated applications. You can also modify it to make it compatible with other LLMs. I encourage you to experiment by adding features, making it more robust, extending its capabilities and adapting it to suit your requirements.

How I ran the Hugging Face Diffusers Textual Inversion example on an AWS EC2 Instance

2023-09-16T04:15:00+00:00

Introduction
Setup instance
Install Python
Install libraries to run the example
Setup for training
Run the example
Using tmux to run the example in the background
Inference
Next steps

Introduction

In this blog I outline the steps I took in setting up an AWS EC2 instance to run the Hugging Face Diffusers Textual Inversion tutorial. Note that this is not a tutorial about Textual Inversion or Diffusion models. Instead it extends the official tutorial by providing the steps I took to setup an EC2 instance, install the code and run the example.

From the Hugging Face tutorial

Textual Inversion is a training technique for personalizing image generation models with just a few example images of what you want it to learn. This technique works by learning and updating the text embeddings (the new embeddings are tied to a special word you must use in the prompt) to match the example images you provide.

The repo has a link to run on Colab but I wanted the convenience of running it on an EC2 instance. The process I follows is admittedly hacky and by no means the best or most efficient way to do it but it was quick and it worked. The idea behind this blogpost is that if you can get it running without too much frustration then you will be motivated to take your learning further and explore the topic in more depth and do things in a more robust way.

The code in this blog is for PyTorch version of the example but there is also a Jax version for which I refer you to the tutorial. You could also probably use a similar process to run any other examples in Diffusers

I think I have given all the commands that I ran but I may have missed some so let me know if something does not work.

Setup instance

These are the key details of my AWS EC2 instance (values in square brackets are options to choose or input)

Instance type [g4dn.xlarge]

1 x [140] GiB [gp2] Root volume (Not encrypted)
1 x [100] GiB [gp2] EBS volume (Not encrypted)

I used the Deep Learning AMI (Ubuntu 18.04) Version 64.2 (ami-04d05f63d9566224b).

I also used an existing security group that I had set up previously with an inbound rule with Type All traffic.

I added the following to my ~/.ssh/config file on my local machine

Host fusion
   AddKeysToAgent yes
   HostName 
   IdentityFile 
   User ubuntu
   LocalForward localhost:8892 localhost:8892
   LocalForward localhost:6014 localhost:6014

I added port forwarding for jupyter (8892) and tensorboard (6014) so that I could access them from my local machine. Then I could run the following to connect to the instance

ssh fusion

Install Python

The instructions here state that Diffusers has been tested using Python 3.8+.

The instance had Python 3.6 and the installation of the libraries failed so here is what I did to install Python 3.8

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.8

While doing this, I encountered the following error

E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

I attempted to rectify this by running the following commands. I can’t recall where I found this solution but some similar solutions are suggested here and here.

sudo rm /var/lib/dpkg/lock-frontend
sudo dpkg --configure -a

The issue did not resolve and I had to reboot the instance and then run the commands again and then the installation worked.

To install pip I ran the following commands

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.8 get-pip.py

To use create virtual environments with venv I needed to install python3.8-venv

sudo apt install python3.8-venv

Finally I set up a virtual environment in my home directory called fusion with the following command

python3.8 -m venv fusion

Install libraries to run the example

Now I was in a position to following the installation instructions in the tutorial.

First I activated the virtual environment

source ~/fusion/bin/activate

Unless otherwise specified, all the python commands in this article are intended to be run from within the virtual environment.

Then I installed the libraries

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

I got the following error

FileNotFoundError: [errno 2] no such file or directory: '/tmp/pip-build-ioswilqu/safetensors/setup.py'

According to this StackOverflow post the solution is to run the following command

pip3 install --upgrade pip

Then re-run the installation command to complete the installation.

The next step is to install the dependencies for the example after navigating to the examples/textual_inversion.

cd examples/textual_inversion
pip install -r requirements.txt

Setup for training

Accelerate is a library from Hugging Face that helps train on multiple GPUs/TPUs or with mixed-precision. It automatically configures the training setup based on your hardware and environment. You can initialise it with a custom configuration, or use the default configuration.

For the custom configuration, the command is

accelerate config

I used the default configuration

accelerate config default

After you run the command it will tell you where the configuration file is located. For me it was /home/ubuntu/.cache/huggingface/accelerate/default_config.yaml. Ensure that use_cpu is set to false to enable GPU training. Here is what my configuration file looked like

{
  "compute_environment": "LOCAL_MACHINE",
  "debug": false,
  "distributed_type": "NO",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 1,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

Then I created a new file in examples/textual_inversion called run.py and added the following code from the tutorial to download the mini dataset for the example

from huggingface_hub import snapshot_download

local_dir = "./cat"
snapshot_download(
    "diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes"
)

Now call the run.py file to download the dataset

python3 run.py

The dataset is very small, containing only six images of cat toys, as shown below

The mini dataset of cat toy images used in the example

Run the example

Next I created another file run.sh in examples/textual_inversion and added the following code from the tutorial making only the change of leaving out the --push_to_hub flag as I did not want to push the model to the Hugging Face Hub

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="./cat"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
  --placeholder_token="" \
  --initializer_token="toy" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 \
  --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="textual_inversion_cat"

Note that in textual_inversion.py the special word used for textual inversion is input via flag placeholder_token which in the example is, unsurprisingly, .

Then I tried to run the example

sh run.sh

but I kept getting this warning

UserWarning: CUDA initialization: The NVIDIA driver on your system is too old

The rest of the warning also suggested downloading a newer driver from here. I went to the page on my local machine and selected the following options

Product Type: Data Center / Tesla
Product Series: T-Series
Product: Tesla T4
Operating System: Linux 64-bit
CUDA Toolkit: 12.2
Language: English (US)

I clicked “Download” in the next page

and then right-clicked on the “Agree & Download” button on the subsequent page and copied the link address

Back in the instance I downloaded the driver using the link

wget https://us.download.nvidia.com/tesla/535.129.03/NVIDIA-Linux-x86_64-535.129.03.run

Then I installed the driver

sudo sh NVIDIA-Linux-x86_64-535.129.03.run

and went through the installation process accepting the default options.

Following the installation, the warning message disappeared and I was able to run the example. The example took about a couple of hours to run I think although I left it running and did not monitor the time exactly. It saves tensorboard logs which can be viewed as follows

tensorboard --logdir textual_inversion_cat --port

You can now view this on the browswer on whichever port you have forwarded to the instance. For me it was localhost:6014.

Note that I needed to install six to get this to work

pip install six

Note that checkpoints and weights are saved in the textual_inversion_cat directory in the examples/textual_inversion folder. If you want to redo the training for any reason, delete this directory.

Using tmux to run the example in the background

So that I could disconnect from the instance and the process would continue running, I ran it on tmux. To do so run

tmux new -s

Then run the example, making sure the virtual environment is activated and you are in the examples/textual_inversion directory. To leave the tmux session press Ctrl+b and then d. To reattach to the session run. To reattach to the session run

tmux attach -t

To delete the session either run exit within the session or from outside the session run

tmux kill-session -t

Inference

The tutorial provides an inference script which I have slightly modified to run in a Jupyter notebook.

First I needed to install jupyterlab and to register the virtual environment with jupyter

pip install jupyterlab
python -m ipykernel install --user --name=fusion

Then I created a notebook in inference.ipynb in examples/textual_inversion and ran a cell with this code to setup the model

from diffusers import StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_textual_inversion("sd-concepts-library/cat-toy")

Now you can generate an image, noting that the placeholder_token must be present in the prompt

image = pipeline("A  train", num_inference_steps=50).images[0]

Since image is a PIL image, you can view in the notebook by simply running

image

Example generated image for prompt "A train"

Next steps

Here are some ideas for what to do next

Learn more about Diffusion models and Textual Inversion
Use different prompts to generate more images
Train the model with different settings
Customise the example to train on your own data
Run Jax version of the example
Try out other examples in Diffusers library

Policy Iteration for finite MDPs

2023-09-01T00:06:24+00:00

Introduction
Problem Statement
Policy Iteration
- Preliminaries
- Implementing Policy Iteration
Visualising the results

Introduction

In this blogpost we will implement Example 4.2: Jack’s Car Rental from Chapter 4 Reinforcement Learning (Sutton and Barto aka the RL book). This is an example of a problem involving a finite Markov Decision Process for which policy iteration is used to find an optimal policy.

I strongly suggest that you study Chapter 4 of the book and that you have a go at implementing the example yourself using this blogpost as a reference in case you get stuck.

Problem Statement

Here is a slightly modified version of the description of the problem given in Chapter 4.3 Policy Iteration of the RL book.

Example 4.2: Jack’s Car Rental

Jack manages two locations for a nationwide car rental company. Each day, some number of customers arrive at each location to rent cars. If Jack has a car available, he rents it out and is credited \$10 by the national company. If he is out of cars at that location, then the business is lost. Cars become available for renting the day after they are returned. To help ensure that cars are available where they are needed, Jack can move them between the two locations overnight, at a cost of \$2 per car moved. We assume that the number of cars requested and returned at each location are Poisson random variables, meaning that the probability that the number is $n$ is $\frac{\lambda^n}{n!}e^{\lambda}$, where $\lambda$ is the expected number. Suppose $\lambda$ is 3 and 4 for rental requests at the first and second locations and 3 and 2 for returns. To simplify the problem slightly, we assume that there can be no more than 20 cars at each location (any additional cars are returned to the nationwide company, and thus disappear from the problem) and a maximum of five cars can be moved from one location to the other in one night. We take the discount rate to be \gamma = 0.9 and formulate this as a continuing finite MDP, where the time steps are days, the state is the number of cars at each location at the end of the day, and the actions are the net numbers of cars moved between the two locations overnight.

MDP

A Markov Decision Process (MDP) is a mathematical framework used in reinforcement learning to describe an environment. It provides a formalism to make sequential decisions under uncertainty, with an assumption that the future depends only on the current state and not on the past states. An MDP is described by a tuple (S, A, P, R), where S represents states, A represents actions, P is the state transition probability, and R is the reward function.

Using the MDP model, we can characterize Jack’s Car Rental problem as follows. The state is represented by the number of cars at each location at the end of the day, the actions correspond to the number of cars moved from location 1 to location 2, where a negative number means the cars were moved from location 2 to location 1 instead. The state transition probability depends on the number of cars rented and returned, which follow Poisson distributions. The reward function is defined by the profit made from renting cars and the cost of moving cars between locations.

A finite MDP is one in which the sets of states, actions, and rewards (S, A, and R) all have a finite number of elements. Since the states, rewards and actions are integer valued and take on a finite number of values e.g. the action is an integer between -5 and +5 , this is a finite MDP.

Let us now define the problem more formally.

Initial state

Start with $s = (s_1, s_2)$ cars
$0 \leq s_1, s_2 \leq 20$

Action to move cars

Move $-5 \leq n_a \leq 5$ cars via action $a$ from 1 to 2
This yields an intermediate state $x = (x_1, x_2) = (s_1 - n_a, s_2 + n_a)$

Rentals

Let $(z_1, z_2)$ be the number of cars requested, where $z_i \sim \text{Poission}(\lambda_{r,i})$
You can rent upto $(x_1, x_2)$ cars per location so the probability that $(n_{r,1}, n_{r,2})$ cars are requested is as follows, where $F$ and $f$ are the cdf and pdf of the $\text{Poission}(\lambda_{r,i})$

\[P(n_{r,i} = n \vert x_i) = \begin{cases} f(n;\lambda_{r,i}) & \text{if $n < x_i$} \\ 1 - F(x_i - 1;\lambda_{r,i}) & \text{if $n = x_i$} \end{cases}\]

This comes about because all the values of $z_i >= x_i$ lead to $x_i$ rentals so the probability of $x_i$ rentals is the probability that there were $x_i$ or more requests.

Returns

We assume that the number of returned cars are added to the pool at the end of the day since they are available for renting only the day after
At this point $(x_1 - n_{r, 1}, x_2 - n_{r, 2})$ cars remain
Let $(a_1, a_2)$ be the number of cars returned, where $a_i \sim \text{Poission}(\lambda_{b,i})$
Upto to 20 cars in total can be kept per location.
We can similarly define the probability that $n_{b, i}$ cars are added

\[P(n_{b,i} = n \vert n_{r,i}, x_i) = \begin{cases} f(n;\lambda_{b,i}) & \text{if $n < 20 - (x_i - n_{r, i})$} \\ 1 - F(20 - (x_i - n_{r, i}) - 1;\lambda_{b,i}) & \text{otherwise} \end{cases}\]

Reward

It costs $ $2 $ to move a car between locations in either direction
The money received for the rentals is $ $10 $ per car
Thus $r = 10(n_{r, 1} + n_{r, 2}) + 2\lvert n_a \rvert$

The four argument function $p(s’, r \vert s, a)$

The probability of ending up in state $s’$ and receiving reward $r$ given that we started in state $s$ and took action $a$, $p(s’, r \vert s, a)$, is often described in the RL book as the four argument function. It is used in the policy evaluation step of policy iteration to calculate the value of a state under a given policy.

Let us now define $p(s’, r \vert s, a)$ for this problem

Notice that given $s, a$, the number of cars moved $n_a$ and the resulting intermediate state $x$ are both fully determined so they are the same for all $(s’, r)$ that can arise from a given $(s, a)$.
Based on that we can define $p(s’, r \vert s, a)$ as follows
- $s’ = (x_1 - n_{r, 1} + n_{b, 1}, x_2 - n_{r, 2} + n_{b, 2})$
- $r = 10(n_{r, 1} + n_{r, 2}) - 2\lvert n_a \rvert$
- $\mathcal{N} = {(n_{r, 1}, n_{r, 2}, n_{b, 1}, n_{b, 2}): 10(n_{r, 1} + n_{r, 2}) - 2\lvert n_a \rvert = r, (x_1 - n_{r, 1} + n_{b, 1}, x_2 - n_{r, 2}, n_{b, 2}) = s’}$
- Finally we have
\[\begin{align} p(s', r \vert s, a) &= p(s', r + 2\vert n_a \vert \vert s, a) \quad\text{the probability is the same since $\vert n_a \vert$ is a function of $a$} \\&= p(s', r + 2\vert n_a \vert \vert x)p(x \vert s, a) \\&= p(s', r + 2\vert n_a \vert \vert x) \quad \text{since $x$ is a function of $s, a$ which means $p(x \vert s, a) =1$ } \\&= \sum_{(n_{r, 1}, n_{r, 2}, n_{b, 1}, n_{b, 2}) \in \mathcal{N}} P(n_{b,1}\vert n_{r,1}, x_1) P(n_{b,2}\vert n_{r,2}, x_2) P(n_{r,1}\vert x_1) P(n_{r,2}\vert x_2) \end{align}\]
Since each combination $(n_{r, 1}, n_{r, 2}, n_{b, 1}, n_{b, 2})$ is independent we can sum over their probabilities.
In practice we will go through all the valid values of $n_{r,i} \in [0, x_i]$ and $n_{b, i} \in [0, 20-(x_i - n_{r,i})]$ and accumulate the probabilities for the $(s’, r)$ pairs to which they give rise, remembering to use the $1 - F$ when any of these numbers attains its max value.

Implementation of the four argument function

Let us now implement the four argument function $p(s’, r \vert s, a)$ for this problem. We will implement it in two ways. First we will implement it in a way that is easy to understand and then we will implement it in a way that is faster to run.

To start with let us import the necessary libraries and define some helper functions.

import numpy as np
import random
import pandas as pd

from scipy.stats import poisson
import itertools
import matplotlib.pyplot as plt

def get_allowed_acts(state):
    # Returns the allowed actions for a given state
    n1, n2 = state
    acts = [0] # can always move no cars
    # We can move upto 5 cars from 1 to 2 
    # but no more than 20-n2
    # so that the total at 2 does not exceed 20
    for i in range(1, min(21-n2, n1 + 1, 6)):
        acts.append(i)
        
    # The actions are defined as the number of cars moved from 1 to 2
    # so if cars are moved in the opposite direction the action is negative
    for i in range(1, min(21-n1, n2 + 1, 6)):
        acts.append(-i)
        
    return acts

def get_state_after_act(state, act):
    # Returns the intermediate state after action
    # i.e. the number of cars at each location 
    # after cars are moved between locations
    return (min(state[0] - act, 20), min(state[1] + act, 20))

An iterative implementation

To start with let us implement the four argument function in a way that is easy to follow. We will use this to clarify our understanding of the problem and to verify the more efficient vectorised implementation that we will implement later.

def get_four_arg_iter(state, act):
    

First steps

Find the intermediate state after action and set up the parameters and function for the Poisson distributions.

    
    num_after_move = get_state_after_act(state, act)
    
    lam_b1, lam_b2 = 3, 2
    lam_r1, lam_r2 = 3, 4
    
    x1, x2 = num_after_move
    
    def prob_fn(x, cond, lam):
        # If condition is true use P(X=x), if false use 1-P(X<=x-1) = P(X >= x)
        return poisson.pmf(x, mu=lam) if cond else 1 - poisson.cdf(x-1, mu=lam)

Probabilities for numbers of cars rented and added back

For each location we go through all the valid values of $n_r$ and $n_b$, calculate the next state, rental credit and the probability of $(s’, r)$ pair to which these values give rise.

    
    location_dicts = [dict(), dict()]
    
    for idx, (xi, lam_ri, lam_bi) in enumerate(zip((x1, x2), (lam_r1, lam_r2), (lam_b1, lam_b2))):
        for nr_i in range(0, xi+1):
            p_nri = prob_fn(nr_i, nr_i < xi, lam_ri)
            n_space_i = 20 - (xi - nr_i)
            for nb_i in range(0, n_space_i + 1):
                p_nbi = prob_fn(nb_i, nb_i < n_space_i, lam_bi)
                s_next_i = xi - nr_i + nb_i
                location_dicts[idx][(nr_i, nb_i)] =  (s_next_i, 10 * nr_i, p_nri * p_nbi)
                
    

Four argument function

Next we combine the states from the two locations, calculate the total reward and $p(s’, r|s, a, n_{b,1}, n_{b,2}, n_{r,1}, n_{r,2})$ by multiplying the probabilities from each location.

We then accumulate the probabilities for the $(s’, r)$ pairs given each combination of $n_{b,1}, n_{b,2}, n_{r,1}, n_{r,2}$ to arrive at the values of $p(s’, r|s, a)$

    
    psrsa = dict()
    
    move_cost = 2 * act
    
    for (nb1, nr1), (s_next1, r1, prob1) in location_dicts[0].items():
        for (nb2, nr2), (s_next2, r2, prob2) in location_dicts[1].items():
            s_next = (s_next1, s_next2)
            r = -move_cost + r1 + r2  
            
            key = (s_next, r)
            prob = prob1 * prob2
            
            if key not in psrsa:
                psrsa[key] = prob
            else:
                psrsa[key] += prob 
        
    return psrsa

Whilst straightforward to understand, this implementation is quite slow as we are looping through all the valid values of $n_{r,i}$ and $n_{b,i}$ for each location.

s_init = (12, 10)
n_move = 4

%timeit get_four_arg_iter(s_init,  n_move)

56.6 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

An vectorised implementation

During the policy iteration algorithm the function will be called multiple times so ideally we need a faster running time which we can achieve by vectorising the implementation.

The function will use numpy to calculate the probabilities for all pairs of requests and returns at a given location in parallel. It will then use pandas to group by the $(s’,r)$ pairs to which each $(n_{r,1}, n_{r,2}, n_{b,1}, n_{b,2})$ combination gives rise, in an efficient manner.

Let use first define a helper function that creates a unique index for each state which makes it easy to iterate though the states whilst running the algorithm.

def get_idx(s1, s2):
    # Flatten the 2d state space into a 1d space
    return s1 * 21 + s2

Now we can implement the four argument function in a vectorised manner.

def get_four_arg_vect(state, act):
    

First steps

Find the intermediate state after action and set up the parameters and function for the Poisson distributions.

    
    num_after_move = get_state_after_act(state, act)
    
    lam_b1, lam_b2 = 3, 2
    lam_r1, lam_r2 = 3, 4
    
    x1, x2 = num_after_move
    
    
    def prob_fn(x, cond, lam):
        # If condition is true use P(X=x), if false use 1-P(X<=x-1) = P(X >= x)
        return np.where(cond, 
                        poisson.pmf(x, mu=lam), 
                        1 - poisson.cdf(x-1, mu=lam))
    

Probabilities and next states for each location

This is similar to the iterative implementation above but we calculate the probabilities and next states each combination of nr and nb for a given location at the same time rather than looping through them.

One subtlety is that the maximum value of $n_{b,i}$ is $20 - (x_i - n_{r,i})$ but a simple vectorised approach not involving structures like ragged arrays requires all rows of the array to have the same number of columns. To handle this we masking to filter out the invalid combinations.

    
    location_arrs = []
    
    for xi, lam_ri, lam_bi in zip((x1, x2), (lam_r1, lam_r2), (lam_b1, lam_b2)):
       
        # Define nr, calculate probability of nr given xi and calculate the number of spaces for nb
        # Shape: [xi + 1]
        nr_i = np.arange(0, xi+1)
        p_nri = prob_fn(nr_i, nr_i < xi, lam_ri)
        n_space_i = 20 - (xi - nr_i)
        
        # All the possible values of nb that can arise from a given xi
        # Not all lead to valid combinations of (nr, nb) so we will mask them out later
        # Note that np.max(n_space_i) = 20 - xi
        # Shape: [20 - xi + 1]
        nb_i = np.arange(np.max(n_space_i) + 1)
        
        # Note that the condition is nb_i < n_space_i[:, None] 
        # which has shape [xi + 1, 20 - xi + 1].
        # This ensures we calculate probabilities with a different upper limit for each row.
        p_nbi = prob_fn(nb_i, np.less(nb_i, n_space_i[:, None]), lam_bi)

        # Mask to exclude invalid combinations of (nr, nb)
        # which occur when nb exceeds the number of spaces available
        # Shape: [xi + 1, 20 - xi + 1]
        mask_i = np.less_equal(nb_i, n_space_i[:, None])
        
        # Select the valid pairs
        nr_i, nb_i = np.where(mask_i)
        
        # Find value of next state and probability of next state
        s_next_i = xi - nr_i + nb_i
        prob_nbi_nri = (p_nbi * p_nri[:, None])[mask_i]
        
        location_arrs.append((s_next_i, nr_i, prob_nbi_nri))
    

Combine the states

At this point we have all the valid combinations of $(n_{r,1}, n_{b,1})$ and $(n_{r,2}, n_{b,2})$ We now combine them to get all the valid combinations of $(n_{r,1}, n_{b,1}, n_{r,2}, n_{b,2})$ and the states, rewards and probabilities that arise from them.

    
    (s_next1, s_next2), (n_rent1, n_rent2), (prob1, prob2) = [
        map(np.ravel, np.meshgrid(arr1, arr2))
        for (arr1, arr2) in zip(*location_arrs)
    ]
    
    n_rent = n_rent1 + n_rent2
    prob = prob1 * prob2
    

Final Dataframe

We store the $(s’, r_c)$ rather than $(s’, r)$ pairs in the dataframe but we can easily convert the former to the latter by subtracting the cost of moving the cars. We saw that $p(s’, r \vert s, a)$ depends only on the intermediate state $x = (x_1, x_2)$ so storing the values in this way let us reuse them for all the $(s, a)$ pairs that lead to given $x$. Since the cost 2*abs(n_a) is constant for all the (s’, r) given (s, a) the grouping in groupby will be independent of this constant offset. Note also the probabilities in the prob column are correct despite this offset because as noted earlier $p(s’, r \vert s, a) = p(s’, r + 2\vert n_a \vert \vert s, a)$

    
    df = pd.DataFrame(
        {'s1': s_next1, 
         's2': s_next2, 
         'nr': n_rent,
         'prob': prob 
        }
    )
    
    df = df.groupby(['s1', 's2', 'nr'], as_index=False).prob.sum()
    
    df['r'] = df['nr'] * 10
    
    # Add flat index to help iterate through states
    df['idx'] = get_idx(df['s1'],  df['s2'])
    
    
    return df

Let us verify this method matches for the same values of s_init and n_move

pdict_iter = get_four_arg_iter(s_init,  n_move)
z = get_four_arg_vect(s_init,  n_move)
pdict_vect = z.assign(rr = -2*abs(n_move) + z['r']).set_index(['s1','s2', 'rr']).to_dict()['prob']
pdict_vect = {((_s1, _s2), _r): p for (_s1, _s2, _r), p in pdict_vect.items()}

assert set(pdict_iter) == set(pdict_vect)

for i in pdict_iter:
    assert np.isclose(pdict_iter[i], pdict_vect[i])
    
del z, pdict_vect, pdict_iter

We can also see that the function runs considerably faster than the iterative implementation.

%timeit get_four_arg_vect(s_init,  n_move)

9.01 ms ± 365 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Policy Iteration

To find an optimal policy we start with some arbitrary policy and repeat the following until convergence

Find the value of each state whilst taking actions according to the fixed policy - Policy Evaluation
Choose the policy for each state that yields the best value - Policy Improvement

Because the Jack’s car rental MDP is finite, it has only a finite number of policies which means that policy iteration must converge to an optimal policy and the optimal value function in a finite number of iterations.

Preliminaries

To get started let us define the state space and some other values that we will need later. Since the state space is finite it can be stored in a finite array. We will use a 1d array to store the states and a dictionary to map the index of each state to the state itself. We will also define some parameters including the discount factor $\gamma$ and the threshold $\theta$ for the policy evaluation step.

state_tuples = list(itertools.product(range(21), range(21)))
state_idx = [get_idx(*s) for s in state_tuples]
states = dict(zip(state_idx, state_tuples))
gamma = 0.9
theta = 1e-6

Next we will write a helper function that finds

\[\sum_{s',r} p(s',r \vert s, a)\left[r + \gamma V(s')\right]\]

The function assumes the existence of a cache where $p(s’,r \vert s, a)$ dataframes are stored keyed by the intermediate states $x$ following the action. As noted earlier we store the $(s’, r + 2\vert n_a \vert)$ rather than the $(s’, r)$ pairs in the dataframe in order to be able to use the dataframe for different $(s, a)$ pairs that give rise to the same intermediate state $x$.

For instance all the following pairs of $(s, a)$ give rise to the same $x = (19, 17)$

\[((16, 20), -3) \longrightarrow (19, 17) \\ ((17, 19), -2) \longrightarrow (19, 17) \\ ((18, 18), -1) \longrightarrow (19, 17) \\ ((19, 17), 0) \longrightarrow (19, 17) \\ ((20, 16), 1) \longrightarrow (19, 17)\]

Using a cache can help avoid unnecessary calculations and speed up the running time of the algorithm.

def get_value(state, action, values, cache, gamma=0.9):
    
    # 
    start = get_state_after_act(state, action)
    if start not in cache:
        cache[start] = get_four_arg_vect(state, action)
    psrsa_df = cache[start]
    
    # Select values of next states 
    V_s_next = values[psrsa_df['idx'].values].values
    
    # Find final reward by subtracting move cost 
    rental_credit = psrsa_df['r'].values
    move_cost = abs(action) * 2
    r = rental_credit - move_cost
    
    # Calculate value
    psrsa = psrsa_df['prob'].values
    val = ((gamma * V_s_next + r) * psrsa)
    return val.sum()

Implementing Policy Iteration

Now will implement and run policy iteration. The algorithm statement given below is from Policy Iteration (using iterative policy evaluation) for estimating $\pi \approx \pi_*$ found in Chapter 4.3 Policy Iteration of the RL book.

1. Initialization

$V(s) \in \mathcal{R}$ and $\pi(s) \in A(s)$ arbitrarily for all $s \in \mathcal{S}$

# Also initialise a cache and a `history` array to store the intermediate results for plotting 
cache = dict()
V = pd.Series(index=state_idx, data=np.zeros(len(states)))
pi = pd.Series(index=state_idx, data=np.zeros(len(states)))
delta_vals = {}
history = []
iters = 0

while True:
    

2. Policy Evaluation

\[\begin{aligned} &\text{Loop:} \\ &\quad \quad \Delta \leftarrow 0 \\ &\quad \quad \text{Loop for each $s \in \mathcal{S}$:} \\ &\quad \quad \quad \quad v \leftarrow V(s) \\ &\quad \quad \quad \quad V(s) \leftarrow \sum_{s', r} p\left(s',r\vert s, \pi(s)\right)\left[\gamma V(s) + r \right] \\ &\quad \quad \Delta \leftarrow \max\left(\Delta, \left\lvert v - V(s)\right\rvert\right) \\ &\text{until $\Delta < \theta$ (a small positive number determining the accuracy of estimation)} \end{aligned}\]

    
    print('Starting Policy Evaluation')
    iters += 1
    policy_eval_iters = 0
    delta_vals[iters] = []
    while True:
        delta = 0

        for idx in (state_idx):
            v = V[idx]
            V[idx] = get_value(states[idx], pi[idx], V, cache, gamma)
            delta = max(delta, abs(v - V[idx]))
            
        
        
        policy_eval_iters += 1
        sys.stdout.write('\rIteration {}, Policy Eval Iteration {}, Δ = {:.5e}'.format(iters, str(policy_eval_iters).rjust(2, ' '), delta));
        
        delta_vals[iters].append(delta)
        if delta < theta:
            
            break
            
    print()
    print('Policy Evaluation done')
    
    print()
    

3. Policy Improvement

\[\begin{aligned} &policy\text{-}stable \leftarrow true \\ &\text{For each $s \in \mathcal{S}$:} \\ &\quad \quad old\text{-}action \leftarrow \pi(s) \\ &\quad \quad \pi(s) \leftarrow \arg\max_a\sum_{s', r} p\left(s',r\vert s, a\right)\left[\gamma V(s) + r \right] \\ &\quad \quad \text{If $old\text{-}action \neq \pi(s)$, then $policy\text{-}stable \leftarrow false$} \\ &\text{If $policy\text{-}stable$ the stop and return $V \approx v_*$ and $\pi \approx \pi_*$; else go to $2$} \ \end{aligned}\]

    
    print('Starting Policy Iteration')

    policy_stable = True
    history.append(pi.copy(deep=True))
    
    
    for idx in (state_idx):
        s = states[idx]
        act = pi[idx]
        actions = get_allowed_acts(s)
        values_for_acts = [get_value(s, a, V, cache, gamma) for a in actions]
        best_act_ind = np.argmax(values_for_acts)
        pi[idx] = actions[best_act_ind]
 
        if (act != pi[idx]): 
            policy_stable = False
    
    if policy_stable:
        print('Done')
        break
        
    else:
        print('Policy changed - repeating eval')
        print()
        print('-' * 100)

Starting Policy Evaluation
Iteration 1, Policy Eval Iteration 96, Δ = 8.99483e-07
Policy Evaluation done

Starting Policy Iteration
Policy changed - repeating eval

----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 2, Policy Eval Iteration 76, Δ = 9.01651e-07
Policy Evaluation done

Starting Policy Iteration
Policy changed - repeating eval

----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 3, Policy Eval Iteration 70, Δ = 9.62239e-07
Policy Evaluation done

Starting Policy Iteration
Policy changed - repeating eval

----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 4, Policy Eval Iteration 52, Δ = 8.39798e-07
Policy Evaluation done

Starting Policy Iteration
Policy changed - repeating eval

----------------------------------------------------------------------------------------------------
Starting Policy Evaluation
Iteration 5, Policy Eval Iteration 17, Δ = 7.18887e-07
Policy Evaluation done

Starting Policy Iteration
Done

We can visualise the progress of the algorithm in the plots below. Evidently policy evaluation converges faster across iterations of the algorithm and after the first few iterations attains a lower final value.

Visualising the results

Having successfully run the algorithm, let us visualise the results by creating a figure similar to Figure 4.2 in the RL book. The figure shows the initial policy $\pi_0$ and the policy after each iteration as a heatmap for each combination of the number of cars at each location, $(s_1, s_2)$. It also shows the value function $v_{\pi_4}$ as a 3D plot for each state.

The policies are in a flattened form so we reshape them into a 2D grid for plotting the results in the form of heatmaps. We also reshape the states and values into 21x21

pi_grid = np.reshape([pi[i] for i in state_idx], (21, 21))
idx_grid = np.reshape([states[i] for i in state_idx], (21, 21, 2))
V_grid = np.reshape([V[i] for i in state_idx], (21, 21))

Here is a helper function that adds text labels to the grid. The function takes the states, values, mask and offsets for the text labels as arguments. It also takes a threshold and a boolean lower which determines whether the mask is applied to values lower or higher than the threshold. The function then plots the values as text on the grid, with the text colour set to white if the mask is satisfied and black otherwise.

def plot_grid(states, values, mask, offx, offy, th, axis=None, lower=True):
    # Assumes a colormesh has already been plotted on the axis
    # and adds text labels to the grid
    axis = plt.gca() if axis is None else axis
    for state, value, m in zip(states, values, mask):
        axis.text(state[1]+offx,state[0]+offy,value, 
                 color='white' if (m < th if lower else m > th) else 'k')

Finally let us make the plots. The plots are arranged in a 2x3 grid. The first 5 plots show the policies $\pi_0, \pi_1, \pi_2, \pi_3, \pi_4$ and the last plot shows the value function $v_{\pi_4}$.

fig = plt.figure(figsize=(33, 22))

gs = plt.GridSpec(2, 3)
fontsize = 19

cmap = 'YlOrRd'

for t in range(len(history) + 1):
   
    
    if t < 5:
        pi_grid_i = np.reshape([history[t][i] for i in state_idx], (21, 21))
        axis = plt.subplot(gs[t])
        axis.pcolormesh(pi_grid_i, cmap=cmap)
        
        plot_grid(
            idx_grid.reshape([-1, 2]),
            pi_grid_i.reshape([-1]),
            pi_grid_i.reshape([-1]),
            .25, .25, 2, axis=axis, lower=False
        )
        axis.set_aspect('equal', adjustable='box')
        axis.set_title(f'$\pi_{t}$', fontsize=fontsize)
        
    else:
        axis = plt.subplot(gs[-1], projection='3d')
        X = idx_grid[..., 1]
        Y = idx_grid[..., 0]
        Z = V_grid
        axis.plot_surface(X, Y, Z, cmap='summer', linewidth=0, antialiased=False)
        axis.view_init(60, -60)
        axis.set_xlim([0, 20])
        axis.set_ylim([0, 20]);
        axis.set_title('$v_{\\pi_4}$', fontsize=25)
    
    if t in [3, 5]:
        space = '\n'*2 if t == 5 else ''
        axis.set_ylabel(f'{space}No. cars at location 1', fontsize=fontsize)
        axis.set_xlabel(f'{space}No. cars at location 2', fontsize=fontsize)
        
    axis.tick_params(labelsize=fontsize)

    
    

Here is an animated version showing how the algorithm progresses towards the final result

And here is an interactive version of the final state-value function

Minibatch AI

A quick introduction to super-resolution using Stable Diffusion and the Diffusers library

How to create matrices in LaTeX

Contents

General Syntax

Column and Row Vectors

Column Vector

Row Vector

Different Bracket Types

Parentheses

Determinants

No brackets

Use of Ellipses

Horizontal

Vertical

Diagonal

Some example matrices

Example: 2x2 rotation matrix

Example: curl of a vector field expressed as a determinant

Example: N-point discrete Fourier transform matrix

Nested matrices

Matrices in Mathematical Expressions

Conclusion

How to Create Quantum Gate Diagrams in Python

Contents

Introduction

Creating a Single Qubit Quantum Gate - X gate

Creating a Diagram

Saving the Diagram Figure

Single Gate Example: Hadamard

Two Qubit Gate Example: CNOT

Creating Custom Gates

X Gate represented as a NOT gate

Rotation Gate

Toffoli Gate

Creating Circuit Diagrams

Single Qubit Circuit - Hadamard in terms of $X$ and rotation gates

Multi-Qubit Circuit - using Hadamard gates to swap CNOT control and target

Conclusion

A Minimalist ChatGPT for Jupyter Notebook or Command Line

Contents

Introduction

Chat Completions API Overview

Interacting with the API

Token Counting

Truncating Conversation History to Limit Tokens

MinChatGPT

Implementing the conversation functionality

Demo

Command Line Interface

Limitations

Conclusion

How I ran the Hugging Face Diffusers Textual Inversion example on an AWS EC2 Instance

Contents

Introduction

Setup instance

Install Python

Install libraries to run the example

Setup for training

Run the example

Using tmux to run the example in the background

Inference

Next steps

Policy Iteration for finite MDPs

Contents

Introduction

Problem Statement

MDP

Initial state

Action to move cars

Rentals

Returns

Reward

The four argument function $p(s’, r \vert s, a)$

Implementation of the four argument function

An iterative implementation

An vectorised implementation

Policy Iteration

Preliminaries

Implementing Policy Iteration