{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Deep Learning and PyTorch\n", "## 11/28/2023\n", "\n", "print view\n", "\n", "notebook" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# OMET Teaching Survey\n", "\n", "Please fill out." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Perceptron\n", "\n", "
\n", "\n", "$$output = \\begin{cases} 0 \\text{ if } w\\cdot x + b \\le 0 \\\\ 1 \\text{ if } w\\cdot x + b > 0 \\end{cases}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Perceptron\n", "\n", "Consider the following perceptron:\n", "\n", "
\n", "\n", "If $x$ takes on only binary values, what are the possible outputs?" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Neurons\n", "\n", "\n", "\n", "Instead of a *binary* output, we set the output to the result of an **activation function** $\\sigma$\n", "\n", "$$output = \\sigma(w\\cdot x + b)$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "%matplotlib inline\n", "x = np.linspace(-10,10,500)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "# Activation Functions: Step (Perceptron)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(x, x > 0,linewidth=1,clip_on=False);\n", "plt.hlines(xmin=-10,xmax=0,y=0,linewidth=3,color='b')\n", "plt.hlines(xmin=0,xmax=10,y=1,linewidth=3,color='b');" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Activation Functions: Sigmoid (Logistic)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(x, 1/(1+np.exp(-x)),linewidth=4,clip_on=False);\n", "plt.plot(x, 1/(1+np.exp(-2*x)),linewidth=2,clip_on=False);\n", "plt.plot(x, 1/(1+np.exp(-.5*x)),linewidth=2,clip_on=False);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Activation Functions: tanh" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot([-10,10],[0,0],'k--')\n", "plt.plot(x, np.tanh(x),linewidth=4,clip_on=False);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Activation Functions: ReLU\n", "Rectified Linear Unit: $\\sigma(z) = \\max(0,z)$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(x,x*(x > 0),clip_on=False,linewidth=4);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Networks\n", "\n", "\n", "\n", "Terminology alert: networks of neurons are sometimes called *multilayer perceptrons*, despite not using the step function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Networks\n", "\n", " The number of input neurons corresponds to the number of features.\n", "\n", "The number of output neurons corresponds to the number of label classes. For binary classification, it is common to have two output nodes.\n", "\n", "Layers are typically *fully connected*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Neural Networks\n", "\n", "The universal approximation theorem says that, if some reasonable assumptions are made, a feedforward neural network with a finite number of nodes can approximate any continuous function to within a given error $\\epsilon$ over a bounded input domain.\n", "\n", "The theorem says nothing about the design (number of nodes/layers) of such a network.\n", "\n", "The theorem says nothing about the *learnability* of the weights of such a network.\n", "\n", "These are open theoretical questions.\n", "\n", "Given a network design, how are we going to learn weights for the neurons?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Stochastic Gradient Descent\n", "\n", "\n", "Randomly select $m$ training examples $X_j$ and compute the gradient of the loss function ($L$). Update weights and biases with a given _learning rate_ $\\eta$.\n", "$$ w_k' = w_k-\\frac{\\eta}{m}\\sum_j^m \n", "\\frac{\\partial L_{X_j}}{\\partial w_k}$$\n", "$$b_l' = b_l-\\frac{\\eta}{m}\n", " \\sum_j^m \\frac{\\partial L_{X_j}}{\\partial b_l}\n", "$$\n", "\n", "Common loss functions: logistic, hinge, cross entropy, euclidean" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Loss Functions\n", "\n", "
\n", "\n", "x = 1 is a correct prediction, x = -1 a wrong prediction" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Backpropagation\n", "\n", "\n", "Backpropagation is an efficient algorithm for computing the partial derivatives needed by the gradient descent update rule. For a training example $x$ and loss function $L$ in a network with $N$ layers:\n", "\n", "1. **Feedforward**. For each layer $l$ compute\n", " $$a^{l} = \\sigma(z^{l})$$\n", " where $z$ is the weighted input and $a$ is the activation induced by $x$ (these are vectors representing all nodes of layer $l$).\n", " \n", "2. **Compute output error**\n", "$$\\delta^{N} = \\nabla_a L \\odot \\sigma'(z^N)$$\n", "where $ \\nabla_a L_j = \\partial L / \\partial a^N_j$, the gradient of the loss with respect to the output activations. $\\odot$ is the elementwise product.\n", "\n", "3. **Backpropagate the error**\n", "$$\\delta^{l} = ((w^{l+1})^T \\delta^{l+1}) \\odot\n", " \\sigma'(z^{l})$$\n", " \n", "4. **Calculate gradients**\n", "$$\\frac{\\partial L}{\\partial w^l_{jk}} = a^{l-1}_k \\delta^l_j \\text{ and } \\frac{\\partial L}{\\partial b^l_j} = \\delta^l_j$$\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Backpropagation as the Chain Rule\n", "\n", " \n", "\n", "$$\\frac{\\partial L}{\\partial a^l} \\cdot \\frac{\\partial a^l}{\\partial z^l} \\cdot \\frac{\\partial z^l}{\\partial a^{l-1}} \\cdot \\frac{\\partial a^{l-1}}{\\partial z^{l-1}} \\cdot \\frac{\\partial z^{l-1}}{\\partial a^{l-2}} \\cdots \\frac{\\partial a^{1}}{\\partial z^{l}} \\cdot \\frac{\\partial z^{l}}{\\partial x} $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Deep Learning\n", "\n", "
\n", "\n", "A deep network is not more powerful (recall can approximate any function with a single layer), but may be more concise - can approximate some functions with many fewer nodes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolutional Neural Nets\n", "\n", "
\n", "\n", "Image recognition challenge results. Purple are deep learning methods." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolution Filters\n", "\n", "A filter applies a *convolution kernel* to an image. \n", "\n", "The kernel is represented by an $n$x$n$ matrix where the target pixel is in the center. \n", "\n", "The output of the filter is the sum of the products of the matrix elements with the corresponding pixels.\n", "\n", "Examples from [Wikipedia](https://en.wikipedia.org/wiki/Kernel_(image_processing)):\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
IdentityBlurEdge Detection
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Feature Maps\n", "\n", "We can think of a kernel as identifying a *feature* in an image and the resulting image as a feature map that has high values (white) where the feature is present and low values (black) elsewhere.\n", "\n", "*Feature maps retain the **spatial relationship** between features present in the original image.*\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolutional Layers\n", "\n", " A single kernel is applied across the input. For each output feature map there is a single set of weights." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolutional Layers\n", "\n", "For images, each pixel is an input feature. Each hidden layer is a set of feature maps.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Pooling\n", "\n", "Pooling layers apply a fixed convolution (usually the non-linear MAX kernel). The kernel is usually applied with a *stride* to reduce the size of the layer.\n", " * faster to train\n", " * fewer parameters to fit\n", " * less sensitive to small changes (MAX)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Consider an input image with 100 pixels. In a classic neural network, we hook these pixels up to a hidden layer with 10 nodes. In a CNN, we hook these pixels up to a convolutional layer with a 3x3 kernel and 10 output feature maps." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "The last features maps are typically connected to one or more fully-connected layers to produce the desired output." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# PyTorch\n", "\n", "PyTorch creates the dataflow graph implicitly as operations are performed.\n", "\n", "* Dataflow graph can be parallelized\n", "* Dataflow graph maintains **autograd** information - how to compute gradients for backpropagation\n", "* Extremely flexible\n", "* Easier to debug and develop than static approach" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# PyTorch Tensors\n", "\n", "Tensor is very similar to `numpy.array` in functionality.\n", " * Is allocated to a device (CPU vs GPU)\n", " * Potentially maintains autograd information" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[0.6449, 0.8679, 0.7000, 0.7213],\n", " [0.7735, 0.5921, 0.2053, 0.5418],\n", " [0.5145, 0.2320, 0.5560, 0.0838]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch # note package is not called pytorch\n", "\n", "T = torch.rand(3,4)\n", "T" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(torch.Size([3, 4]), torch.float32, device(type='cpu'), False)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.shape,T.dtype,T.device,T.requires_grad" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Modules vs Functional\n", "\n", "**Modules** are objects that can be initialized with default parameters and store any learnable parameters. Learnable parameters can be easily extracted from the module (and any member modules). Modules are called as functions on their inputs.\n", "\n", "**Functional** APIs maintain no state. All parameters are passed when the function is called." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import torch.nn as nn\n", "import torch.nn.functional as F" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# A network is a module\n", "\n", "To define a network we create a module with submodules for operations with learnable parameters. Generally use functional API for operations without learnable parameters." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "class MyNet(nn.Module):\n", " def __init__(self): #initialize submodules here - this defines our network architecture\n", " super(MyNet, self).__init__()\n", " self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1) \n", " self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1)\n", " self.fc1 = nn.Linear(2304, 10) #mystery X\n", "\n", " def forward(self, x): # this actually applies the operations\n", " x = self.conv1(x)\n", " x = F.relu(x)\n", " x = F.max_pool2d(x, kernel_size=2, stride=2) # POOL \n", " x = self.conv2(x)\n", " x = F.relu(x)\n", " x = F.max_pool2d(x, kernel_size=2, stride=2) # POOL\n", " x = torch.flatten(x, 1)\n", " x = self.fc1(x)\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# MNIST" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from torchvision import datasets\n", "train_data = datasets.MNIST(root='../data', train=True,download=True)\n", "test_data = datasets.MNIST(root='../data', train=False)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(, 5)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data[0]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAABAElEQVR4nGNgGMyAWUhIqK5jvdSy/9/rGRgYGFhgEnJsVjYCwQwMDAxPJgV+vniQgYGBgREqZ7iXH8r6l/SV4dn7m8gmCt3++/fv37/Htn3/iMW+gDnZf/+e5WbQnoXNNXyMs/5GoQoxwVmf/n9kSGFiwAW49/11wynJoPzx4YIcRlyygR/+/i2XxCWru+vv32nSuGQFYv/83Y3b4p9/fzpAmSyoMnohpiwM1w5h06Q+5enfv39/bcMiJVF09+/fv39P+mFKiTtd/fv3799jgZiBJLT69t+/f/8eDuDEkDJf8+jv379/v7Ryo4qzMDAwMAQGMjBc3/y35wM2V1IfAABFF16Aa0wAOwAAAABJRU5ErkJggg==", "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data[0][0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Inputs need to be tensors..." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "from torchvision import transforms\n", "train_data = datasets.MNIST(root='../data', train=True,transform=transforms.ToTensor())\n", "test_data = datasets.MNIST(root='../data', train=False,transform=transforms.ToTensor())" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0118, 0.0706, 0.0706, 0.0706,\n", " 0.4941, 0.5333, 0.6863, 0.1020, 0.6510, 1.0000, 0.9686, 0.4980,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.1176, 0.1412, 0.3686, 0.6039, 0.6667, 0.9922, 0.9922, 0.9922,\n", " 0.9922, 0.9922, 0.8824, 0.6745, 0.9922, 0.9490, 0.7647, 0.2510,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1922,\n", " 0.9333, 0.9922, 0.9922, 0.9922, 0.9922, 0.9922, 0.9922, 0.9922,\n", " 0.9922, 0.9843, 0.3647, 0.3216, 0.3216, 0.2196, 0.1529, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0706,\n", " 0.8588, 0.9922, 0.9922, 0.9922, 0.9922, 0.9922, 0.7765, 0.7137,\n", " 0.9686, 0.9451, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.3137, 0.6118, 0.4196, 0.9922, 0.9922, 0.8039, 0.0431, 0.0000,\n", " 0.1686, 0.6039, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0549, 0.0039, 0.6039, 0.9922, 0.3529, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.5451, 0.9922, 0.7451, 0.0078, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0431, 0.7451, 0.9922, 0.2745, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.1373, 0.9451, 0.8824, 0.6275,\n", " 0.4235, 0.0039, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3176, 0.9412, 0.9922,\n", " 0.9922, 0.4667, 0.0980, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1765, 0.7294,\n", " 0.9922, 0.9922, 0.5882, 0.1059, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0627,\n", " 0.3647, 0.9882, 0.9922, 0.7333, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.9765, 0.9922, 0.9765, 0.2510, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1804, 0.5098,\n", " 0.7176, 0.9922, 0.9922, 0.8118, 0.0078, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.1529, 0.5804, 0.8980, 0.9922,\n", " 0.9922, 0.9922, 0.9804, 0.7137, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0941, 0.4471, 0.8667, 0.9922, 0.9922, 0.9922,\n", " 0.9922, 0.7882, 0.3059, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0902, 0.2588, 0.8353, 0.9922, 0.9922, 0.9922, 0.9922, 0.7765,\n", " 0.3176, 0.0078, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0706, 0.6706,\n", " 0.8588, 0.9922, 0.9922, 0.9922, 0.9922, 0.7647, 0.3137, 0.0353,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.2157, 0.6745, 0.8863, 0.9922,\n", " 0.9922, 0.9922, 0.9922, 0.9569, 0.5216, 0.0431, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.5333, 0.9922, 0.9922, 0.9922,\n", " 0.8314, 0.5294, 0.5176, 0.0627, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000],\n", " [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,\n", " 0.0000, 0.0000, 0.0000, 0.0000]]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data[0][0]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.imshow(train_data[0][0][0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Training MNIST" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "#process 10 randomly sampled images at a time\n", "train_loader = torch.utils.data.DataLoader(train_data,batch_size=10,shuffle=True)\n", "test_loader = torch.utils.data.DataLoader(test_data,batch_size=10,shuffle=False)\n", "\n", "#instantiate our neural network and put it on the GPU\n", "model = MyNet().to('cuda')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[tensor([[[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]],\n", " \n", " \n", " [[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]],\n", " \n", " \n", " [[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]],\n", " \n", " \n", " ...,\n", " \n", " \n", " [[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]],\n", " \n", " \n", " [[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]],\n", " \n", " \n", " [[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]]]),\n", " tensor([3, 4, 2, 2, 0, 0, 4, 8, 4, 7])]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch = next(iter(train_loader))\n", "batch" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "tensor([[-0.0618, 0.0725, 0.0696, 0.0907, 0.0204, -0.1858, 0.0752, 0.0432,\n", " -0.1025, -0.0215],\n", " [-0.0696, 0.0885, -0.0112, 0.1133, 0.0270, -0.1915, 0.1485, -0.0218,\n", " -0.1089, 0.0137],\n", " [-0.0315, 0.0997, 0.0130, 0.0563, 0.0575, -0.1342, 0.1309, 0.0970,\n", " -0.0519, 0.0064],\n", " [-0.0242, 0.0943, 0.0179, 0.0685, 0.0844, -0.2072, 0.0581, 0.1100,\n", " -0.1413, 0.0031],\n", " [-0.0648, 0.0661, 0.0950, 0.0312, 0.0466, -0.1491, 0.0859, 0.0665,\n", " -0.0603, 0.0305],\n", " [-0.0315, 0.0449, 0.0610, 0.1021, 0.0320, -0.1344, 0.1114, 0.0320,\n", " -0.0648, 0.0369],\n", " [-0.0270, 0.0672, 0.0473, 0.0491, 0.0293, -0.1694, 0.0731, 0.0291,\n", " -0.1106, 0.0358],\n", " [-0.0456, 0.0428, 0.0516, 0.0689, 0.0399, -0.2126, 0.0604, 0.0615,\n", " -0.0844, -0.0294],\n", " [-0.0477, 0.0740, 0.0625, 0.0808, 0.0134, -0.1671, 0.1014, 0.0406,\n", " -0.1062, 0.0309],\n", " [-0.0168, 0.0673, 0.0582, 0.0483, 0.0313, -0.1919, 0.0552, 0.0547,\n", " -0.0937, 0.0116]], device='cuda:0', grad_fn=)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output = model(batch[0].to('cuda')) # model is on GPU, so must put input there too\n", "output" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Training MNIST\n", "\n", "Our network takes an image (as a tensor) and outputs class probabilities.\n", " * Need a loss\n", " * Need an optimizer (e.g. SGD, ADAM)\n", " * `backward` does _not_ update parameters" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(2.3067, device='cuda:0', grad_fn=)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loss = F.cross_entropy(output,batch[1].to('cuda')) #combines log softmax and \n", "loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$L(x,class) = - \\log\\left(\\frac{e^{x_{\\mathrm{class}}}}{\\sum_j e^{x_j}}\\right)$$" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "loss.backward() # sets grad, but does not change parameters of model" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Training MNIST\n", "\n", "**Epoch** - One pass through the training data." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3min 44s, sys: 1.69 s, total: 3min 46s\n", "Wall time: 3min 50s\n" ] } ], "source": [ "%%time\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.00001) # need to tell optimizer what it is optimizing\n", "\n", "losses = []\n", "for epoch in range(10):\n", " for i, (img,label) in enumerate(train_loader):\n", " optimizer.zero_grad() # IMPORTANT!\n", " img, label = img.to('cuda'), label.to('cuda')\n", " output = model(img)\n", " loss = F.cross_entropy(output, label)\n", " loss.backward()\n", " optimizer.step()\n", " losses.append(loss.item())" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(losses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the **batch loss**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Testing MNIST" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy 0.9684\n" ] } ], "source": [ "correct = 0\n", "with torch.no_grad(): #no need for gradients - won't be calling backward to clear them\n", " for img, label in test_loader:\n", " img, label = img.to('cuda'), label.to('cuda')\n", " output = F.softmax(model(img),dim=1)\n", " pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability\n", " correct += pred.eq(label.view_as(pred)).sum().item()\n", " \n", "print(\"Accuracy\",correct/len(test_loader.dataset))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Some Failures\n", "\n", "*Not from this particular network\n", "\n", "
\n", "Top label is correct. Bottom is prediction from a CNN." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Generative vs. Discriminative\n", "\n", "A *generative* model produces as output the input of a discriminative model: $P(X|Y=y)$ *or* $P(X,Y)$\n", "\n", "

\n", "

$y \\rightarrow$
Model
\n", "$ \\rightarrow X$
\n", "\n", "

\n", "

Model
\n", "$ \\rightarrow X,y$
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Autoencoders\n", "\n", "A neural network trained to generate its input.\n", "\n", "\n", "\n", "https://en.wikipedia.org/wiki/Autoencoder" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Latent Space\n", "\n", "http://blog.fastforwardlabs.com/2016/08/12/introducing-variational-autoencoders-in-prose-and.html\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Latent Space Arithmetic\n", "\n", "https://arxiv.org/pdf/1707.05776.pdf\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Latent Spaces for Molecules\n", "\n", "https://arxiv.org/abs/1610.02415\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "1% - 70% of output valid SMILES \n", "\n", "\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Generative Models of the Cell\n", "\n", "https://arxiv.org/pdf/1705.00092.pdf\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "
\n", "\n", "https://drive.google.com/file/d/0B2tsfjLgpFVhMnhwUVVuQnJxZTg/view" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Generative Adversarial Networks\n", "\n", "https://arxiv.org/abs/1406.2661\n", "\n", "https://youtu.be/G06dEcZ-QTg\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "https://thispersondoesnotexist.com\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# pix2pix\n", "\n", "\n", "\n", "https://affinelayer.com/pixsrv/\n", "\n", "https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# CycleGAN\n", "\n", "https://medium.com/coding-blocks/introduction-to-cyclegans-1dbdb8fbe781\n", "\n", "\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-from-its-creators-to-cheat-at-its-appointed-task/\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Deep learning is not profound learning.\n", "\n", "But it is quite powerful and flexible." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 1 }