{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# numpy: arrays and functions\n", "## 09/12/2023\n", "\n", "print view
\n", "notebook" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Arrays\n", "\n", "`numpy` arrays are dense, continuous, uniformly sized blocks of identically typed data values" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "L = [[0,1],[2,3]]\n", "A = np.array(L)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L: [[0, 1], [2, 3]]\n", "A:\n", " [[0 1]\n", " [2 3]]\n" ] } ], "source": [ "print(\"L:\",L)\n", "print(\"A:\\n\",A)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \n" ] } ], "source": [ "print(type(L),type(A))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Array Memory Layout" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Array Memory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the standard python interpretter, the return value of `id` is the memory address of the object." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4448783488\n" ] } ], "source": [ "print(id(L))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-128\n" ] } ], "source": [ "print(id(L[1])-id(L[0])) #rows are far away" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "32\n" ] } ], "source": [ "print(id(L[0][1])-id(L[0][0])) #columns not so much, but 32 bytes?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Why does this matter?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "Keeping data close together results in faster access times.\n", " * It's easier to figure out the location of the data\n", " * The data is more likely to fit in the processor's *cache*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## If you have a *block* of *dense* numerical data, store it in a `numpy` array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Creating `numpy` Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `np.ndarray` and `np.array` are the same thing." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([1,2,3,4])\n", "A.dtype #type of what is stored in the array - NOT python types!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.ndim #number of dimensions (axes in numpy speak)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4,)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.shape #size of the dimensions as a tuple" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 1)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.reshape((4,1)).shape #a column vector" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "
\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "A = np.array([1,2,3,4]).reshape(4,1)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Initializing numpy Arrays" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64 (2, 3)\n" ] } ], "source": [ "#can initialize an array with a list, or list of lists (or list of lists of lists, etc)\n", "M = np.array([[1,2,3],[4,5,6.0]])\n", "print(M.dtype,M.shape)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "#if know the size, but not the data, can initialize to zeros:\n", "Z = np.zeros((10,10))\n", "#or ones\n", "O = np.ones((5,10))\n", "#or identity\n", "I = np.identity(3) #this makes a 3x3 square identity matrix" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] } ], "source": [ "print(Z.dtype) #note, default type is floating point" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int64\n" ] } ], "source": [ "Z = np.zeros((10,10),np.int64) #can change\n", "print(Z.dtype)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Indexing and Slicing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "`numpy` arrays can be indexed and sliced a lot like python lists, but take **tuples** of values to reference each dimension." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2],\n", " [3, 4, 5]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.array([[0,1,2],[3,4,5]])\n", "M" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n", "2\n" ] } ], "source": [ "print(M[1,1]) #indexing\n", "print(M[0,-1]) #last item of first row" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2]\n" ] } ], "source": [ "print(M[0,1:]) #can have slices - all but first column of first row" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3 4 5] [3 4 5]\n" ] } ], "source": [ "print(M[1],M[1,:]) #missing indices are treated as complete slices" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "M = [[0,1,2],[3,4,5]]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Advanced Slicing: Integer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` arrays support advanced indexing by arrays of integers or booleans:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "A = np.array([0,1,4,9,16,25])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 4 25]\n" ] } ], "source": [ "print(A[[2,5]]) #choose just indices 2 and 5" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Advanced Slicing: Boolean\n", "\n", "Indexing by **Boolean** *numpy arrays* can be used to select elements" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False True True True]\n" ] } ], "source": [ "b = A > 4\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 9 16 25]\n" ] } ], "source": [ "print(A[b])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Slicing Assignment" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b = [False False False True True True]\n" ] } ], "source": [ "print(\"b =\",b)\n", "A[b] = 0" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 4 0 0 0]\n" ] } ], "source": [ "print(A)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "S = np.array(['a','b','c','b','a'])\n", "S[S != 'a'] = 'z'" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Array Views vs. Copies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * A `numpy` array object has a pointer to a dense block of memory that stores the data of the array.\n", " * *Basic* slices are just *views* of this data - they are **not** a new copy. \n", " * Binding the same object to different variables will **not** create a copy.\n", " * *Advanced* slices will create a copy if bound to a new variable - these are cases where the result may contain elements that are not contiguous in the original array " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Views" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "A = np.array([[0,1,2],[3,4,5],[6,7,8]])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = A #A and B reference the _same_ object\n", "A is B" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1000, 1, 2],\n", " [ 3, 4, 5],\n", " [ 6, 7, 8]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[0,0] = 1000\n", "A" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Sliced Views" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4, 5])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row = A[1,:]\n", "row" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1000, 1, 2],\n", " [ 3, 4, 5000],\n", " [ 6, 7, 8]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row[2] = 5000\n", "A" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Explicit Copy" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1000, 1, 2],\n", " [ 3, 4, 5000],\n", " [ 6, 7, 8]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newMat = A.copy() #this will actually copy the data\n", "newMat[0,0] = 0\n", "A" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2],\n", " [ 3, 4, 5000],\n", " [ 6, 7, 8]])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newMat" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Advanced Slices Copy" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5, 6, 7, 8])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[0,1,2],[3,4,5],[6,7,8]])\n", "B = A[A > 4]\n", "B" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1, -1, -1, -1])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[:] = -1\n", "B" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2],\n", " [3, 4, 5],\n", " [6, 7, 8]])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "but..." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2],\n", " [ 3, 4, -1],\n", " [-1, -1, -1]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[A > 4] = -1\n", "A" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def z(M):\n", " M[:] = 0\n", "A = np.array([1,2,3])\n", "z(A)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Functions on Arrays\n", "\n", "`numpy` includes a number of standard functions that will work on arrays" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.5" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [1,2,3,4]\n", "np.mean(A)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(A)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sin(A)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Axis\n", "\n", "Most aggregation operations take an `axis` parameter that limits the operation to a specific direction in the array\n", "* axis 0: across rows (apply operation to individual columns)\n", "* axis 1: across columns (apply operation to individual rows)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.arange(12).reshape(3,4); b" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "66" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(b)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([12, 15, 18, 21])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(b,axis=0)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6, 22, 38])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(b,axis=1)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Loading Data\n", "\n", "`genfromtxt` (and the simpler `loadtxt`) will read in deliminated files." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([nan, nan, nan, ..., nan, nan, nan])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.genfromtxt('../files/Spellman.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The defaul delimiter is *whitespace* which will not work with a csv" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ nan, 4.0000000e+01, 5.0000000e+01, ...,\n", " 2.4000000e+02, 2.5000000e+02, 2.6000000e+02],\n", " [ nan, -7.0000000e-02, -2.3000000e-01, ...,\n", " 5.7000000e-01, 0.0000000e+00, 1.0000000e-02],\n", " [ nan, 2.1500000e-01, 9.0000000e-02, ...,\n", " -1.0000000e-01, 2.7000000e-01, 2.3500001e-01],\n", " ...,\n", " [ nan, -2.5500000e-01, -3.6000000e-01, ...,\n", " 8.4000000e-01, -3.9000000e-01, -4.1500000e-01],\n", " [ nan, 5.7000000e-01, 1.2000000e-01, ...,\n", " -1.2000000e-01, 6.9000000e-01, 5.5500000e-01],\n", " [ nan, 4.0500000e-01, 1.7000000e-01, ...,\n", " -8.0000000e-02, 6.5000000e-01, 5.2000000e-01]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.genfromtxt('../files/Spellman.csv',delimiter=',')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Why nan? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Loading Data\n", "\n", "Recall that numpy arrays are dense, uniformly typed arrays. Can't mix a gene name (string) with expression values (float)." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([['time', '40', '50', ..., '240', '250', '260'],\n", " ['YAL001C', '-0.07', '-0.23', ..., '0.57', '0', '0.01'],\n", " ['YAL014C', '0.215', '0.09', ..., '-0.1', '0.27', '0.23500001'],\n", " ...,\n", " ['YPR201W', '-0.255', '-0.36', ..., '0.84', '-0.39', '-0.415'],\n", " ['YPR203W', '0.57', '0.12', ..., '-0.12', '0.69', '0.555'],\n", " ['YPR204W', '0.405', '0.17', ..., '-0.08', '0.65', '0.52']],\n", " dtype='\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
\n", "" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function genfromtxt in module numpy:\n", "\n", "genfromtxt(fname, dtype=, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=\" !#$%&'()*+,-./:;<=>?@[\\\\]^{|}~\", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, ndmin=0, like=None)\n", " Load data from a text file, with missing values handled as specified.\n", " \n", " Each line past the first `skip_header` lines is split at the `delimiter`\n", " character, and characters following the `comments` character are discarded.\n", " \n", " Parameters\n", " ----------\n", " fname : file, str, pathlib.Path, list of str, generator\n", " File, filename, list, or generator to read. If the filename\n", " extension is ``.gz`` or ``.bz2``, the file is first decompressed. Note\n", " that generators must return bytes or strings. The strings\n", " in a list or produced by a generator are treated as lines.\n", " dtype : dtype, optional\n", " Data type of the resulting array.\n", " If None, the dtypes will be determined by the contents of each\n", " column, individually.\n", " comments : str, optional\n", " The character used to indicate the start of a comment.\n", " All the characters occurring on a line after a comment are discarded.\n", " delimiter : str, int, or sequence, optional\n", " The string used to separate values. By default, any consecutive\n", " whitespaces act as delimiter. An integer or sequence of integers\n", " can also be provided as width(s) of each field.\n", " skiprows : int, optional\n", " `skiprows` was removed in numpy 1.10. Please use `skip_header` instead.\n", " skip_header : int, optional\n", " The number of lines to skip at the beginning of the file.\n", " skip_footer : int, optional\n", " The number of lines to skip at the end of the file.\n", " converters : variable, optional\n", " The set of functions that convert the data of a column to a value.\n", " The converters can also be used to provide a default value\n", " for missing data: ``converters = {3: lambda s: float(s or 0)}``.\n", " missing : variable, optional\n", " `missing` was removed in numpy 1.10. Please use `missing_values`\n", " instead.\n", " missing_values : variable, optional\n", " The set of strings corresponding to missing data.\n", " filling_values : variable, optional\n", " The set of values to be used as default when the data are missing.\n", " usecols : sequence, optional\n", " Which columns to read, with 0 being the first. For example,\n", " ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.\n", " names : {None, True, str, sequence}, optional\n", " If `names` is True, the field names are read from the first line after\n", " the first `skip_header` lines. This line can optionally be preceded\n", " by a comment delimiter. If `names` is a sequence or a single-string of\n", " comma-separated names, the names will be used to define the field names\n", " in a structured dtype. If `names` is None, the names of the dtype\n", " fields will be used, if any.\n", " excludelist : sequence, optional\n", " A list of names to exclude. This list is appended to the default list\n", " ['return','file','print']. Excluded names are appended with an\n", " underscore: for example, `file` would become `file_`.\n", " deletechars : str, optional\n", " A string combining invalid characters that must be deleted from the\n", " names.\n", " defaultfmt : str, optional\n", " A format used to define default field names, such as \"f%i\" or \"f_%02i\".\n", " autostrip : bool, optional\n", " Whether to automatically strip white spaces from the variables.\n", " replace_space : char, optional\n", " Character(s) used in replacement of white spaces in the variable\n", " names. By default, use a '_'.\n", " case_sensitive : {True, False, 'upper', 'lower'}, optional\n", " If True, field names are case sensitive.\n", " If False or 'upper', field names are converted to upper case.\n", " If 'lower', field names are converted to lower case.\n", " unpack : bool, optional\n", " If True, the returned array is transposed, so that arguments may be\n", " unpacked using ``x, y, z = genfromtxt(...)``. When used with a\n", " structured data-type, arrays are returned for each field.\n", " Default is False.\n", " usemask : bool, optional\n", " If True, return a masked array.\n", " If False, return a regular array.\n", " loose : bool, optional\n", " If True, do not raise errors for invalid values.\n", " invalid_raise : bool, optional\n", " If True, an exception is raised if an inconsistency is detected in the\n", " number of columns.\n", " If False, a warning is emitted and the offending lines are skipped.\n", " max_rows : int, optional\n", " The maximum number of rows to read. Must not be used with skip_footer\n", " at the same time. If given, the value must be at least 1. Default is\n", " to read the entire file.\n", " \n", " .. versionadded:: 1.10.0\n", " encoding : str, optional\n", " Encoding used to decode the inputfile. Does not apply when `fname` is\n", " a file object. The special value 'bytes' enables backward compatibility\n", " workarounds that ensure that you receive byte arrays when possible\n", " and passes latin1 encoded strings to converters. Override this value to\n", " receive unicode arrays and pass strings as input to converters. If set\n", " to None the system default is used. The default value is 'bytes'.\n", " \n", " .. versionadded:: 1.14.0\n", " ndmin : int, optional\n", " Same parameter as `loadtxt`\n", " \n", " .. versionadded:: 1.23.0\n", " like : array_like, optional\n", " Reference object to allow the creation of arrays which are not\n", " NumPy arrays. If an array-like passed in as ``like`` supports\n", " the ``__array_function__`` protocol, the result will be defined\n", " by it. In this case, it ensures the creation of an array object\n", " compatible with that passed in via this argument.\n", " \n", " .. versionadded:: 1.20.0\n", " \n", " Returns\n", " -------\n", " out : ndarray\n", " Data read from the text file. If `usemask` is True, this is a\n", " masked array.\n", " \n", " See Also\n", " --------\n", " numpy.loadtxt : equivalent function when no data is missing.\n", " \n", " Notes\n", " -----\n", " * When spaces are used as delimiters, or when no delimiter has been given\n", " as input, there should not be any missing data between two fields.\n", " * When the variables are named (either by a flexible dtype or with `names`),\n", " there must not be any header in the file (else a ValueError\n", " exception is raised).\n", " * Individual values are not stripped of spaces by default.\n", " When using a custom converter, make sure the function does remove spaces.\n", " \n", " References\n", " ----------\n", " .. [1] NumPy User Guide, section `I/O with NumPy\n", " `_.\n", " \n", " Examples\n", " --------\n", " >>> from io import StringIO\n", " >>> import numpy as np\n", " \n", " Comma delimited file with mixed dtype\n", " \n", " >>> s = StringIO(u\"1,1.3,abcde\")\n", " >>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),\n", " ... ('mystring','S5')], delimiter=\",\")\n", " >>> data\n", " array((1, 1.3, b'abcde'),\n", " dtype=[('myint', '>> _ = s.seek(0) # needed for StringIO example only\n", " >>> data = np.genfromtxt(s, dtype=None,\n", " ... names = ['myint','myfloat','mystring'], delimiter=\",\")\n", " >>> data\n", " array((1, 1.3, b'abcde'),\n", " dtype=[('myint', '>> _ = s.seek(0)\n", " >>> data = np.genfromtxt(s, dtype=\"i8,f8,S5\",\n", " ... names=['myint','myfloat','mystring'], delimiter=\",\")\n", " >>> data\n", " array((1, 1.3, b'abcde'),\n", " dtype=[('myint', '>> s = StringIO(u\"11.3abcde\")\n", " >>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],\n", " ... delimiter=[1,3,5])\n", " >>> data\n", " array((1, 1.3, b'abcde'),\n", " dtype=[('intvar', '>> f = StringIO('''\n", " ... text,# of chars\n", " ... hello world,11\n", " ... numpy,5''')\n", " >>> np.genfromtxt(f, dtype='S12,S12', delimiter=',')\n", " array([(b'text', b''), (b'hello world', b'11'), (b'numpy', b'5')],\n", " dtype=[('f0', 'S12'), ('f1', 'S12')])\n", "\n" ] } ], "source": [ "help(np.genfromtxt)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Data Normalization\n", "\n", "## Q1: How would you rescale your data to range from 0 to 1?\n", "\n", "## Q2: How would you rescale your data to have zero mean and unit standard deviation?" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[0.51995439, 0.50171038, 0.51653364, ..., 0.59293044, 0.52793615,\n", " 0.5290764 ],\n", " [0.55245154, 0.5381984 , 0.53078677, ..., 0.51653364, 0.55872292,\n", " 0.55473204],\n", " [0.54503991, 0.54503991, 0.55302166, ..., 0.48916762, 0.55644242,\n", " 0.54960091],\n", " ...,\n", " [0.49885975, 0.48688712, 0.49372862, ..., 0.62371722, 0.48346636,\n", " 0.48061574],\n", " [0.59293044, 0.54161916, 0.51995439, ..., 0.51425314, 0.60661345,\n", " 0.59122007],\n", " [0.57411631, 0.54732041, 0.52280502, ..., 0.51881414, 0.60205245,\n", " 0.58722919]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(values-values.min())/(values.max()-values.min())" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.9999999999999999" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std((values-values.mean())/values.std())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Project: Expression Data\n", "\n", "https://MSCBIO2025.github.io/files/Spellman.csv\n", "\n", "* Read this data into a numpy array\n", "* Plot a histogram of the expression values for the first time point\n", "* Plot a histogram of the expression values for the last time point\n", "* Plot a histogram of the average expression value for the genes across all time points\n", "* Plot the average expression value (across all genes) at each time point as a line graph\n", "* Plot two series of average expression values: one for all genes where the first value is positive and the other for all genes where the first value is negative\n" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOWUlEQVR4nO3df6jd9X3H8edrzvlHLajkLktjXKRkg3R0sVys0P5hcZu/xqKjE/1DXedICxEUhBEVZmEIga5269iEdIoWnFZQMaxh0wbB9Q+tUcQfia6hjZgQTVo3FYSO6Ht/3K96TO/NPeeec+6593OfDzjc7/fz/XHe3+Tklc/9fH+cVBWSpLb8xqQLkCSNnuEuSQ0y3CWpQYa7JDXIcJekBv3mpAsAWLVqVa1fv37SZUjSsvLss8/+oqqmZlu2JMJ9/fr17NmzZ9JlSNKykuS1uZY5LCNJDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ1aEneoSurf+m0//Gj6wPZLJ1iJljJ77pLUIMNdkhpkuEtSgxxzlxrkuLzsuUtSgwx3SWqQwzJSI3qHYiR77pLUIMNdkhpkuEtSgxxzl5Yxx9k1l3l77knWJXkiyd4kLye5oWv/ZpJDSZ7vXpf0bHNzkv1JXk1y4TgPQNKJrd/2w49eWjn66bkfA26qqueSfBp4Nsnj3bLvVNXf966cZCNwJfA54DPAj5L8XlW9P8rCJUlzm7fnXlWHq+q5bvpdYB+w9gSbbAYeqKpfVdXPgf3AuaMoVpLUn4FOqCZZD5wDPN01XZ/khSR3Jzm9a1sLvN6z2UFm+c8gyZYke5LsOXr06OCVS5Lm1He4JzkVeAi4sareAe4EPgtsAg4D3x7kjatqR1VNV9X01NTUIJtKkubRV7gnOZmZYL+vqh4GqKo3q+r9qvoA+B4fD70cAtb1bH5m1yZJWiT9XC0T4C5gX1Xd0dO+pme1y4GXuumdwJVJTklyNrAB+MnoSpYkzaefq2W+BFwNvJjk+a7tFuCqJJuAAg4AXweoqpeTPAjsZeZKm61eKSNJi2vecK+qHwOZZdGuE2xzO3D7EHVJkobg4wckqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSg3yeu7RE9T6i98D2SydYiZYje+6S1CDDXZIa5LCMtAz4LUoalD13SWqQ4S5JDTLcJalBjrlLK5SXWrbNnrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNmjfck6xL8kSSvUleTnJD135GkseT/LT7eXrXniTfTbI/yQtJvjDug5AkfVI/PfdjwE1VtRE4D9iaZCOwDdhdVRuA3d08wMXAhu61Bbhz5FVLkk5o3nCvqsNV9Vw3/S6wD1gLbAbu7Va7F7ism94MfL9mPAWclmTNqAuXJM1toK/ZS7IeOAd4GlhdVYe7RW8Aq7vptcDrPZsd7NoO97SRZAszPXvOOuusQeuWtAC9X62ntvV9QjXJqcBDwI1V9U7vsqoqoAZ546raUVXTVTU9NTU1yKaSpHn0Fe5JTmYm2O+rqoe75jc/HG7pfh7p2g8B63o2P7NrkyQtkn6ulglwF7Cvqu7oWbQTuLabvhZ4tKf9mu6qmfOAt3uGbyRJi6CfMfcvAVcDLyZ5vmu7BdgOPJjkOuA14Ipu2S7gEmA/8B7wtVEWLEma37zhXlU/BjLH4gtmWb+ArUPWJUkagneoSlKDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUoIEeHCZpvHywl0bFnrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yG9ikvSJb4A6sP3SCVaiUbHnLkkNMtwlqUGGuyQ1yHCXpAbNG+5J7k5yJMlLPW3fTHIoyfPd65KeZTcn2Z/k1SQXjqtwSdLc+um53wNcNEv7d6pqU/faBZBkI3Al8Llum39JctKoipUk9WfecK+qJ4G3+tzfZuCBqvpVVf0c2A+cO0R9kqQFGGbM/fokL3TDNqd3bWuB13vWOdi1/ZokW5LsSbLn6NGjQ5QhSTreQsP9TuCzwCbgMPDtQXdQVTuqarqqpqemphZYhiRpNgsK96p6s6rer6oPgO/x8dDLIWBdz6pndm2SpEW0oHBPsqZn9nLgwytpdgJXJjklydnABuAnw5UoSRrUvM+WSXI/cD6wKslB4Dbg/CSbgAIOAF8HqKqXkzwI7AWOAVur6v2xVC5JmtO84V5VV83SfNcJ1r8duH2YoiRJw/EOVUlqkI/8lSbAR+xq3Oy5S1KD7LlL+oTe3yrA3yyWK3vuktQge+7ShB3fU5ZGwZ67JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUHexCTphHzI2fJkz12SGmS4S1KDHJaRFonPkNFisucuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1aN5wT3J3kiNJXuppOyPJ40l+2v08vWtPku8m2Z/khSRfGGfxkqTZ9dNzvwe46Li2bcDuqtoA7O7mAS4GNnSvLcCdoylTkjSIecO9qp4E3jqueTNwbzd9L3BZT/v3a8ZTwGlJ1oyoVklSnxY65r66qg53028Aq7vptcDrPesd7Np+TZItSfYk2XP06NEFliFJms3QJ1SrqoBawHY7qmq6qqanpqaGLUOS1GOhX9bxZpI1VXW4G3Y50rUfAtb1rHdm1yatSH5BhyZloT33ncC13fS1wKM97dd0V82cB7zdM3wjSVok8/bck9wPnA+sSnIQuA3YDjyY5DrgNeCKbvVdwCXAfuA94GtjqFmSNI95w72qrppj0QWzrFvA1mGLkiQNxztUJalBhrskNchwl6QGGe6S1KCFXucuaQ5e266lwJ67JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkN8jp3SQvSez3/ge2XTrASzcZwl9Q3b9BaPhyWkaQGGe6S1CDDXZIa5Ji7NAKORWupsecuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGuSlkNICefmjljJ77pLUIHvukobmEyKXHnvuktQge+6SRspe/NIwVLgnOQC8C7wPHKuq6SRnAD8A1gMHgCuq6n+GK1OSNIhRDMt8pao2VdV0N78N2F1VG4Dd3bwkaRGNY8x9M3BvN30vcNkY3kOSdALDhnsBjyV5NsmWrm11VR3upt8AVs+2YZItSfYk2XP06NEhy5Ak9Rr2hOqXq+pQkt8GHk/ySu/CqqokNduGVbUD2AEwPT096zrSUuONS1ouhuq5V9Wh7ucR4BHgXODNJGsAup9Hhi1SkjSYBYd7kk8l+fSH08CfAC8BO4Fru9WuBR4dtkhJ0mCGGZZZDTyS5MP9/FtV/UeSZ4AHk1wHvAZcMXyZ0uQ4FKPlaMHhXlU/A/5wlvZfAhcMU5QkaTg+fkCSGmS4S1KDfLaMNAvH2bXcGe6SxsaHiE2OwzKS1CDDXZIa5LCMpEXhEM3isucuSQ0y3CWpQYa7JDXIcJekBnlCVSuWNyqpZYa7VhQDXSuFwzKS1CB77pIW3VzXvHst/OjYc5ekBtlzlzRRngcZD3vuktQge+6SljzH4gdnz12SGmS4S1KDHJZR8zxhp5XIcFeTDHStdIa7mmGgt6Wfv09PtM7NcNeS4p2L0mh4QlWSGmTPXdKy4vBbfwx3LZq5/lHONcwy1/oO0Wg2fi4+yXDXvJbyOLi9OM1nKXxOJ8Fw18QZ0NLojS3ck1wE/CNwEvCvVbV9XO+1EixG76Of9xj08jRpKRn0M36if2tL/TeCsYR7kpOAfwb+GDgIPJNkZ1XtHcf7tWpcITmqEJeWqkE/vy12WsbVcz8X2F9VPwNI8gCwGRh5uJ/oD3yY/00HPfk37v30s/+F7HNUH9jl9sGXFqPz1I9x9fpTVaPfafJV4KKq+utu/mrgi1V1fc86W4At3ezvA6+OvJDBrAJ+MeEaFoPH2ZaVcpywco51kOP83aqamm3BxE6oVtUOYMek3v94SfZU1fSk6xg3j7MtK+U4YeUc66iOc1x3qB4C1vXMn9m1SZIWwbjC/RlgQ5Kzk/wWcCWwc0zvJUk6zliGZarqWJLrgf9k5lLIu6vq5XG81wgtmSGiMfM427JSjhNWzrGO5DjHckJVkjRZPhVSkhpkuEtSgwz3TpK/S/JCkueTPJbkM5OuaVySfCvJK93xPpLktEnXNA5J/iLJy0k+SNLcJXRJLkryapL9SbZNup5xSXJ3kiNJXpp0LeOSZF2SJ5Ls7T6zNwy7T8P9Y9+qqs9X1Sbg34G/nXA94/Q48AdV9Xngv4GbJ1zPuLwE/Dnw5KQLGbWeR3xcDGwErkqycbJVjc09wEWTLmLMjgE3VdVG4Dxg67B/n4Z7p6re6Zn9FNDsmeaqeqyqjnWzTzFzH0JzqmpfVU36zudx+egRH1X1f8CHj/hoTlU9Cbw16TrGqaoOV9Vz3fS7wD5g7TD79JG/PZLcDlwDvA18ZcLlLJa/An4w6SI0sLXA6z3zB4EvTqgWjVCS9cA5wNPD7GdFhXuSHwG/M8uiW6vq0aq6Fbg1yc3A9cBti1rgCM13rN06tzLz6+B9i1nbKPVznNJykeRU4CHgxuNGEwa2osK9qv6oz1XvA3axjMN9vmNN8pfAnwIX1DK+2WGAv9PW+IiPxiQ5mZlgv6+qHh52f465d5Js6JndDLwyqVrGrfsilb8B/qyq3pt0PVoQH/HRkCQB7gL2VdUdI9nnMu60jVSSh5h59PAHwGvAN6qqyZ5Qkv3AKcAvu6anquobEyxpLJJcDvwTMAX8L/B8VV040aJGKMklwD/w8SM+bp9sReOR5H7gfGYehfsmcFtV3TXRokYsyZeB/wJeZCaDAG6pql0L3qfhLkntcVhGkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QG/T/f+Nm+J6yvwAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "#bins = [-3,-2,-1,0,1,2,3]\n", "#bins = np.linspace(-3,3,100)\n", "plt.hist(values[:,0],bins=100);" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAANx0lEQVR4nO3df+hd9X3H8edrzvWPKaiYZS6GRUpWSMcWyxfn6P5wc1v9MRYdm+gfNusc6R8RFISh7R/tP4XAVsvKNkc6xRScTlAxrHZtGgQpTOs3Emxi6hraiAnRfDtHFYSO2Pf+yIm7xm/y/XF/nO/3c58PuNxzPuece9+HfO/rfvI5P26qCklSW36h7wIkSaNnuEtSgwx3SWqQ4S5JDTLcJalBv9h3AQCXXnppbdiwoe8yJGlV2bdv30+qas18y1ZEuG/YsIHZ2dm+y5CkVSXJa2db5rCMJDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1aEVcoSqN2oZ7v/H+9JEdN/ZYidQPe+6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXI89zVjMFz26VpZ89dkhq0YLgnWZ/k2SSvJDmY5K6u/YtJjiXZ3z1uGNjmviSHk7ya5FPj3AFJ0octZljmJHBPVb2U5EJgX5I93bKvVNXfDa6cZBNwK/Bx4NeA7yT5jap6b5SFS5LObsGee1Udr6qXuul3gEPAunNssgV4rKp+VlU/Bg4DV42iWEnS4ixpzD3JBuBK4IWu6c4kLyd5KMnFXds64PWBzY5y7i8DSdKILTrck1wAPAHcXVVvAw8AHwU2A8eBLy/ljZNsSzKbZHZubm4pm0qSFrCoUyGTnM+pYH+kqp4EqKo3B5Z/Dfj3bvYYsH5g88u7tg+oqp3AToCZmZlaTvHSUnkrYE2LxZwtE+BB4FBV3T/QftnAajcDB7rp3cCtST6S5ApgI/C90ZUsSVrIYnrunwRuB76fZH/X9jngtiSbgQKOAJ8FqKqDSR4HXuHUmTbbPVNGkiZrwXCvqu8CmWfRM+fY5kvAl4aoS5I0BK9QlaQGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIn9nTquZP60nzs+cuSQ0y3CWpQYa7JDXIcJekBnlAVVPrzIOx/niHWmK4q3meUaNpZLhr1TGspYU55i5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIe8tIncF71niHSK129twlqUGGuyQ1aMFwT7I+ybNJXklyMMldXfslSfYk+WH3fHHXniRfTXI4yctJPjHunZAkfdBieu4ngXuqahNwNbA9ySbgXmBvVW0E9nbzANcDG7vHNuCBkVctSTqnBcO9qo5X1Uvd9DvAIWAdsAXY1a22C7ipm94CfL1OeR64KMlloy5cknR2SxpzT7IBuBJ4AVhbVce7RW8Aa7vpdcDrA5sd7drOfK1tSWaTzM7NzS21bknSOSw63JNcADwB3F1Vbw8uq6oCailvXFU7q2qmqmbWrFmzlE0lSQtYVLgnOZ9Twf5IVT3ZNb95erilez7RtR8D1g9sfnnXJkmakMWcLRPgQeBQVd0/sGg3sLWb3go8PdD+6e6smauBnw4M30iSJmAxV6h+Ergd+H6S/V3b54AdwONJ7gBeA27plj0D3AAcBt4FPjPKgiVJC1sw3Kvqu0DOsvjaedYvYPuQdUkfMHhrAEkL8wpVSWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJatBifmZP6sVK+fWlwTqO7Lixx0qkxbPnLkkNMtwlqUGGuyQ1yDF3aR4rZbxfWi7DXSuKoSqNhsMyktQgw12SGmS4S1KDDHdJapDhLkkN8mwZaQm8FYFWC3vuktSgBcM9yUNJTiQ5MND2xSTHkuzvHjcMLLsvyeEkryb51LgKlySd3WJ67g8D183T/pWq2tw9ngFIsgm4Ffh4t80/JTlvVMVKkhZnwXCvqueAtxb5eluAx6rqZ1X1Y+AwcNUQ9UmSlmGYMfc7k7zcDdtc3LWtA14fWOdo1/YhSbYlmU0yOzc3N0QZkqQzLTfcHwA+CmwGjgNfXuoLVNXOqpqpqpk1a9YsswxJ0nyWFe5V9WZVvVdVPwe+xv8PvRwD1g+sennXJkmaoGWFe5LLBmZvBk6fSbMbuDXJR5JcAWwEvjdciZKkpVrwIqYkjwLXAJcmOQp8AbgmyWaggCPAZwGq6mCSx4FXgJPA9qp6byyVS5LOKlXVdw3MzMzU7Oxs32VoBVit93P3alX1Icm+qpqZb5lXqEpSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoP8DVX1brVelSqtZPbcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQV6hql54Vao0XvbcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoM8FVIagcFTO4/suLHHSqRT7LlLUoMMd0lq0ILhnuShJCeSHBhouyTJniQ/7J4v7tqT5KtJDid5Ocknxlm8JGl+i+m5Pwxcd0bbvcDeqtoI7O3mAa4HNnaPbcADoylTkrQUC4Z7VT0HvHVG8xZgVze9C7hpoP3rdcrzwEVJLhtRrZKkRVrumPvaqjreTb8BrO2m1wGvD6x3tGuTJE3Q0AdUq6qAWup2SbYlmU0yOzc3N2wZkqQByw33N08Pt3TPJ7r2Y8D6gfUu79o+pKp2VtVMVc2sWbNmmWVIkuaz3HDfDWztprcCTw+0f7o7a+Zq4KcDwzeSpAlZ8ArVJI8C1wCXJjkKfAHYATye5A7gNeCWbvVngBuAw8C7wGfGULNWEa/clPqxYLhX1W1nWXTtPOsWsH3YoiRJw/EKVUlqkOEuSQ0y3CWpQd7yVxojDyirL/bcJalBhrskNchhGU3M4BCFpPGy5y5JDTLcJalBhrskNchwl6QGGe6S1CDPlpFGzLOCtBLYc5ekBhnuktQgh2WkCfE+M5oke+6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgr1CVeuDVqho3e+6S1CDDXZIaZLhLUoMcc9fI+WMVUv+GCvckR4B3gPeAk1U1k+QS4N+ADcAR4Jaq+p/hypQkLcUohmV+v6o2V9VMN38vsLeqNgJ7u3lJ0gSNY8x9C7Crm94F3DSG95AkncOw4V7At5PsS7Kta1tbVce76TeAtfNtmGRbktkks3Nzc0OWIUkaNOwB1d+rqmNJfgXYk+QHgwurqpLUfBtW1U5gJ8DMzMy860iSlmeonntVHeueTwBPAVcBbya5DKB7PjFskZKkpVl2uCf55SQXnp4G/hg4AOwGtnarbQWeHrZISdLSDDMssxZ4Ksnp1/nXqvqPJC8Cjye5A3gNuGX4MiVJS7HscK+qHwG/PU/7fwPXDlOUJGk4XqGqkfCqVGll8d4yktQge+5Sz7y3u8bBnrskNcieu7RC2aPXMOy5S1KDDHdJapDDMtIq4BCNlsqeuyQ1yJ67ls0Ll6SVy3CXVhC/MDUqDstIUoPsuWtJ7FlKq4PhLq0ynjmjxXBYRpIaZLhLUoMMd0lqkGPuWpAHUVcfx+Vlz12SGmS4S1KDDHdJapDhLkkN8oCq1AgPfGuQPXdJapA9d2kVs7eus7HnLkkNsueuedkjbIcXNE0nw33K+cHXmc78YvfvYnUy3PU+e+vt88t8ehju0pSaRND7ZdKfsYV7kuuAvwfOA/6lqnaM672mzVI/MPbItZBz/Y2cbdng354hvvKMJdyTnAf8I/BHwFHgxSS7q+qVcbzfNPNDpdViMV8Sw7ymf/8fNK6e+1XA4ar6EUCSx4AtwMjD/Vw9jnH8Yw/Tax7H+lJfzvbZW+r/FJe6/jCfo5UyFDWJOlJVo3/R5M+B66rqr7v524Hfqao7B9bZBmzrZj8GvDryQhbnUuAnPb13n9zv6TOt+97yfv96Va2Zb0FvB1Sraiews6/3Py3JbFXN9F3HpLnf02da931a93tcV6geA9YPzF/etUmSJmBc4f4isDHJFUl+CbgV2D2m95IknWEswzJVdTLJncC3OHUq5ENVdXAc7zUCvQ8N9cT9nj7Tuu9Tud9jOaAqSeqXd4WUpAYZ7pLUIMMdSPK3SX6Q5OUkTyW5qO+aJiHJXyQ5mOTnSZo/VSzJdUleTXI4yb191zMpSR5KciLJgb5rmaQk65M8m+SV7u/8rr5rmiTD/ZQ9wG9W1W8B/wXc13M9k3IA+DPgub4LGbeBW2JcD2wCbkuyqd+qJuZh4Lq+i+jBSeCeqtoEXA1sn6J/c8MdoKq+XVUnu9nnOXVefvOq6lBV9XVl8KS9f0uMqvpf4PQtMZpXVc8Bb/Vdx6RV1fGqeqmbfgc4BKzrt6rJMdw/7K+Ab/ZdhEZuHfD6wPxRpuiDPu2SbACuBF7ouZSJmZr7uSf5DvCr8yz6fFU93a3zeU79V+6RSdY2TovZb6llSS4AngDurqq3+65nUqYm3KvqD8+1PMlfAn8CXFsNnfy/0H5PEW+JMYWSnM+pYH+kqp7su55JcliG939Y5G+AP62qd/uuR2PhLTGmTJIADwKHqur+vuuZNMP9lH8ALgT2JNmf5J/7LmgSktyc5Cjwu8A3knyr75rGpTtgfvqWGIeAx1fwLTFGKsmjwH8CH0tyNMkdfdc0IZ8Ebgf+oPtc709yQ99FTYq3H5CkBtlzl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQf8HJXuJ/flNKXMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(values[:,-1],bins=100);" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Number of Instances')" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "bins = np.linspace(-3,3,100)\n", "plt.hist(values[:,0],bins=bins, alpha=0.5,label=\"ts-40\")\n", "plt.hist(values[:,-1],bins=100,alpha=0.5,label=\"ts-260\")\n", "plt.legend(loc=\"best\");\n", "plt.xlabel(\"Expression\", size=14)\n", "plt.ylabel(\"Number of Instances\", size=14)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAD4CAYAAAANbUbJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOzElEQVR4nO3dbYxc1X3H8e+v5KFSkxaoXdc1Jk5Tt5KRWhNtKVKqhoq0EJBqIkUUpIAbUTkvQEpU3jjJi0SVkNyHQBupRXUKiqlCCCFBWA1NCy5SGqk8rCki2JTgJKbYNfYmoQSVitbw74u9Tib22jO7M7Oz6/P9SKO5c+bcmb+PZ3979sydO6kqJEmnt5+YdAGSpPEz7CWpAYa9JDXAsJekBhj2ktSAN0y6AIAVK1bUunXrJl2GJC0ru3fv/m5VrRyk75II+3Xr1jE9PT3pMiRpWUny3KB9XcaRpAYY9pLUAMNekhpg2EtSAwx7SWqAYS9JDTDsJakBhr0kNcCwl6QGLIlP0EotWLf1Kz/c3r/t8glWohY5s5ekBhj2ktQAw16SGmDYS1IDDHtJaoBH40gT5lE6WgzO7CWpAYa9JDXAsJekBrhmLy0hvev3vVzL17Cc2UtSAwx7SWqAYS9JDTDsJakBfcM+ydokDyXZm2RPkg937Z9McjDJE93lsp59PppkX5Jnklwyzn+AJKm/QY7GOQrcWFWPJ3krsDvJA919t1TVn/d2TrIBuAo4D/gF4MEkv1xVr42ycEnS4PrO7KvqUFU93m2/DDwNrDnFLpuAu6rq1ar6DrAPuGAUxUqSFmZea/ZJ1gHnA490TTckeTLJ7UnO6trWAM/37HaAU/9ykCSN2cBhn+QtwJeAj1TVD4BbgXcAG4FDwKfm88RJtiSZTjI9MzMzn10lSfM0UNgneSOzQf+5qvoyQFUdrqrXqup14DP8aKnmILC2Z/dzurYfU1Xbq2qqqqZWrlw5zL9BktRH3zdokwS4DXi6qm7uaV9dVYe6m+8Dnuq2dwJ3JrmZ2Tdo1wOPjrRqaZk72WkRpHEZ5GicdwHXAN9I8kTX9jHg6iQbgQL2Ax8CqKo9Se4G9jJ7JM/1HokjSZPVN+yr6utA5rjr/lPscxNw0xB1SZJGyE/QSlIDDHtJaoDns5eWAb+nVsNyZi9JDTDsJakBhr0kNcCwl6QGGPaS1ADDXpIaYNhLUgMMe0lqgGEvSQ0w7CWpAYa9JDXAsJekBhj2ktQAw16SGuApjqVlxtMdayGc2UtSA5zZSyPmzFtLkTN7SWqAYS9JDTDsJakBhr0kNcA3aKUx6n2zVpokw15axjzyR4NyGUeSGmDYS1IDDHtJakDfsE+yNslDSfYm2ZPkw1372UkeSPJsd31W154kn06yL8mTSd457n+EJOnUBpnZHwVurKoNwIXA9Uk2AFuBXVW1HtjV3QZ4L7C+u2wBbh151ZKkeekb9lV1qKoe77ZfBp4G1gCbgB1dtx3AFd32JuCOmvUwcGaS1aMuXJI0uHmt2SdZB5wPPAKsqqpD3V0vAKu67TXA8z27Hejajn+sLUmmk0zPzMzMt25J0jwMHPZJ3gJ8CfhIVf2g976qKqDm88RVtb2qpqpqauXKlfPZVZI0TwOFfZI3Mhv0n6uqL3fNh48tz3TXR7r2g8Dant3P6dokSRMyyNE4AW4Dnq6qm3vu2gls7rY3A/f1tF/bHZVzIfBSz3KPJGkCBjldwruAa4BvJHmia/sYsA24O8l1wHPAld199wOXAfuAV4APjrJgSdL89Q37qvo6kJPcffEc/Qu4fsi6JEkj5CdoJakBnvVSGgFPZaylzpm9JDXAsJekBhj2ktQAw16SGmDYS1IDDHtJaoBhL0kNMOwlqQGGvSQ1wLCXpAYY9pLUAMNekhpg2EtSAzzrpXSa6D3z5v5tl0+wEi1FzuwlqQGGvSQ1wLCXpAYY9pLUAMNekhpg2EtSAwx7SWqAYS9JDTDsJakBhr0kNcCwl6QGGPaS1ADDXpIa0Dfsk9ye5EiSp3raPpnkYJInustlPfd9NMm+JM8kuWRchUuSBjfIzP6zwKVztN9SVRu7y/0ASTYAVwHndfv8dZIzRlWsJGlh+oZ9VX0N+P6Aj7cJuKuqXq2q7wD7gAuGqE+SNALDfHnJDUmuBaaBG6vqRWAN8HBPnwNd2wmSbAG2AJx77rlDlCFNRu+XhUhL3ULfoL0VeAewETgEfGq+D1BV26tqqqqmVq5cucAyJEmDWFDYV9Xhqnqtql4HPsOPlmoOAmt7up7TtUmSJmhBYZ9kdc/N9wHHjtTZCVyV5M1J3g6sBx4drkRJ0rD6rtkn+TxwEbAiyQHgE8BFSTYCBewHPgRQVXuS3A3sBY4C11fVa2OpXJoA1+m1XPUN+6q6eo7m207R/ybgpmGKkiSNlp+glaQGGPaS1ADDXpIaYNhLUgMMe0lqgGEvSQ0w7CWpAYa9JDXAsJekBgxzimNJS1TvaR32b7t8gpVoqXBmL0kNMOwlqQGGvSQ1wLCXpAYY9pLUAMNekhpg2EtSAwx7SWqAYS9JDTDsJakBhr0kNcCwl6QGeCI06TTnSdEEzuwlqQmGvSQ1wGUcqSEu6bTLmb0kNcCwl6QG9A37JLcnOZLkqZ62s5M8kOTZ7vqsrj1JPp1kX5Ink7xznMVLkgYzyMz+s8Clx7VtBXZV1XpgV3cb4L3A+u6yBbh1NGVKkobRN+yr6mvA949r3gTs6LZ3AFf0tN9Rsx4GzkyyekS1SpIWaKFr9quq6lC3/QKwqtteAzzf0+9A13aCJFuSTCeZnpmZWWAZkqRBDH3oZVVVklrAftuB7QBTU1Pz3l9aLL2HK0rL1UJn9oePLc9010e69oPA2p5+53RtkqQJWmjY7wQ2d9ubgft62q/tjsq5EHipZ7lHkjQhfZdxknweuAhYkeQA8AlgG3B3kuuA54Aru+73A5cB+4BXgA+OoWZp7Fy60emmb9hX1dUnueviOfoWcP2wRUmSRstP0EpSAwx7SWqAYS9JDTDsJakBhr0kNcCwl6QGGPaS1ADDXpIaYNhLUgP8wnGpUX75eFuc2UtSAwx7SWqAYS9JDTDsJakBhr0kNcCwl6QGGPaS1ADDXpIaYNhLUgP8BK0kP03bAGf2ktQAw16SGmDYS1IDXLOXOr3r1tLpxrCX9GOO/6XnG7anB5dxJKkBhr0kNcCwl6QGGPaS1ICh3qBNsh94GXgNOFpVU0nOBr4ArAP2A1dW1YvDlSmNnkffqCWjmNn/dlVtrKqp7vZWYFdVrQd2dbclSRM0jmWcTcCObnsHcMUYnkOSNA/Dhn0B/5Rkd5ItXduqqjrUbb8ArJprxyRbkkwnmZ6ZmRmyDEnSqQz7oarfrKqDSX4OeCDJv/feWVWVpObasaq2A9sBpqam5uwjSRqNoWb2VXWwuz4C3AtcABxOshqguz4ybJGSpOEsOOyT/FSStx7bBn4XeArYCWzuum0G7hu2SEnScIZZxlkF3Jvk2OPcWVVfTfIYcHeS64DngCuHL1MaDQ+3VKsWHPZV9W3g1+Zo/x5w8TBFSaNkwEt+glaSmmDYS1IDDHtJaoBfXiLplHrf8/CLTJYvZ/aS1ADDXpIaYNhLUgMMe0lqgGEvSQ0w7CWpAYa9JDXA4+wlLYjH3y8vzuwlqQGGvSQ1wGUcnZY8rfF4OK7Ll2Gv04ZBJJ2cyziS1ABn9pKG5pE5S59hr2XBMJGG4zKOJDXAsJekBriMo2XHo26WNpfcliZn9pLUAGf2ksbGWf7SYdhrSTEcpPEw7LVkuTYvjY5hL2nRnewX+cn+mvMvvuEZ9horf0ilpcGwl7QoBlmWc3IwPmML+ySXAn8JnAH8bVVtG9dzaTJO9oPpD7XGydfOwqSqRv+gyRnAN4HfAQ4AjwFXV9XeufpPTU3V9PT0yOtY7ub7ol7ID8Ega6e+Uaqlar6v00H6n6zPUvzFkmR3VU0N0ndcM/sLgH1V9e2uoLuATcCcYT+MYf8zBpmdDhO043ixDDtznu/+0lI139fpMK/r4/cd5ud5Er9ExjWzfz9waVX9YXf7GuA3quqGnj5bgC3dzV8Bnhl5IZO3AvjupItYghyXEzkmJ3JMTnT8mLytqlYOsuPE3qCtqu3A9kk9/2JIMj3on1gtcVxO5JicyDE50TBjMq5z4xwE1vbcPqdrkyRNwLjC/jFgfZK3J3kTcBWwc0zPJUnqYyzLOFV1NMkNwD8ye+jl7VW1ZxzPtcSd1stUQ3BcTuSYnMgxOdGCx2Qsb9BKkpYWz2cvSQ0w7CWpAYb9CCU5O8kDSZ7trs+ao8/GJP+aZE+SJ5P8/iRqXUyDjEvX76tJ/ivJ3y92jYshyaVJnkmyL8nWOe5/c5IvdPc/kmTdBMpcVAOMyW8leTzJ0e7zO00YYFz+KMneLkN2JXlbv8c07EdrK7CrqtYDu7rbx3sFuLaqzgMuBf4iyZmLV+JEDDIuAH8GXLNoVS2i7hQifwW8F9gAXJ1kw3HdrgNerKpfAm4B/mRxq1xcA47JfwB/ANy5uNVNzoDj8m/AVFX9KnAP8Kf9HtewH61NwI5uewdwxfEdquqbVfVst/2fwBFgoE/ALWN9xwWgqnYBLy9STYvth6cQqar/BY6dQqRX7zjdA1ycJItY42LrOyZVtb+qngRen0SBEzLIuDxUVa90Nx9m9rNMp2TYj9aqqjrUbb8ArDpV5yQXAG8CvjXuwiZsXuNymloDPN9z+0DXNmefqjoKvAT87KJUNxmDjEmL5jsu1wH/0O9BPZ/9PCV5EPj5Oe76eO+NqqokJz2uNclq4O+AzVW17GctoxoXSYNL8gFgCnh3v76G/TxV1XtOdl+Sw0lWV9WhLsyPnKTfTwNfAT5eVQ+PqdRFNYpxOc0NcgqRY30OJHkD8DPA9xanvInwtCpzG2hckryH2cnUu6vq1X4P6jLOaO0ENnfbm4H7ju/QnT7iXuCOqrpnEWubpL7j0oBBTiHSO07vB/65Tu9PPXpalbn1HZck5wN/A/xeVQ02eaoqLyO6MLu+ugt4FngQOLtrn2L227oAPgD8H/BEz2XjpGuf9Lh0t/8FmAH+h9l1yksmXfuIx+EyZr/U51vM/lUH8MfdDyzATwJfBPYBjwK/OOmal8CY/Hr3WvhvZv/K2TPpmpfIuDwIHO7JkJ39HtPTJUhSA1zGkaQGGPaS1ADDXpIaYNhLUgMMe0lqgGEvSQ0w7CWpAf8PJxHm/X3mia4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.hist(values.mean(axis=1),bins=100);" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(header,values.mean(axis=0))\n", "plt.xlabel(\"Time\",size=14)\n", "plt.ylabel(\"Avg. Expression\",size=14);" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(header,(values[values[:,0]>0]).mean(axis=0),label=\"positive\")\n", "plt.plot(header,(values[values[:,0]<0]).mean(axis=0),label=\"negative\");\n", "plt.xlabel(\"Time\",size=14)\n", "plt.ylabel(\"Avg. Expression\",size=14)\n", "plt.legend()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }