{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<small><small><i>\n", "All the IPython Notebooks in this lecture series by Dr. Milan Parmar are available @ **[GitHub](https://github.com/milaan9/11_Python_Matplotlib_Module)**\n", "</i></small></small>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Matplotlib\n", "\n", "**[Matplotlib](https://matplotlib.org/)** is a Python 2D plotting library that produces high-quality charts and figures, which helps us visualize extensive data to understand better. Pandas is a handy and useful data-structure tool for analyzing large and complex data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Necessary Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:01.238492Z", "start_time": "2021-07-04T12:44:00.049041Z" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic Graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:48.486380Z", "start_time": "2021-07-04T12:30:46.481499Z" } }, "outputs": [], "source": [ "x = [0,1,2,3,4]\n", "y = [0,2,4,6,8]\n", "\n", "# Resize your Graph (dpi specifies pixels per inch. When saving probably should use 300 if possible)\n", "plt.figure(figsize=(8,5), dpi=100)\n", "\n", "# Line 1\n", "\n", "# Keyword Argument Notation\n", "#plt.plot(x,y, label='2x', color='red', linewidth=2, marker='.', linestyle='--', markersize=10, markeredgecolor='blue')\n", "\n", "# Shorthand notation\n", "# fmt = '[color][marker][line]'\n", "plt.plot(x,y, 'b^--', label='2x')\n", "\n", "## Line 2\n", "\n", "# select interval we want to plot points at\n", "x2 = np.arange(0,4.5,0.5)\n", "\n", "# Plot part of the graph as line\n", "plt.plot(x2[:6], x2[:6]**2, 'r', label='X^2')\n", "\n", "# Plot remainder of graph as a dot\n", "plt.plot(x2[5:], x2[5:]**2, 'r--')\n", "\n", "# Add a title (specify font parameters with fontdict)\n", "plt.title('Our First Graph!', fontdict={'fontname': 'Comic Sans MS', 'fontsize': 20})\n", "\n", "# X and Y labels\n", "plt.xlabel('X Axis')\n", "plt.ylabel('Y Axis')\n", "\n", "# X, Y axis Tickmarks (scale of your graph)\n", "plt.xticks([0,1,2,3,4,])\n", "#plt.yticks([0,2,4,6,8,10])\n", "\n", "# Add a legend\n", "plt.legend()\n", "\n", "# Save figure (dpi 300 is good when saving so graph has high resolution)\n", "plt.savefig('mygraph.png', dpi=300)\n", "\n", "# Show plot\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The line plot graph should look like this:\n", "<div>\n", "<img src=\"img/ex1_1.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bar Chart" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:49.327203Z", "start_time": "2021-07-04T12:30:48.497612Z" } }, "outputs": [], "source": [ "labels = ['A', 'B', 'C']\n", "values = [1,4,2]\n", "\n", "plt.figure(figsize=(5,3), dpi=100)\n", "\n", "bars = plt.bar(labels, values)\n", "\n", "patterns = ['/', 'O', '*']\n", "for bar in bars:\n", " bar.set_hatch(patterns.pop(0))\n", "\n", "plt.savefig('barchart.png', dpi=300)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The line bar chart should look like this:\n", "<div>\n", "<img src=\"img/ex1_2.png\" width=\"500\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Real World Examples\n", "\n", "Download datasets from my Github: \n", "1. **[gas_prices.csv](https://github.com/milaan9/11_Python_Matplotlib_Module/blob/main/gas_prices.csv)** \n", "2. **[fifa_data.csv](https://github.com/milaan9/11_Python_Matplotlib_Module/blob/main/fifa_data.csv)**\n", "3. **[iris_data.csv](https://github.com/milaan9/11_Python_Matplotlib_Module/blob/main/iris_data.csv)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Line Graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:50.709523Z", "start_time": "2021-07-04T12:30:49.336967Z" } }, "outputs": [], "source": [ "gas = pd.read_csv('gas_prices.csv')\n", "\n", "plt.figure(figsize=(8,5))\n", "\n", "plt.title('Gas Prices over Time (in USD)', fontdict={'fontweight':'bold', 'fontsize': 18})\n", "\n", "plt.plot(gas.Year, gas.USA, 'b.-', label='United States')\n", "plt.plot(gas.Year, gas.Canada, 'r.-')\n", "plt.plot(gas.Year, gas['South Korea'], 'g.-')\n", "plt.plot(gas.Year, gas.Australia, 'y.-')\n", "\n", "# Another Way to plot many values!\n", "# countries_to_look_at = ['Australia', 'USA', 'Canada', 'South Korea']\n", "# for country in gas:\n", "# if country in countries_to_look_at:\n", "# plt.plot(gas.Year, gas[country], marker='.')\n", "\n", "plt.xticks(gas.Year[::3].tolist()+[2011])\n", "\n", "plt.xlabel('Year')\n", "plt.ylabel('US Dollars')\n", "\n", "plt.legend()\n", "\n", "plt.savefig('Gas_price_figure.png', dpi=300)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The line graph should look like this:\n", "<div>\n", "<img src=\"img/ex1_3.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Fifa Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:11.002158Z", "start_time": "2021-07-04T12:44:10.222864Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Unnamed: 0</th>\n", " <th>ID</th>\n", " <th>Name</th>\n", " <th>Age</th>\n", " <th>Photo</th>\n", " <th>Nationality</th>\n", " <th>Flag</th>\n", " <th>Overall</th>\n", " <th>Potential</th>\n", " <th>Club</th>\n", " <th>...</th>\n", " <th>Composure</th>\n", " <th>Marking</th>\n", " <th>StandingTackle</th>\n", " <th>SlidingTackle</th>\n", " <th>GKDiving</th>\n", " <th>GKHandling</th>\n", " <th>GKKicking</th>\n", " <th>GKPositioning</th>\n", " <th>GKReflexes</th>\n", " <th>Release Clause</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>0</td>\n", " <td>158023</td>\n", " <td>L. Messi</td>\n", " <td>31</td>\n", " <td>https://cdn.sofifa.org/players/4/19/158023.png</td>\n", " <td>Argentina</td>\n", " <td>https://cdn.sofifa.org/flags/52.png</td>\n", " <td>94</td>\n", " <td>94</td>\n", " <td>FC Barcelona</td>\n", " <td>...</td>\n", " <td>96.0</td>\n", " <td>33.0</td>\n", " <td>28.0</td>\n", " <td>26.0</td>\n", " <td>6.0</td>\n", " <td>11.0</td>\n", " <td>15.0</td>\n", " <td>14.0</td>\n", " <td>8.0</td>\n", " <td>€226.5M</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>1</td>\n", " <td>20801</td>\n", " <td>Cristiano Ronaldo</td>\n", " <td>33</td>\n", " <td>https://cdn.sofifa.org/players/4/19/20801.png</td>\n", " <td>Portugal</td>\n", " <td>https://cdn.sofifa.org/flags/38.png</td>\n", " <td>94</td>\n", " <td>94</td>\n", " <td>Juventus</td>\n", " <td>...</td>\n", " <td>95.0</td>\n", " <td>28.0</td>\n", " <td>31.0</td>\n", " <td>23.0</td>\n", " <td>7.0</td>\n", " <td>11.0</td>\n", " <td>15.0</td>\n", " <td>14.0</td>\n", " <td>11.0</td>\n", " <td>€127.1M</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2</td>\n", " <td>190871</td>\n", " <td>Neymar Jr</td>\n", " <td>26</td>\n", " <td>https://cdn.sofifa.org/players/4/19/190871.png</td>\n", " <td>Brazil</td>\n", " <td>https://cdn.sofifa.org/flags/54.png</td>\n", " <td>92</td>\n", " <td>93</td>\n", " <td>Paris Saint-Germain</td>\n", " <td>...</td>\n", " <td>94.0</td>\n", " <td>27.0</td>\n", " <td>24.0</td>\n", " <td>33.0</td>\n", " <td>9.0</td>\n", " <td>9.0</td>\n", " <td>15.0</td>\n", " <td>15.0</td>\n", " <td>11.0</td>\n", " <td>€228.1M</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>3</td>\n", " <td>193080</td>\n", " <td>De Gea</td>\n", " <td>27</td>\n", " <td>https://cdn.sofifa.org/players/4/19/193080.png</td>\n", " <td>Spain</td>\n", " <td>https://cdn.sofifa.org/flags/45.png</td>\n", " <td>91</td>\n", " <td>93</td>\n", " <td>Manchester United</td>\n", " <td>...</td>\n", " <td>68.0</td>\n", " <td>15.0</td>\n", " <td>21.0</td>\n", " <td>13.0</td>\n", " <td>90.0</td>\n", " <td>85.0</td>\n", " <td>87.0</td>\n", " <td>88.0</td>\n", " <td>94.0</td>\n", " <td>€138.6M</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>4</td>\n", " <td>192985</td>\n", " <td>K. De Bruyne</td>\n", " <td>27</td>\n", " <td>https://cdn.sofifa.org/players/4/19/192985.png</td>\n", " <td>Belgium</td>\n", " <td>https://cdn.sofifa.org/flags/7.png</td>\n", " <td>91</td>\n", " <td>92</td>\n", " <td>Manchester City</td>\n", " <td>...</td>\n", " <td>88.0</td>\n", " <td>68.0</td>\n", " <td>58.0</td>\n", " <td>51.0</td>\n", " <td>15.0</td>\n", " <td>13.0</td>\n", " <td>5.0</td>\n", " <td>10.0</td>\n", " <td>13.0</td>\n", " <td>€196.4M</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 89 columns</p>\n", "</div>" ], "text/plain": [ " Unnamed: 0 ID Name Age \\\n", "0 0 158023 L. Messi 31 \n", "1 1 20801 Cristiano Ronaldo 33 \n", "2 2 190871 Neymar Jr 26 \n", "3 3 193080 De Gea 27 \n", "4 4 192985 K. De Bruyne 27 \n", "\n", " Photo Nationality \\\n", "0 https://cdn.sofifa.org/players/4/19/158023.png Argentina \n", "1 https://cdn.sofifa.org/players/4/19/20801.png Portugal \n", "2 https://cdn.sofifa.org/players/4/19/190871.png Brazil \n", "3 https://cdn.sofifa.org/players/4/19/193080.png Spain \n", "4 https://cdn.sofifa.org/players/4/19/192985.png Belgium \n", "\n", " Flag Overall Potential \\\n", "0 https://cdn.sofifa.org/flags/52.png 94 94 \n", "1 https://cdn.sofifa.org/flags/38.png 94 94 \n", "2 https://cdn.sofifa.org/flags/54.png 92 93 \n", "3 https://cdn.sofifa.org/flags/45.png 91 93 \n", "4 https://cdn.sofifa.org/flags/7.png 91 92 \n", "\n", " Club ... Composure Marking StandingTackle SlidingTackle \\\n", "0 FC Barcelona ... 96.0 33.0 28.0 26.0 \n", "1 Juventus ... 95.0 28.0 31.0 23.0 \n", "2 Paris Saint-Germain ... 94.0 27.0 24.0 33.0 \n", "3 Manchester United ... 68.0 15.0 21.0 13.0 \n", "4 Manchester City ... 88.0 68.0 58.0 51.0 \n", "\n", " GKDiving GKHandling GKKicking GKPositioning GKReflexes Release Clause \n", "0 6.0 11.0 15.0 14.0 8.0 €226.5M \n", "1 7.0 11.0 15.0 14.0 11.0 €127.1M \n", "2 9.0 9.0 15.0 15.0 11.0 €228.1M \n", "3 90.0 85.0 87.0 88.0 94.0 €138.6M \n", "4 15.0 13.0 5.0 10.0 13.0 €196.4M \n", "\n", "[5 rows x 89 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fifa = pd.read_csv('fifa_data.csv')\n", "\n", "fifa.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Histogram" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:53.151909Z", "start_time": "2021-07-04T12:30:51.834036Z" } }, "outputs": [], "source": [ "bins = [40,50,60,70,80,90,100]\n", "\n", "plt.figure(figsize=(8,5))\n", "\n", "plt.hist(fifa.Overall, bins=bins, color='#abcdef')\n", "\n", "plt.xticks(bins)\n", "\n", "plt.ylabel('Number of Players')\n", "plt.xlabel('Skill Level')\n", "plt.title('Distribution of Player Skills in FIFA 2018')\n", "\n", "plt.savefig('histogram.png', dpi=300)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The histogram should look like this:\n", "<div>\n", "<img src=\"img/ex1_4.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pie Chart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pie Chart #1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:53.536672Z", "start_time": "2021-07-04T12:30:53.161675Z" } }, "outputs": [], "source": [ "left = fifa.loc[fifa['Preferred Foot'] == 'Left'].count()[0]\n", "right = fifa.loc[fifa['Preferred Foot'] == 'Right'].count()[0]\n", "\n", "plt.figure(figsize=(8,5))\n", "\n", "labels = ['Left', 'Right']\n", "colors = ['#abcdef', '#aabbcc']\n", "\n", "plt.pie([left, right], labels = labels, colors=colors, autopct='%.2f %%')\n", "\n", "plt.title('Foot Preference of FIFA Players')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The piechart should look like this:\n", "<div>\n", "<img src=\"img/ex1_5.png\" width=\"400\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pie Chart #2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:54.220755Z", "start_time": "2021-07-04T12:30:53.542534Z" } }, "outputs": [], "source": [ "plt.figure(figsize=(8,5), dpi=100)\n", "\n", "plt.style.use('ggplot')\n", "\n", "fifa.Weight = [int(x.strip('lbs')) if type(x)==str else x for x in fifa.Weight]\n", "\n", "light = fifa.loc[fifa.Weight < 125].count()[0]\n", "light_medium = fifa[(fifa.Weight >= 125) & (fifa.Weight < 150)].count()[0]\n", "medium = fifa[(fifa.Weight >= 150) & (fifa.Weight < 175)].count()[0]\n", "medium_heavy = fifa[(fifa.Weight >= 175) & (fifa.Weight < 200)].count()[0]\n", "heavy = fifa[fifa.Weight >= 200].count()[0]\n", "\n", "weights = [light,light_medium, medium, medium_heavy, heavy]\n", "label = ['under 125', '125-150', '150-175', '175-200', 'over 200']\n", "explode = (.4,.2,0,0,.4)\n", "\n", "plt.title('Weight of Professional Soccer Players (lbs)')\n", "\n", "plt.pie(weights, labels=label, explode=explode, pctdistance=0.8,autopct='%.2f %%')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The piechart should look like this:\n", "<div>\n", "<img src=\"img/ex1_6.png\" width=\"500\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pie Chart #3" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:38.377634Z", "start_time": "2021-07-04T12:44:38.353221Z" } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Id</th>\n", " <th>SepalLengthCm</th>\n", " <th>SepalWidthCm</th>\n", " <th>PetalLengthCm</th>\n", " <th>PetalWidthCm</th>\n", " <th>Species</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>5.1</td>\n", " <td>3.5</td>\n", " <td>1.4</td>\n", " <td>0.2</td>\n", " <td>Iris-setosa</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>4.9</td>\n", " <td>3.0</td>\n", " <td>1.4</td>\n", " <td>0.2</td>\n", " <td>Iris-setosa</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>3</td>\n", " <td>4.7</td>\n", " <td>3.2</td>\n", " <td>1.3</td>\n", " <td>0.2</td>\n", " <td>Iris-setosa</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>4</td>\n", " <td>4.6</td>\n", " <td>3.1</td>\n", " <td>1.5</td>\n", " <td>0.2</td>\n", " <td>Iris-setosa</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>5</td>\n", " <td>5.0</td>\n", " <td>3.6</td>\n", " <td>1.4</td>\n", " <td>0.2</td>\n", " <td>Iris-setosa</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species\n", "0 1 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 2 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 3 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5 5.0 3.6 1.4 0.2 Iris-setosa" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "data = pd.read_csv(\"iris_data.csv\")\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:57.738331Z", "start_time": "2021-07-04T12:30:54.329154Z" } }, "outputs": [], "source": [ "SepalLength = data['SepalLengthCm'].value_counts()\n", "\n", "# Plot a pie chart\n", "%matplotlib inline\n", "from matplotlib import pyplot as plt\n", "SepalLength.plot(kind='pie', title='Sepal Length', figsize=(9,9))\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The piechart should look like this:\n", "<div>\n", "<img src=\"img/ex1_7.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Box and Whiskers Chart\n", "\n", "A box and whisker plot(box plot) *displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Box plot #1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:58.134328Z", "start_time": "2021-07-04T12:30:57.745170Z" } }, "outputs": [], "source": [ "plt.figure(figsize=(5,8), dpi=100)\n", "\n", "plt.style.use('default')\n", "\n", "barcelona = fifa.loc[fifa.Club == \"FC Barcelona\"]['Overall']\n", "madrid = fifa.loc[fifa.Club == \"Real Madrid\"]['Overall']\n", "revs = fifa.loc[fifa.Club == \"New England Revolution\"]['Overall']\n", "\n", "#bp = plt.boxplot([barcelona, madrid, revs], labels=['a','b','c'], boxprops=dict(facecolor='red'))\n", "bp = plt.boxplot([barcelona, madrid, revs], labels=['FC Barcelona','Real Madrid','NE Revolution'], patch_artist=True, medianprops={'linewidth': 2})\n", "\n", "plt.title('Professional Soccer Team Comparison')\n", "plt.ylabel('FIFA Overall Rating')\n", "\n", "for box in bp['boxes']:\n", " # change outline color\n", " box.set(color='#4286f4', linewidth=2)\n", " # change fill color\n", " box.set(facecolor = '#e0e0e0' )\n", " # change hatch\n", " #box.set(hatch = '/')\n", " \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The box plot should look like this:\n", "<div>\n", "<img src=\"img/ex1_8.png\" width=\"400\"/>\n", "</div>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Box plot #2" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:46.874699Z", "start_time": "2021-07-04T12:44:46.856146Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Name Salary Hours Grade\n", "0 John 60000 41 50\n", "1 Rad 64000 40 50\n", "2 Var 60000 36 46\n", "3 Mathew 289000 30 95\n", "4 Alina 66000 35 50\n", "5 Lee 50000 39 5\n", "6 Rogers 60000 40 57\n" ] } ], "source": [ "#cateating data\n", "import pandas as pd\n", "df = pd.DataFrame({'Name': ['John', 'Rad', 'Var', 'Mathew', 'Alina', 'Lee', 'Rogers'],\n", " 'Salary':[60000,64000,60000,289000,66000,50000,60000],\n", " 'Hours':[41,40,36,30,35,39,40],\n", " 'Grade':[50,50,46,95,50,5,57]})\n", "print(df)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:53.560730Z", "start_time": "2021-07-04T12:44:53.537293Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.25 35.5\n", "0.50 39.0\n", "0.75 40.0\n", "Name: Hours, dtype: float64\n" ] } ], "source": [ "# Quartiles of Hours\n", "print(df['Hours'].quantile([0.25, 0.5, 0.75]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:58.755910Z", "start_time": "2021-07-04T12:30:58.341361Z" } }, "outputs": [], "source": [ "# Plot a box-whisker chart\n", "import matplotlib.pyplot as plt\n", "df['Hours'].plot(kind='box', title='Weekly Hours Distribution', figsize=(10,8))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The box plot should look like this:\n", "<div>\n", "<img src=\"img/ex1_9.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:44:57.581721Z", "start_time": "2021-07-04T12:44:57.561215Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.25 60000.0\n", "0.50 60000.0\n", "0.75 65000.0\n", "Name: Salary, dtype: float64\n" ] } ], "source": [ "# Quartiles of Salary\n", "print(df['Salary'].quantile([0.25, 0.5, 0.75]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2021-07-04T12:30:59.252980Z", "start_time": "2021-07-04T12:30:58.796927Z" } }, "outputs": [], "source": [ "# Plot a box-whisker chart\n", "df['Salary'].plot(kind='box', title='Salary Distribution', figsize=(10,8))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The box plot should look like this:\n", "<div>\n", "<img src=\"img/ex1_10.png\" width=\"600\"/>\n", "</div>" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }