Skip to content

Instantly share code, notes, and snippets.

@reina137
Created March 23, 2019 04:57
Show Gist options
  • Save reina137/49a988e569dc9451893d6332ee38c1ee to your computer and use it in GitHub Desktop.
Save reina137/49a988e569dc9451893d6332ee38c1ee to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Segmenting and Clustering Neighborhoods in Toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.\n",
"\n",
"Start by creating a new Notebook for this assignment.\n",
"Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done!\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"#json tools\n",
"import json\n",
"from pandas.io.json import json_normalize\n",
"\n",
"#scraping\n",
"import requests\n",
"from urllib.request import urlopen\n",
"from bs4 import BeautifulSoup\n",
"\n",
"#geocoders\n",
"from geopy.geocoders import Nominatim\n",
"\n",
"#visualization libraries\n",
"import matplotlib.cm as cm\n",
"import matplotlib.colors as colors\n",
"import folium\n",
"\n",
"#kmeans clustering\n",
"from sklearn.cluster import KMeans\n",
"\n",
"print('Done!')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 1 - Scraping Wikipedia page, creating Pandas DF, cleaning data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using BeautifulSoup and URLopen libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"wlink = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'\n",
"raw_page = urlopen(wlink).read().decode('utf-8')\n",
"page = BeautifulSoup(raw_page, 'html.parser')\n",
"table = page.body.table.tbody"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, transforming the table data to Pandas Dataframe"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#functions for getting cell and row data\n",
"\n",
"def table_cell(i):\n",
" cells = i.find_all('td')\n",
" row = []\n",
" \n",
" for cell in cells:\n",
" if cell.a: \n",
" if (cell.a.text):\n",
" row.append(cell.a.text)\n",
" continue\n",
" row.append(cell.string.strip())\n",
" \n",
" return row\n",
"\n",
"def table_row(): \n",
" data = [] \n",
" \n",
" for tr in table.find_all('tr'):\n",
" row = table_cell(tr)\n",
" if len(row) != 3:\n",
" continue\n",
" data.append(row) \n",
" \n",
" return data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M2A</td>\n",
" <td>Not assigned</td>\n",
" <td>Not assigned</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M3A</td>\n",
" <td>North York</td>\n",
" <td>Parkwoods</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M4A</td>\n",
" <td>North York</td>\n",
" <td>Victoria Village</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1A Not assigned Not assigned\n",
"1 M2A Not assigned Not assigned\n",
"2 M3A North York Parkwoods\n",
"3 M4A North York Victoria Village\n",
"4 M5A Downtown Toronto Harbourfront"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#writing into pandas dataframe\n",
"data = table_row()\n",
"columns = ['Postcode', 'Borough', 'Neighbourhood']\n",
"df = pd.DataFrame(data, columns=columns)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cleaning the data:\n",
"\n",
"1. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.\n",
"2. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.\n",
"3. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.\n",
"4. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.\n",
"5. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Port Union</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1B Scarborough Rouge\n",
"1 M1B Scarborough Malvern\n",
"2 M1C Scarborough Highland Creek\n",
"3 M1C Scarborough Rouge Hill\n",
"4 M1C Scarborough Port Union"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#dropping the \"Not Assigned\" borough\n",
"df1 = df[df.Borough != 'Not assigned']\n",
"df1 = df1.sort_values(by=['Postcode','Borough'])\n",
"\n",
"df1.reset_index(inplace=True)\n",
"df1.drop('index',axis=1,inplace=True)\n",
"df1.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge,Malvern</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek,Rouge Hill,Port Union</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood,Morningside,West Hill</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood\n",
"0 M1B Scarborough Rouge,Malvern\n",
"1 M1C Scarborough Highland Creek,Rouge Hill,Port Union\n",
"2 M1E Scarborough Guildwood,Morningside,West Hill\n",
"3 M1G Scarborough Woburn\n",
"4 M1H Scarborough Cedarbrae"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Consolidating the neighbourhoods that share the postcode\n",
"\n",
"df_postcodes = df1['Postcode']\n",
"df_postcodes.drop_duplicates(inplace=True)\n",
"df2 = pd.DataFrame(df_postcodes)\n",
"df2['Borough'] = '';\n",
"df2['Neighbourhood'] = '';\n",
"\n",
"\n",
"df2.reset_index(inplace=True)\n",
"df2.drop('index', axis=1, inplace=True)\n",
"df1.reset_index(inplace=True)\n",
"df1.drop('index', axis=1, inplace=True)\n",
"\n",
"for i in df2.index:\n",
" for j in df1.index:\n",
" if df2.iloc[i, 0] == df1.iloc[j, 0]:\n",
" df2.iloc[i, 1] = df1.iloc[j, 1]\n",
" df2.iloc[i, 2] = df2.iloc[i, 2] + ',' + df1.iloc[j, 2]\n",
" \n",
"for i in df2.index:\n",
" s = df2.iloc[i, 2]\n",
" if s[0] == ',':\n",
" s =s [1:]\n",
" df2.iloc[i,2 ] = s\n",
" \n",
"df2.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(103, 3)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Checking dataframe shape\n",
"df2.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 2 - Get Coordinates\n",
"\n",
"Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.\n",
"\n",
"Using the provided Geospatial_Coordinates.csv file to get the coordinates:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"#reading the file to coord dataframe\n",
"df2['Latitude'] = '0';\n",
"df2['Longitude'] = '0';\n",
"\n",
"coord = pd.read_csv('https://cocl.us/Geospatial_data')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M1B</td>\n",
" <td>Scarborough</td>\n",
" <td>Rouge,Malvern</td>\n",
" <td>43.8067</td>\n",
" <td>-79.1944</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M1C</td>\n",
" <td>Scarborough</td>\n",
" <td>Highland Creek,Rouge Hill,Port Union</td>\n",
" <td>43.7845</td>\n",
" <td>-79.1605</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M1E</td>\n",
" <td>Scarborough</td>\n",
" <td>Guildwood,Morningside,West Hill</td>\n",
" <td>43.7636</td>\n",
" <td>-79.1887</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M1G</td>\n",
" <td>Scarborough</td>\n",
" <td>Woburn</td>\n",
" <td>43.771</td>\n",
" <td>-79.2169</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M1H</td>\n",
" <td>Scarborough</td>\n",
" <td>Cedarbrae</td>\n",
" <td>43.7731</td>\n",
" <td>-79.2395</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude \\\n",
"0 M1B Scarborough Rouge,Malvern 43.8067 \n",
"1 M1C Scarborough Highland Creek,Rouge Hill,Port Union 43.7845 \n",
"2 M1E Scarborough Guildwood,Morningside,West Hill 43.7636 \n",
"3 M1G Scarborough Woburn 43.771 \n",
"4 M1H Scarborough Cedarbrae 43.7731 \n",
"\n",
" Longitude \n",
"0 -79.1944 \n",
"1 -79.1605 \n",
"2 -79.1887 \n",
"3 -79.2169 \n",
"4 -79.2395 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#merging dataframe that contain coordinates with the one that contains borough names\n",
"for i in df2.index:\n",
" for j in coord.index:\n",
" if df2.iloc[i, 0] == coord.iloc[j, 0]:\n",
" df2.iloc[i, 3] = coord.iloc[j, 1]\n",
" df2.iloc[i, 4] = coord.iloc[j, 2]\n",
"\n",
"#checking the results \n",
"df2.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Task 3 - Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.\n",
"\n",
"Just make sure:\n",
"\n",
"1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.\n",
"2. to generate maps to visualize your neighborhoods and how they cluster together."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 Select only the neighbourhoods of Downtown Toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Choose the neighbourhoods that contain word \" Downtown Toronto\":"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M4W</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Rosedale</td>\n",
" <td>43.6796</td>\n",
" <td>-79.3775</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M4X</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Cabbagetown,St. James Town</td>\n",
" <td>43.668</td>\n",
" <td>-79.3677</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M4Y</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Church and Wellesley</td>\n",
" <td>43.6659</td>\n",
" <td>-79.3832</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront,Regent Park</td>\n",
" <td>43.6543</td>\n",
" <td>-79.3606</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5B</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Ryerson,Garden District</td>\n",
" <td>43.6572</td>\n",
" <td>-79.3789</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude Longitude\n",
"0 M4W Downtown Toronto Rosedale 43.6796 -79.3775\n",
"1 M4X Downtown Toronto Cabbagetown,St. James Town 43.668 -79.3677\n",
"2 M4Y Downtown Toronto Church and Wellesley 43.6659 -79.3832\n",
"3 M5A Downtown Toronto Harbourfront,Regent Park 43.6543 -79.3606\n",
"4 M5B Downtown Toronto Ryerson,Garden District 43.6572 -79.3789"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toronto = df2[df2['Borough'] == 'Downtown Toronto'].reset_index(drop=True)\n",
"toronto.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The geograpical coordinates of Toronto are 43.653963, -79.387207.\n"
]
}
],
"source": [
"#get the coordinates for Toronto\n",
"address = 'Toronto, Canada'\n",
"\n",
"geolocator = Nominatim(user_agent=\"ny_explorer\")\n",
"location = geolocator.geocode(address)\n",
"latitude = location.latitude\n",
"longitude = location.longitude\n",
"print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7fdb92f34438>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#create the Folium map of Downtown Toronto\n",
"map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)\n",
"\n",
"# add markers to map\n",
"for lat, lng, label in zip(toronto['Latitude'], toronto['Longitude'], toronto['Neighbourhood']):\n",
" label = folium.Popup(label, parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lng],\n",
" radius=5,\n",
" popup=label,\n",
" color='blue',\n",
" fill=True,\n",
" fill_color='#3186cc',\n",
" fill_opacity=0.7,\n",
" parse_html=False).add_to(map_toronto) \n",
" \n",
"map_toronto"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 Utilizing the Foursquare API to get top 100 venues in Downtown Toronto"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"#set credintials\n",
"CLIENT_ID = 'IPTYUZQHVW5OCDTT331BXA1SFQCJ3QCNQ2NVFZHQI5M4ZJLY' # your Foursquare ID\n",
"CLIENT_SECRET = '4ARF5SHATZIHJ2FJURJBFIUBZWYKR0UZ4FP5XHAGRE4BCZJ1' # your Foursquare Secret\n",
"VERSION = '20190323' # Foursquare API version"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Borrowing the function from the lab to get Top 100 venues in Downtown Toronto within a radius of 500m:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def getNearbyVenues(names, latitudes, longitudes, radius=500):\n",
" \n",
" venues_list=[]\n",
" for name, lat, lng in zip(names, latitudes, longitudes):\n",
" print(name)\n",
" \n",
" # create the API request URL\n",
" url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(\n",
" CLIENT_ID, \n",
" CLIENT_SECRET, \n",
" VERSION, \n",
" lat, \n",
" lng, \n",
" radius, \n",
" LIMIT)\n",
" \n",
" # make the GET request\n",
" results = requests.get(url).json()[\"response\"]['groups'][0]['items']\n",
" \n",
" # return only relevant information for each nearby venue\n",
" venues_list.append([(\n",
" name, \n",
" lat, \n",
" lng, \n",
" v['venue']['name'], \n",
" v['venue']['location']['lat'], \n",
" v['venue']['location']['lng'], \n",
" v['venue']['categories'][0]['name']) for v in results])\n",
"\n",
" nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])\n",
" nearby_venues.columns = ['Neighborhood', \n",
" 'Neighborhood Latitude', \n",
" 'Neighborhood Longitude', \n",
" 'Venue', \n",
" 'Venue Latitude', \n",
" 'Venue Longitude', \n",
" 'Venue Category']\n",
" \n",
" return(nearby_venues)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues.\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rosedale\n",
"Cabbagetown,St. James Town\n",
"Church and Wellesley\n",
"Harbourfront,Regent Park\n",
"Ryerson,Garden District\n",
"St. James Town\n",
"Berczy Park\n",
"Central Bay Street\n",
"Adelaide,King,Richmond\n",
"Harbourfront East,Toronto Islands,Union Station\n",
"Design Exchange,Toronto Dominion Centre\n",
"Commerce Court,Victoria Hotel\n",
"Harbord,University of Toronto\n",
"Chinatown,Grange Park,Kensington Market\n",
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara\n",
"Stn A PO Boxes 25 The Esplanade\n",
"First Canadian Place,Underground city\n",
"Christie\n"
]
}
],
"source": [
"LIMIT = 100 # limit of number of venues returned by Foursquare API\n",
"\n",
"downtown_venues = getNearbyVenues(names=toronto['Neighbourhood'],\n",
" latitudes=toronto['Latitude'],\n",
" longitudes=toronto['Longitude']\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1273, 7)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#checking the size of venues dataframe\n",
"downtown_venues.shape"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 207 unique categories.\n"
]
}
],
"source": [
"#checking how many unique categories of venues are there\n",
"print('There are {} unique categories.'.format(len(downtown_venues['Venue Category'].unique())))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 Analyze each neighbourhood"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Yoga Studio</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>American Restaurant</th>\n",
" <th>...</th>\n",
" <th>Toy / Game Store</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wine Shop</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 207 columns</p>\n",
"</div>"
],
"text/plain": [
" Yoga Studio Accessories Store Afghan Restaurant Airport \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Food Court Airport Gate Airport Lounge Airport Service \\\n",
"0 0 0 0 0 \n",
"1 0 0 0 0 \n",
"2 0 0 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"\n",
" Airport Terminal American Restaurant ... Toy / Game Store Trail \\\n",
"0 0 0 ... 0 0 \n",
"1 0 0 ... 0 0 \n",
"2 0 0 ... 0 0 \n",
"3 0 0 ... 0 1 \n",
"4 0 0 ... 0 0 \n",
"\n",
" Train Station Vegetarian / Vegan Restaurant Video Game Store \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"\n",
" Vietnamese Restaurant Wine Bar Wine Shop Wings Joint Women's Store \n",
"0 0 0 0 0 0 \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"\n",
"[5 rows x 207 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# one hot encoding\n",
"dt_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix=\"\", prefix_sep=\"\")\n",
"\n",
"# add neighborhood column back to dataframe\n",
"dt_onehot['Neighborhood'] = downtown_venues['Neighborhood'] \n",
"\n",
"# move neighborhood column to the first column\n",
"fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])\n",
"dt_onehot = dt_onehot[fixed_columns]\n",
"\n",
"dt_onehot.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1273, 207)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#checking the dataframe size\n",
"dt_onehot.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Gouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>Yoga Studio</th>\n",
" <th>Accessories Store</th>\n",
" <th>Afghan Restaurant</th>\n",
" <th>Airport</th>\n",
" <th>Airport Food Court</th>\n",
" <th>Airport Gate</th>\n",
" <th>Airport Lounge</th>\n",
" <th>Airport Service</th>\n",
" <th>Airport Terminal</th>\n",
" <th>...</th>\n",
" <th>Toy / Game Store</th>\n",
" <th>Trail</th>\n",
" <th>Train Station</th>\n",
" <th>Vegetarian / Vegan Restaurant</th>\n",
" <th>Video Game Store</th>\n",
" <th>Vietnamese Restaurant</th>\n",
" <th>Wine Bar</th>\n",
" <th>Wine Shop</th>\n",
" <th>Wings Joint</th>\n",
" <th>Women's Store</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide,King,Richmond</td>\n",
" <td>0.000000</td>\n",
" <td>0.01</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Berczy Park</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CN Tower,Bathurst Quay,Island airport,Harbourf...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.076923</td>\n",
" <td>0.076923</td>\n",
" <td>0.076923</td>\n",
" <td>0.153846</td>\n",
" <td>0.153846</td>\n",
" <td>0.153846</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Cabbagetown,St. James Town</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Central Bay Street</td>\n",
" <td>0.012821</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.012821</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.012821</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Chinatown,Grange Park,Kensington Market</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.010204</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.051020</td>\n",
" <td>0.000000</td>\n",
" <td>0.051020</td>\n",
" <td>0.010204</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Christie</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Church and Wellesley</td>\n",
" <td>0.012195</td>\n",
" <td>0.00</td>\n",
" <td>0.012195</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.012195</td>\n",
" <td>0.012195</td>\n",
" <td>0.000000</td>\n",
" <td>0.012195</td>\n",
" <td>0.012195</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Commerce Court,Victoria Hotel</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Design Exchange,Toronto Dominion Centre</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.01</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>First Canadian Place,Underground city</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.01</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Harbord,University of Toronto</td>\n",
" <td>0.027778</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.027778</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Harbourfront East,Toronto Islands,Union Station</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.01</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Harbourfront,Regent Park</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Rosedale</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.25</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Ryerson,Garden District</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.010000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.010000</td>\n",
" <td>0.010000</td>\n",
" <td>0.010000</td>\n",
" <td>0.010000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>St. James Town</td>\n",
" <td>0.000000</td>\n",
" <td>0.01</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.010000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Stn A PO Boxes 25 The Esplanade</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>18 rows × 207 columns</p>\n",
"</div>"
],
"text/plain": [
" Neighborhood Yoga Studio \\\n",
"0 Adelaide,King,Richmond 0.000000 \n",
"1 Berczy Park 0.000000 \n",
"2 CN Tower,Bathurst Quay,Island airport,Harbourf... 0.000000 \n",
"3 Cabbagetown,St. James Town 0.000000 \n",
"4 Central Bay Street 0.012821 \n",
"5 Chinatown,Grange Park,Kensington Market 0.000000 \n",
"6 Christie 0.000000 \n",
"7 Church and Wellesley 0.012195 \n",
"8 Commerce Court,Victoria Hotel 0.000000 \n",
"9 Design Exchange,Toronto Dominion Centre 0.000000 \n",
"10 First Canadian Place,Underground city 0.000000 \n",
"11 Harbord,University of Toronto 0.027778 \n",
"12 Harbourfront East,Toronto Islands,Union Station 0.000000 \n",
"13 Harbourfront,Regent Park 0.000000 \n",
"14 Rosedale 0.000000 \n",
"15 Ryerson,Garden District 0.000000 \n",
"16 St. James Town 0.000000 \n",
"17 Stn A PO Boxes 25 The Esplanade 0.000000 \n",
"\n",
" Accessories Store Afghan Restaurant Airport Airport Food Court \\\n",
"0 0.01 0.000000 0.000000 0.000000 \n",
"1 0.00 0.000000 0.000000 0.000000 \n",
"2 0.00 0.000000 0.076923 0.076923 \n",
"3 0.00 0.000000 0.000000 0.000000 \n",
"4 0.00 0.000000 0.000000 0.000000 \n",
"5 0.00 0.000000 0.000000 0.000000 \n",
"6 0.00 0.000000 0.000000 0.000000 \n",
"7 0.00 0.012195 0.000000 0.000000 \n",
"8 0.00 0.000000 0.000000 0.000000 \n",
"9 0.00 0.000000 0.000000 0.000000 \n",
"10 0.00 0.000000 0.000000 0.000000 \n",
"11 0.00 0.000000 0.000000 0.000000 \n",
"12 0.00 0.000000 0.000000 0.000000 \n",
"13 0.00 0.000000 0.000000 0.000000 \n",
"14 0.00 0.000000 0.000000 0.000000 \n",
"15 0.00 0.000000 0.000000 0.000000 \n",
"16 0.01 0.000000 0.000000 0.000000 \n",
"17 0.00 0.000000 0.000000 0.000000 \n",
"\n",
" Airport Gate Airport Lounge Airport Service Airport Terminal ... \\\n",
"0 0.000000 0.000000 0.000000 0.000000 ... \n",
"1 0.000000 0.000000 0.000000 0.000000 ... \n",
"2 0.076923 0.153846 0.153846 0.153846 ... \n",
"3 0.000000 0.000000 0.000000 0.000000 ... \n",
"4 0.000000 0.000000 0.000000 0.000000 ... \n",
"5 0.000000 0.000000 0.000000 0.000000 ... \n",
"6 0.000000 0.000000 0.000000 0.000000 ... \n",
"7 0.000000 0.000000 0.000000 0.000000 ... \n",
"8 0.000000 0.000000 0.000000 0.000000 ... \n",
"9 0.000000 0.000000 0.000000 0.000000 ... \n",
"10 0.000000 0.000000 0.000000 0.000000 ... \n",
"11 0.000000 0.000000 0.000000 0.000000 ... \n",
"12 0.000000 0.000000 0.000000 0.000000 ... \n",
"13 0.000000 0.000000 0.000000 0.000000 ... \n",
"14 0.000000 0.000000 0.000000 0.000000 ... \n",
"15 0.000000 0.000000 0.000000 0.000000 ... \n",
"16 0.000000 0.000000 0.000000 0.000000 ... \n",
"17 0.000000 0.000000 0.000000 0.000000 ... \n",
"\n",
" Toy / Game Store Trail Train Station Vegetarian / Vegan Restaurant \\\n",
"0 0.000000 0.00 0.00 0.010000 \n",
"1 0.000000 0.00 0.00 0.000000 \n",
"2 0.000000 0.00 0.00 0.000000 \n",
"3 0.000000 0.00 0.00 0.000000 \n",
"4 0.000000 0.00 0.00 0.012821 \n",
"5 0.010204 0.00 0.00 0.051020 \n",
"6 0.000000 0.00 0.00 0.000000 \n",
"7 0.000000 0.00 0.00 0.000000 \n",
"8 0.000000 0.00 0.00 0.000000 \n",
"9 0.000000 0.00 0.01 0.010000 \n",
"10 0.000000 0.00 0.01 0.010000 \n",
"11 0.000000 0.00 0.00 0.000000 \n",
"12 0.000000 0.00 0.01 0.010000 \n",
"13 0.000000 0.00 0.00 0.000000 \n",
"14 0.000000 0.25 0.00 0.000000 \n",
"15 0.010000 0.00 0.00 0.010000 \n",
"16 0.000000 0.00 0.00 0.000000 \n",
"17 0.000000 0.00 0.00 0.000000 \n",
"\n",
" Video Game Store Vietnamese Restaurant Wine Bar Wine Shop Wings Joint \\\n",
"0 0.000000 0.000000 0.010000 0.000000 0.000000 \n",
"1 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"2 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"3 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"4 0.000000 0.000000 0.012821 0.000000 0.000000 \n",
"5 0.000000 0.051020 0.010204 0.000000 0.000000 \n",
"6 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"7 0.012195 0.012195 0.000000 0.012195 0.012195 \n",
"8 0.000000 0.000000 0.010000 0.000000 0.000000 \n",
"9 0.000000 0.000000 0.010000 0.000000 0.000000 \n",
"10 0.000000 0.000000 0.010000 0.000000 0.000000 \n",
"11 0.027778 0.000000 0.000000 0.000000 0.000000 \n",
"12 0.000000 0.000000 0.010000 0.000000 0.000000 \n",
"13 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"14 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"15 0.010000 0.010000 0.010000 0.000000 0.000000 \n",
"16 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"17 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"\n",
" Women's Store \n",
"0 0.010000 \n",
"1 0.000000 \n",
"2 0.000000 \n",
"3 0.000000 \n",
"4 0.000000 \n",
"5 0.010204 \n",
"6 0.000000 \n",
"7 0.000000 \n",
"8 0.000000 \n",
"9 0.000000 \n",
"10 0.000000 \n",
"11 0.000000 \n",
"12 0.000000 \n",
"13 0.000000 \n",
"14 0.000000 \n",
"15 0.000000 \n",
"16 0.010000 \n",
"17 0.000000 \n",
"\n",
"[18 rows x 207 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()\n",
"dt_grouped"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(18, 207)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#checking the grouped dataframe size\n",
"dt_grouped.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Printing out each neighborhood along with the top 5 most common venues in it:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"----Adelaide,King,Richmond----\n",
" venue freq\n",
"0 Coffee Shop 0.06\n",
"1 Bar 0.04\n",
"2 Thai Restaurant 0.04\n",
"3 Steakhouse 0.04\n",
"4 Café 0.04\n",
"\n",
"\n",
"----Berczy Park----\n",
" venue freq\n",
"0 Coffee Shop 0.08\n",
"1 Restaurant 0.05\n",
"2 Cocktail Bar 0.05\n",
"3 Italian Restaurant 0.03\n",
"4 Steakhouse 0.03\n",
"\n",
"\n",
"----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----\n",
" venue freq\n",
"0 Airport Lounge 0.15\n",
"1 Airport Terminal 0.15\n",
"2 Airport Service 0.15\n",
"3 Boat or Ferry 0.08\n",
"4 Harbor / Marina 0.08\n",
"\n",
"\n",
"----Cabbagetown,St. James Town----\n",
" venue freq\n",
"0 Coffee Shop 0.09\n",
"1 Restaurant 0.07\n",
"2 Market 0.05\n",
"3 Café 0.05\n",
"4 Bakery 0.05\n",
"\n",
"\n",
"----Central Bay Street----\n",
" venue freq\n",
"0 Coffee Shop 0.17\n",
"1 Café 0.06\n",
"2 Italian Restaurant 0.05\n",
"3 Burger Joint 0.04\n",
"4 Bar 0.04\n",
"\n",
"\n",
"----Chinatown,Grange Park,Kensington Market----\n",
" venue freq\n",
"0 Café 0.07\n",
"1 Bar 0.05\n",
"2 Vietnamese Restaurant 0.05\n",
"3 Vegetarian / Vegan Restaurant 0.05\n",
"4 Coffee Shop 0.04\n",
"\n",
"\n",
"----Christie----\n",
" venue freq\n",
"0 Café 0.20\n",
"1 Grocery Store 0.20\n",
"2 Park 0.13\n",
"3 Restaurant 0.07\n",
"4 Convenience Store 0.07\n",
"\n",
"\n",
"----Church and Wellesley----\n",
" venue freq\n",
"0 Coffee Shop 0.07\n",
"1 Japanese Restaurant 0.07\n",
"2 Gay Bar 0.04\n",
"3 Restaurant 0.04\n",
"4 Burger Joint 0.04\n",
"\n",
"\n",
"----Commerce Court,Victoria Hotel----\n",
" venue freq\n",
"0 Coffee Shop 0.13\n",
"1 Café 0.06\n",
"2 Hotel 0.06\n",
"3 American Restaurant 0.04\n",
"4 Restaurant 0.04\n",
"\n",
"\n",
"----Design Exchange,Toronto Dominion Centre----\n",
" venue freq\n",
"0 Coffee Shop 0.13\n",
"1 Café 0.08\n",
"2 Hotel 0.07\n",
"3 American Restaurant 0.04\n",
"4 Italian Restaurant 0.03\n",
"\n",
"\n",
"----First Canadian Place,Underground city----\n",
" venue freq\n",
"0 Coffee Shop 0.08\n",
"1 Café 0.07\n",
"2 Hotel 0.06\n",
"3 Steakhouse 0.04\n",
"4 American Restaurant 0.03\n",
"\n",
"\n",
"----Harbord,University of Toronto----\n",
" venue freq\n",
"0 Café 0.17\n",
"1 Japanese Restaurant 0.06\n",
"2 Bakery 0.06\n",
"3 Restaurant 0.06\n",
"4 Bookstore 0.06\n",
"\n",
"\n",
"----Harbourfront East,Toronto Islands,Union Station----\n",
" venue freq\n",
"0 Coffee Shop 0.13\n",
"1 Aquarium 0.05\n",
"2 Hotel 0.05\n",
"3 Italian Restaurant 0.04\n",
"4 Café 0.04\n",
"\n",
"\n",
"----Harbourfront,Regent Park----\n",
" venue freq\n",
"0 Coffee Shop 0.16\n",
"1 Café 0.08\n",
"2 Bakery 0.06\n",
"3 Park 0.06\n",
"4 Pub 0.06\n",
"\n",
"\n",
"----Rosedale----\n",
" venue freq\n",
"0 Park 0.50\n",
"1 Playground 0.25\n",
"2 Trail 0.25\n",
"3 Yoga Studio 0.00\n",
"4 Movie Theater 0.00\n",
"\n",
"\n",
"----Ryerson,Garden District----\n",
" venue freq\n",
"0 Coffee Shop 0.08\n",
"1 Clothing Store 0.07\n",
"2 Café 0.04\n",
"3 Cosmetics Shop 0.04\n",
"4 Middle Eastern Restaurant 0.03\n",
"\n",
"\n",
"----St. James Town----\n",
" venue freq\n",
"0 Coffee Shop 0.07\n",
"1 Restaurant 0.06\n",
"2 Café 0.05\n",
"3 Hotel 0.05\n",
"4 Breakfast Spot 0.04\n",
"\n",
"\n",
"----Stn A PO Boxes 25 The Esplanade----\n",
" venue freq\n",
"0 Coffee Shop 0.09\n",
"1 Restaurant 0.05\n",
"2 Café 0.04\n",
"3 Hotel 0.03\n",
"4 Cocktail Bar 0.03\n",
"\n",
"\n"
]
}
],
"source": [
"num_top_venues = 5\n",
"\n",
"for hood in dt_grouped['Neighborhood']:\n",
" print(\"----\"+hood+\"----\")\n",
" temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()\n",
" temp.columns = ['venue','freq']\n",
" temp = temp.iloc[1:]\n",
" temp['freq'] = temp['freq'].astype(float)\n",
" temp = temp.round({'freq': 2})\n",
" print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))\n",
" print('\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Converting the results to Pandas dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"#function to sort the venues in descending order:\n",
"\n",
"def return_most_common_venues(row, num_top_venues):\n",
" row_categories = row.iloc[1:]\n",
" row_categories_sorted = row_categories.sort_values(ascending=False)\n",
" \n",
" return row_categories_sorted.index.values[0:num_top_venues]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Neighborhood</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adelaide,King,Richmond</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Steakhouse</td>\n",
" <td>Bar</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Café</td>\n",
" <td>American Restaurant</td>\n",
" <td>Hotel</td>\n",
" <td>Asian Restaurant</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Bakery</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Berczy Park</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Cocktail Bar</td>\n",
" <td>Restaurant</td>\n",
" <td>Café</td>\n",
" <td>Pub</td>\n",
" <td>Farmers Market</td>\n",
" <td>Bakery</td>\n",
" <td>Seafood Restaurant</td>\n",
" <td>Steakhouse</td>\n",
" <td>Cheese Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CN Tower,Bathurst Quay,Island airport,Harbourf...</td>\n",
" <td>Airport Lounge</td>\n",
" <td>Airport Terminal</td>\n",
" <td>Airport Service</td>\n",
" <td>Harbor / Marina</td>\n",
" <td>Sculpture Garden</td>\n",
" <td>Boutique</td>\n",
" <td>Boat or Ferry</td>\n",
" <td>Airport Gate</td>\n",
" <td>Airport</td>\n",
" <td>Airport Food Court</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Cabbagetown,St. James Town</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Park</td>\n",
" <td>Pub</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Café</td>\n",
" <td>Market</td>\n",
" <td>Pizza Place</td>\n",
" <td>Pet Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Central Bay Street</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Burger Joint</td>\n",
" <td>Bar</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Spa</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Salad Place</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Neighborhood 1st Most Common Venue \\\n",
"0 Adelaide,King,Richmond Coffee Shop \n",
"1 Berczy Park Coffee Shop \n",
"2 CN Tower,Bathurst Quay,Island airport,Harbourf... Airport Lounge \n",
"3 Cabbagetown,St. James Town Coffee Shop \n",
"4 Central Bay Street Coffee Shop \n",
"\n",
" 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue \\\n",
"0 Steakhouse Bar Thai Restaurant \n",
"1 Cocktail Bar Restaurant Café \n",
"2 Airport Terminal Airport Service Harbor / Marina \n",
"3 Restaurant Park Pub \n",
"4 Café Italian Restaurant Burger Joint \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"0 Café American Restaurant Hotel \n",
"1 Pub Farmers Market Bakery \n",
"2 Sculpture Garden Boutique Boat or Ferry \n",
"3 Italian Restaurant Bakery Café \n",
"4 Bar Sushi Restaurant Bubble Tea Shop \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"0 Asian Restaurant Sushi Restaurant Bakery \n",
"1 Seafood Restaurant Steakhouse Cheese Shop \n",
"2 Airport Gate Airport Airport Food Court \n",
"3 Market Pizza Place Pet Store \n",
"4 Spa Chinese Restaurant Salad Place "
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#create the new dataframe and display the top 10 venues for each neighborhood:\n",
"\n",
"num_top_venues = 10\n",
"\n",
"indicators = ['st', 'nd', 'rd']\n",
"\n",
"# create columns according to number of top venues\n",
"columns = ['Neighborhood']\n",
"for ind in np.arange(num_top_venues):\n",
" try:\n",
" columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))\n",
" except:\n",
" columns.append('{}th Most Common Venue'.format(ind+1))\n",
"\n",
"# create a new dataframe\n",
"neighborhoods_venues_sorted = pd.DataFrame(columns=columns)\n",
"neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']\n",
"\n",
"for ind in np.arange(dt_grouped.shape[0]):\n",
" neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)\n",
"\n",
"neighborhoods_venues_sorted.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.4 Cluster Neighborhoods\n",
"Run k-means to cluster the neighborhood into 5 clusters."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 2, 0, 0, 0, 3, 0, 0, 0], dtype=int32)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# set number of clusters\n",
"kclusters = 4\n",
"\n",
"dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)\n",
"\n",
"# run k-means clustering\n",
"kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)\n",
"\n",
"# check cluster labels generated for each row in the dataframe\n",
"kmeans.labels_[0:10] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Postcode</th>\n",
" <th>Borough</th>\n",
" <th>Neighbourhood</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M4W</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Rosedale</td>\n",
" <td>43.6796</td>\n",
" <td>-79.3775</td>\n",
" <td>1</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Trail</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Electronics Store</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dog Run</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M4X</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Cabbagetown,St. James Town</td>\n",
" <td>43.668</td>\n",
" <td>-79.3677</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Park</td>\n",
" <td>Pub</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Café</td>\n",
" <td>Market</td>\n",
" <td>Pizza Place</td>\n",
" <td>Pet Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M4Y</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Church and Wellesley</td>\n",
" <td>43.6659</td>\n",
" <td>-79.3832</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Gay Bar</td>\n",
" <td>Burger Joint</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Men's Store</td>\n",
" <td>Café</td>\n",
" <td>Mediterranean Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M5A</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Harbourfront,Regent Park</td>\n",
" <td>43.6543</td>\n",
" <td>-79.3606</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Bakery</td>\n",
" <td>Pub</td>\n",
" <td>Park</td>\n",
" <td>Theater</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Health Food Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M5B</td>\n",
" <td>Downtown Toronto</td>\n",
" <td>Ryerson,Garden District</td>\n",
" <td>43.6572</td>\n",
" <td>-79.3789</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Clothing Store</td>\n",
" <td>Café</td>\n",
" <td>Cosmetics Shop</td>\n",
" <td>Middle Eastern Restaurant</td>\n",
" <td>Ramen Restaurant</td>\n",
" <td>Tea Room</td>\n",
" <td>Lingerie Store</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Pizza Place</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Postcode Borough Neighbourhood Latitude Longitude \\\n",
"0 M4W Downtown Toronto Rosedale 43.6796 -79.3775 \n",
"1 M4X Downtown Toronto Cabbagetown,St. James Town 43.668 -79.3677 \n",
"2 M4Y Downtown Toronto Church and Wellesley 43.6659 -79.3832 \n",
"3 M5A Downtown Toronto Harbourfront,Regent Park 43.6543 -79.3606 \n",
"4 M5B Downtown Toronto Ryerson,Garden District 43.6572 -79.3789 \n",
"\n",
" Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue \\\n",
"0 1 Park Playground Trail \n",
"1 0 Coffee Shop Restaurant Park \n",
"2 0 Coffee Shop Japanese Restaurant Restaurant \n",
"3 0 Coffee Shop Café Bakery \n",
"4 0 Coffee Shop Clothing Store Café \n",
"\n",
" 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue \\\n",
"0 Deli / Bodega Electronics Store Dumpling Restaurant \n",
"1 Pub Italian Restaurant Bakery \n",
"2 Gay Bar Burger Joint Bubble Tea Shop \n",
"3 Pub Park Theater \n",
"4 Cosmetics Shop Middle Eastern Restaurant Ramen Restaurant \n",
"\n",
" 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue \\\n",
"0 Donut Shop Doner Restaurant Dog Run \n",
"1 Café Market Pizza Place \n",
"2 Sushi Restaurant Men's Store Café \n",
"3 Breakfast Spot Mexican Restaurant Restaurant \n",
"4 Tea Room Lingerie Store Bubble Tea Shop \n",
"\n",
" 10th Most Common Venue \n",
"0 Discount Store \n",
"1 Pet Store \n",
"2 Mediterranean Restaurant \n",
"3 Health Food Store \n",
"4 Pizza Place "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add clustering labels\n",
"neighborhoods_venues_sorted.insert(0, 'Labels', kmeans.labels_)\n",
"\n",
"dt_merged = toronto\n",
"\n",
"# merge downtown_grouped with toronto data to add latitude/longitude for each neighborhood\n",
"# I realized that I've misspelled the NeighboUrhood column name in Toronto dataframe. oops...\n",
"dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')\n",
"\n",
"dt_merged.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Visualize the resulting clusters on map:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"width:100%;\"><div style=\"position:relative;width:100%;height:0;padding-bottom:60%;\"><iframe src=\"data:text/html;charset=utf-8;base64,\" style=\"position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;\" allowfullscreen webkitallowfullscreen mozallowfullscreen></iframe></div></div>"
],
"text/plain": [
"<folium.folium.Map at 0x7fdb92dfd630>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create map\n",
"map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)\n",
"\n",
"# set color scheme for the clusters\n",
"x = np.arange(kclusters)\n",
"ys = [i + x + (i*x)**2 for i in range(kclusters)]\n",
"colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))\n",
"rainbow = [colors.rgb2hex(i) for i in colors_array]\n",
"\n",
"# add markers to the map\n",
"markers_colors = []\n",
"for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighbourhood'], dt_merged['Labels']):\n",
" label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)\n",
" folium.CircleMarker(\n",
" [lat, lon],\n",
" radius=9,\n",
" popup=label,\n",
" color=rainbow[cluster-1],\n",
" fill=True,\n",
" fill_color=rainbow[cluster-1],\n",
" fill_opacity=0.7).add_to(map_clusters)\n",
" \n",
"map_clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(I've attached the screenshot because the map is not rendered on GitHub)\n",
"\n",
"<img src=\"toronto_map.png\" >"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.5 Examine the clusters"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Park</td>\n",
" <td>Pub</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Café</td>\n",
" <td>Market</td>\n",
" <td>Pizza Place</td>\n",
" <td>Pet Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Gay Bar</td>\n",
" <td>Burger Joint</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Men's Store</td>\n",
" <td>Café</td>\n",
" <td>Mediterranean Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Bakery</td>\n",
" <td>Pub</td>\n",
" <td>Park</td>\n",
" <td>Theater</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Health Food Store</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Clothing Store</td>\n",
" <td>Café</td>\n",
" <td>Cosmetics Shop</td>\n",
" <td>Middle Eastern Restaurant</td>\n",
" <td>Ramen Restaurant</td>\n",
" <td>Tea Room</td>\n",
" <td>Lingerie Store</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Pizza Place</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Hotel</td>\n",
" <td>Café</td>\n",
" <td>Breakfast Spot</td>\n",
" <td>Clothing Store</td>\n",
" <td>Bakery</td>\n",
" <td>Cosmetics Shop</td>\n",
" <td>Gastropub</td>\n",
" <td>Gym</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Cocktail Bar</td>\n",
" <td>Restaurant</td>\n",
" <td>Café</td>\n",
" <td>Pub</td>\n",
" <td>Farmers Market</td>\n",
" <td>Bakery</td>\n",
" <td>Seafood Restaurant</td>\n",
" <td>Steakhouse</td>\n",
" <td>Cheese Shop</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Burger Joint</td>\n",
" <td>Bar</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Bubble Tea Shop</td>\n",
" <td>Spa</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Salad Place</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Steakhouse</td>\n",
" <td>Bar</td>\n",
" <td>Thai Restaurant</td>\n",
" <td>Café</td>\n",
" <td>American Restaurant</td>\n",
" <td>Hotel</td>\n",
" <td>Asian Restaurant</td>\n",
" <td>Sushi Restaurant</td>\n",
" <td>Bakery</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Hotel</td>\n",
" <td>Aquarium</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Café</td>\n",
" <td>Fried Chicken Joint</td>\n",
" <td>Brewery</td>\n",
" <td>Scenic Lookout</td>\n",
" <td>Pizza Place</td>\n",
" <td>Music Venue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Hotel</td>\n",
" <td>American Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Seafood Restaurant</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Gastropub</td>\n",
" <td>Bakery</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Hotel</td>\n",
" <td>American Restaurant</td>\n",
" <td>Restaurant</td>\n",
" <td>Gastropub</td>\n",
" <td>Gym</td>\n",
" <td>Seafood Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Steakhouse</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Café</td>\n",
" <td>Restaurant</td>\n",
" <td>Bookstore</td>\n",
" <td>Japanese Restaurant</td>\n",
" <td>Bar</td>\n",
" <td>Bakery</td>\n",
" <td>Yoga Studio</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Sandwich Place</td>\n",
" <td>Comfort Food Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Café</td>\n",
" <td>Vietnamese Restaurant</td>\n",
" <td>Bar</td>\n",
" <td>Vegetarian / Vegan Restaurant</td>\n",
" <td>Bakery</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Chinese Restaurant</td>\n",
" <td>Mexican Restaurant</td>\n",
" <td>Caribbean Restaurant</td>\n",
" <td>Comfort Food Restaurant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Restaurant</td>\n",
" <td>Café</td>\n",
" <td>Seafood Restaurant</td>\n",
" <td>Hotel</td>\n",
" <td>Pub</td>\n",
" <td>Cocktail Bar</td>\n",
" <td>Fast Food Restaurant</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Creperie</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>0</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Café</td>\n",
" <td>Hotel</td>\n",
" <td>Steakhouse</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Gastropub</td>\n",
" <td>Bakery</td>\n",
" <td>Bar</td>\n",
" <td>Burger Joint</td>\n",
" <td>Seafood Restaurant</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"1 Downtown Toronto 0 Coffee Shop Restaurant \n",
"2 Downtown Toronto 0 Coffee Shop Japanese Restaurant \n",
"3 Downtown Toronto 0 Coffee Shop Café \n",
"4 Downtown Toronto 0 Coffee Shop Clothing Store \n",
"5 Downtown Toronto 0 Coffee Shop Restaurant \n",
"6 Downtown Toronto 0 Coffee Shop Cocktail Bar \n",
"7 Downtown Toronto 0 Coffee Shop Café \n",
"8 Downtown Toronto 0 Coffee Shop Steakhouse \n",
"9 Downtown Toronto 0 Coffee Shop Hotel \n",
"10 Downtown Toronto 0 Coffee Shop Café \n",
"11 Downtown Toronto 0 Coffee Shop Café \n",
"12 Downtown Toronto 0 Café Restaurant \n",
"13 Downtown Toronto 0 Café Vietnamese Restaurant \n",
"15 Downtown Toronto 0 Coffee Shop Restaurant \n",
"16 Downtown Toronto 0 Coffee Shop Café \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue \\\n",
"1 Park Pub \n",
"2 Restaurant Gay Bar \n",
"3 Bakery Pub \n",
"4 Café Cosmetics Shop \n",
"5 Hotel Café \n",
"6 Restaurant Café \n",
"7 Italian Restaurant Burger Joint \n",
"8 Bar Thai Restaurant \n",
"9 Aquarium Italian Restaurant \n",
"10 Hotel American Restaurant \n",
"11 Hotel American Restaurant \n",
"12 Bookstore Japanese Restaurant \n",
"13 Bar Vegetarian / Vegan Restaurant \n",
"15 Café Seafood Restaurant \n",
"16 Hotel Steakhouse \n",
"\n",
" 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue \\\n",
"1 Italian Restaurant Bakery Café \n",
"2 Burger Joint Bubble Tea Shop Sushi Restaurant \n",
"3 Park Theater Breakfast Spot \n",
"4 Middle Eastern Restaurant Ramen Restaurant Tea Room \n",
"5 Breakfast Spot Clothing Store Bakery \n",
"6 Pub Farmers Market Bakery \n",
"7 Bar Sushi Restaurant Bubble Tea Shop \n",
"8 Café American Restaurant Hotel \n",
"9 Café Fried Chicken Joint Brewery \n",
"10 Restaurant Seafood Restaurant Italian Restaurant \n",
"11 Restaurant Gastropub Gym \n",
"12 Bar Bakery Yoga Studio \n",
"13 Bakery Coffee Shop Chinese Restaurant \n",
"15 Hotel Pub Cocktail Bar \n",
"16 Deli / Bodega Gastropub Bakery \n",
"\n",
" 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue \n",
"1 Market Pizza Place Pet Store \n",
"2 Men's Store Café Mediterranean Restaurant \n",
"3 Mexican Restaurant Restaurant Health Food Store \n",
"4 Lingerie Store Bubble Tea Shop Pizza Place \n",
"5 Cosmetics Shop Gastropub Gym \n",
"6 Seafood Restaurant Steakhouse Cheese Shop \n",
"7 Spa Chinese Restaurant Salad Place \n",
"8 Asian Restaurant Sushi Restaurant Bakery \n",
"9 Scenic Lookout Pizza Place Music Venue \n",
"10 Deli / Bodega Gastropub Bakery \n",
"11 Seafood Restaurant Bakery Steakhouse \n",
"12 Chinese Restaurant Sandwich Place Comfort Food Restaurant \n",
"13 Mexican Restaurant Caribbean Restaurant Comfort Food Restaurant \n",
"15 Fast Food Restaurant Italian Restaurant Creperie \n",
"16 Bar Burger Joint Seafood Restaurant "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Cluster 1\n",
"dt_merged.loc[dt_merged['Labels'] == 0, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>1</td>\n",
" <td>Park</td>\n",
" <td>Playground</td>\n",
" <td>Trail</td>\n",
" <td>Deli / Bodega</td>\n",
" <td>Electronics Store</td>\n",
" <td>Dumpling Restaurant</td>\n",
" <td>Donut Shop</td>\n",
" <td>Doner Restaurant</td>\n",
" <td>Dog Run</td>\n",
" <td>Discount Store</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"0 Downtown Toronto 1 Park Playground \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"0 Trail Deli / Bodega Electronics Store \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"0 Dumpling Restaurant Donut Shop Doner Restaurant \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"0 Dog Run Discount Store "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Cluster 2\n",
"dt_merged.loc[dt_merged['Labels'] == 1, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>2</td>\n",
" <td>Airport Lounge</td>\n",
" <td>Airport Terminal</td>\n",
" <td>Airport Service</td>\n",
" <td>Harbor / Marina</td>\n",
" <td>Sculpture Garden</td>\n",
" <td>Boutique</td>\n",
" <td>Boat or Ferry</td>\n",
" <td>Airport Gate</td>\n",
" <td>Airport</td>\n",
" <td>Airport Food Court</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"14 Downtown Toronto 2 Airport Lounge Airport Terminal \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"14 Airport Service Harbor / Marina Sculpture Garden \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"14 Boutique Boat or Ferry Airport Gate \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"14 Airport Airport Food Court "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Cluster 3\n",
"dt_merged.loc[dt_merged['Labels'] == 2, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Borough</th>\n",
" <th>Labels</th>\n",
" <th>1st Most Common Venue</th>\n",
" <th>2nd Most Common Venue</th>\n",
" <th>3rd Most Common Venue</th>\n",
" <th>4th Most Common Venue</th>\n",
" <th>5th Most Common Venue</th>\n",
" <th>6th Most Common Venue</th>\n",
" <th>7th Most Common Venue</th>\n",
" <th>8th Most Common Venue</th>\n",
" <th>9th Most Common Venue</th>\n",
" <th>10th Most Common Venue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Downtown Toronto</td>\n",
" <td>3</td>\n",
" <td>Grocery Store</td>\n",
" <td>Café</td>\n",
" <td>Park</td>\n",
" <td>Italian Restaurant</td>\n",
" <td>Coffee Shop</td>\n",
" <td>Convenience Store</td>\n",
" <td>Nightclub</td>\n",
" <td>Diner</td>\n",
" <td>Restaurant</td>\n",
" <td>Baby Store</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Borough Labels 1st Most Common Venue 2nd Most Common Venue \\\n",
"17 Downtown Toronto 3 Grocery Store Café \n",
"\n",
" 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue \\\n",
"17 Park Italian Restaurant Coffee Shop \n",
"\n",
" 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue \\\n",
"17 Convenience Store Nightclub Diner \n",
"\n",
" 9th Most Common Venue 10th Most Common Venue \n",
"17 Restaurant Baby Store "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Cluster 4\n",
"dt_merged.loc[dt_merged['Labels'] == 3, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.6 Conclusion:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As seen from the above dataframes corresponding to each cluster label, the following conclusions can be made:\n",
"\n",
"1. Cluster 1: the most common venue type is Coffee Shop, followed by restaurants and bars.\n",
"2. Cluster 2: the most common venue type is Park or Playground.\n",
"3. Cluster 3: the most common venue type is Airport Lounge.\n",
"4. Cluster 4: the most common venue type is Grocery Store.\n",
"\n",
"The most popular venue type in Downtown Toronto is Coffee Shop, containing 16 venues total."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment