Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save PeterKjeldsen/55651fbde839d1c72b7e9da907b18a1a to your computer and use it in GitHub Desktop.
Save PeterKjeldsen/55651fbde839d1c72b7e9da907b18a1a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "<center>\n <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n</center>\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# **Collecting Job Data Using APIs**\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Estimated time needed: **45 to 60** minutes\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Objectives\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "After completing this lab, you will be able to:\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "- Collect job data from GitHub Jobs API\n- Store the collected data into an excel spreadsheet. \n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Warm-Up Exercise\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Before you attempt the actual lab, here is a fully solved warmup exercise that will help you to learn how to access an API.\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Using an API, let us find out who currently are on the International Space Station (ISS).<br> The API at [http://api.open-notify.org/astros.json](http://api.open-notify.org/astros.json?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) gives us the information of astronauts currently on ISS in json format.<br>\nYou can read more about this API at [http://open-notify.org/Open-Notify-API/People-In-Space/](http://open-notify.org/Open-Notify-API/People-In-Space?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "import requests # you need this module to make an API call",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "api_url = \"http://api.open-notify.org/astros.json\" # this url gives use the astronaut data",
"execution_count": 2,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "response = requests.get(api_url) # Call the API using the get method and store the\n # output of the API call in a variable called response.",
"execution_count": 3,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "if response.ok: # if all is well() no errors, no network timeouts)\n data = response.json() # store the result in json format in a variable called data\n # the variable data is of type dictionary.",
"execution_count": 4,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "print(data) # print the data just to check the output or for debugging",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "{'message': 'success', 'number': 7, 'people': [{'craft': 'ISS', 'name': 'Sergey Ryzhikov'}, {'craft': 'ISS', 'name': 'Kate Rubins'}, {'craft': 'ISS', 'name': 'Sergey Kud-Sverchkov'}, {'craft': 'ISS', 'name': 'Mike Hopkins'}, {'craft': 'ISS', 'name': 'Victor Glover'}, {'craft': 'ISS', 'name': 'Shannon Walker'}, {'craft': 'ISS', 'name': 'Soichi Noguchi'}]}\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Print the number of astronauts currently on ISS.\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "print(data.get('number'))",
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": "7\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Print the names of the astronauts currently on ISS.\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "astronauts = data.get('people')\nprint(\"There are {} astronauts on ISS\".format(len(astronauts)))\nprint(\"And their names are :\")\nfor astronaut in astronauts:\n print(astronaut.get('name'))",
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": "There are 7 astronauts on ISS\nAnd their names are :\nSergey Ryzhikov\nKate Rubins\nSergey Kud-Sverchkov\nMike Hopkins\nVictor Glover\nShannon Walker\nSoichi Noguchi\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Hope the warmup was helpful. Good luck with your next lab!\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Lab: Collect Jobs Data using GitHub Jobs API\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Before you start doing this lab, get familier with the GitHub Jobs API.<br>\nThe documentation for the GitHub Jobs API can be found at <https://jobs.github.com/api><br>\n\n<li>Understand what urls to use.<br>\n<li>Understand what parameters have to be passed.<br>\n<li>Understand the format of the output data.</li>\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Objective: Determine the number of jobs currently open for various technologies\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Collect the number of job postings for the following languages using the API:\n\n- C\n- C#\n- C++\n- Java\n- JavaScript\n- Python\n- Scala\n- Oracle\n- SQL Server\n- MySQL Server\n- PostgreSQL\n- MongoDB\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "#Import required libraries\nimport requests",
"execution_count": 8,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "baseurl = \"https://jobs.github.com/positions.json\"",
"execution_count": 9,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Write a function to get the number of jobs for the given technology.<br>\n_Note:_ The API gives a maximum of 50 jobs per page.<br>\nIf you get 50 jobs per page, it means there could be some more job listings available.<br>\nSo if you get 50 jobs per page you should make another API call for next page to check for more jobs.<br>\nIf you get less than 50 jobs per page, you can take it as the final count.<br>\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "response = requests.get(baseurl)\nif response.ok: # if all is well() no errors, no network timeouts)\n data = response.json()",
"execution_count": 10,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "#print(data) # print the data just to check the output or for debugging #Disabled before printout",
"execution_count": 11,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "#Function to get the number of jobs for the given technology\ndef get_number_of_jobs(technology):\n number_of_jobs = 0\n #your code goes here\n pa=0\n param = {'description' : technology,'page' : pa}\n r=requests.get(baseurl, param)\n number_of_jobs = len(r.json())\n while len(r.json()) == 50:\n pa += 1\n param = {'description' : technology,'page' : pa}\n r = requests.get(baseurl, param)\n number_of_jobs = number_of_jobs + len(r.json())\n return technology,number_of_jobs\n",
"execution_count": 12,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Call the function for Python and check if it is working.\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "print (get_number_of_jobs('Python'))",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"text": "('Python', 118)\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Store the results in an excel file\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Call the API for all the given technologies above and write the results in an excel spreadsheet.\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "If you do not know how create excel file using python, double click here for **hints**.\n\n<!--\n# import Workbook class from module openpyxl\nwb=Workbook() # create a workbook object\nws=wb.active # use the active worksheet\nws.append(['Country','Continent']) # add a row with two columns 'Country' and 'Continent'\nws.append(['Eygpt','Africa']) # add a row with two columns 'Egypt' and 'Africa'\nws.append(['India','Asia']) # add another row\nws.append(['France','Europe']) # add another row\nwb.save(\"countries.xlsx\") # save the workbook into a file called countries.xlsx\n\n\n-->\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Create a python list of all technologies for which you need to find the number of jobs postings.\n"
},
{
"metadata": {
"scrolled": true
},
"cell_type": "code",
"source": "#your code goes here\ntechno_list = ['C', 'C#', 'C++', 'Java', 'JavaScript', 'Python', 'Scala', 'Oracle', 'SQL Server', 'MySQL Server', 'PostgreSQL', 'MongoDB']\ntechno_list",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/plain": "['C',\n 'C#',\n 'C++',\n 'Java',\n 'JavaScript',\n 'Python',\n 'Scala',\n 'Oracle',\n 'SQL Server',\n 'MySQL Server',\n 'PostgreSQL',\n 'MongoDB']"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Import libraries required to create excel spreadsheet\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\n!pip install openpyxl\n",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Create a workbook and select the active worksheet\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\nimport openpyxl\nwb=0\nwb=openpyxl.Workbook() \nws=wb.active",
"execution_count": 39,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Find the number of jobs postings for each of the technology in the above list.\nWrite the technology name and the number of jobs postings into the excel spreadsheet.\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "#your code goes here\nfor i in techno_list:\n jobs=get_number_of_jobs(i)\n ws.append(jobs)",
"execution_count": 40,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Save into an excel spreadsheet named 'github-job-postings.xlsx'.\n"
},
{
"metadata": {},
"cell_type": "code",
"source": "#your code goes here\nwb.save(\"github-job-postings.xlsx\")",
"execution_count": 41,
"outputs": []
},
{
"metadata": {},
"cell_type": "code",
"source": "import os\nimport pandas as pd\nfilename=\"/home/wsuser/work/github-job-postings.xlsx\"\ndf=pd.read_excel(filename)\ndf.columns = [\"Technologies\", \"Postings\"]\ndf2=df.sort_values('Postings', ascending=False)\ndf2.head(10)",
"execution_count": 56,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 56,
"data": {
"text/plain": " Technologies Postings\n2 Java 187\n3 JavaScript 153\n4 Python 118\n5 Scala 106\n7 SQL Server 45\n0 C# 32\n8 MySQL Server 20\n1 C++ 19\n9 PostgreSQL 19\n6 Oracle 12",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Technologies</th>\n <th>Postings</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>Java</td>\n <td>187</td>\n </tr>\n <tr>\n <th>3</th>\n <td>JavaScript</td>\n <td>153</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Python</td>\n <td>118</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Scala</td>\n <td>106</td>\n </tr>\n <tr>\n <th>7</th>\n <td>SQL Server</td>\n <td>45</td>\n </tr>\n <tr>\n <th>0</th>\n <td>C#</td>\n <td>32</td>\n </tr>\n <tr>\n <th>8</th>\n <td>MySQL Server</td>\n <td>20</td>\n </tr>\n <tr>\n <th>1</th>\n <td>C++</td>\n <td>19</td>\n </tr>\n <tr>\n <th>9</th>\n <td>PostgreSQL</td>\n <td>19</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Oracle</td>\n <td>12</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "code",
"source": "ax=df2.plot(kind = 'barh', x = \"Technologies\", y = \"Postings\", figsize=(8, 4), color = 'g' )\nax.set_title('Number of Job Postings for the Various Technologies')\n",
"execution_count": 62,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 62,
"data": {
"text/plain": "Text(0.5, 1.0, 'Number of Job Postings for the Various Technologies')"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 576x288 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Authors\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Ramesh Sannareddy\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Other Contributors\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Rav Ahuja\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Change Log\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n| ----------------- | ------- | ----------------- | ---------------------------------- |\n| 2020-10-17 | 0.1 | Ramesh Sannareddy | Created initial version of the lab |\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": " Copyright \u00a9 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBM-DA0321EN-SkillsNetwork-21426264&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ).\n"
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.7.10",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment