Created
May 16, 2022 06:54
-
-
Save jirislav/2603780a3aebe1028ce23c870f2c5fd0 to your computer and use it in GitHub Desktop.
colab-introduction.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "colab-introduction.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/jirislav/2603780a3aebe1028ce23c870f2c5fd0/colab-introduction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Welcome to Google Colab notebook!\n", | |
"\n", | |
"Colab is an extension of Jupyter notebooks that allows its users to run their code directly from a browser." | |
], | |
"metadata": { | |
"id": "QwmcCdc-qVnq" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"print('Hello, wolrd!')" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "E0gWtX1NqxXO", | |
"outputId": "7f3d0524-4ec1-4215-a072-5bcecc6c0930" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Hello, wolrd!\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"1 + 2 + 3" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "B1xysoVpq5jA", | |
"outputId": "b5c0f0c4-5b95-40fa-feb0-a827c769af47" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"6" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 15 | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"names = ['Tomas', 'Eliska', 'Honza']\n", | |
"for name in names:\n", | |
" print(f'Hello, {name}!')" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "9R2kzpRaryMh", | |
"outputId": "7cb770ac-97bb-4d2d-b37c-faa2f3c3bafd" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Hello, Tomas!\n", | |
"Hello, Eliska!\n", | |
"Hello, Honza!\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### A simple warmup task\n", | |
"\n", | |
"One of the first benchmark tasks to measure performance of different system, mostly in distributed environments.\n", | |
"The objective is to compute the count of occurrences for each word in a set of documents.\n", | |
"\n", | |
"In order to get to know Colab better, we'll try to compute a word count on a single file." | |
], | |
"metadata": { | |
"id": "C8lhJoS7sclQ" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"#### Load a text file\n", | |
"\n", | |
"Load a `big-data-wiki.txt` file and split it into words. Make it simple and use a single space as a separator." | |
], | |
"metadata": { | |
"id": "V9KBv1Ctt2Y3" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!test -f big-data-wiki.txt || wget https://github.com/seznam/IT-akademie-bigdata/tree/main/big-data/data/big-data-wiki.txt\n", | |
"!test -d big-data-wiki.txt\n", | |
"!ls -l" | |
], | |
"metadata": { | |
"id": "0n_MFSUToGUP" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Here you can write your code" | |
], | |
"metadata": { | |
"id": "0M8_29ZYoSFc" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"#### Compute the word count\n", | |
"\n", | |
"Create a map with all the words from the text and number of their occurrences. Print out the most used words with their counts." | |
], | |
"metadata": { | |
"id": "r2p-PCehu68G" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Here you can write your code" | |
], | |
"metadata": { | |
"id": "uBMWfeo1oVDE" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment