Created
February 12, 2024 17:50
-
-
Save yohhaan/9a16a64295da2081976720c6aab1b88f to your computer and use it in GitHub Desktop.
Copy of the Demo of the Topics API for the Web hosted by Google on Colab: https://colab.research.google.com/drive/1hIVoz8bRCTpllYvads51MV7YS3zi3prn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Topics API: Model Execution Demo\n", | |
"\n", | |
"This colab makes it easy to load the [TensorFlow Lite](https://www.tensorflow.org/lite) model used by Chrome to infer topics from hostnames.\n", | |
"\n", | |
"Before running the steps below, upload the `.tflite` model file and the [override list](https://developer.chrome.com/docs/privacy-sandbox/topics/#where-can-i-find-the-current-classifier-model):\n", | |
"\n", | |
"1. Upload the `.tflite` file: locate the file on your computer, then click the folder icon at the left of this page, then click the upload icon.\n", | |
"\n", | |
"2. Upload the override list. This is in the same directory as the model file: the filename is `override_list.pb.gz`.\n", | |
"\n", | |
"[Access the tflite classifier model file](https://developer.chrome.com/docs/privacy-sandbox/topics/#access-tflite-file) provides more detailed instructions.\n", | |
"\n" | |
], | |
"metadata": { | |
"id": "jVPGDNBgGPtI" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Libraries, Override List and Taxonomy" | |
], | |
"metadata": { | |
"id": "Zso7BOh2xEc1" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!pip install tflite-support-nightly protobuf-compiler\n" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 1000 | |
}, | |
"id": "C_nC4A5KFzcF", | |
"outputId": "3677fafa-0f53-437c-e6d6-9184acc02980" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", | |
"Collecting tflite-support-nightly\n", | |
" Downloading tflite_support_nightly-0.4.0.dev20220816-cp37-cp37m-manylinux2014_x86_64.whl (60.2 MB)\n", | |
"\u001b[K |████████████████████████████████| 60.2 MB 1.2 MB/s \n", | |
"\u001b[?25hCollecting protobuf-compiler\n", | |
" Downloading protobuf_compiler-1.0.20-py3-none-any.whl (8.6 kB)\n", | |
"Collecting protobuf<4,>=3.18.0\n", | |
" Downloading protobuf-3.20.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)\n", | |
"\u001b[K |████████████████████████████████| 1.0 MB 67.0 MB/s \n", | |
"\u001b[?25hCollecting sounddevice>=0.4.4\n", | |
" Downloading sounddevice-0.4.5-py3-none-any.whl (31 kB)\n", | |
"Collecting pybind11>=2.6.0\n", | |
" Downloading pybind11-2.10.0-py3-none-any.whl (213 kB)\n", | |
"\u001b[K |████████████████████████████████| 213 kB 70.7 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.7/dist-packages (from tflite-support-nightly) (1.21.6)\n", | |
"Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from tflite-support-nightly) (1.2.0)\n", | |
"Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.7/dist-packages (from tflite-support-nightly) (22.9.24)\n", | |
"Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.7/dist-packages (from sounddevice>=0.4.4->tflite-support-nightly) (1.15.1)\n", | |
"Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from CFFI>=1.0->sounddevice>=0.4.4->tflite-support-nightly) (2.21)\n", | |
"Collecting tqdm==4.31.1\n", | |
" Downloading tqdm-4.31.1-py2.py3-none-any.whl (48 kB)\n", | |
"\u001b[K |████████████████████████████████| 48 kB 6.1 MB/s \n", | |
"\u001b[?25hCollecting bleach==2.1.0\n", | |
" Downloading bleach-2.1-py2.py3-none-any.whl (27 kB)\n", | |
"Collecting grpcio==1.18.0\n", | |
" Downloading grpcio-1.18.0-cp37-cp37m-manylinux1_x86_64.whl (10.6 MB)\n", | |
"\u001b[K |████████████████████████████████| 10.6 MB 4.4 MB/s \n", | |
"\u001b[?25hCollecting termcolor==1.1.0\n", | |
" Downloading termcolor-1.1.0.tar.gz (3.9 kB)\n", | |
"Collecting colorama==0.3.3\n", | |
" Downloading colorama-0.3.3.tar.gz (22 kB)\n", | |
"Collecting grpcio-tools==1.18.0\n", | |
" Downloading grpcio_tools-1.18.0-cp37-cp37m-manylinux1_x86_64.whl (22.8 MB)\n", | |
"\u001b[K |████████████████████████████████| 22.8 MB 1.4 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: html5lib!=1.0b1,!=1.0b2,!=1.0b3,!=1.0b4,!=1.0b5,!=1.0b6,!=1.0b7,!=1.0b8,>=0.99999999pre in /usr/local/lib/python3.7/dist-packages (from bleach==2.1.0->protobuf-compiler) (1.0.1)\n", | |
"Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from bleach==2.1.0->protobuf-compiler) (1.15.0)\n", | |
"Requirement already satisfied: webencodings in /usr/local/lib/python3.7/dist-packages (from html5lib!=1.0b1,!=1.0b2,!=1.0b3,!=1.0b4,!=1.0b5,!=1.0b6,!=1.0b7,!=1.0b8,>=0.99999999pre->bleach==2.1.0->protobuf-compiler) (0.5.1)\n", | |
"Building wheels for collected packages: colorama, termcolor\n", | |
" Building wheel for colorama (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for colorama: filename=colorama-0.3.3-py3-none-any.whl size=14329 sha256=b501accd2af7319bec084421129d801686f659a4af04ab4242fd8b982f61ef1d\n", | |
" Stored in directory: /root/.cache/pip/wheels/ac/42/97/77eb85865f435ca81a91fe4c269471f5b4d50144344868f3b1\n", | |
" Building wheel for termcolor (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4848 sha256=e5599e9513d7b08cbb44e12026cf0812f8acdc0747cfcef21d3219343c87b319\n", | |
" Stored in directory: /root/.cache/pip/wheels/3f/e3/ec/8a8336ff196023622fbcb36de0c5a5c218cbb24111d1d4c7f2\n", | |
"Successfully built colorama termcolor\n", | |
"Installing collected packages: protobuf, grpcio, tqdm, termcolor, sounddevice, pybind11, grpcio-tools, colorama, bleach, tflite-support-nightly, protobuf-compiler\n", | |
" Attempting uninstall: protobuf\n", | |
" Found existing installation: protobuf 3.17.3\n", | |
" Uninstalling protobuf-3.17.3:\n", | |
" Successfully uninstalled protobuf-3.17.3\n", | |
" Attempting uninstall: grpcio\n", | |
" Found existing installation: grpcio 1.49.1\n", | |
" Uninstalling grpcio-1.49.1:\n", | |
" Successfully uninstalled grpcio-1.49.1\n", | |
" Attempting uninstall: tqdm\n", | |
" Found existing installation: tqdm 4.64.1\n", | |
" Uninstalling tqdm-4.64.1:\n", | |
" Successfully uninstalled tqdm-4.64.1\n", | |
" Attempting uninstall: termcolor\n", | |
" Found existing installation: termcolor 2.0.1\n", | |
" Uninstalling termcolor-2.0.1:\n", | |
" Successfully uninstalled termcolor-2.0.1\n", | |
" Attempting uninstall: bleach\n", | |
" Found existing installation: bleach 5.0.1\n", | |
" Uninstalling bleach-5.0.1:\n", | |
" Successfully uninstalled bleach-5.0.1\n", | |
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", | |
"tensorflow 2.8.2+zzzcolab20220929150707 requires grpcio<2.0,>=1.24.3, but you have grpcio 1.18.0 which is incompatible.\n", | |
"tensorflow 2.8.2+zzzcolab20220929150707 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.\n", | |
"tensorboard 2.8.0 requires grpcio>=1.24.3, but you have grpcio 1.18.0 which is incompatible.\n", | |
"spacy 3.4.1 requires tqdm<5.0.0,>=4.38.0, but you have tqdm 4.31.1 which is incompatible.\n", | |
"prophet 1.1.1 requires tqdm>=4.36.1, but you have tqdm 4.31.1 which is incompatible.\n", | |
"panel 0.12.1 requires tqdm>=4.48.0, but you have tqdm 4.31.1 which is incompatible.\u001b[0m\n", | |
"Successfully installed bleach-2.1 colorama-0.3.3 grpcio-1.18.0 grpcio-tools-1.18.0 protobuf-3.20.3 protobuf-compiler-1.0.20 pybind11-2.10.0 sounddevice-0.4.5 termcolor-1.1.0 tflite-support-nightly-0.4.0.dev20220816 tqdm-4.31.1\n" | |
] | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"application/vnd.colab-display-data+json": { | |
"pip_warning": { | |
"packages": [ | |
"google" | |
] | |
} | |
} | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Override List Loading/Proto Definition\n", | |
"\n", | |
"This is copied from chromium source: https://source.chromium.org/chromium/chromium/src/+/main:components/optimization_guide/proto/page_topics_override_list.proto;l=1?q=page_topics_override_list.proto&ss=chromium\n" | |
], | |
"metadata": { | |
"id": "1uZTJAwExKju" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"\n", | |
"proto_path = 'page_topics_override_list.proto'\n", | |
"\n", | |
"example_proto = b\"\"\"\n", | |
"syntax = \"proto2\";\n", | |
"\n", | |
"// The whole override list.\n", | |
"message PageTopicsOverrideList {\n", | |
" repeated PageTopicsOverrideEntry entries = 1;\n", | |
"}\n", | |
"\n", | |
"// An individual entry in the override list. |domain| is expected to be the\n", | |
"// exact string input that is otherwise passed to the TFLite model, with all\n", | |
"// needed pre-processing and/or cleaning done to it already.\n", | |
"message PageTopicsOverrideEntry {\n", | |
" optional string domain = 1;\n", | |
" optional AnnotatedPageTopics topics = 2;\n", | |
"}\n", | |
"\n", | |
"// The topic identifiers for the given domain.\n", | |
"message AnnotatedPageTopics {\n", | |
" repeated int32 topic_ids = 1;\n", | |
"}\n", | |
"\"\"\"\n", | |
"\n", | |
"with open(proto_path, 'wb') as f:\n", | |
" f.write(example_proto)\n", | |
"\n", | |
"!cd /tmp/\n", | |
"!protoc \"page_topics_override_list.proto\" --python_out=.\n", | |
"!rm \"override_list.pb\"\n", | |
"!gzip -dk \"override_list.pb.gz\"" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "4XjYTlrAxJTa", | |
"outputId": "438c18d2-be15-4f45-ee82-2d7774644050" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"rm: cannot remove 'override_list.pb': No such file or directory\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import page_topics_override_list_pb2\n", | |
"\n", | |
"# Read the existing address book.\n", | |
"override_list = page_topics_override_list_pb2.PageTopicsOverrideList()\n", | |
"try:\n", | |
" f = open(\"override_list.pb\", \"rb\")\n", | |
" override_list.ParseFromString(f.read())\n", | |
" f.close()\n", | |
"except IOError:\n", | |
" print(\"Could not open file.\")\n", | |
"\n", | |
"# Replaces a set of common domain characters with white space. See https://source.chromium.org/chromium/chromium/src/+/main:components/optimization_guide/core/page_topics_model_executor.cc;l=211?q=meaningless%20f:optimization_guide&ss=chromium\n", | |
"def process_domain(domain):\n", | |
" replace_chars = ['-', '_', '.', '+']\n", | |
" for rc in replace_chars:\n", | |
" domain = domain.replace(rc, \" \")\n", | |
" return domain\n", | |
"\n", | |
"def check_override_list(override_list, domain):\n", | |
" if override_list is None:\n", | |
" return None\n", | |
" for entry in override_list.entries:\n", | |
" if entry.domain == domain:\n", | |
" return entry.topics.topic_ids\n", | |
" return None\n" | |
], | |
"metadata": { | |
"id": "DQwHe_H9xRbl" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Taxonomy" | |
], | |
"metadata": { | |
"id": "tajaY11wxciK" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Taxonomy ids and names pulled from here: https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v1.md\n", | |
"\n", | |
"cat_map = {\n", | |
"1: '/Arts & Entertainment',\n", | |
"2: '/Arts & Entertainment/Acting & Theater',\n", | |
"3: '/Arts & Entertainment/Comics',\n", | |
"4: '/Arts & Entertainment/Concerts & Music Festivals',\n", | |
"5: '/Arts & Entertainment/Dance',\n", | |
"6: '/Arts & Entertainment/Entertainment Industry',\n", | |
"7: '/Arts & Entertainment/Humor',\n", | |
"8: '/Arts & Entertainment/Humor/Live Comedy',\n", | |
"9: '/Arts & Entertainment/Live Sporting Events',\n", | |
"10: '/Arts & Entertainment/Magic',\n", | |
"11: '/Arts & Entertainment/Movie Listings & Theater Showtimes',\n", | |
"12: '/Arts & Entertainment/Movies',\n", | |
"13: '/Arts & Entertainment/Movies/Action & Adventure Films',\n", | |
"14: '/Arts & Entertainment/Movies/Animated Films',\n", | |
"15: '/Arts & Entertainment/Movies/Comedy Films',\n", | |
"16: '/Arts & Entertainment/Movies/Cult & Indie Films',\n", | |
"17: '/Arts & Entertainment/Movies/Documentary Films',\n", | |
"18: '/Arts & Entertainment/Movies/Drama Films',\n", | |
"19: '/Arts & Entertainment/Movies/Family Films',\n", | |
"20: '/Arts & Entertainment/Movies/Horror Films',\n", | |
"21: '/Arts & Entertainment/Movies/Romance Films',\n", | |
"22: '/Arts & Entertainment/Movies/Thriller, Crime & Mystery Films',\n", | |
"23: '/Arts & Entertainment/Music & Audio',\n", | |
"24: '/Arts & Entertainment/Music & Audio/Blues',\n", | |
"25: '/Arts & Entertainment/Music & Audio/Classical Music',\n", | |
"26: '/Arts & Entertainment/Music & Audio/Country Music',\n", | |
"27: '/Arts & Entertainment/Music & Audio/Dance & Electronic Music',\n", | |
"28: '/Arts & Entertainment/Music & Audio/Folk & Traditional Music',\n", | |
"29: '/Arts & Entertainment/Music & Audio/Jazz',\n", | |
"30: '/Arts & Entertainment/Music & Audio/Musical Instruments',\n", | |
"31: '/Arts & Entertainment/Music & Audio/Pop Music',\n", | |
"32: '/Arts & Entertainment/Music & Audio/Rap & Hip-Hop',\n", | |
"33: '/Arts & Entertainment/Music & Audio/Rock Music',\n", | |
"34: '/Arts & Entertainment/Music & Audio/Rock Music/Classic Rock & Oldies',\n", | |
"35: '/Arts & Entertainment/Music & Audio/Rock Music/Hard Rock & Progressive',\n", | |
"36: '/Arts & Entertainment/Music & Audio/Rock Music/Indie & Alternative Music',\n", | |
"37: '/Arts & Entertainment/Music & Audio/Soul & R&B',\n", | |
"38: '/Arts & Entertainment/Music & Audio/Soundtracks',\n", | |
"39: '/Arts & Entertainment/Music & Audio/Talk Radio',\n", | |
"40: '/Arts & Entertainment/Music & Audio/World Music',\n", | |
"41: '/Arts & Entertainment/Music & Audio/World Music/Reggae & Caribbean Music',\n", | |
"42: '/Arts & Entertainment/Online Image Galleries',\n", | |
"43: '/Arts & Entertainment/Online Video',\n", | |
"44: '/Arts & Entertainment/Opera',\n", | |
"45: '/Arts & Entertainment/TV Shows & Programs',\n", | |
"46: '/Arts & Entertainment/TV Shows & Programs/TV Comedies',\n", | |
"47: '/Arts & Entertainment/TV Shows & Programs/TV Documentary & Nonfiction',\n", | |
"48: '/Arts & Entertainment/TV Shows & Programs/TV Dramas',\n", | |
"49: '/Arts & Entertainment/TV Shows & Programs/TV Dramas/TV Soap Operas',\n", | |
"50: '/Arts & Entertainment/TV Shows & Programs/TV Family-Oriented Shows',\n", | |
"51: '/Arts & Entertainment/TV Shows & Programs/TV Reality Shows',\n", | |
"52: '/Arts & Entertainment/TV Shows & Programs/TV Sci-Fi & Fantasy Shows',\n", | |
"53: '/Arts & Entertainment/Visual Art & Design',\n", | |
"54: '/Arts & Entertainment/Visual Art & Design/Design',\n", | |
"55: '/Arts & Entertainment/Visual Art & Design/Painting',\n", | |
"56: '/Arts & Entertainment/Visual Art & Design/Photographic & Digital Arts',\n", | |
"57: '/Autos & Vehicles',\n", | |
"58: '/Autos & Vehicles/Cargo Trucks & Trailers',\n", | |
"59: '/Autos & Vehicles/Classic Vehicles',\n", | |
"60: '/Autos & Vehicles/Custom & Performance Vehicles',\n", | |
"61: '/Autos & Vehicles/Gas Prices & Vehicle Fueling',\n", | |
"62: '/Autos & Vehicles/Motor Vehicles (By Type)',\n", | |
"63: '/Autos & Vehicles/Motor Vehicles (By Type)/Autonomous Vehicles',\n", | |
"64: '/Autos & Vehicles/Motor Vehicles (By Type)/Convertibles',\n", | |
"65: '/Autos & Vehicles/Motor Vehicles (By Type)/Coupes',\n", | |
"66: '/Autos & Vehicles/Motor Vehicles (By Type)/Hatchbacks',\n", | |
"67: '/Autos & Vehicles/Motor Vehicles (By Type)/Hybrid & Alternative Vehicles',\n", | |
"68: '/Autos & Vehicles/Motor Vehicles (By Type)/Luxury Vehicles',\n", | |
"69: '/Autos & Vehicles/Motor Vehicles (By Type)/Microcars & Subcompacts',\n", | |
"70: '/Autos & Vehicles/Motor Vehicles (By Type)/Motorcycles',\n", | |
"71: '/Autos & Vehicles/Motor Vehicles (By Type)/Off-Road Vehicles',\n", | |
"72: '/Autos & Vehicles/Motor Vehicles (By Type)/Pickup Trucks',\n", | |
"73: '/Autos & Vehicles/Motor Vehicles (By Type)/Scooters & Mopeds',\n", | |
"74: '/Autos & Vehicles/Motor Vehicles (By Type)/Sedans',\n", | |
"75: '/Autos & Vehicles/Motor Vehicles (By Type)/Station Wagons',\n", | |
"76: '/Autos & Vehicles/Motor Vehicles (By Type)/SUVs & Crossovers',\n", | |
"77: '/Autos & Vehicles/Motor Vehicles (By Type)/SUVs & Crossovers/Crossovers',\n", | |
"78: '/Autos & Vehicles/Motor Vehicles (By Type)/Vans & Minivans',\n", | |
"79: '/Autos & Vehicles/Towing & Roadside Assistance',\n", | |
"80: '/Autos & Vehicles/Vehicle & Traffic Safety',\n", | |
"81: '/Autos & Vehicles/Vehicle Parts & Accessories',\n", | |
"82: '/Autos & Vehicles/Vehicle Repair & Maintenance',\n", | |
"83: '/Autos & Vehicles/Vehicle Shopping',\n", | |
"84: '/Autos & Vehicles/Vehicle Shopping/Used Vehicles',\n", | |
"85: '/Autos & Vehicles/Vehicle Shows',\n", | |
"86: '/Beauty & Fitness',\n", | |
"87: '/Beauty & Fitness/Body Art',\n", | |
"88: '/Beauty & Fitness/Face & Body Care',\n", | |
"89: '/Beauty & Fitness/Face & Body Care/Antiperspirants, Deodorants & Body Sprays',\n", | |
"90: '/Beauty & Fitness/Face & Body Care/Bath & Body Products',\n", | |
"91: '/Beauty & Fitness/Face & Body Care/Clean Beauty',\n", | |
"92: '/Beauty & Fitness/Face & Body Care/Make-Up & Cosmetics',\n", | |
"93: '/Beauty & Fitness/Face & Body Care/Nail Care Products',\n", | |
"94: '/Beauty & Fitness/Face & Body Care/Perfumes & Fragrances',\n", | |
"95: '/Beauty & Fitness/Face & Body Care/Razors & Shavers',\n", | |
"96: '/Beauty & Fitness/Fashion & Style',\n", | |
"97: '/Beauty & Fitness/Fitness',\n", | |
"98: '/Beauty & Fitness/Fitness/Bodybuilding',\n", | |
"99: '/Beauty & Fitness/Hair Care',\n", | |
"100: '/Books & Literature',\n", | |
"101: '/Books & Literature/Childrens Literature',\n", | |
"102: '/Books & Literature/Poetry',\n", | |
"103: '/Business & Industrial',\n", | |
"104: '/Business & Industrial/Advertising & Marketing',\n", | |
"105: '/Business & Industrial/Advertising & Marketing/Sales',\n", | |
"106: '/Business & Industrial/Agriculture & Forestry',\n", | |
"107: '/Business & Industrial/Agriculture & Forestry/Food Production',\n", | |
"108: '/Business & Industrial/Automotive Industry',\n", | |
"109: '/Business & Industrial/Aviation Industry',\n", | |
"110: '/Business & Industrial/Business Operations',\n", | |
"111: '/Business & Industrial/Business Operations/Flexible Work Arrangements',\n", | |
"112: '/Business & Industrial/Business Operations/Human Resources',\n", | |
"113: '/Business & Industrial/Commercial Lending',\n", | |
"114: '/Business & Industrial/Construction & Maintenance',\n", | |
"115: '/Business & Industrial/Construction & Maintenance/Civil Engineering',\n", | |
"116: '/Business & Industrial/Defense Industry',\n", | |
"117: '/Business & Industrial/Energy & Utilities',\n", | |
"118: '/Business & Industrial/Energy & Utilities/Water Supply & Treatment',\n", | |
"119: '/Business & Industrial/Hospitality Industry',\n", | |
"120: '/Business & Industrial/Manufacturing',\n", | |
"121: '/Business & Industrial/Metals & Mining',\n", | |
"122: '/Business & Industrial/Pharmaceuticals & Biotech',\n", | |
"123: '/Business & Industrial/Printing & Publishing',\n", | |
"124: '/Business & Industrial/Retail Trade',\n", | |
"125: '/Business & Industrial/Venture Capital',\n", | |
"126: '/Computers & Electronics',\n", | |
"127: '/Computers & Electronics/Antivirus & Malware',\n", | |
"128: '/Computers & Electronics/Computer Peripherals',\n", | |
"129: '/Computers & Electronics/Consumer Electronics',\n", | |
"130: '/Computers & Electronics/Consumer Electronics/Cameras & Camcorders',\n", | |
"131: '/Computers & Electronics/Consumer Electronics/Home Automation',\n", | |
"132: '/Computers & Electronics/Consumer Electronics/Home Theater Systems',\n", | |
"133: '/Computers & Electronics/Consumer Electronics/Wearable Technology',\n", | |
"134: '/Computers & Electronics/Desktop Computers',\n", | |
"135: '/Computers & Electronics/Laptops & Notebooks',\n", | |
"136: '/Computers & Electronics/Network Security',\n", | |
"137: '/Computers & Electronics/Networking',\n", | |
"138: '/Computers & Electronics/Networking/Distributed & Cloud Computing',\n", | |
"139: '/Computers & Electronics/Programming',\n", | |
"140: '/Computers & Electronics/Software',\n", | |
"141: '/Computers & Electronics/Software/Audio & Music Software',\n", | |
"142: '/Computers & Electronics/Software/Desktop Publishing',\n", | |
"143: '/Computers & Electronics/Software/Freeware & Shareware',\n", | |
"144: '/Computers & Electronics/Software/Graphics & Animation Software',\n", | |
"145: '/Computers & Electronics/Software/Operating Systems',\n", | |
"146: '/Computers & Electronics/Software/Photo Software',\n", | |
"147: '/Computers & Electronics/Software/Video Software',\n", | |
"148: '/Computers & Electronics/Software/Web Browsers',\n", | |
"149: '/Finance',\n", | |
"150: '/Finance/Accounting & Auditing',\n", | |
"151: '/Finance/Accounting & Auditing/Tax Preparation & Planning',\n", | |
"152: '/Finance/Credit Cards',\n", | |
"153: '/Finance/Financial Planning & Management',\n", | |
"154: '/Finance/Financial Planning & Management/Retirement & Pension',\n", | |
"155: '/Finance/Grants, Scholarships & Financial Aid',\n", | |
"156: '/Finance/Grants, Scholarships & Financial Aid/Study Grants & Scholarships',\n", | |
"157: '/Finance/Home Financing',\n", | |
"158: '/Finance/Insurance',\n", | |
"159: '/Finance/Insurance/Auto Insurance',\n", | |
"160: '/Finance/Insurance/Health Insurance',\n", | |
"161: '/Finance/Insurance/Home Insurance',\n", | |
"162: '/Finance/Insurance/Life Insurance',\n", | |
"163: '/Finance/Insurance/Travel Insurance',\n", | |
"164: '/Finance/Investing',\n", | |
"165: '/Finance/Investing/Commodities & Futures Trading',\n", | |
"166: '/Finance/Investing/Currencies & Foreign Exchange',\n", | |
"167: '/Finance/Investing/Hedge Funds',\n", | |
"168: '/Finance/Investing/Mutual Funds',\n", | |
"169: '/Finance/Investing/Stocks & Bonds',\n", | |
"170: '/Finance/Personal Loans',\n", | |
"171: '/Finance/Student Loans & College Financing',\n", | |
"172: '/Food & Drink',\n", | |
"173: '/Food & Drink/Cooking & Recipes',\n", | |
"174: '/Food & Drink/Cooking & Recipes/BBQ & Grilling',\n", | |
"175: '/Food & Drink/Cooking & Recipes/Cuisines',\n", | |
"176: '/Food & Drink/Cooking & Recipes/Cuisines/Vegetarian Cuisine',\n", | |
"177: '/Food & Drink/Cooking & Recipes/Cuisines/Vegetarian Cuisine/Vegan Cuisine',\n", | |
"178: '/Food & Drink/Cooking & Recipes/Healthy Eating',\n", | |
"179: '/Food & Drink/Food & Grocery Retailers',\n", | |
"180: '/Games',\n", | |
"181: '/Games/Billiards',\n", | |
"182: '/Games/Card Games',\n", | |
"183: '/Games/Computer & Video Games',\n", | |
"184: '/Games/Computer & Video Games/Action & Platform Games',\n", | |
"185: '/Games/Computer & Video Games/Adventure Games',\n", | |
"186: '/Games/Computer & Video Games/Casual Games',\n", | |
"187: '/Games/Computer & Video Games/Competitive Video Gaming',\n", | |
"188: '/Games/Computer & Video Games/Massively Multiplayer Games',\n", | |
"189: '/Games/Computer & Video Games/Music & Dance Games',\n", | |
"190: '/Games/Computer & Video Games/Simulation Games',\n", | |
"191: '/Games/Computer & Video Games/Sports Games',\n", | |
"192: '/Games/Computer & Video Games/Strategy Games',\n", | |
"193: '/Games/Drawing & Coloring',\n", | |
"194: '/Games/Roleplaying Games',\n", | |
"195: '/Games/Table Tennis',\n", | |
"196: '/Hobbies & Leisure',\n", | |
"197: '/Hobbies & Leisure/Anniversaries',\n", | |
"198: '/Hobbies & Leisure/Birthdays & Name Days',\n", | |
"199: '/Hobbies & Leisure/Diving & Underwater Activities',\n", | |
"200: '/Hobbies & Leisure/Fiber & Textile Arts',\n", | |
"201: '/Hobbies & Leisure/Outdoors',\n", | |
"202: '/Hobbies & Leisure/Outdoors/Fishing',\n", | |
"203: '/Hobbies & Leisure/Outdoors/Hunting & Shooting',\n", | |
"204: '/Hobbies & Leisure/Paintball',\n", | |
"205: '/Hobbies & Leisure/Radio Control & Modeling',\n", | |
"206: '/Hobbies & Leisure/Weddings',\n", | |
"207: '/Home & Garden',\n", | |
"208: '/Home & Garden/Gardening',\n", | |
"209: '/Home & Garden/Home & Interior Decor',\n", | |
"210: '/Home & Garden/Home Appliances',\n", | |
"211: '/Home & Garden/Home Improvement',\n", | |
"212: '/Home & Garden/Home Safety & Security',\n", | |
"213: '/Home & Garden/Household Supplies',\n", | |
"214: '/Home & Garden/Landscape Design',\n", | |
"215: '/Internet & Telecom',\n", | |
"216: '/Internet & Telecom/Email',\n", | |
"217: '/Internet & Telecom/ISPs',\n", | |
"218: '/Internet & Telecom/Phone Service Providers',\n", | |
"219: '/Internet & Telecom/Search Engines',\n", | |
"220: '/Internet & Telecom/Smart Phones',\n", | |
"221: '/Internet & Telecom/Teleconferencing',\n", | |
"222: '/Internet & Telecom/Text & Instant Messaging',\n", | |
"223: '/Internet & Telecom/Web Apps & Online Tools',\n", | |
"224: '/Internet & Telecom/Web Design & Development',\n", | |
"225: '/Internet & Telecom/Web Hosting',\n", | |
"226: '/Jobs & Education',\n", | |
"227: '/Jobs & Education/Education',\n", | |
"228: '/Jobs & Education/Education/Academic Conferences & Publications',\n", | |
"229: '/Jobs & Education/Education/Colleges & Universities',\n", | |
"230: '/Jobs & Education/Education/Distance Learning',\n", | |
"231: '/Jobs & Education/Education/Early Childhood Education',\n", | |
"232: '/Jobs & Education/Education/Early Childhood Education/Preschool',\n", | |
"233: '/Jobs & Education/Education/Homeschooling',\n", | |
"234: '/Jobs & Education/Education/Standardized & Admissions Tests',\n", | |
"235: '/Jobs & Education/Education/Vocational & Continuing Education',\n", | |
"236: '/Jobs & Education/Jobs',\n", | |
"237: '/Jobs & Education/Jobs/Career Resources & Planning',\n", | |
"238: '/Jobs & Education/Jobs/Job Listings',\n", | |
"239: '/Law & Government',\n", | |
"240: '/Law & Government/Crime & Justice',\n", | |
"241: '/Law & Government/Legal',\n", | |
"242: '/Law & Government/Legal/Legal Services',\n", | |
"243: '/News',\n", | |
"244: '/News/Economy News',\n", | |
"245: '/News/Local News',\n", | |
"246: '/News/Mergers & Acquisitions',\n", | |
"247: '/News/Politics',\n", | |
"248: '/News/Weather',\n", | |
"249: '/News/World News',\n", | |
"250: '/Online Communities',\n", | |
"251: '/Online Communities/Dating & Personals',\n", | |
"252: '/Online Communities/Forum & Chat Providers',\n", | |
"253: '/Online Communities/Social Networks',\n", | |
"254: '/People & Society',\n", | |
"255: '/People & Society/Family & Relationships',\n", | |
"256: '/People & Society/Family & Relationships/Ancestry & Genealogy',\n", | |
"257: '/People & Society/Family & Relationships/Marriage',\n", | |
"258: '/People & Society/Family & Relationships/Parenting',\n", | |
"259: '/People & Society/Family & Relationships/Parenting/Adoption',\n", | |
"260: '/People & Society/Family & Relationships/Parenting/Babies & Toddlers',\n", | |
"261: '/People & Society/Family & Relationships/Parenting/Child Internet Safety',\n", | |
"262: '/People & Society/Science Fiction & Fantasy',\n", | |
"263: '/Pets & Animals',\n", | |
"264: '/Pets & Animals/Pet Food & Pet Care Supplies',\n", | |
"265: '/Pets & Animals/Pets',\n", | |
"266: '/Pets & Animals/Pets/Birds',\n", | |
"267: '/Pets & Animals/Pets/Cats',\n", | |
"268: '/Pets & Animals/Pets/Dogs',\n", | |
"269: '/Pets & Animals/Pets/Fish & Aquaria',\n", | |
"270: '/Pets & Animals/Pets/Reptiles & Amphibians',\n", | |
"271: '/Pets & Animals/Veterinarians',\n", | |
"272: '/Real Estate',\n", | |
"273: '/Real Estate/Lots & Land',\n", | |
"274: '/Real Estate/Timeshares & Vacation Properties',\n", | |
"275: '/Reference',\n", | |
"276: '/Reference/Educational Resources',\n", | |
"277: '/Reference/Foreign Language Study',\n", | |
"278: '/Reference/How-To, DIY & Expert Content',\n", | |
"279: '/Science',\n", | |
"280: '/Science/Augmented & Virtual Reality',\n", | |
"281: '/Science/Biological Sciences',\n", | |
"282: '/Science/Biological Sciences/Genetics',\n", | |
"283: '/Science/Chemistry',\n", | |
"284: '/Science/Ecology & Environment',\n", | |
"285: '/Science/Geology',\n", | |
"286: '/Science/Machine Learning & Artificial Intelligence',\n", | |
"287: '/Science/Physics',\n", | |
"288: '/Science/Robotics',\n", | |
"289: '/Shopping',\n", | |
"290: '/Shopping/Antiques & Collectibles',\n", | |
"291: '/Shopping/Childrens Clothing',\n", | |
"292: '/Shopping/Consumer Resources',\n", | |
"293: '/Shopping/Consumer Resources/Coupons & Discount Offers',\n", | |
"294: '/Shopping/Costumes',\n", | |
"295: '/Shopping/Flowers',\n", | |
"296: '/Shopping/Mens Clothing',\n", | |
"297: '/Shopping/Party & Holiday Supplies',\n", | |
"298: '/Shopping/Womens Clothing',\n", | |
"299: '/Sports',\n", | |
"300: '/Sports/American Football',\n", | |
"301: '/Sports/Australian Football',\n", | |
"302: '/Sports/Auto Racing',\n", | |
"303: '/Sports/Baseball',\n", | |
"304: '/Sports/Basketball',\n", | |
"305: '/Sports/Bowling',\n", | |
"306: '/Sports/Boxing',\n", | |
"307: '/Sports/Cheerleading',\n", | |
"308: '/Sports/College Sports',\n", | |
"309: '/Sports/Cricket',\n", | |
"310: '/Sports/Cycling',\n", | |
"311: '/Sports/Equestrian',\n", | |
"312: '/Sports/Extreme Sports',\n", | |
"313: '/Sports/Extreme Sports/Climbing & Mountaineering',\n", | |
"314: '/Sports/Fantasy Sports',\n", | |
"315: '/Sports/Golf',\n", | |
"316: '/Sports/Gymnastics',\n", | |
"317: '/Sports/Hockey',\n", | |
"318: '/Sports/Ice Skating',\n", | |
"319: '/Sports/Martial Arts',\n", | |
"320: '/Sports/Motorcycle Racing',\n", | |
"321: '/Sports/Olympics',\n", | |
"322: '/Sports/Rugby',\n", | |
"323: '/Sports/Running & Walking',\n", | |
"324: '/Sports/Skiing & Snowboarding',\n", | |
"325: '/Sports/Soccer',\n", | |
"326: '/Sports/Surfing',\n", | |
"327: '/Sports/Swimming',\n", | |
"328: '/Sports/Tennis',\n", | |
"329: '/Sports/Track & Field',\n", | |
"330: '/Sports/Volleyball',\n", | |
"331: '/Sports/Wrestling',\n", | |
"332: '/Travel & Transportation',\n", | |
"333: '/Travel & Transportation/Adventure Travel',\n", | |
"334: '/Travel & Transportation/Air Travel',\n", | |
"335: '/Travel & Transportation/Business Travel',\n", | |
"336: '/Travel & Transportation/Car Rentals',\n", | |
"337: '/Travel & Transportation/Cruises & Charters',\n", | |
"338: '/Travel & Transportation/Family Travel',\n", | |
"339: '/Travel & Transportation/Honeymoons & Romantic Getaways',\n", | |
"340: '/Travel & Transportation/Hotels & Accommodations',\n", | |
"341: '/Travel & Transportation/Long Distance Bus & Rail',\n", | |
"342: '/Travel & Transportation/Low Cost & Last Minute Travel',\n", | |
"343: '/Travel & Transportation/Luggage & Travel Accessories',\n", | |
"344: '/Travel & Transportation/Tourist Destinations',\n", | |
"345: '/Travel & Transportation/Tourist Destinations/Beaches & Islands',\n", | |
"346: '/Travel & Transportation/Tourist Destinations/Regional Parks & Gardens',\n", | |
"347: '/Travel & Transportation/Tourist Destinations/Theme Parks',\n", | |
"348: '/Travel & Transportation/Tourist Destinations/Zoos, Aquariums & Preserves',\n", | |
"349: '/Travel & Transportation/Travel Guides & Travelogues',\n", | |
"-2: 'Unknown'\n", | |
"}" | |
], | |
"metadata": { | |
"id": "JbicWvp3F5Gi" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## Model Execution Demo\n" | |
], | |
"metadata": { | |
"id": "h-jNa5GFHFI8" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from tflite_support.task import text\n", | |
"from tflite_support.task import core\n", | |
"\n", | |
"# Remember, this will fail if you haven't uploaded model.tflite.\n", | |
"# See instructions at the top of this file.\n", | |
"\n", | |
"options = text.BertNLClassifierOptions(\n", | |
" base_options=core.BaseOptions(\n", | |
" file_name='model.tflite'))\n", | |
"\n", | |
"tflite_topics = text.BertNLClassifier.create_from_options(options)\n", | |
"\n", | |
"def CategorySort(elem):\n", | |
" return elem.score" | |
], | |
"metadata": { | |
"id": "5xa5NqXzF8RE" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"domains = [\n", | |
" \"github.com\",\n", | |
" \"wikipedia.org\",\n", | |
" \"wikipedia.com\",\n", | |
"]" | |
], | |
"metadata": { | |
"id": "6OyGmF-gHAdr" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"\n", | |
"for domain in domains:\n", | |
" print(\"domain: \", domain)\n", | |
" processed_domain = process_domain(domain)\n", | |
" topics = check_override_list(override_list, processed_domain)\n", | |
" if topics != None:\n", | |
" for c in topics:\n", | |
" print(\"Category: {} \\t\".format( cat_map[c]))\n", | |
" else:\n", | |
" topics = tflite_topics.classify(processed_domain)\n", | |
" cats = sorted(topics.classifications[0].categories, key=CategorySort)[-5:][::-1]\n", | |
" for c in cats:\n", | |
" print(\"Category: {} \\t - Score: {}\".format( cat_map[int(c.category_name)], c.score))\n", | |
"\n", | |
" print(\"\\n\")" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "8ntQHb06F9th", | |
"outputId": "c98d05ca-6896-4663-9e10-49640bab1639" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"domain: github.com\n", | |
"Category: /Computers & Electronics/Programming \t\n", | |
"Category: /Computers & Electronics/Software \t\n", | |
"Category: /Internet & Telecom \t\n", | |
"Category: /Internet & Telecom/Web Hosting \t\n", | |
"\n", | |
"\n", | |
"domain: wikipedia.org\n", | |
"Category: /Reference \t\n", | |
"\n", | |
"\n", | |
"domain: wikipedia.com\n", | |
"Category: /Reference \t - Score: 0.9774933457374573\n", | |
"Category: /Arts & Entertainment \t - Score: 0.019126325845718384\n", | |
"Category: /News \t - Score: 0.009499592706561089\n", | |
"Category: /Reference/Educational Resources \t - Score: 0.00749183027073741\n", | |
"Category: /Law & Government \t - Score: 0.0072003500536084175\n", | |
"\n", | |
"\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment