Skip to content

Instantly share code, notes, and snippets.

@AnderRV
Created August 11, 2021 15:09
Show Gist options
  • Save AnderRV/fef6bca56a6598ba79d7178ffd45e733 to your computer and use it in GitHub Desktop.
Save AnderRV/fef6bca56a6598ba79d7178ffd45e733 to your computer and use it in GitHub Desktop.
def crawl(url):
if not url or url in visited:
return
print('Crawl: ', url)
visited.add(url)
html = get_html(url)
soup = BeautifulSoup(html, 'html.parser')
extract_content(soup)
links = extract_links(soup)
to_visit.update(links)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment