Skip to content

Instantly share code, notes, and snippets.

@davidmezzetti
Created June 12, 2026 16:19
Show Gist options
  • Select an option

  • Save davidmezzetti/b469bdd8c601dd8659c3b1d3e739152a to your computer and use it in GitHub Desktop.

Select an option

Save davidmezzetti/b469bdd8c601dd8659c3b1d3e739152a to your computer and use it in GitHub Desktop.
# pip install txtai-minimal beautifulsoup4
# pip freeze
# beautifulsoup4==4.15.0
# soupsieve==2.8.4
# txtai_minimal==9.10.0
# typing_extensions==4.15.0
# du -hs /python
# 19M /python
from txtai import Textractor
textractor = Textractor(sections=True)
for x in textractor("https://github.com/neuml"):
print("SECTION", x)
# SECTION **NeuML · GitHub**
#
# *NeuML is the company behind txtai, one of the most popular open-source AI frameworks in the world. - NeuML*
# SECTION NeuML is the company behind txtai, one of the most popular open-source AI frameworks in the world.
#
# We are building a suite of applications to make it easy to integrate AI into production.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment