Skip to content

Instantly share code, notes, and snippets.

@afparsons
Last active August 2, 2022 13:19
Show Gist options
  • Save afparsons/b40d84f1e20b21fa53bab7b20dd8957b to your computer and use it in GitHub Desktop.
Save afparsons/b40d84f1e20b21fa53bab7b20dd8957b to your computer and use it in GitHub Desktop.
Spacy: Tabular View of Token Attributes
# standard library imports
from operator import attrgetter
from typing import Union, Generator
# third-party library imports
from pandas import DataFrame
from spacy.tokens import Token, Span, Doc
def analyze_tokens(
doclike: Union[Doc, Span],
*attributes
) -> DataFrame:
"""
Example:
`analyze_tokens(doc, 'like_num', 'lemma_', 'pos_', 'children', '_.custom')`
"""
columns = ('text', *attributes)
data = (
(
(*attribute,) if isinstance(attribute, Generator) else attribute
for attribute in attrgetter(*columns)(token)
) for token in doclike
)
return DataFrame(data=data, columns=columns).T
@afparsons
Copy link
Author

This returns a Pandas DataFrame.

image

There is nothing revolutionary about this function; it is simply a convenient way to evaluate a given Doc or Span in, for example, a Jupyter notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment