This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A quick excerpt demonstrating usage of a custom `RegularExpressionMatcher` for spaCy 3. | |
This is from one of my personal projects (HaleyNLP/Irnerius). Module-level imports and other code blocks have been elided. | |
""" | |
class ComponentExtractionBibliographer( | |
AbstractComponentMatcher, | |
matcher=RegularExpressionMatcher, | |
): | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A quick excerpt demonstrating usage of a custom `RegularExpressionMatcher` for spaCy 3. | |
This is from one of my personal projects (HaleyNLP/Irnerius). Module-level imports and other code blocks have been elided. | |
""" | |
class RegularExpressionMatcher: | |
""" | |
Akin to spaCy's Token Matcher, although this runs regular expressions | |
on the entire doc text. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A quick excerpt demonstrating usage of a custom `RegularExpressionMatcher` for spaCy 3. | |
This is from one of my personal projects (HaleyNLP/Irnerius). Module-level imports and other code blocks have been elided. | |
""" | |
@registry.misc('haleynlp.common.extraction.handler.on_match.bibliography._european_union_ecli') | |
def _european_union_ecli( | |
span: Span, | |
match: re.Match, | |
) -> None: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT ?cur ?curLabel ?code ?char ?symbol ?endTime ?countryLabel ?altLabel | |
WHERE | |
{ | |
?sign wdt:P31 wd:Q308229. | |
?sign wdt:P487 ?char . | |
?cur wdt:P489 ?sign . | |
?cur wdt:P31 wd:Q8142 . | |
?cur wdt:P498 ?code . | |
OPTIONAL { ?cur wdt:P5061 ?symbol . } | |
?cur wdt:P17 ?country . |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Adpated from: https://stackoverflow.com/a/14822210/4189676 | |
""" | |
from math import floor, log | |
def bytes_to_human_readable(number_of_bytes: int) -> str: | |
magnitude: int = int(floor(log(number_of_bytes, 1024))) | |
value: float = number_of_bytes / pow(1024, magnitude) | |
if magnitude > 3: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import List, Set, Tuple, Type | |
from taggit.models import Tag, TaggedItem | |
from django.db.models import Model, QuerySet | |
from django.contrib.contenttypes.models import ContentType | |
class TagAdder: | |
""" | |
An object which adds specific tags to given Django Model objects in bulk. | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import Dict | |
from pandas import DataFrame | |
from taggit.models import TaggedItem | |
from django.db.models import Q, Count | |
def get_tag_counts() -> DataFrame: | |
""" | |
Returns a DataFrame with the counts of django-taggit TaggedItems by | |
Tag name and ContentType model name. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# standard library imports | |
from operator import attrgetter | |
from typing import Union, Generator | |
# third-party library imports | |
from pandas import DataFrame | |
from spacy.tokens import Token, Span, Doc | |
def analyze_tokens( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Enhanced Django QuerySet printing using PrettyPrinter | |
# Example usage: dropped into and employed within an IPython notebook. | |
# --- PRETTYPRINT ------------------------------------------------------------- | |
# A PrettyPrinter object contains a _dispatch dictionary. | |
# This lookup table contains (key, value) pairs wherein the key corresponds to | |
# an object's __repr__ method, and the value is a special _pprint_<OBJECT> | |
# method. The PrettyPrint method pprint() queries the dictionary to call the | |
# appropriate object printer. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ZCTA5CE10 | count | |
---|---|---|
49008 | 15 | |
49036 | 1 | |
49740 | 1 | |
49506 | 5 | |
48170 | 4 | |
48302 | 5 | |
49009 | 10 | |
49017 | 1 | |
48306 | 4 |
NewerOlder