Skip to content

Instantly share code, notes, and snippets.

View rjurney's full-sized avatar

Russell Jurney rjurney

View GitHub Profile
@rjurney
rjurney / .claude_slash_settings.json
Last active May 3, 2025 07:37
My current CLAUDE.md file, always a work in progress...
{
"permissions": {
"allow": [
"Bash(git status:*)",
"Bash(git diff:*)",
"Bash(git log:*)",
"Bash(poetry update:*)",
"Bash(pip show:*)",
"Bash(pip freeze:*)",
"Bash(pip list:*)",
@rjurney
rjurney / parse.py
Created April 23, 2025 05:25
Parse JSONIsh brackets
class Solution:
openers = ["(", "{", "["]
closers = [")", "}", "]"]
valid = openers + closers
def __init__(self):
self.stack = []
def push(self, x):
self.stack.append(x)
@rjurney
rjurney / substr.py
Created April 23, 2025 04:54
Find the longest common sub-string among a set of strings...
from collections import defaultdict
from pprint import pprint
class Solution:
def longestCommonPrefix(self, strs: List[str]) -> str:
str_count = len(strs)
min_len = min([len(s) for s in strs])
print(f"Minimum string length: {min_len:,}")
@rjurney
rjurney / pregel.py
Created March 25, 2025 17:10
GraphFrames Pregel API - sum the ages of a node's neighbors
from graphframes.lib import AggregateMessages as AM
from graphframes.examples import Graphs
from pyspark.sql.functions import sum as sqlsum
g = Graphs(spark).friends() # Get example graph
# For each user, sum the ages of the adjacent users
msgToSrc = AM.dst["age"]
msgToDst = AM.src["age"]
@rjurney
rjurney / csv_single.py
Last active February 3, 2025 13:20
A proposed monkey patch to save PySpark DataFrames into a single CSV file
import os
import glob
import shutil
import uuid
from pyspark.sql.readwriter import DataFrameWriter
def csv_single(self, path, **options):
"""
Write the DataFrame as a single CSV file at the specified path.
@rjurney
rjurney / extract.py
Last active January 31, 2025 07:58
Relik for relation extraction on the GraphFrames paper
"""Script that tests and times Relik's relation extraction and entity linking on the GraphFrames Paper: https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf"""
import timeit
import warnings
from pprint import pprint
from relik import Relik # type: ignore
from relik.inference.data.objects import RelikOutput # type: ignore
# Squash Relik's warnings for prettier screenshots
warnings.simplefilter("ignore")
@rjurney
rjurney / paper.py
Created January 31, 2025 06:48
Extracting the text from the GraphFrames paper with PyPDF
from pypdf import PdfReader
# Load the PDF. The GraphFrames paper normally resides at
# https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf
reader = PdfReader("data/grades_graphframes.pdf")
# Extract text from all pages
text = "\n".join([page.extract_text() for page in reader.pages if page.extract_text()])
# Write it to a text file
@rjurney
rjurney / comment.txt
Last active January 31, 2025 17:27
Relik: Hello, World!
You can see it linked many topics that are related - Apache Phoenix - but not actually mentioned in the text...
@rjurney
rjurney / command.txt
Last active January 18, 2025 10:31
Warp.dev shell command to count my papers on graph pattern matching: graphlets and network motifs
Prompt: Find all instances of files containing the term 'motif' or 'graphlet' in this folder
or any below it. List the filenames, then print the total count of unique files.
@rjurney
rjurney / A GraphFrames Bug
Last active January 11, 2025 07:49
GraphFrames Connected Components OutOfMemoryError in Java 11 on TINY Graph...
I can't figure out why this unit test is failing with this error:
> [error] Uncaught exception when running org.graphframes.lib.ConnectedComponentsSuite: java.lang.OutOfMemoryError: Java
heap space sbt.ForkMain$ForkError: java.lang.OutOfMemoryError: Java heap space
The test is an 8 node, 6 edge graph of two components and two dangling vertices. WTF heap space? I cleaned up the `Dockerfile`
below because it was on wonky versions and tried the same commands there... no go. Same exception. The weird thing is that
CI does pass these tests... so I don't get what is going wrong.
HOW YOU CAN HELP: Please run this command and tell me if the tests pass: