Skip to content

Instantly share code, notes, and snippets.

View CanOfBees's full-sized avatar
😶‍🌫️

Bridger Dyson-Smith CanOfBees

😶‍🌫️
View GitHub Profile
@CanOfBees
CanOfBees / nginx.conf
Created May 31, 2024 13:29 — forked from Stanback/nginx.conf
Example Nginx configuration for adding cross-origin resource sharing (CORS) support to reverse proxied APIs
#
# CORS header support
#
# One way to use this is by placing it into a file called "cors_support"
# under your Nginx configuration directory and placing the following
# statement inside your **location** block(s):
#
# include cors_support;
#
# As of Nginx 1.7.5, add_header supports an "always" parameter which
@CanOfBees
CanOfBees / write_pages_2.py
Created October 7, 2021 02:16
python - create subset PDFs for error checking
import sys
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
inpfn = sys.argv[1]
file_base_name = inpfn.replace('.pdf','')
pdf = PdfFileReader(inpfn)
@CanOfBees
CanOfBees / slow-check.xq
Last active October 5, 2021 18:47
checking tika-generated xhtml output from PDFs
(:
: for each xh:html document in the database, check for 1 occurence of $str1
: and 2 occurences of $str2, returning the db:path (or name) of the document where
: true
:)
declare namespace xh = "http://www.w3.org/1999/xhtml";
for $html in //xh:html
let $str1 := "(original signatures are on file with official student records"
let $str2 := "to the graduate council:"
@CanOfBees
CanOfBees / steps.md
Created June 15, 2021 14:52
Setup Steps for ArchivesSpace

Steps to set up ArchivesSpace

ArchivesSpace application

The AS application is self-contained, but you do need a JVM. I think a JRE is sufficient, but we've used the full OpenJDK for deploying. The AS docs specify version 1.8 for your java.

  1. cd /vhosts/ && wget https://github.com/archivesspace/archivesspace/releases/download/v3.0.1/archivesspace-v3.0.1.zip
  2. unzip archivesspace-v3.0.1.zip

Database

For anything other than local fun, use a robust SQL ("squill") database. See the official readme for different info. Note that the following steps assume that you have already downloaded and unpacked the ArchivesSpace release from github.

@CanOfBees
CanOfBees / validate_mtsu_16.xq
Last active June 3, 2021 18:08
Validate MTSU_16
xquery version "3.1";
declare variable $oai-resp := fetch:xml("http://cdm15838.contentdm.oclc.org/oai/oai.php?verb=ListRecords&set=p15838coll16&metadataPrefix=oai_dc");
(: two items -
: 1. you'll want to add the saxon-8.7.jar to $BaseX/lib/custom, and,
: 2. you'll want to modify these paths if you want to double-check this
:)
declare variable $xslt := "/home/bridger/Documents/metadata-notes/DLTN/XSLT/mtsup15838coll16dctomods.xsl";
declare variable $schema := "/home/bridger/Documents/metadata-notes/DLTN/tests/testSchemas/DLTN_oai_mods.xsd";
@CanOfBees
CanOfBees / another-bad-query.xq
Created November 6, 2020 17:02
Pull all geographic values without valueURIs on the geographic element, or the parent subject element.
xquery version "3.1";
declare namespace mods = "http://www.loc.gov/mods/v3";
declare namespace xlink = "http://www.w3.org/1999/xlink";
distinct-values(
//mods:mods/mods:subject[not(@valueURI)]/mods:geographic[not(@valueURI)]
)[contains(lower-case(.), 'tenn.') and not(contains(., '--'))]
(: this returns the following 290 values
@CanOfBees
CanOfBees / almost-but-not-really.xq
Last active October 17, 2020 01:28
BaseX - return distinct XPaths as strings
xquery version "3.1";
declare variable $input :=
<test>
<aaa>
<bbb type="foo">bbb content</bbb>
</aaa>
<aaa>
<bbb type="foo" enc="bar">bbb content</bbb>
</aaa>
@CanOfBees
CanOfBees / tdh-titles-collections.xq
Created May 6, 2020 03:13
Ugly xq for pulling titles / collection information for TDH
declare namespace tei = "http://www.tei-c.org/ns/1.0";
"### TDH Contributing Repositories: ###",
"### TDH Titles: ###",
for $TEI in //tei:TEI
let $TEIid := $TEI/@xml:id/data()
let $TEIbibl := $TEI/tei:teiHeader/descendant::tei:bibl
@CanOfBees
CanOfBees / slurp_all_MODS_to_solr.xslt
Created May 4, 2020 15:39
Refactor slurp_all_MODS for identifiers
<?xml version="1.0" encoding="UTF-8"?>
<!-- Basic MODS -->
<xsl:stylesheet version="1.0"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
xmlns:mods="http://www.loc.gov/mods/v3"
exclude-result-prefixes="mods java">
<!-- <xsl:include href="/vhosts/fedora/tomcat/webapps/fedoragsearch/WEB-INF/classes/config/index/FgsIndex/islandora_transforms/library/xslt-date-template.xslt"/>-->
<!--<xsl:include href="/usr/share/tomcat/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/islandora_transforms/library/xslt-date-template.xslt"/>-->
@CanOfBees
CanOfBees / tdh-file-renaming.xq
Created May 3, 2020 02:31
i needed to regex and this was the least destructive way i could think to use 'em
xquery version "3.1";
(: analyze-string example :)
declare variable $source external := "/home/bridger/rename-test/";
declare variable $target external := "/home/bridger/rename-target/";
declare variable $pav-text:= "(pav)(\.xml|\.mods.xml|\.txt)";
declare variable $pav-images := "(pav)(\d{1,})(\.tif)";
declare variable $cmmn-text:= "(\p{L}{1,})(\d{1,})(\.xml|\.mods\.xml|\.txt)";
declare variable $cmmn-image:= "(\p{L}{1,})(\d{1,})(\p{L}{1,})(\.tif)";
declare variable $cmmn-image-2:= "(\p{L}{1,})(\d{1,})(\.tif)";