Skip to content

Instantly share code, notes, and snippets.

View mhaeussermann's full-sized avatar

Manuel Häußermann mhaeussermann

View GitHub Profile
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
@gidili
gidili / profanity_check.js
Created January 31, 2013 16:46
simple javascript profanity check
// extend strings with the method "contains"
String.prototype.contains = function(str) { return this.indexOf(str) != -1; };
// profanities of choice
var profanities = new Array("ass", "cunt", "pope");
var containsProfanity = function(text){
var returnVal = false;
for (var i = 0; i < profanities.length; i++) {
if(text.toLowerCase().contains(profanities[i].toLowerCase())){