Created
July 15, 2014 20:46
-
-
Save shayanjm/5c5f490289bd7083a088 to your computer and use it in GitHub Desktop.
A rough implementation of Normalized Google Distance in Clojure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; Rough implementation of Normalized Google Distance algorithm | |
; Assumed total number of indexed pages = 42,000,000,000 | |
(defn get-ngd | |
"Returns the normalized google distance of two searchable terms. Returns nil if no results available for either query, or if there is no overlap for either query. The closer the result trends to 0, the more closely 'related' the terms are." | |
[term1 term2] | |
(let [m 42000000000 | |
fx (Integer. (:totalResults (:searchInformation (google-search term1)))) | |
fy (Integer. (:totalResults (:searchInformation (google-search term2)))) | |
fxy (Integer. (:totalResults (:searchInformation (google-search (str term1 "+" term2))))) | |
ngdnumerator (- (max (math/log10 fx) (math/log10 fy)) (math/log10 fxy)) | |
ngddenominator (- (math/log10 m) (min (math/log10 fx) (math/log10 fy))) | |
ngd (/ ngdnumerator ngddenominator)] | |
(if (or (utils/NaN? ngd) (Double/isInfinite ngd)) | |
nil ngd))) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment