Created
February 16, 2014 19:02
-
-
Save andrewsmhay/9039012 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I've got host scan data that I've imported into MongoDB in the following format: | |
{ | |
"_id" : ObjectId("52fd928c62c9815b36f66e68"), | |
"date" : "1/1/2014", | |
"scanner" : "123.9.74.172", | |
"csp" : "aws", | |
"ip" : "126.34.44.38", | |
"port" : 445, | |
"latt" : 35.685, | |
"long" : 139.7514, | |
"country" : "Japan", | |
"continent" : "AS", | |
"region" : 40, | |
"city" : "Tokyo" | |
} | |
{ | |
"_id" : ObjectId("52fd928c62c9815b36f66e69"), | |
"date" : "1/1/2014", | |
"scanner" : "119.9.74.172", | |
"csp" : "aws", | |
"ip" : "251.252.216.196", | |
"port" : 135, | |
"latt" : -33.86150000000001, | |
"long" : 151.20549999999997, | |
"country" : "Australia", | |
"continent" : "OC", | |
"region" : 2, | |
"city" : "Sydney" | |
} | |
{ | |
"_id" : ObjectId("52fd928c62c9815b36f66e6a"), | |
"date" : "1/1/2014", | |
"scanner" : "143.9.74.172", | |
"csp" : "aws", | |
"ip" : "154.248.219.132", | |
"port" : 139, | |
"latt" : 35.685, | |
"long" : 139.7514, | |
"country" : "Japan", | |
"continent" : "AS", | |
"region" : 40, | |
"city" : "Tokyo" | |
} | |
Since I"m new to mongo, I've been looking at the aggregation framework and mapreduce to figure out how to create some queries. I can't, however, for the life of me figure out how to do something as simple as: | |
1) Count the distinct "ip" addresses with "port" 445 with a "date" of "1/1/2014" | |
2) Return the "ip" address with the most open "ports", by "date" | |
3) Count the distinct "ip" addresses, by "csp", for every "date" in January | |
Any help would be greatly appreciated. I've been reading and reading but the queries keep exceeding the 16MB limit. As you can see below, I have a lot of entries: | |
{ | |
"ns" : "brisket.my_models", | |
"count" : 117715343, | |
"size" : 25590813424, | |
"avgObjSize" : 217.3957342502073, | |
"storageSize" : 29410230112, | |
"numExtents" : 33, | |
"nindexes" : 1, | |
"lastExtentSize" : 2146426864, | |
"paddingFactor" : 1, | |
"systemFlags" : 1, | |
"userFlags" : 0, | |
"totalIndexSize" : 3819900784, | |
"indexSizes" : { | |
"_id_" : 3819900784 | |
}, | |
"ok" : 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Map reduce will probably get you what you want. But I've been using pymongo as a way to avoid learning map reduce.