Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

import logging | |
import uuid | |
import time | |
from mesos.interface import Scheduler | |
from mesos.native import MesosSchedulerDriver | |
from mesos.interface import mesos_pb2 | |
logging.basicConfig(level=logging.INFO) |
import org.apache.spark.mllib.linalg.distributed.RowMatrix | |
import org.apache.spark.mllib.linalg._ | |
import org.apache.spark.{SparkConf, SparkContext} | |
// To use the latest sparse SVD implementation, please build your spark-assembly after this | |
// change: https://github.com/apache/spark/pull/1378 | |
// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0 | |
// Assume the number of rows is larger than the number of columns, and the number of columns is | |
// smaller than Int.MaxValue |
Vagrant.configure("2") do |config| | |
config.vm.box = "dummy" | |
config.vm.provider :aws do |aws, override| | |
aws.access_key_id = "..." | |
aws.secret_access_key = "..." | |
# you'll need to create the EC2 keypair used here -- I called it vagrant for easy tracking | |
aws.keypair_name = "vagrant" | |
# you'll want to use a group that has at least SSH open |
As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.
##Some questions:
<?xml version="1.0"?> | |
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd"> | |
<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model"> | |
<Extension name="user" value="lanenga" extender="Rattle/PMML"/> | |
<Application name="Rattle/PMML" version="1.4"/> | |
<Timestamp>2014-01-07 15:33:34</Timestamp> | |
</Header> | |
<DataDictionary numberOfFields="4"> | |
<DataField name="sepal_width" optype="continuous" dataType="double"/> | |
<DataField name="sepal_length" optype="continuous" dataType="double"/> |
bash-3.2$ lein do sub install, deps, compile, repl | |
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2) | |
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k) | |
from https://clojars.org/repo/ | |
Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2) | |
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k) | |
from https://clojars.org/repo/ | |
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k) | |
from http://repo1.maven.org/maven2/ | |
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k) |
bash-3.2$ lein version | |
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM | |
bash-3.2$ hadoop version | |
Warning: $HADOOP_HOME is deprecated. | |
Hadoop 1.0.3 | |
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192 | |
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012 | |
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be | |
bash-3.2$ lein clean |