andrewvc · December 18, 2015 11:58 · Jul 24, 2013 · Jul 24, 2013 · Jul 17, 2013 · Jun 14, 2013
diff --git a/laruby-elasticsearchtalk.md b/laruby-elasticsearchtalk.md
@@ -157,26 +157,36 @@ ElasticSearch can report counts of common terms in documents, frequently seen on
 
 ## Let's Facet
 
-```ruby
-# Create a mapping for bands, with a 'name' and a 'genre'
-server.index(:bands).create(mappings: {band: {properties: {name: {type: :string}, genre: {type: :string, index: :not_analyzed} }}})
-
-#Import some docs
-[["Stone Roses", "madchester"], ["Boards of Canada", "IDM"], ["Aphex Twin", "IDM"], ["Mogwai", "Post Rock"], ["Godspeed", "Post Rock"], ["Harry Belafonte", "Calypso"]].
-each_with_index {|b,i|
-  server.index(:bands).type(:band).put(i, {name: b[0], genre: b[1]})
-}
-
-# Perform a search
-server.index(:bands).search(facets: {bands: {terms: {field: :genre}}}).facets.bands.terms.map {|f| [f[:term], f[:count]]}
-# => [["Post Rock", 2], ["IDM", 2], ["madchester", 1], ["Calypso", 1]]
-
-# A more specific search
-server.index(:bands).search(query: {match: {name: "Boards"}}, facets: {bands: {terms: {field: :genre}}}).facets.bands.terms.map {|f| [f[:term], f[:count]]}
-# => [["IDM", 1]]
+```
+POST /bands
+
+PUT /bands/band/_mapping
+{"band":{"properties":{"name":{"type":"string"},"genre":{"type":"string","index":"not_analyzed"}}}}
+
+POST /_bulk
+{"index": {"_index": "bands", "_type": "band", "_id": 1}}
+{"name": "Stone Roses", "genre": "madchester"}
+{"index": {"_index": "bands", "_type": "band", "_id": 2}}
+{"name": "Aphex Twin", "genre": "IDM"}
+{"index": {"_index": "bands", "_type": "band", "_id": 4}}
+{"name": "Boards of Canada", "genre": "IDM"}
+{"index": {"_index": "bands", "_type": "band", "_id": 5}}
+{"name": "Mogwai", "genre": "Post Rock"}
+{"index": {"_index": "bands", "_type": "band", "_id": 6}}
+{"name": "Godspeed", "genre": "Post Rock"}
+{"index": {"_index": "bands", "_type": "band", "_id": 7}}
+{"name": "Harry Belafonte", "genre": "Calypso"}
+
+// Perform a search
+POST /bands/band/_search
+{"size": 0, "facets":{"bands":{"terms":{"field":"genre"}}}}
+
+// A more specific search
+POST /bands/band/_search
+{"size": 5, "query": {"match": {"name": "Harry"}}, "facets":{"bands":{"terms":{"field":"genre"}}}}
 ```
 
-## Integrating With Rails
+## Integrating With an App Server
 
 ## Key Rails Integration Criteria
 

diff --git a/laruby-elasticsearchtalk.md b/laruby-elasticsearchtalk.md
@@ -1,4 +1,4 @@
-# An Elasticsearch in Ruby Crash Course!
+# An Elasticsearch in Crash Course!
 
 ### By Andrew Cholakian
 

diff --git a/laruby-elasticsearchtalk.md b/laruby-elasticsearchtalk.md
@@ -293,11 +293,10 @@ Good because:
 
 ### Links
 
+* This talk: http://bit.ly/142wv13
 * http://www.elasticsearch.org/
 * http://exploringelasticsearch.com (my free book on elasticsearch)
 * https://github.com/PoseBiz/stretcher 
 * Paramedic Cluster Monitoring tool: https://github.com/karmi/elasticsearch-paramedic
-* This presentation: https://gist.github.com/andrewvc/ebbe0e832cdd2ff7b431
-
 
 ## This Page Intentionally Left Blank
diff --git a/laruby-elasticsearchtalk.md b/laruby-elasticsearchtalk.md
@@ -0,0 +1,303 @@
+# An Elasticsearch in Ruby Crash Course!
+
+### By Andrew Cholakian
+
+*All examples use the Stretcher ruby gem*
+
+## What is Elasticsearch?
+
+* An Information Retrieval (IR) System
+* A way to search your data in terms of natural language, and so much more
+* A distributed version of Lucene with a JSON API
+* A fancy clustered, eventually consistent database
+
+## Elasticsearch and Lucene
+
+Lucene is an information retrieval library providing full-text indexing and search. Elastisearch provides a RESTish HTTP interface, clustering support, and other tools on top of it.
+
+## Modeling Data
+
+* Data is stored in an **index**, similar to an SQL DB
+* Each index can store multiple **types**, similar to an SQL table
+* Items inside the index are **documents** that have a type
+* All documents are nested JSON data
+* Strongly typed schema
+
+## Creating a Schema
+
+```ruby
+# Setup our server
+server = Stretcher::Server.new('http://localhost:9200')
+# Create the index with its schema
+server.index(:foo).create(mappings: {
+                  tweet: {
+                    properties: {
+                      text: {type: 'string', 
+                      analyzer: 'snowball'}}}}) rescue nil
+```
+
+## Create some fake data
+
+```ruby
+words = %w(Many dogs dog cat cats candles candleizer abscond rightly candlestick monkey monkeypulley deft deftly)
+words.each.with_index {|w,idx|  
+  server.index(:foo).type(:tweet).put(idx+1, {text: w })
+}
+```
+
+* The document is a simple JSON hash: `{"text": "word" }`
+* Each document has a unique ID
+* We use `put`, elasticsearch has a RESTish API
+
+## And Perform a Search!
+
+```ruby
+# A simple search
+server.index(:foo).search(query: {match: {text: "abscond"}}).results.map(&:text)
+=> ["abscond"]
+```
+
+* our query is actually a JSON object
+* our response is also JSON!
+
+## What is Analysis?
+
+Analysis is the process whereby words are transformed into tokens.
+The Snowball analyzer, for instance, turns english words into tokens based on their stems.
+
+![An Analyzer in Action](https://www.evernote.com/shard/s46/sh/d5eb1481-b9a1-459f-ba93-8ebd9bcae64f/dd6870867a5a06fb6f561b8eade356ef/deep/0/analysis-rollerblading.png)
+
+## Analysis Using the API
+
+```ruby
+server.analyze("deft", analyzer: :snowball).tokens.map(&:token)
+=> ["deft"]
+server.analyze("deftly", analyzer: :snowball).tokens.map(&:token)
+=> ["deft"]
+server.analyze("deftness", analyzer: :snowball).tokens.map(&:token)
+=> ["deft"]
+server.analyze("candle", analyzer: :snowball).tokens.map(&:token)
+=> ["candl"]
+server.analyze("candlestick", analyzer: :snowball).tokens.map(&:token)
+=> ["candlestick"]
+```
+
+## Analysis in Action
+
+```ruby
+# Will match deft and deftly
+server.index(:foo).search(query: {match: {text: "deft"}}).results.map(&:text)
+=> ["deft", "deftly"]
+# Will match candle, but not candlestick
+server.index(:foo).search(query: {match: {text: "candle"}}).results.map(&:text)
+# => ["candles"]
+```
+
+## More kinds of Analysis
+
+```ruby
+# NGram
+server.analyze("news", tokenizer: "ngram", filter: "lowercase").tokens.map(&:token)
+# =>  ["n", "e", "w", "s", "ne", "ew", "ws"]
+
+# Stop word
+server.analyze("The quick brown fox jumps over the lazy dog.", analyzer: :stop).tokens.map(&:token)
+#=> ["quick", "brown", "fox", "jumps", "over", "lazy", "dog"]
+
+# Path Hierarchy
+server.analyze("/var/lib/racoons", tokenizer: :path_hierarchy).tokens.map(&:token)
+# => ["/var", "/var/lib", "/var/lib/racoons"]
+```
+
+## Searching With An NGram
+
+```ruby
+# Create the index
+server.index(:users).create(settings: {analysis: {analyzer: {my_ngram: {type: "custom", tokenizer: "ngram", filter: 'lowercase'}}}}, mappings: {user: {properties: {name: {type: :string, analyzer: :my_ngram}}}})
+
+# Store some fake data
+users = %w(bender fry lela hubert cubert hermes calculon)
+users.each_with_index {|name,i| server.index(:users).type(:user).put(i, {name: name}) }
+
+# Our analyzer in action
+server.index(:users).analyze("hubert", analyzer: :my_ngram).tokens.map(&:token)
+# => ["h", "u", "b", "e", "r", "t", "hu", "ub", "be", "er", "rt"]
+
+# Some queries
+
+# Exact
+server.index(:users).search(query: {match: {name: "Hubert"}}).results.map(&:name)
+=> ["hubert", "cubert", "bender", "hermes", "fry", "calculon", "lela"]
+
+# A Mis-spelled query
+server.index(:users).search(query: {match: {name: "Calclulon"}}).results.map(&:name)
+=> ["calculon", "lela", "cubert", "bender", "hubert"]
+```
+
+## Boosting
+
+```ruby
+# Individual docs can be boosted
+server.index(:users).type(:user).put(1000, {name: "boiler", "_boost" => 1_000_000})
+
+server.index(:users).search(query: {match: {name: "bender"}}).results.map(&:name)
+# Wha?
+# => ["boiler", "bender", "hermes", "cubert", "hubert", "calculon", "fry", "lela"]
+
+server.index(:users).search(query: {match: {name: "lela"}}).results.map(&:name)
+# Sweet Zombie Jesus!
+=> ["boiler", "lela", "calculon", "bender", "hermes", "cubert", "hubert"]
+```
+
+## Faceting
+
+ElasticSearch can report counts of common terms in documents, frequently seen on the left-hand side of web-sites these are 'facets'
+
+![Facets on Amazon](https://www.evernote.com/shard/s46/sh/dcc9a51c-9296-40ac-83b3-ae0ad66379d5/7b1a9c6e980f87c6adc8c3dfed93993a/deep/0/Amazon.com.jpg)
+
+## Let's Facet
+
+```ruby
+# Create a mapping for bands, with a 'name' and a 'genre'
+server.index(:bands).create(mappings: {band: {properties: {name: {type: :string}, genre: {type: :string, index: :not_analyzed} }}})
+
+#Import some docs
+[["Stone Roses", "madchester"], ["Boards of Canada", "IDM"], ["Aphex Twin", "IDM"], ["Mogwai", "Post Rock"], ["Godspeed", "Post Rock"], ["Harry Belafonte", "Calypso"]].
+each_with_index {|b,i|
+  server.index(:bands).type(:band).put(i, {name: b[0], genre: b[1]})
+}
+
+# Perform a search
+server.index(:bands).search(facets: {bands: {terms: {field: :genre}}}).facets.bands.terms.map {|f| [f[:term], f[:count]]}
+# => [["Post Rock", 2], ["IDM", 2], ["madchester", 1], ["Calypso", 1]]
+
+# A more specific search
+server.index(:bands).search(query: {match: {name: "Boards"}}, facets: {bands: {terms: {field: :genre}}}).facets.bands.terms.map {|f| [f[:term], f[:count]]}
+# => [["IDM", 1]]
+```
+
+## Integrating With Rails
+
+## Key Rails Integration Criteria
+
+* Generally use an RDBMS(SQL) as primary store
+* Elasticsearch data should respond correctly to RDBMS transactions
+* Elasticsearch data can be rebuilt from RDBMS any time
+* ActiveRecord objects do not necessarily map 1:1 w/ ES objects
+* ES should fail gracefully whenever possible. If ES dies, your app should degrade, not stop.
+
+## What NOT to do!
+
+```ruby
+after_save do
+  es_client.put(self.id, self.as_json)
+end
+```
+
+Bad because:
+
+* Another after_save block fails causing a transaction rollback, won't rollback elasticsearch
+* ES goes down, your app goes down
+* Even if you handle ES going down, you have to figure out which records need re-indexing when it comes back up
+
+## How We Solved This at Pose
+
+```ruby
+after_save do
+  # Add to RBDMS queue of objects needing indexing
+  IndexRequest.create(self)
+end
+```
+
+Good because:
+
+* Processed in background
+* Transaction safe
+* If ES dies, our queue backs up
+* BONUS: Efficient bulk update now possible
+
+## Queue Visualized
+
+![The Queue](http://blog.andrewvc.com/assets/images/elasticsearch_model_pipeline.png)
+
+## How We Implemented Bulk Updates
+
+* Indexes are rebuilt w/o using queue
+* Multiple DelayedJob workers run mod sharded queries over table
+* High-speed, parallel re-imports possible
+* New content will use queue
+
+## Complex Schema Update Problems
+
+* No (good) Way in ES to change field type.
+* Delete / Rebuild may leave site inoperable too long
+
+## Complex Schema Update Solutions
+
+* Allow N indexes per model
+* All indexes are updated in real-time, IndexRequest queue centralizes reqs
+* Batch job runs in background retroactively adding new records
+* When new index caught up, point queries at it, delete old
+
+## Requirements For Multi-Schema Solutions
+
+* Ability to map models:indexes 1:n. We implemented m:n
+* Simultaneous bulk range and real-time indexing
+* Fast enough bulk operations that you don't take ∞ time
+
+## Problem: Building BIG Queries
+
+* Some queries will be large and programmatically generated
+* Our largest query > 100 lines expanded JSON
+* Sometimes need to run A/B tests between queries
+
+## Solution: Class Per Query
+
+* Each query gets own class
+* Plenty of space for DRY helpers within classes
+* When running A/B tests, subclassing for variations
+
+## Search API Class Structure
+
+![class structure](http://blog.andrewvc.com/assets/images/elasticsearch-classes.png)
+
+## Does ElasticSearch Support Clustering?
+
+## You're Damn Right it Supports Clustering!
+
+![ES Clustering](https://www.evernote.com/shard/s46/sh/85bb4d5b-0b8f-4bb0-bf1e-3d5ed01b6048/41e5d0d37143ce4276d6783d61da6b4f/deep/0/Paramedic%20%7C%20pose-cluster.png)
+
+## The Clustering Story
+
+* All queries run across all shards in the cluster
+* Shards are allocated automatically to nodes and rebalanced
+* A query to any node will work, the actual queries will be executed on the proper shard / node
+* Shards are rack aware
+* Indexes have a configurable number of replicas, set this based on your failure tolerance
+
+
+## The Ops Side of elasticsearch
+
+* elasticsearch is easy to set up!
+* Just a java jar, all you need is java installed
+* Has a .deb package available
+
+## Clustering just works
+
+* Clustering just works...
+* If on a LAN they will find each other and figure everything out
+* If on EC2, install the EC2 plugin and they will find each other
+* There is no built-in security, but proxying nginx in front works well
+
+## Thank You for Listening!
+
+### Links
+
+* http://www.elasticsearch.org/
+* http://exploringelasticsearch.com (my free book on elasticsearch)
+* https://github.com/PoseBiz/stretcher 
+* Paramedic Cluster Monitoring tool: https://github.com/karmi/elasticsearch-paramedic
+* This presentation: https://gist.github.com/andrewvc/ebbe0e832cdd2ff7b431
+
+
+## This Page Intentionally Left Blank