- “we don’t care about perf”
- “clojure should be fast”
- “we’re using mongo!”
- “clojure might not be fast enough, we should use type hints”
- build it, then test it
- instrument with graphite
(GET "/slow" [] (t/time! (t/timer "slow-page")
(do (Thread/sleep (+ 100 (rand-int 100))) "oy!")))- “maybe we could use a macro” – no!
- madness. use bidi!
- use robert/hooke to instrument anything
(add-hook #'function #'hook)(postwalk identity r)to disable laziness recursively
- fix 2: don’t iterate over slow things
- remove Schema
- just return fields we need
- or maybe paginate
- don’t do this prematurely!
- fix 3: tags rarely change - we can memoize
- (what about cache invalidation?)
- fix 4: queries without indices
- mongoDB handy setting:
// tablescans are now banned!
db.getSiblingDB("admin").runCommand(
{ setParameter: 1, notablescan: 1}
);- fix 5: unrolled queries
- temptation:
(->> (all-tumblrs db)
(map (partial all-posts-for-tumble db))
(map (partial all-pics-for-post db)))- slow in code, fast in SQL
- can roll your own profiler
- JVisualVM - free, but limited
- Yourkit - free for big OS projects, or 15-day demo
- Riemann?
- fix 6: RTFM
- stencil/render-string not memoized
- use stencil/render-file instead
- fix 7: more dakka
- fix 8: async work queues
- fix 10: tweak clojure
- “clojure high perf programming” by Shantanu Kumar
- whole nothing issue. esp mobile browsers
- hard to test at scale
- scale “from cheap, fast feedback, flexible” to “accurate, slow, rigid”
- don’t need to go full sophisticated immediately!
- “performance test pyramid” by analogy with test pyramid
- simulant
- don’t know, sounds like a good idea
- is memoization a harder problem?
- memoization is a cheat
- the first person still gets horrible perf
- if things change, the req which gets the change gets horrible perf
- memoization is a cheat
- aren’t most of your problems RTFM? aren’t you solving a cultural
problem with tech?
- we shouldn’t have needed JMeter to find stencil slowness
- reading manuals goes out of the window under pressure
- what do you do when you hit a JVM limit? eg integer parsing <JDK8
had a global lock
- throw up your hands and say screw it
- it’s hard to estimate how much time you should put into perf testing
- distributed
core.asyncand associated libraries
-
- have named “slots”
- code or data
- no method/field distinction
- q: how do you send an “object” from Ponyville to Equestria?
- separation of data and schema and behaviour
- immutable data structures: not cyclical
- transit: richer than json, cross-platform serialization
- nothing is synchronous
- but we fake it
- multiple cores and NUMA
- but
- we humans gloss over it in day-to-day life
- in computer systems, we have to think of everything as asynchronous
- akka:
Any => Unit- ie take any value, return no response
- “I’m going to the pub”
X => LAFuture[Y]- “bruce, I’d like a hug”
- timeouts
- Channels > actors
- because:
- backpressure (paging @RobAshton)
- separation of concerns
- fanout
- actors can be modelled with channels (not supervisors though), but not vice versa
- because:
- script:
- Spike: find the book
- Pinky: find Applejack
- S&P: Look for gems
- when S&P are done, Twilight does stuff with gems, applejack, and the book aka the gofor macro
- gofor: for comprehension, go-style
(gofor
[a (foo channel {:thing val})]
:let [b (inc a)]
[c (baz other-chan)
:timeout 45000]
(println a b c)
:error (println &err &at))sends a message to a channel
simultaneous dispatch
- you can query services for documentation
- lift is the best thing for server push around
- and has been for 8 yeras
- currently http long poll
- actors server & client side
- multiple actors but only one http connection
- developers don’t have to worry about guts
- retries, multiplexing, etc
- the best plumbing “just works”
- focus on taking a shower, rather than source or destination of water
- pipes accessible but not in your face
- so you can fix them when you need to
- or call a plumber
- eg Lift Comet
- parameters for retry backoff tuning
- how to surface errors, in browser and in server
- isolate logic from transport for testing
- separate thinking about domain logic from thinking about REST calls
- fewer developer context switches
- faster dev cycles
- discover services, send messages
- @noahcampbell - “treat your servers like cattle, not pets”
- REFful µservices
- github.com/mixradio/mr-clojure
- legacy java, php
- riak, rabbit, elasticsearch
- continuous delivery since 2010
- snowflake servers
- configuration drift
- variation of size, spec, versions
- slow provisioning (2 week lead time)
- slow deployment
- configuration confusion
- database issues & rollbacks
- no audit, so hard to know
- database issues & rollbacks
- escaped own tin dc, migrated to aws
- dev account, prod
- naming scheme: mr-*
- command line swiss army knife
- written in go
- JVM dependency would suck :(
- also considered node & python (but rejected)
- go is cross-platform
- uses our RESTful services
- create app, find images, deploy, list boxes
- github.com/ptaoussanis/faraday
- bakes machine images
- throw away old servers and deploy new ones to upgrade
- no upgrades
- handled by new bake
- puppet for a few important things
- amazon linux base image
- mixradio base image
- service image | ad-hoc image (testing)
- shells out to packer
- uses Raynes/conch
- per-environment configuration
- mixradio/mr-tyrant
- backed by github for config storage
- app is a readonly frontend
- Raynes/tentacles
- app properties
- db conn strings
- port numbers
- deployment parameters
- capacity, security groups
- launch data
- ie userdata
- keeping infrastructure in sync
- mr-pedantic
- “puppet for cloud infrastructure”
- configuration exists
- gets compared with running env
- running env gets corrected to match config
- idempotent
- lets us roll out in another region
- backed by github for config storage
- clojure for configuration
- clojail to avoid stupidity
- deployment orchestration
- mr-maestro
- asgard via APIs
- red/black deployment
- bring up new version
- healthcheck
- add to load balancer pool (but keep old version)
- when happy, remove old version
- (if unhappy, can always flip back to old version)
- asgard was good for finding what we wanted
- maestro can deploy itself
- the new servers will receive the messages which will kill the old versions
- someone else’s good idea redone on a more primitive level
- datalog queryable in-memory database
- triple store:
- <entity, attribute, value>
- database as an immutable value
- completely in-memory
- written in cljs, js bindings available
(defn create-conn [schema]
(atom (empty-db schema)
:meta {:listeners (atom {})}))
(defrecord DB
[schema
eavt aevt avet ; three indices
max-eid max-t])
(defn with [db datoms]
(-> db
(update-in [:eavt into datoms])
(update-in [:aevt into datoms])))
(defn transact [conn datoms]
(swap! ;...
))- impl: BTSet: B+ tree (sorted set)
- perf comparable with sorted-set:
- slower conj, but faster iterate
- binary search lookups
- fast range scans
- reverse iteration
- fast first-time creation
- clojurescript, but heavy use of js arrays and APIs
- datascript is lightweight
- 700 loc btset
- 550 loc query engine
- 1700 loc total
- 1200 loc tests
- unfortunate to be associated with the word “database”
- no networking
- query over memory
- every app has an ad-hoc state
- put everything in a database instead
- non-trivial SPA has complex state
- KV stores do not cut it
- no inherent hierarchy
- natural for any data: sparse, irregular, hierarchical, graph
- faster data retrieval from big datasets
- server sync
- undo/redo
- local caching
- audit
- datascript + react
- db update causes full re-render, top-down
- immutability makes it fast
- datomic never removes anything
- datascript will clean up old values, to ensure storage is bound
- acha-acha.co
- no server fetch on any navigation, all queries and aggregations happen on client
- considered alpha
- ephemeral? transient?
- persistent-first, but not always?
- Persistent data structures
- immutable
- permit update-like operations
- with at-most logarithmic slowdown when compared with mutable counterpart
- “fully persistent data structures”
- full branching history remains available
- transients
- mutable, but share structure with persistent data structures
- enforce (<1.7) / demand (>= 1.7?) thread isolation
- change in 1.7 removes the enforcement
- still requires a single thread to “own”
- can hand-off via safe publication, so long as single ownership remains
- can be frozen to create a persistent DS
- ephemeral data structures: another term for mutable data structures
- implementation strategies
- trees with path copying for efficient PDS ‘updates’
- a notion of ownership for subtrees to support transients
- Bagwell (2000), Ideal Hash Trees
- mutable Hash Array Mapped Tries
- contribution: traditional hash tables have an occasional expensive resizing operation
- the unlucky insertion has to wait while the whole hash table is copied
- HAMTs avoids this by using a tree structure
- demonstrate comparable performance in best case, while avoiding whole-table copy resize ops
- mutable Hash Array Mapped Tries
- Hickey (2007), Clojure
- persistent hash maps based on HAMTs with path copying
- also vectors based on a modification of the idea behind HAMTs
- Hickey (2009), Clojure
- transient vectors
- transient maps added later by @cgrand
c.c/transientoriginally calledmutable,persistent!-immutable!- thread isolation introduced later
- Prokopec, Bronson, Bagwell, Odesky (2011)
- Cache-Aware Lock-Free Concurrent Hash Tries
- Concurrent Tries with Efficient Non-Blocking Snapshots
- Ctries - lock-free mutable HAMTs with O(1) snapshots
- the snapshots are first-class Ctries independent of originals
- key notion: subtree ownership
- distinguished central location stores an ‘edit’ marker
- an AtomicReference
- and so do individual nodes
- the transient instance itself owns nodes with its edit marker
- other nodes might be shared
- updates check edit markers, mutate own nodes in place, copy paths otherwise
- new paths get new edit markers, subsequent updates will be in-place
persistent!invalidates the transient by resetting its edit marker to nil
- data.avl vs JDK’s NavigableMaps
- NavigableMap: nearest neighbour queries and subsetting
- two impls: java util TreeMap & java util SkipListMap
- JDK impls return subcollections as views
- modifications to the original reflected in the view (and vice versa)
- cannot add items to views outside original bounds
- inappropriate when the subcoll is to be passed on for arbitrary use
- NavigableMap: nearest neighbour queries and subsetting
- however, BSTs (including red-black and AVL trees) support join/split/range
- join: merge/concat for pairs of trees with an ordering precondition
- split: produce subcolls including keys < and > a given key
- range: extract subcoll bounded by limits
- all in logarithmic time
- in mutable setting, these modify original tree
- intention of NavigableMap is different
- that it leaves original untouched
- TreeMap is forced into view-based impl
- AVL doesn’t have this problem
- return new first-class NavigableMaps, leave original untouched
- better than Java’s own NavigableMaps in this respect
- …unless we add transients
- a transient can be stored in a j.u.TreeMap-workalike wrapper
- an independent snapshot can be produced by invalidating the transient
- the original wrapper is free to install a new edit marker immediately
- the snapshot can be subsetted to produce first-class subcolls
- these can be safely returned
- cost: must store edit markers, a little slower for updates
- no benefit for SortedMap, only NavigableMap
- map data structure with support for concurrent updates
- comparable to regular HAMTs in perf of HAMT ops
- snapshotable in O(1) time with little degradation to perf of other ops
- snapshots are completely independent first-class ctries
- now impl in pure clojure - ctries.clj (not on github yet)
- ctries: internally similar to clj PersitentHashMap
- though different range of node types
- key structure difference:
- indirection nodes
- ensure linearizability of history at a slight perf cost
- CAS-based in-place updates to tree structure
- “generation” marker used to determine subtree ownership
- with transients, establishing ownership happens at every level of the tree
- with ctries, can skip levels
- would benefit from a generalized CAS operation
- GCAS: original contribution of second ctries paper
- RDCSS: Harris, Fraser, Pratt (2002), A Practical Multi-Word Compare-and-Swap Operation
- based on descriptor objects stored at the modified location
- GCAS only creates descriptors upon failure
- at first glacne, ctries could support persistent and transient APIs simultaneously
- persistent ops would take snapshot and modify snapshot
- no go because transients reuse method names from the persistent API
- instead, ctries.clj maps are “transient first”
persistent!creates immutable snapshots that behave like persistent mapsderefcreates mutable snapshots in the form of independent ctriesderefalso works on immutable snapshots (returns mutable independent ctries)
- persistent ctrie-based maps to be optimized further
- Wisp, a lisp for js
- clojurescript as npm modules
- mori
- compiled clojurescript
- prefix notation
- includes quite a few clojure standard library functions
- clojure data structures
- immutablejs
- javascript
- prototype extension
- mori
- om
- love your library, not your language
- had a huge influence on javascript single-page apps
- clearly faster
- clearly fewer bugs
- omniscient
- javascript clone of om
- immutablejs/react
- immstruct
- implement cursors from om
- javascript clone of om
-
- confident community
- es6
- libraries > framework
- http://slides.com/rrees/trojan-horsing-clojure-with-js
-
- do js people really understand the Om influence on the way
single-page apps are going?
- two-way data binding is dying
- is wisp ready for prod?
- you know what you’re shipping
- you’ll have to test the js that comes out of it
- Glasgow-based, Arnold Clark
- stuart sierra component workflow
- juxt/jig – superceded by stuart sierra’s work this year
- github: james-henderson
- twitter: jarohen
+------------+
listens | Atom | updates
/------------| |<--------\
v +------------+ |
Widget Model
| +------------+ ^
\----------->| Channel |---------/
events +------------+ reacts to
- core.async
- lots of boilerplate
- could a library help here?
- led to clidget
- ensured DOM was updated every time atom changed
- evolved into Flow
- 2 weeks later, Om was announced
- why would any sane person carry on with Flow?!
- couldn’t shake the feeling that things could be simpler
- Om & Reagent introduce a lot of new concepts
- 100% declarative
- minimise number of new concepts
- perform “well enough” to be useful (but don’t compete with React)
- https://github.com/james-henderson/flow
- (also, facebook created a thing called “flow” too. not the same)
(:require [flow.core :as f :include-macros true])
(defn hello-world []
(f/root js/document.body
(f/el
[:p.message {::f/style {:font-weight :bold
:color "#a00"}}
"Hello world!"])))
;; dynamic
(defn counter-component [!counter]
(f/el
[:p "value is " (<< !counter)]))- the
<<operator continuously reads from an atom
- important for composable, maintainable code
- the other operator
!<<
- input
- lexing
- parsing
- code synthesis
- output
- read-string
- notions:
DynamicElement :: AppState -> (Element, DynamicElement)
DynamicValue :: AppState -> Value(if <DynamicValue>
<DynamicElement>
<DynamicElement>)
(defmethod fc/compile-el-form :if [[_ test then else] opts]
`(build-if (fn [] ~(fc/compile-value-form test opts))
(fn [] ~(fc/compile-el-form then opts))
(fn [] ~(fc/compile-el-form else opts))))- don’t write macros
- don’t write macros!
- (unless you know you have to)
- Get out of macro-lang as soon as you can
- changing the execution order?
- analysing a form?
- what’s in
build-if?- cached state (of the current value of the test)
-
- the bottleneck is becoming me, rather than the clojurescript compiler
- stable release
- more tutorials, examples, docs
- next version: ???
- feedback please!
- give it a go!
- get involved!
- @tcoupland
- MixRadio
- graphite
- graphical frontend for viewing metrics
- carbon
- default data store for graphite
- (technically, whisper, but no matter)
- flat files on disk
- round-robin rollup
- cyanite from @pyr
- elasticsearch/cassandra backend for graphite, replacement for whisper
- core.async primer
- channels, take/put, go blocks
- thread pool executors
- channels, take/put, go blocks
- YourKit agent added to cyanite
- start CPU profiling
- culprit: add-path
- doing blocking I/O inside go-loop
- don’t block the thread pool
- Single Responsibility Principle
- changed topology to read from one input, write to two separate output channels (elasticsearch + cassandra)
- but… performs even worse!
- less CPU usage, but less metrics/s and less network usage
- faster, but stil not great
- 600 metrics/s
- 50k/s traffic in, 160k/s traffic out
- deserves a cup of tea :)
- eventually we find a problem:
- AssertionError: No more than 1024 pending puts are allowed on a single channel
- channel has in-built buffer
- but there’s an extra “buffer” of pending-puts (and takes)
- which has a hard limit of 1024
- we hit it
- helpfully, the error message suggests “windowed-buffer”
- which drops work on the floor
- nope nope nope!
- (@aphyr from the twitter gallery: LITTLES LAW)
- core.async partition
(<! (partition 1000 chan))- consumes messages in batches of 1000
- add-path fn is doing too much
- split into separate CSP processes
- check full path
- break up
- insert
- now:
- 2500 metrics/s
- network
- 400 k/s in
- 1500 k/s out
- are we stressing cyanite enough to see problems?
- soak testing
- generate vast quantities of data to stress cyanite
- use yourkit profiling
- observe most of what we’re doing is waiting for elasticsearch
- current elasticsearch client uses clj-http
- others could use httpkit or similar (which use java nio)
- being open source really pays off here
- we can just look inside the elasticsearch driver
- we can copy-paste it and swap out another http client
- qbits.hayt.cql – cassandra query builder
- half the system’s time is spent on
clojure.string/join- this isn’t so good
- I’m using the library wrong
- it’s for building nice queries
- I’m just doing inserts, I don’t care about queries
- should just do batching myself
- now
- CPU usage halved to 16% (from 30%)
- memory usage down
- 2500 metrics/s, same as before
- network
- in: 400k/s
- out: 880k/s (down from 1500k/s, due to batching)
- all our time is in the format_processor
- this looks good
- Single Responsibility Principle
- works at all the abstraction levels
- it’s always a good idea
- except when it isn’t, like all good ideas
- open source is brilliant
- but RTFM
- if their motivation doesn’t align with yours, it might not help you
- but RTFM
- core.async
- did it deliver on its promise?
- it’s a nice way of breaking things down and following SRP
- check the JIRA before betting the business on it
- performance tuning
- it’s brilliant, do it
- go use YourKit
- BirdWatch, a twitter processing thingy
- joy of clojure
- react
- Om
- transducers
- listened to rich’s talk several times
- each time learned something new
- listened to rich’s talk several times
- A farewell note to a programming language (namely scala)
- twitter streaming API
- clojure stream client
- percolation queries for elasticsearch
- sends data only to those clients who are interested via websockets
- logging, pprinting
- bah
- tail -F didn’t work, wanted multiline
- log to core.async channels with dropping-buffers
- can inspect latest state by reading from them
- composition
- micro -> macro:
- immutability, pure fns, idiomatic clojure
- core.async, component
- prod
- but we spend more of our time at the high level – where are our composition tools here?
- micro -> macro:
- wish-list for experience report:
- context
- I want to know why I need this thing
- limitations in the group
- values in the group
- alternatives
- and why didn’t you choose them?
- high-level view of component
- before the details are presented. won’t under the details without this
- interesting details
- context
- goal: archive data from kafka to S3, without any hadoop dependencies
- main app:
- nginx
- ruby webapp
- comparison service + mongo
- collector + MySQL
- all logs sent to kafka
- kafka keeps a sliding window, keeping latest n messages
- behind kafka:
- a loader
- sends data to buckets
- “blueshift” - uswitch tool that sends stuff to aws redshift
- time
- S3
- part of larger piece of work
- streaming (no longer batching)
- secor – has a dependency on hadoop
- 30 minute slots – please contact me!
- first meetup jan 2015
- lenses! <3
update-in- separates the focus of context from the applied function
- the same paths work with
get-inandassoc-in - however,
update-inis specialised to one kind of focus
- “functors are structure-preserving maps between categories”
- Barr and Wells
- functors are functions that lift functions into a context.
- functors compose as functions
- sequence functor
(defn fsequence [f] (partial map f))
- identity functor
(defn fidentity [f] f)
- constant functor
(defn fconst [_] identity)
- “in” functor
(defn fin [k f] (fn [x] (update-in x [k] f)))((fin :x inc) {:x 1 :y 1}) => {:x 2 :y 1}
- Lenses are functions that lift contextualising functions into a context.
update,put,viewcan all be represented by one function.- lenses compose as functions.
- I don’t think my notes can do this talk justice. it’s really good, watch the video :)
- lenses that can have more than one target
- the
updatepart works - but
viewneeds variadic functors (~ Applicatives) and Monoid targets- need a strategy for combining multiple values into one
- problem in clojure: monoid zero value doesn’t know the context it’s in
- so how do you deal with traversals with zero targets?
- inspired by kmett’s haskell lenses
- but uses building blocks from clojure/core
- and recognizes limitations of lack of contextual type information
- a lens in this world is a pair of fns:
focusandfmap - in this world, the traversal laws are more what you’d call guidelines…
- https://github.com/ctford/traversy
- what are your thoughts on teh different worlds of haskell and
clojure?
- the social difference is most clear
- clojure is v friendly
- haskell is friendly too, but a bit of a rough exterior sometimes
- I work at factual
- I do a lot of database stuff
- hugs form a monoid!
- ->>, map, filter, reduce
(->> coll
(map inc)
(filter odd?)
(reduce +))- if you create intermediate sequences for these seqs, you put a lot of pressure on the GC
- catamorphisms!
- reducers eliminate the intermediate seqs
- haskell calls this “stream fusion”
- a specific case of deforestation - a fp optimization technique
- reducers are defined for each coll type
- trancducers have no underlying type (to confuse haskell people)
(mapping inc)– maps over anything- but still sequential
- if f is associative, we can fold in either direction
- you can have an arborescent fold (tree-like)
- can more easily parallelize
- then bring together the resultant results
- this is already in reducers
- because Rich invented everything and all we have to do is
discover it in the core library
- eg PersistentQueue - who knew?
- results preserve order
- order requires coordination
- coordination might be hard
- if
(= (f a b) (f b a))we can do stuff in any order we like - monoids compose wonderfully
- commutative monoids compose even better
- they’re great for distributed systems
- see also CRDTs (which are commutative monoids with idempotence)
- parallel
- unordered
- stream fusion
- collection independent
- can do this in hadoop
- using stateful mappers
- can do reductions in the mappers
- post-reduce phase
- reduce over transient, or native array
- this phase can seal it
- post-combine phase
- similar, after combining
- tesser uses a map!
- rather than multiple arity fns
{:reduce ...
:reduce-identity ...
:combine ...
:combine-identity...
... ...}- unlike transducers, we defer the build phase til the end
- so we can compose inside & out
- allows seq API
- identity fns must be pure
- transducers doesn’t use pure reduction fns
- take sets up an atom for no of elements
- increments each time
- doesn’t have to wrap the accumulator value
- no-go for tesser
- we need all the state in the accumulator
- so we can pass it back and forth over network
- transducers doesn’t use pure reduction fns
- reducers/combiners must be associative
- and also commutative
- algebird isn’t commutative. should we order?
- must short-circuit
(reduced x)values
- stats!
- mean, variance, covariance, correlation matrices