cscorley/proposal.md

Last active September 10, 2015 18:59

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/cscorley/71fa1da790a55d58912b.js"></script>
Save cscorley/71fa1da790a55d58912b to your computer and use it in GitHub Desktop.

Raw

the general algo goes like so:

for chunk in corpus:
  e-step
  m-step

gensim hacks in multiple passes:

for pass_ in passes:
  for chunk in corpus:
    e-step
    m-step

what we've been doing (only works for batch):

for pass_ in passes:
  for bound_iter in iters:
    for chunk in corpus:
      e-step
      m-step
  
    break if done

for online updates, would it make more sense to:

for chunk in corpus:
  for bound_iter in iters:
    e-step
    m-step
  
    break if done

this would give us something that works the same for batch (via chunksize=len(corpus) and bound_iters > 1) but also something that works for online mode (via chunksize<len(corpus) and bound_iters > 1).

hazelybell commented Sep 10, 2015

I mean the only reason I can see to use example 4 is if the updates are "actually" online. If chunks are being used just due to memory constraints but we're really running in a batch mode, then having chunks on the inside makes more sense and follows the algorithm in the paper more closely.

cscorley/proposal.md

Select an option

No results found

Select an option

No results found

hazelybell commented Sep 10, 2015

Uh oh!