Monday
05Oct2009
OMGWTFBBQ: if CONC trees replace CONS lists, then mapreduce can be parallel
20091005 at 10:20 Alternate title: Mary had a little λ (kidding!)
Guy Steele: Organizing Functional Code for Parallel Execution
CONS (or “Lisp”) lists are inherently linear and sequential. One of CAR, CDR is constant-time, and the other is linear (on the length of the list). For a given list, there is only one CONS representation.
CONC primitives are null <>, singleton <42>, and “concatenation” a || b
CONC trees can represent CONS style lists, or balanced binary trees, or sparse trees, or other structures.
- CAR CDR and CONS are compared to CONC primitives… and rather than use accessors left, right Guy prefers a SPLIT functional accessor which calls the second argument (a function) with the left and right parts as parameters.
- Guy says this somewhere later on, and it’s easy to miss:
“We are going to use CONC trees to optimize delay”
- An important goal is a corollary to the equivalence
(cons (car xs (cdr xs)) = xs
- It’s not this:
(conc (left xs) (right xs)) = xs
- but this functional gem:
(split xs (λ (ys zs) (conc ys zs))) = xs
- Think of the lambda λ as a continuation, and a way to keep the left and right together. In practice, split does not have to behave uniformly, but for the sake of this talk, consider split to be purely functional with no side effects. You will notice we now have a corollary to the most low-level CONS equivalence, but one which is expressed functionally, recursively, by “binary decomposition and reassembly”
Having taken care of the basics, Guy breaks down the implementations of MAP, REDUCE, MAPREDUCE, LENGTH, FILTER, QUICKSORT and MERGESORT for both CONS lists and the new CONC trees.
if CONC trees replace CONS lists, then MAPREDUCE can be parallel - Watch more Videos at Vodpod.
Heres the recursive mapreduce implementation with the Opportunity for Parallelism.
(define (mapreduce f g id xs) ; Logarithmic in (length xs)??
(cond ((null? xs) id)
((singleton? xs) (f (item xs)))
(else (split xs (λ (ys zs)
(g (mapreduce f g id ys) ; OMG Opportunity for
(mapreduce f g id zs))))))) ; WTF Parallelism
; BBQ (mmmm, barbecue… <gurgle>)
(cond ((null? xs) id)
((singleton? xs) (f (item xs)))
(else (split xs (λ (ys zs)
(g (mapreduce f g id ys) ; OMG Opportunity for
(mapreduce f g id zs))))))) ; WTF Parallelism
; BBQ (mmmm, barbecue… <gurgle>)
…. Almost done! ….
RANT ON
If you are on Windows searching the Unicode characters looking for “LAMBDA”, you should STOP IMMEDIATELY. Search for “LAMDA” (no B). Microsoft doesn’t know how to spell “LAMBDA” much less describe its purpose in binding new variables dynamically, so the lambda (and its children) have a protected scope that hides variables in outer scopes, until the lambda completes and its scope goes away. Anyway HTML for the λ is ” λ ” — that’s ampersand lambda semicolon (no spaces)
RANT OFF
in
code,
video | tagged
analysis,
concurrency,
design,
language,
parallel,
performance,
programming,
scalability
code,
video | tagged
analysis,
concurrency,
design,
language,
parallel,
performance,
programming,
scalability 
Reader Comments (2)
Ahh, brings back memories of the FFT recursive algorithm by breaking the problem up into even and odd portions and iterating until simple multiple/add operation is done.
Just out of curiousity, is this post relate to opening list operations to parallel implementations? Any luck getting this speaking to CUDA (nvidias parallel sorta c language). They have a predefined set of hash like tables (grids, offsets) but my guess is optimized compliled local massive parallelized code for a video card could make quick work of these type of problems. But the memory space is SMALL.
Mark - great question!
Did you watch the video to the end? He talks briefly about his research project - "Fortress" - a high-level language, compiler and interpreter. His focus seems targeted at very high end research though... And the market would be top research institutions and universities. Sun's elite server clients.
What you are talking about though does seem like it could easily "fall out" ... in the question part at the end they talk about how to automatically assess how to execute sequentially once you've exhausted your parallel resources.
This seems tractable for the problem you mention, but I do believe it would have to be a hardware solution -- given this language is a "research project" and is unlikely to run anything in "very small memory".
The hardware solution could be general though - a tiny lisp or other interpreter/compiler built with CONC primitives.
Check out the questions which begin around the 58:00 mark. He hopes to "keep the trees fairly well balanced" which makes reasoning about certain operations possible using theorems yet to be developed.