From blerner@cs.washington.edu Thu Dec  6 11:51:01 2007
Date: Thu, 06 Dec 2007 11:50:58 -0800
From: Ben Lerner <blerner@cs.washington.edu>
To: Brad Chamberlain <bradc@cray.com>
Subject: Re: Thoughts on using Chapel

And some replies to those.  Most of my "complaints" can be waved away as
"newbie alerts", so take them with a large grain of salt...

~ben
> > * I know we discussed this in class, but while writing my toy program, I
> > could not keep straight the difference between calling functions with []
> > versus ().  Most of the time, it didn't matter -- most of my functions
> > were one-argument anyway, but I was confused by the documentation about
> > string.substring:  you show both foo.substring(i) and foo.substring[i],
> > but only foo.substring[i..j], so I wasn't ever sure whether ranges were
> > treated as single variables or whether they were zipped over...
> 
> Here's how I remember the zipper vs. tensor semantics -- it's based
> strongly on how the choice of syntax fell out:  Our belief was that for a
> function call with two promoted arguments -- say, exp(A, B) -- the natural
> interpretation seemed to be to zipper the arguments and create a result of
> the same shape (e.g., exp(A(1), B(1)), exp(A(2), B(2)), ...).  Meanwhile,
> array slicing notation, which is a tensor product, more traditionally uses
> square brackets -- A[2:n-1, 2:n-1] in Fortran, for example.
> 
> Stylistically, when I'm slicing an array, I almost always use square
> brackets, even if its a 1D slice like A[i, 1..n].  This is equivalent to
> A(i, 1..n) of course, but I tend to use the square brackets because it
> looks more like a slice to me.
> 
> I suspect that the author of the substring code (which may have been me,
> though I don't think it was), was falling into a similar mode, thinking of
> a substring as being a slice of a string.  At some point, I think we even
> supported substring slicing via the indexing function, so this may have
> been a leftover from that conversion.  All that said, I suspect that I'd
> probably write it using parenthesis today now that it looks more like a
> method and I don't think of it as being a promotion.  Do you remember
> where you saw that?  Language spec or example codes?
> 
I saw these three snippets in the documentation.  Part of my confusion I
think was my C-programmer's intuition that a string is a char array, and
therefore brackets are appropriate.  But it's a function call, and I don't
think I want tensor semantics, so parentheses.  Wait, but there's only one
argument, so they're equivalent.  And so on, round and round.  Your example
of A[i, 1..n] is particularly tricky -- when i is holding a range, this has
very different semantics between [] and (), even though it looks like
there's only one range present to slice over.  If you went with your current
suspicion and wrote it with parens today, you'd get, what, a diagonal slice
of characters through an array of strings?  I'm not saying your design is
wrong, just that it takes getting used to, and can hide some unexpected
complexities.
> 
> > * The semantics of begin, cobegin, forall and coforall need some
> > clarification.  For instance, the spec should say explicitly something
> > like, "Coforall <cond> { <body> } is sugar for forall <cond> { begin {
> > <body> } }", rather than just the English prose it currently has.
> 
> Agreed.  This makes sense to me.  I'd probably describe it less in terms
> of sugar and more interms of "semantically equivalent to", since that
> suggests that we might implement it intelligently (using a tree of spawns
> rather than a linear sequence of them, e.g.).  Sugar implies strongly to
> me that it is probably just implemented by making the conversion you
> specified.
As I understood the semantics of forall, you *might* implement the
iterations sequentially, or in parallel; I wasn't sure which behavior
happened when or why.  But if you were to implement forall in parallel, then
surely the same tree-of-spawns would be used for coforall.  So I didn't see
the sugar as being restrictive of your implementation strategies.  That
said, sure -- saying they're semantically equivalent is fine for programmer
understanding.
> 
> Note that the sugar you give above is incomplete because it doesn't
> capture the join at the bottom of the forall loop.  The rewrite of this
> would probably involve capturing some sort of termination condition into a
> sync variable and waiting after the loop for all the sync variables to be
> filled.  The tedium of writing this common idiom all of the time was part
> of the motivation for introducing the coforall (along with the observation
> that we could probably do the spawn and joins more efficiently for a large
> group of tasks than a beginning or lazy user would bother to.
True, I forgot that point :).  But the larger point remains, if you can
"compile away" coforall loops into even an inefficient representation using
foralls and joins and whatever, then you've shrunk the core language.  I
think, in fact, that the only two concepts you need in this space are
for-loops and begin; cobegin, forall, and coforall can all be implemented in
terms of those two.
> 
> 
> > * I ran across a bug in the implementation of coforall loops where if
> > your iteration variable has structure (a tuple or something), the
> > current compiler doesn't privatize the destructured values properly.
> > Combining the first and third points above, why can't you say "'for
> > <tuple> in <2+D domain> { <body> }' is sugar for 'for temp in <2+D
> > domain> { const <tuple> = temp; <body> }'. In all cases, temp is locally
> > scoped to the current iteration of the loop." Then your compiler bug
> > goes away, because your semantic support for loops is simplified.
> 
> As I understand it, this rewrite is essentially what we do, but a
> "privatize me" flag on the original variables wasn't getting propagated to
> the temporary variables appropriately, causing the temps to be allocated
> on the stack rather than the heap in the generated code, which is what
> resulted in having multiple threads share the values.  So yes, we think in
> terms of these rewrites, but that doesn't imply that the compiler will be
> bug-free.  Steve's already working on the fix for this.
Ok, I was thinking of fixing this as sugar in the parser, because we know
that rewriting this explicitly fixed the bug.  Either way works.
> 
> > * Idiomatically, I wanted the ability to construct lists of values and
> > fold over them.  The best I came up with was to define sparse domains
> > and add elements to them, then iterate over indices in the sparse
> > domain.  That seems rather "meta" -- I want a list of those values, not
> > have those values be indices to some other array.
> 
> I'd suggest thinking of the domains truly as 1st class index sets, and not
> simply a means of creating an array.  For example, I use associative
> domains to implement sets of values all the time, and never allocate
> arrays over them.  Along those same lines, I think sparse arithmetic
> domains are a very reasonable way to store a sorted list of elements from
> a bounded set of indices, even if they're never used to implement an
> array.
> 
> All that said, it is our intention to support a standard list class in the
> standard libraries (and in fact, I think we have a first draft already
> there -- I'm not sure if it's documented).  We used to have a sequence
> type in the language, but it was one thing that's been pruned out of the
> language and put into the library with the goal of making the language
> smaller.
> 
> 
Makes sense.  The thing I felt awkward about was wanting to write ([i in D]
i) all the time to grab the indices out of the domain.  It feels like a
weird typecast : arith.domain.indextype -> int.  I was trying to maintain in
my head that "yeah, they domain uses ints for indices, but treat them as
opaque, or else there's a slippery slope to thinking ints are indices when
they're not."  Maybe that's just me being new to Chapel.

> > * Also, once I'd defined a sparse domain, and an array using it, every
> > time I added an element to the domain that array must be reallocated,
> > no?  I'd like a way to say, "Here's a dense domain, and a sparse domain
> > over it, and an array over *that*; now hang on a sec while I initialize
> > the sparse domain... ok, *now* go ahead and allocate that array once and
> > for all, thanks."
> 
> Does anything prevent you in this case from moving the declaration of the
> array after the sparse domain's initialization?  If not, I'd suggest doing
> that -- declare the dense domain, the sparse domain, initialize the sparse
> domain, now declare the sparse array.  That way, the allocation happens in
> one fell swoop as you'd like.
I wanted all the declarations to be global, so I wouldn't have to pass them
around in my code. 
> 
> In any case, I believe an important compiler optimization for sparse
> domains/arrays is going to be to batch up reallocations like this on the
> user's behalf (and this is the sort of thing that fits into that second
> "optional interface" category of a distribution -- a sparse implementation
> that wants to support this kind of optimized addition would support it,
> but there would be no requirement to support modification of more than one
> index at a time.
> 
> Even with such an optimization, as a distrustful user, I'd still move the
> array declaration after the sparse domain's initialization, if I had the
> power to do so.
> 
> Or, can the sparse domain be initialized on its declaration line rather
> than after its declaration (using an iterator, e.g.)?  This would be the
> ideal way to do it since it would allow the sparse domain to be declared
> const, which should result in other optimization and readability benefits.
Not in my case -- the sparse domain StartIndices, for instance, is the
location of all "<" characters in the input, and the positions just after
all ">" characters.  That said, they are constant, right after I figure that
information out.   Here's a wacky thought: would the following work?
const AllIndices : [1..length(input)]
var StartIndices : single sparse subdomain(AllIndices)  <<== note the
single!
var EndIndices : single sparse subdomain(AllIndices)  <<== note the single!
var parsedElements : [StartIndices, EndIndices] single XmlElement  <<==
hierarchical domains, not there yet but soon :)

/* expectation: parsedElements has not been allocated yet, since it's
waiting for StartIndices and EndIndices to be assigned */

StartIndices = ComputeStartIndices(input); /* still waiting... */
EndIndices = ComputeEndIndices(input);
/* Ok, go!  Both sets of indices are available, so parsedElements can be
allocated.  Moreover, unless it's reassigned, the domain of parsedElements
must be constant. */

> > * There doesn't seem to be an easy way to re-order iterators, or if
> > there is I didn't understand it clearly.  For instance, given the
> > iteration [1..2, 1..4], which yields the pairs (1,1), (1,2), (1,3),
> > (1,4), (2,1), (2,2), (2,3), (2,4) in that order, I'd like a simple way
> > to "take the transpose" of that and get column major order.  If I'd had
> > that, then I wouldn't have been bitten by the coforall bug above -- I
> > could have used a forall loop instead. :)  I suspect I could work around
> > this by having done forall (y,x) in [1..4, 1..2], which swaps the
> > variables and also the iterator, but that feels not-clean, somehow.
> 
> If you're doing a forall loop then by definition you're abdicating control
> over the iteration order of the loop (and since our forall loops are
> currently sequential, you're going to get our default iteration order of
> RMO in practice).  You can regain this control by authoring the
> distribution for the domain over which you're iterating, in which case
> you're specifying the implementation of forall loops.
> 
> A lighter-weight way to do this would be to simply write your own
> standalone iterator:
> 
>     def CMO(D: domain(2)) {
>       for i in D.dim(2) {
>         for j in D.dim(2) {
>           yield (i,j);
>         }
>       }
>     }
> 
> and invoke it:
> 
>     for (x,y) in CMO([1..2, 1..4]) {
>       ...
>     }
> 
> This looks simple because it's all sequential -- making it parallel would
> be harder, but given that you have some pretty heavy constraints on the
> order of the parallelism, that seems appropriate.  Implementing parallel
> wavefront style loops has similar issues -- one technique is to use an
> array of sync variables to control when each iteration can legally fire.
Well, yeah.  I was kinda hazy on the iterators section (I think I missed
class that day).  But essentially what I did was coforall create every
thread at once, then have my array of single variables to let the iterations
proceed in wavefronts as you described.  (I don't need sync vars because
each subproblem should be solved uniquely, and solved exactly once, so
single vars give me some sanity checks on my algorithm, and should give the
compiler some constant optimization opportunities.)  When the coforall loop
was broken, I thought of trying to explicitly code this other ordering on
the parallelism via a forall loop.
> 
> 
> 
> > * The syntax isn't well specified in the spec, I think due to its change
> > over time.
> 
> Hmm, I don't think it's changed that much since the spec was written.
> Could you clarify for me in what way it seems poorly specified and how it
> could be improved?  Are you referring to the syntax "diagrams" or the
> supporting text that describes them?
There are some BNF-style descriptions in the spec, but I'm not sure they
match reality explicitly.  (I think the syntax for class declarations and
deriving from base classes is missing a colon separating class D : B, for
instance.  Also evidently semicolons are prohibited following class and
function body definitions, but are required after variable definitions in
class bodies.  I didn't gather that from the spec; just from the example
programs.  I was also confused by the if-then syntax, see below.)  It's not
so much that the spec is blatantly poorly specified, but when there are
mismatches between the spec and reality, I don't know which is newer or more
correct.  Having an appendix at the end containing the Bison grammar you
actually use in the compiler (dropping the semantic actions or other cruft)
should be an automated way to keep the spec in sync with the code.

> 
> > But it wasn't clear why you had an optional "then" keyword, for
> > instance; pick one syntax (e.g. C's, since you're very close to it
> > already) and stick to it.
> 
> For single-statement conditionals, there's a syntactic ambiguity due to
> Chapel's use of square brackets for both function arguments and forall
> loops.  For example:
> 
>     if A [D] write("hi");
> 
> Could be:
> 
>     if A[D] {
>       write("hi");
>     }
> 
> or:
> 
>     if A {
>       [D] write("hi");
>     }
> 
> So optional keywords like "then" (and its partners in crime) are used to
> support the single statement cases of control flow:
> 
>     if A[D] then write("hi");
> 
>     if A then [D] write("hi");
> 
> Since the single statement could itself be a compound statement, this
> permits:
> 
>     if A then { [D] write("hi"); write("there"); }
> 
> but of course, the then isn't strictly required in this case.  The result
> is if you use curly brackets religiously, you'll end up being very C-like.
> If you like to avoid curly brackets in single-statement cases, you need to
> use the keywords.
> 
> This rationale should be captured in the spec if it isn't already.
> 
> If you have thoughts about getting rid of the then altogether, that would
> of course be of interest.
I see.  Well, I personally would just consistently use brackets until I was
more comfortable with the language, but another possibility is one taken by
Perl and I think Ruby, where you can say
<stmt> if <expr>;
instead of the typical
if <expr> then <stmt>

The claimed advantage to this is that for one-liner statements like this,
the point is the <stmt>, and not its guarding <expr>, so get the guard out
of the way and read the statement first.

> > * There's a heavy learning curve, I think, which is aggravated by the
> > lack of examples.  A lot of the code I wrote was written via
> > copy-paste-modify, and I wasn't always sure what syntax was critical and
> > what was style and what was optional.
> 
> Are you talking about examples in the spec (which I'd completely agree
> with), or examples in the release itself (which I could also agree with,
> but would need some suggestions about what examples you'd like to see
> included, or how the existing examples could be made more instructive --
> we're happy to add more, but don't want to overwhelm anyone by providing
> more than they can wrap their heads around).
> 
Mostly in the spec.  I found the example .chpl files very helpful.  I didn't
get far enough into the language with this one foray to need anything much
beyond genericStack and beer, with a few others for quick checks.  But in
the spec, given that I wasn't sure about syntax in some cases, and semantics
in some others (the loops, mentioned above), it would have helped to see a
few example statements or expressions that showed "ok, here's what a derived
class looks like; here's the options for if statements, etc."
