Discussion:
Optimizing PMC-based MMD
(too old to reply)
Chromatic
2008-12-21 22:35:14 UTC
Permalink
The following code performs far more work than it has to, mostly due to
crossing the C/PCC boundary multiple times, as well as throwing away known
information:

$P0 = box 10
$I0 = cmp $P0, 10

This:

- calls VTABLE_cmp on $P1, reaching VTABLE_cmp in the Default PMC
- calls Parrot_mmd_multi_dispatch_from_c_args
- passing 'cmp', 'PP->I' signature, and args as varargs
- builds sig object from varargs
- loops through signature string
- creates a new CallSignature PMC
- creates a new return PMC for all return argument
- creates a new CPointer for each return argument
- pushes arguments onto the CallSignature PMC
- builds a type tuple for MMD
- loops through signature stored in CallSignature to find
MMD-participant arguments
- loops through type signature to set argument types
- checks MMD cache
- use cached candidate if possible
- find new candidate
- creates new array PMC for candidate list
- searches CallSignature's namespace for candidates (?)
- searches global MULTI namespace for candidates
- sorts candidate list by MMD type tuple
- loops through candidate list
- calculates distance to each candidate
- loops through each argument (parallel iteration
over type tuple and argument list)
- loops over all elements in MRO for each argument
type
- calls Parrot_pcc_invoke_sub_from_sig_object
- converts CallSignature string to C string
- creates array PMCs for arguments and results
- counts number of arguments and return values (looping over
signature string)
- sets up input parameters in current context
- loops over the C signature
- assigns each parameter to the appropriate context
- invokes the Parrot sub (NCI)
- calls the NCI thunk (pcf_I_JPP)
- calls Parrot_init_arg_nci
- inits data structures
- calls Parrot_init_arg_indexes_and_sig_pmc
- calls Parrot_init_arg_sig
- calls C function
- calls set_nci_I to store return value
- converts argument to INTVAL if necessary
- stores argument into register
- assigns return values from the context to the CallSignature
- loops over the C signature
- assigns each return value appropriately

The default Integer case performs a C-level <> comparison. Most of this
codepath is new as of the MMD branch merge.

Within the cmp op bodies, we *know* the arity and most of the types of MMD-
participant arguments at compile time. We can get the types of PMC
participants within the body of the op itself. Thus we could avoid most of
the argument marshalling and counting and analysis if we had a way to perform
cached MMD lookup without constructing a CallSignature PMC. That would clear
up a third of the work.

Another area for optimization is invoking a Sub from a signature PMC; I
believe we're throwing away and recalculating valuable information, though we
may have to wait for dramatic improvements until we can unify contexts and
CallSignature.

The final opportunity for optimization is making the PMC multis defined in
PMCs use PCC instead of C calling conventions. Corresponding multis written
in PIR already use PCC, and we want to support that, so we should unify our
approach. That would remove the NCI expense here, though that's probably
minor in comparison to the CallSignature PMC expense.

-- c
Patrick R. Michaud
2008-12-24 18:15:57 UTC
Permalink
Post by Chromatic
Within the cmp op bodies, we *know* the arity and most of the types of MMD-
participant arguments at compile time. We can get the types of PMC
participants within the body of the op itself. Thus we could avoid
most of the argument marshalling and counting and analysis if we had a
way to perform cached MMD lookup without constructing a CallSignature
PMC. That would clear up a third of the work.
This we should open up to general discussion. The consequence of
short-cutting like this is that individual PMCs will no longer be able
to override 'cmp' to do something other than multi-dispatch.
Does "individual PMCs" here mean "PMC instance" or "PMC classes"? I.e.,
are you saying that a specific PMC instance could choose to override
the cmp opcode for that individual PMC? If so, do we have any examples
where this is being done now?
At the
moment, developers still have the option of providing their own quick
comparison, which gives an even more extreme speedup than this shortcut.
So, question for language developers and other PMC developers, how
important is the ability to define a 'cmp' vtable function that's called
when the 'cmp' opcode is invoked? Or, is defining a 'cmp' multi for your
PMC type enough?
From a Rakudo perspective, the ability to define custom 'cmp' vtable
functions doesn't appear to be at all important. Comparisons are
almost invariably done by invoking :multi Sub PMCs of one form or
another and letting those handle the MMD dispatch. The opcode form
seems to impose too many limitations to be used directly.

To turn the question around a bit: I can tell that a lot of work
has gone into Parrot to make MMD possible at the vtable level,
but I haven't see how vtable MMD is at all useful or usable in
languages where operator overloading is possible from the HLL itself.
And most dynamic languages I'm looking at seem to support that
in one form or another.

If someone (Allison) could make an example of how vtable MMD is
intended to improve things -- i.e., taking an HLL language
statement and showing how that translates to PIR that is improved
by vtable MMD, that would be very helpful.
The calling conventions refactors are non-critical (some will likely
land after 1.0), because the interface will stay the same, it's only the
internals that will change.
Oh, I'm very disappointed to hear this. Named and positional argument
handling still has an odd behavior [*], and Perl 6 still really
needs the :lookahead option described earlier in the year. I thought
that was going to be made possible by the refactor, and is partially
why PDS had "calling conventions" schedule for the December 2008 release.

[*] Currently named parameters are filled from any leftover positionals
in the argument list -- there's no way to declare an argument that
can _only_ be filled by name, short of defining a :slurpy array
that grabs any "extra" positional arguments and then checking
that the slurpy is empty.

And, Jonathan can correct me on this if I'm mistaken, but
I suspect the other big reason that "calling convention refactor" was
scheduled for the December 2008 release is that it's likely a blocker
or important component for the custom dispatcher that Jonathan will
be creating for Rakudo as part of his funded grant. That's due to be
completed by the end of January, IIRC.

Pm

Loading...