[perl #36283] [TODO] pasm/pir: forbid assignment syntax for inout params

Discussion:

(too old to reply)

Klaas-Jan Stol via RT

2008-05-08 21:28:47 UTC

It makes sense to allow e.g. C<$P0 = add $P1, $P2> as alternative
syntax for C<add $P0, $P1, $P2>. However, when the first parameter is
"inout" rather than "out", assignment syntax is *not* appropriate.
For example, C<substr $P0, 1, 2, "x"> actually modifies $P0 rather
than just assigning to it, so C<$P0 = substr 1, 2, "x"> is just wrong,
and should be a compile-time error.

In order to make this work, a check should be done for each op in the
pir compiler. Besides the question whether an identifier is an op
(is_op() ), its meta-data must be retrieved. While I'm sure this is
possible, by having the .ops pre-processor store the mode of the op
parameters somewhere, it does involve extra checks, making the pir
compiler slower.
Also, for ops in dyn.op.libs written by third parties, which are only
distributed in binary form (as the .so or .dll file), which does not
include any source, this check cannot be done, EXCEPT this check is ALSO
done during runtime.
In short, this issue is a pain to solve; is it worth fixing?
(probably that's why this ticket is stalled.)

on second thought, I had another idea.

currently the PIR compiler (IMCC) will select the appropriate operation
based on the types of the arguments. So:

print "hi"

will become:

print_sc "hi" # print s(tring) c(onstant).

Currently, you can also write:

$S0 = "hi"
$S0 = print

Maybe it's an idea that inout arguments are marked using the same
strategy; so for instance:

$S0 = print

will be translated to:

print_sco # "print" s(tring) c(onstant) o(utput)

or whatever letter/mark is chosen to mark that a parameter is an output
parameter; surely, Parrot won't have a print_sco instruction, only a
print_sc (or, if you want to make it explicit: print_sci).

This way, there's no additional overhead of checking any meta data, but
we can easily emit an error message when writing stuff like "$S0 = print".

just some thoughts.

kjs

Kjstol

2009-02-12 00:37:40 UTC

Permalink

--00163645819887471d0462aded43
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

I almost couldn't believe how I could have missed that, until I remembered
what was the problem.Thing is, there are some ops (and of course, this
should be possible, as users can load dynops) that have different variants,
with different directions for the first arg. For instance: (can't remember
the real example; just try run make test with the patch)

$P0 = foo "x" # foo_p_sc, variant 1
$P0 = foo # foo_p, variant 2

If you know the short name of an op, you get a semi-random variant of that
op (that is, the byte code). So, you can't know which opinfo you get (for
what variant). You can ONLY know the right variant, if you know the full
signature of the op, something that you don't know until bytecode generation
(happens way later in imcc). I'm probably not explaining this correctly or
very clearly, but I can guarantee it will never work. (unless, of course,
you precalculate the full signature right there, but... let's just stick
with "good luck" as the appropriate message for that :-)

I've tried this approach, it doesn't work, unfortunately.

Also, I would strongly suggest that we disallow use of full-signatured
opnames in PIR. It's not needed, as literally nobody uses ops like
"print_sc". Also, it might be slightly more efficient, as whenever an
is_op() call is done with a non-op, there's 2 searches, 1 for the short
name, 1 for the long name. (and I think PIRC can't even handle it as it
stands now, although implementability shouldn't be a valid reason, I admit
that. I wonder whether IMCC will handle it fully correctly in all cases)

kjs

On Wed, Feb 11, 2009 at 8:54 PM, Andrew Whitworth via RT <

I've attached my first attempt to make this work. I had hoped we could
combine the is_op check in imcc.l with this check in imcc.y to prevent
having to dig into the oplib twice for each op, but that's an
optimization to consider at a later date.
The patch checks the op to ensure that it's first argument is an "OUT"
argument. If it is not, we throw a syntax error exception.
I haven't tested this yet, no flex/bison on this computer. I'll do it at
home tonight. If anybody else can test this, or has an opinion on it, or
whatever let me know.

Andrew Whitworth

2009-02-13 16:09:46 UTC

Permalink

Well, it was a long shot. Nothing is ever as easy as it seems! The
problem with this ticket is that we need all sorts of different
information, none of which is available at the same time. We need to
know the syntax that's used by the op (using '=' or not) which is
known in the parser but not elsewhere. We need to know the types of
parameters, so that we can know the long name of the opcode, so we can
know whether the first argument is OUT or not.

The next best solution I can think of is to attach a flag to the
instruction node in the parse tree that "used_assign_syntax" or
something. Then the bytecode generator can check that flag to make
sure the syntax matches the given opcode. This is bad for a number of
reasons, but there don't seem to be many other ways to get all the
information we need in one place.

--Andrew Whitworth

Post by Kjstol
I almost couldn't believe how I could have missed that, until I remembered
what was the problem.Thing is, there are some ops (and of course, this
should be possible, as users can load dynops) that have different variants,
with different directions for the first arg. For instance: (can't remember
the real example; just try run make test with the patch)
$P0 = foo "x" # foo_p_sc, variant 1
$P0 = foo # foo_p, variant 2
If you know the short name of an op, you get a semi-random variant of that
op (that is, the byte code). So, you can't know which opinfo you get (for
what variant). You can ONLY know the right variant, if you know the full
signature of the op, something that you don't know until bytecode generation
(happens way later in imcc). I'm probably not explaining this correctly or
very clearly, but I can guarantee it will never work. (unless, of course,
you precalculate the full signature right there, but... let's just stick
with "good luck" as the appropriate message for that :-)
I've tried this approach, it doesn't work, unfortunately.
Also, I would strongly suggest that we disallow use of full-signatured
opnames in PIR. It's not needed, as literally nobody uses ops like
"print_sc". Also, it might be slightly more efficient, as whenever an
is_op() call is done with a non-op, there's 2 searches, 1 for the short
name, 1 for the long name. (and I think PIRC can't even handle it as it
stands now, although implementability shouldn't be a valid reason, I admit
that. I wonder whether IMCC will handle it fully correctly in all cases)
kjs
On Wed, Feb 11, 2009 at 8:54 PM, Andrew Whitworth via RT <

Kjstol

2009-02-13 18:01:19 UTC

Permalink

--0016368e2b18b932c50462d09fce
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Post by Andrew Whitworth
Well, it was a long shot. Nothing is ever as easy as it seems! The

Yes, but nice try anyway ;-)

Post by Andrew Whitworth
problem with this ticket is that we need all sorts of different
information, none of which is available at the same time. We need to
know the syntax that's used by the op (using '=' or not) which is
known in the parser but not elsewhere. We need to know the types of
parameters, so that we can know the long name of the opcode, so we can
know whether the first argument is OUT or not.
The next best solution I can think of is to attach a flag to the
instruction node in the parse tree that "used_assign_syntax" or
something. Then the bytecode generator can check that flag to make
sure the syntax matches the given opcode. This is bad for a number of
reasons, but there don't seem to be many other ways to get all the
information we need in one place.

Yes, that solution's also what I thought. However, I don't think it's worth
the trouble; sure it's not *nice* if people write $S0 = print, and we can
certainly document very clearly that this syntax should not be used as it
will be removed, but it's not a complete disaster...

Adding the extra code to achieve this is also prone for new bugs; I'd say,
let's document it and close the ticket. (pirc does catch these cases)

kjs

Post by Andrew Whitworth
--Andrew Whitworth

Post by Kjstol
I almost couldn't believe how I could have missed that, until I

remembered

Post by Kjstol
what was the problem.Thing is, there are some ops (and of course, this
should be possible, as users can load dynops) that have different

variants,

Post by Kjstol
with different directions for the first arg. For instance: (can't

remember

Post by Kjstol
the real example; just try run make test with the patch)
$P0 = foo "x" # foo_p_sc, variant 1
$P0 = foo # foo_p, variant 2
If you know the short name of an op, you get a semi-random variant of

that

Post by Kjstol
op (that is, the byte code). So, you can't know which opinfo you get (for
what variant). You can ONLY know the right variant, if you know the full
signature of the op, something that you don't know until bytecode

generation

Post by Kjstol
(happens way later in imcc). I'm probably not explaining this correctly

Post by Kjstol
very clearly, but I can guarantee it will never work. (unless, of course,
you precalculate the full signature right there, but... let's just stick
with "good luck" as the appropriate message for that :-)
I've tried this approach, it doesn't work, unfortunately.
Also, I would strongly suggest that we disallow use of full-signatured
opnames in PIR. It's not needed, as literally nobody uses ops like
"print_sc". Also, it might be slightly more efficient, as whenever an
is_op() call is done with a non-op, there's 2 searches, 1 for the short
name, 1 for the long name. (and I think PIRC can't even handle it as it
stands now, although implementability shouldn't be a valid reason, I

admit

Post by Kjstol
that. I wonder whether IMCC will handle it fully correctly in all cases)
kjs
On Wed, Feb 11, 2009 at 8:54 PM, Andrew Whitworth via RT <