Discussion:
parrot smoke-me/illegal-escape-gh1103
Reini Urban
2014-10-15 13:58:53 UTC
Permalink
parrot wants to change its behavior with illegal escape sequences with 6.9.0
See https://github.com/parrot/parrot/issues/1103

parrot and rakudo smoked fine with this branch,
and it helps finding imcc parser bugs, esp. .lex quoting issues.,
tracked in GH #1095, which was found by this perl6
https://rt.perl.org/Public/Bug/Display.html?id=116643

Previously:
Silently ignore illegal escapes for a-zA-Z
and change the string from "foo\o" to "fooo"

Now:
Throw "Illegal escape sequence \o in foo\o"

The C standard requires such "invalid" escape sequences to be diagnosed
(i.e., the compiler must print an error message). Parrot behaved strange,
and hence imcc parser quoting bugs were never fixed.

Parrot_str_unescape()
Unescapes the specified C string. These sequences are covered:

\xhh 1..2 hex digits
\ooo 1..3 oct digits
\cX control char X
\x{h..h} 1..8 hex digits
\uhhhh 4 hex digits
\Uhhhhhhhh 8 hex digits
\a, \b, \t, \n, \v, \f, \r, \e

These sequences are not escaped: C<\\ \" \' \?>

All other escape sequences within C<[a-zA-Z]> are illegal.

This printed ok 4 instead of ok 3:

.sub 'main' :main
$S0 = 'bar\o'
$P1 = box 'ok 1'
set_global $S0, $P1
$P2 = get_global 'bar\o'
say $P2

$S1 = "foo\\o"
$P1 = box 'ok 2'
set_global "foo\\o", $P1 # ok, parsed as "foo\\o"
$P2 = get_global "foo\\o"
say $P2

$S2 = "foo\o"
$P1 = box 'ok 3'
$S3 = "fooo"
$P2 = box 'ok 4'
set_global "foo\o", $P1 # wrong, parsed as "fooo"
set_global "fooo", $P2
$P3 = get_global "foo\o"
say $P3

$P3 = get_global "fooo"
say $P3
.end

but the real problem is with double-quoted .lex names.

What do you think? Should I allow illegal escape chars for one
deprecation cycle, just warn once?
Or change it right away? I'd go for right away.
--
Reini Urban
http://cpanel.net/ http://www.perl-compiler.org/
Tobias Leich
2014-10-15 15:15:58 UTC
Permalink
I think I can (sort of) speak for rakudo and say that we would rather
have the correct behaviour in the upcomming release in favour of an
deprecation cycle.

Because, if we spot a problem in nqp or rakudo with 6.9.0, we can always
decide to delay upgrading the parrot version for one month.

Cheers, FROGGS
Post by Reini Urban
parrot wants to change its behavior with illegal escape sequences with 6.9.0
See https://github.com/parrot/parrot/issues/1103
parrot and rakudo smoked fine with this branch,
and it helps finding imcc parser bugs, esp. .lex quoting issues.,
tracked in GH #1095, which was found by this perl6
https://rt.perl.org/Public/Bug/Display.html?id=116643
Silently ignore illegal escapes for a-zA-Z
and change the string from "foo\o" to "fooo"
Throw "Illegal escape sequence \o in foo\o"
The C standard requires such "invalid" escape sequences to be diagnosed
(i.e., the compiler must print an error message). Parrot behaved strange,
and hence imcc parser quoting bugs were never fixed.
Parrot_str_unescape()
\xhh 1..2 hex digits
\ooo 1..3 oct digits
\cX control char X
\x{h..h} 1..8 hex digits
\uhhhh 4 hex digits
\Uhhhhhhhh 8 hex digits
\a, \b, \t, \n, \v, \f, \r, \e
These sequences are not escaped: C<\\ \" \' \?>
All other escape sequences within C<[a-zA-Z]> are illegal.
.sub 'main' :main
$S0 = 'bar\o'
$P1 = box 'ok 1'
set_global $S0, $P1
$P2 = get_global 'bar\o'
say $P2
$S1 = "foo\\o"
$P1 = box 'ok 2'
set_global "foo\\o", $P1 # ok, parsed as "foo\\o"
$P2 = get_global "foo\\o"
say $P2
$S2 = "foo\o"
$P1 = box 'ok 3'
$S3 = "fooo"
$P2 = box 'ok 4'
set_global "foo\o", $P1 # wrong, parsed as "fooo"
set_global "fooo", $P2
$P3 = get_global "foo\o"
say $P3
$P3 = get_global "fooo"
say $P3
.end
but the real problem is with double-quoted .lex names.
What do you think? Should I allow illegal escape chars for one
deprecation cycle, just warn once?
Or change it right away? I'd go for right away.
Loading...