Discussion:
[perl #61224] .eof returns false if last read call read the last byte of the file, but not beyond
(too old to reply)
Jonathan Worthington
2008-12-09 14:36:55 UTC
Permalink
# New Ticket Created by Jonathan Worthington
# Please include the string: [perl #61224]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=61224 >


Hi,

It seems that the .eof() method on file handles can sometimes return
true even if there is nothing more to read. This occurs when you have
read upto the last byte of a file (e.g. when a readline reads up to the
end of a newline, and that newline is the last thing in the file), but
not beyond (which seems to be what causes the EOF flag to be set). I'm
thinking this is the wrong behaviour?

Thoughts and fixes welcome!

Jonathan
Joshua Juran
2008-12-12 08:46:33 UTC
Permalink
The way to check if the byte after the last requested byte is the
end of
the file is to read ahead. Perl (at least 5.10) does this by actually
reading the next character and then putting it back with 'ungetc'. Not
the best solution. Any read ahead can be a bit expensive. I
experimented
with a quick patch to use 'peek' in the test for EOF in Parrot,
just to
see what would happen... it broke a large quantity of code (probably
because all the code is expecting the old behavior of the EOF test, or
possibly a bug in 'peek').
If I'm understanding correctly, unrequested read-ahead is an error.
The problem is that you can't put the toothpaste back into the tube,
so to speak. Continuing with this analogy, calling ungetc() is like
putting the extra toothpaste in a paper cup for later. If I'm the
next person to brush my teeth, then sure, I'll scrape the toothpaste
out of the cup first before I get more from the tube, but any
hypothetical roommates would regard the cup as personal to me and
ignore it, going straight for the tube.

Toothpaste is fungible, though, and it doesn't matter in what order
it's used, whereas the same is not true of streamed bytes. If
multiple processes are sharing a file descriptor and coordinating
reads from it, any non-undoable read-ahead* will break the protocol.

* Read-ahead could be undone via lseek() for files, or be done non-
destructively with recv( ..., MSG_PEEK ) for sockets.

Josh

Loading...