April 23, 2007

Real Tail'ing in Python

or, finding last few lines in a file.

Ok. So, last solution was not perfect. It just returned last line from a file. What about returning say 10 or may be more lines? Here is the modified Tail function to do that:
def Tail(filepath, nol=10, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
nol: number of lines to print
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[:-1]
lines = read_str.split('\n')
if len(lines) >= nol: # Got nol lines
return "\n".join(lines[-nol:])
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()

You can call it in your program like this:
Tail('/var/log/syslog') or,
Tail('/etc/httpd/logs/access.log', 100)
Useful, Isn't it?

Cheers,
-Manu

April 2, 2007

Tail'ing in Python

or, finding last line of a huge file..

How do you find the last line of a 2 GB log file from within your program? You don't want to go through the whole file, right? Right. What you want to do is, you want to start reading from end until you find a newline character. Here is how I did it in Python:


def Tail(filepath, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[0:-1]
lines = read_str.split('\n')
if len(lines) > 1: # Got a line
return lines[len(lines) - 1]
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()


(There will hardly be any reason to change read_size. I used it mainly for testing.)

It works quite similar to the way Unix 'tail -1' works. It can be easily be modified to return last 10 or 'n' lines, I believe. But, I haven't got the time and reason to try that yet :)

Remember, it's supposed to be called from within the python programs, not from command line (because Unix tail does that better ;-)).

I have done quite a bit of testing, so it must be safe to use.

cheers,
Manu