April 2, 2007

Tail'ing in Python

or, finding last line of a huge file..

How do you find the last line of a 2 GB log file from within your program? You don't want to go through the whole file, right? Right. What you want to do is, you want to start reading from end until you find a newline character. Here is how I did it in Python:


def Tail(filepath, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[0:-1]
lines = read_str.split('\n')
if len(lines) > 1: # Got a line
return lines[len(lines) - 1]
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()


(There will hardly be any reason to change read_size. I used it mainly for testing.)

It works quite similar to the way Unix 'tail -1' works. It can be easily be modified to return last 10 or 'n' lines, I believe. But, I haven't got the time and reason to try that yet :)

Remember, it's supposed to be called from within the python programs, not from command line (because Unix tail does that better ;-)).

I have done quite a bit of testing, so it must be safe to use.

cheers,
Manu

8 comments:

  1. Why to use this to tail last line...what wrong with tail -1 ?

    ReplyDelete
  2. nothing is wrong with tail -1. As I said, this code is supposed to be used inside a python program. Yes, I can call (fork) tail -1 from python program, but forking is expensive.

    ReplyDelete
  3. Not only that, but Windows does not appear to have an equivalent, so it is very nice to have.

    ReplyDelete
  4. Yes, very nice to have this on Windows machines... However, I get:

    if read_str[offset-1] == '\n':
    IndexError: string index out of range

    Because offset is somehow 1024 at this point of execution...

    But it works if i change:

    if read_str[offset-1] == '\n':

    to:

    if read_str[-1] == '\n':

    I'm still not able to read last lines correctly from my ffmpeg encoding output log files which are being updated very often.

    Thanks anyway :)

    ReplyDelete
  5. Okay, it was a newline issue with my ffmpeg log. Changing to universal newline support when opening file:

    f = open(filepath, 'r')

    changed to:

    f = open(filepath, 'U')

    and it seems to work perfectly.

    ReplyDelete
  6. Thanks for reverting ii! I sure didn't think about using this function on Windows while writing it :) This PEP explains the rationale behind universal line support - http://svn.python.org/projects/peps/trunk/pep-0278.txt.

    I'll change "r" to "rU" in the post.

    ReplyDelete
  7. f.close()

    will NEVER be executed. i believe it should be right before the return statments.

    ReplyDelete
  8. Hãy đến với sàn giao dịch vận chuyển hàng hóa, chúng tôi là công ty vận chuyển hàng hóa bắc nam hàng đầu hiện nay. Hiện nay chúng tôi đang cung cấp nhiều dịch vụ liên quan đến vận chuyển, vận tải. Tiêu biểu có thể kể đến như dịch vụ vận tải hàng hóa bằng đường bộ, vận tải bằng đường thủy, vận chuyển hàng hóa bằng đường sắt,... Đối với các tỉnh thành chúng tôi có những dịch vụ riêng biệt để phục vụ nhu cầu. Có thể kể đến một vài dịch vụ như dịch vụ vận chuyển hàng đi an giang, vận chuyển hàng hóa đi vũng tàu, vận chuyển hàng đi bắc giang,... Ngoài ra còn rất nhiều dịch vụ khác đang chờ bạn khám phá. Hãy ghé vào để biết thêm thông tin chi tiết nhé.

    ReplyDelete