August 26, 2007

Hacking squid

In this post, I would like to talk about the recent fun I had with squid. It involved some troubleshooting and some hacking.
Problem: Squid will stop responding after running for some random period of time, say 10 to 40 min and cpu usage will shoot up to 95-100%.

I started with strace, but everything looked fine there. Then I tried ltrace and there I got the first clue. squid was comparing 2 strings in an infinite loop:
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1

Looks like some bad 'for' loop. But, what part of code and why? It needed little more debugging to answer these questions. The squid binary that I was running was installed from a debian package and thus was stripped off debugging symbols. To fix that problem, I rebuilt the squid package with debugging information. On debian, you do that by supplying "DEB_BUILD_OPTIONS = nostrip" as an environment variable or in debian/rules file, while building the package. I ran this newly compiled squid within gdb. And, when it was hung up again, I took the backtrace:
gdb) bt
#0 0x00002ae1d60c6914 in strcmp () from /lib/libc.so.6
#1 0x000000000049f269 in hash_lookup (hid=0x79aae0, k=0x299f3e0) at hash.c:192
#2 0x000000000042c9bd in idnsCachedLookup (key=0xa733a8 "i12.ebaystatic.com",
callback=0x44ffe0 , data=0x7ad400) at dns_internal.c:1016
#3 0x000000000042d380 in idnsALookup (name=0x299f3e0 "thumbnail.videoegg.com",
callback=0xa733a8, data=0x7ad400) at dns_internal.c:1042
#4 0x00000000004506be in ipcache_gethostbyname (name=0x32bc628 "thumbnail.videoegg.com",
flags=1) at ipcache.c:521
#5 0x000000000040a3b4 in aclMatchAclList (list=0x78fe90, checklist=0xa730a8) at acl.c:1800
#6 0x000000000040ac50 in aclCheck (checklist=0xa730a8) at acl.c:2160
#7 0x00000000004200ae in clientReadRequest (fd=542, data=) at
client_side.c:4058
#8 0x00000000004288f8 in comm_select (msec=) at comm_generic.c:264
#9 0x0000000000452baa in main (argc=3, argv=) at main.c:858

On the first look it seems to be the problem with hash_lookup function as it's calling strcmp repetitively. hash_lookup seems to be pretty generic function however and it's unlikely that it will do something that stupid. Looking at the rest of the stack, it seems pretty clear that the problem is either in the internal dns part of the code (dns_internal.c) or hash routines. After this it was all dirty work. Going through various parts of the squid code, mainly internal dns and hash routines, and trying different things. Interestingly, I nailed down the cause of the problem while sitting at a bus stop, waiting for the bus to Pune :) Luckily I had my laptop with me at that time and I could verify it immediately. For the results of my findings, I'll quote myself from the bug report:
The problem seems to be in the way squid's internal DNS system (dns_internal.c) keeps record of looked up but not yet answered DNS queries. This bug is hit specifically when multiple search paths are used in /etc/resolv.conf.

Squid caches all dns queries before sending them to avoid duplicate queries for the same name. (look at: idnsCacheQuery(q) and hash_table *idns_lookup_hash, in dns_internal.c). This mechanism works well unless multiple search paths are defined in /etc/resolv.conf. When multiple dns search paths are defined, same query object is modified and next search path is concatenated to it's name. This query is cached again and resent.

Problem is that the query is not unlinked before being cached and thus linked again. Only the key of hash object (that's actually name) changes this time; object itself remains same. This corrupts the hash table of looked up queries.

Once you know the problem and what's causing it, it becomes pretty easy to fix that. That's true at least in the world of computers, especially open source world, if not generally in life :) So, I wrote a patch to fix it and it worked. This patch made into both the branches of squid - 2.6 and 3.0. Also, it got me my name listed on the changesets page as a contributor :)

It was really exciting as it was my first contribution (in the form of code) to an existing open source project. I have started a couple of small time open source projects, but contributing to an existing and mature project was another type of fun :-)

1 comment:

  1. Have you ever heard of celeb networth - the famous website for celebrities information?

    ReplyDelete