December 18, 2007

pacparser - a library to parse PAC files

As I mentioned earlier also, proxy auto-config (PAC) files are becoming more and more important for web proxy usage because of automation and ease of administration provided by them. Almost all popular browsers today support them. But, there is still a dearth of tools available for processing PAC files e.g. popular web software like curl, wget and python-urllib still don't take PAC file for proxy configuration.

That was the problem I wanted to solve when I started to work on pacparser. Now it's ready in full glory - http://code.google.com/p/pacparser. From the release announcement:

I am very pleased to announce the release of "pacparser" - a C library to parse proxy auto-config (PAC) scripts. Needless to say, PAC files are now a widely accepted method for proxy configuration management and almost all popular browsers support them. The idea behind pacparser is to make it easy to add this PAC file parsing capability to other programs. It comes as a shared C library with a clear API. You can use it to make any C or python (using ctypes) program PAC scripts intelligent. Some very useful targets could be popular web software like wget, curl and python-urllib.

For documentation and available packages, please visit project home page at:
http://code.google.com/p/pacparser

For the ones who like to start with source code, here is the link to direct download for source code:
http://pacparser.googlecode.com/files/pacparser-1.0.1.tar.gz.

I hope you will find some use for it. Anyway, let me know how your find it :-)

Cheers :-),
Manu

Update: For easy installation on Ubuntu boxes, I have created debs for Ubuntu gutsy for i386 and amd64 platforms. These debs can be downloaded from here:
http://code.google.com/p/pacparser/downloads/list
---
Manu Garg/http://www.manugarg.com/"Journey is the destination of life"

September 16, 2007

ladakh, land of peace and quiet - part II

Time to continue the Ladakh story started in the last post. So we reached our guest house in Leh on Monday night at around 10 PM. Pankaj was bowled over by the beautiful smile of the receptionist and manager of the guest house, a simple country girl. Actually, she was cute :) She was daughter of the guest house owner. The whole guest house was run by family people only - gardening, managing, cooking, cleaning everything. The people there were really nice. They cooked food just for us even though the regular dinner time was already over.


We had a good sleep that night. Next morning, after having Ladakhi breakfast (ladakhi bread, honey jam and butter), we went out to see Leh. Main market was about 20 min away from the guest house and the whole route was filled with the handicraft shops and scenic views on both the sides. We had lunch there in the market itself and came back. Then we again slept off in the afternoon. The 2-days travel was showing up on us finally. In the evening, we decided to see Shanti Stupa as it was very close to our guest house. This is one amazing place. It's where you feel the power of the space the most. We climbed about 500 steep stairs to reach there. Feeling of being at the top, strong breeze in the hair, and powerful space around you; it works on you immediately. There is also a temple there called Shanti Stupa temple. It's a "peace" temple. You just have to be there to calm yourself, nothing else is required. Since it was getting dark we came down after spending about an hour there. But, we almost resolved in our hearts to come back there again.

Next two days we didn't go for any long trip. What did we do? We got up by 7 am, had breakfast on time, played table tennis, went out to see nearby places, had lunch outside, came back in the evening, spent time in guest house library, talked to the guest house people, had dinner in the guest house, watched stars in the night sky and just relaxed. It was a retreat for us. Time was going so very smoothly. We saw Spituk and Shankar gompas (monasteries) in these two days. Spituk gompa is little far from Leh and you have to take a bus for that, while Shankar gompa was only 30 min walk away from our guest house.


On Friday, we started for Pangong Lake (Tso) in the morning after having our breakfast. It's about 5 hrs drive away from Leh. We passed mighty Chang La on the way. It's the third highest road of the world. We reached Pangong Tso by 3 PM. And what a place it is. Try to think of a huge sea like lake, surrounded by the mountains, at the height of 4250 meters, and with only 5 people for a long long distance. Yes, it's as magical as it sounds. The ultra clear water, ever-changing colors, shadow of the mountains, and sound of nothing but the soft breeze. You feel like being there always. I can't say it enough in words. You have to be there to feel it. We started back from there at about 6 PM.


On the way back, we decided to stay in Tangse village. It's a very small village, with not more than 50 houses I'd say. We found a place to stay there just for Rs 300/- a night. There was no tourist there except us. The morning there was really really pleasant. The weather was so good. Sky was absolutely clear and very blue :) After breakfast, we took a walk in the village, and met school going kids. At about 10 AM, we started back for Leh. We reached Leh by 1 PM.

That day we visited Shanti Stupa again :) This time we went little early to enjoy it for longer. We sat there and just sat there for a long time. We could see a thunderstorm coming from the mountains at the far end. It made breeze even stronger. I can remember myself lying there and singing "Trying to find, trying to find, where I've been" of Kashmir of Led Zeppelin. The feeling was absolutely great.

Now only Sunday was left. Monday morning we had to fly back to Delhi. On Sunday we went out to see Leh Palace. It was in the interiors of the city. We walked for long to reach there and to come back we walked for even longer. While coming back we found a very good and genuine handicrafts shop. It wasn't in the main market so prices were pretty OK. That's where we did our shopping. In the evening, we settled our guest house bills and asked the lady to book a cab for us for next morning. Next day we took the cab, reached airport and boarded our flight to Delhi. From inside the flight, we got some amazing views of the mountain tops.

That's how our journey ended and somewhere in our minds a new journey started.

Links to Leh albums: Reaching Leh In Ladakh

August 26, 2007

ladakh, land of peace and quiet - part I

Have you ever felt the power of space? When you feel that the space, just space around you, affects you strongly. Almost all of us have experienced it for short duration in some way or other, for example when we go to a temple. I felt it for a much longer duration. It happened to us when we visited Ladakh last month. By we, I mean Pankaj and I. For those who don't know, Pankaj and I are best buddies.

So, we went to this land of peace and quiet. There are some obvious things that make Ladakh different from all other hill stations. Altitude so high that AMS (Acute Mountain Sickness) comes to you easily, different kind of people, and proximity to both Pakistan and China borders. But, there are some things which are not easy to imagine. Things like how can it calm you beyond your imagination.

We were very excited about this trip. We decided to go by Manali-Leh road and come back by air. Our route was something like: Hyderabad -> Delhi -> Chandigarh -> Manali -> Leh -> Delhi -> Hyderabad. We had started preparing for it well in advance. Booking flight tickets from Leh to Delhi, hotels in Manali and Keylong, train tickets for Delhi to Chandigarh, guest house in Leh etc. And what date did we chose to travel - July 13th, Friday. Yes, Friday the 13th. It was accidental. We didn't notice it until the last day :)


My flight from Hyderabad to Delhi was little delayed, but I managed to reach Delhi just in time. Pankaj had come to airport to pick me up by a cab. From there we went together to Delhi Railway station by the same cab. We reached on time but no time was left for dinner. So, that's how we started our journey. Everything went as planned. We reached Manali by next day evening. After cleaning ourselves up and having snacks, we visited Hidimba temple. It's really a nice temple and in the late evening, it was a quiet place and had a calming effect in it. Next day morning we started for Leh. Our next destination was Keylong which is about 8 hours away from Manali. We passed Rohtang La on our way. We reached Keylong by evening 4 pm. This whole region is very beautiful. Air is fresh, land is green all around, mountains are capped with snow, space is free of any kind of pollution. We absolutely loved that place. Walking around in the town, watching sky full of stars in the night. Next day we started early, at about 6 am. And then we realized, Mountains look even more beautiful in morning :)


Next we encounterd a broken bridge which was being repaired. We took this as an opportunity to feel our surroundings. We hiked down to the shore of the river and sat there to appreciate the lovely morning. It was about 9:30 am in the morning. It wasn't windy, sun was shining and temperature was little bit on the colder side, but not too cold. I touched the water, it was chilling. In the mean time, our driver decided to cross the river through water and called us back. It was a bad move. We got stuck in the middle of the river. And then, we got a little adventurous. We removed our shoes and got into the water to push the car. But water was unthinkably cold. Our feet became red and nails started becoming yellow. Even after all the effort, we managed to move the car only a bit. After trying all the things, we finally gave up. Driver went out to fetch some help to tow the car. Luckily he found a tractor that was working at a road repair site not too far off. That's how we finally came out of river :)

Out next stop was at Sarchu. We had our lunch there in a mini-restaurant. It was in a tent and was run by 2 people exactly - a man and a woman, a couple probably. We had dal-roti, paratha and tea there. After that we started for Leh again. It was still a long drive from here. We passed mighty Tanglang La on our way. We reached Leh in the night at about 9:30 PM. Market was almost closed and it was completely dark there. After asking a couple of people there, we found our guest house and got in :)



(..to be continued. Photographs of the trip so far - http://picasaweb.google.com/manugarg/ReachingLeh)

Hacking squid

In this post, I would like to talk about the recent fun I had with squid. It involved some troubleshooting and some hacking.
Problem: Squid will stop responding after running for some random period of time, say 10 to 40 min and cpu usage will shoot up to 95-100%.

I started with strace, but everything looked fine there. Then I tried ltrace and there I got the first clue. squid was comparing 2 strings in an infinite loop:
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1
strcmp("thumbnail.videoegg.com", "i12.ebaystatic.com") = -1

Looks like some bad 'for' loop. But, what part of code and why? It needed little more debugging to answer these questions. The squid binary that I was running was installed from a debian package and thus was stripped off debugging symbols. To fix that problem, I rebuilt the squid package with debugging information. On debian, you do that by supplying "DEB_BUILD_OPTIONS = nostrip" as an environment variable or in debian/rules file, while building the package. I ran this newly compiled squid within gdb. And, when it was hung up again, I took the backtrace:
gdb) bt
#0 0x00002ae1d60c6914 in strcmp () from /lib/libc.so.6
#1 0x000000000049f269 in hash_lookup (hid=0x79aae0, k=0x299f3e0) at hash.c:192
#2 0x000000000042c9bd in idnsCachedLookup (key=0xa733a8 "i12.ebaystatic.com",
callback=0x44ffe0 , data=0x7ad400) at dns_internal.c:1016
#3 0x000000000042d380 in idnsALookup (name=0x299f3e0 "thumbnail.videoegg.com",
callback=0xa733a8, data=0x7ad400) at dns_internal.c:1042
#4 0x00000000004506be in ipcache_gethostbyname (name=0x32bc628 "thumbnail.videoegg.com",
flags=1) at ipcache.c:521
#5 0x000000000040a3b4 in aclMatchAclList (list=0x78fe90, checklist=0xa730a8) at acl.c:1800
#6 0x000000000040ac50 in aclCheck (checklist=0xa730a8) at acl.c:2160
#7 0x00000000004200ae in clientReadRequest (fd=542, data=) at
client_side.c:4058
#8 0x00000000004288f8 in comm_select (msec=) at comm_generic.c:264
#9 0x0000000000452baa in main (argc=3, argv=) at main.c:858

On the first look it seems to be the problem with hash_lookup function as it's calling strcmp repetitively. hash_lookup seems to be pretty generic function however and it's unlikely that it will do something that stupid. Looking at the rest of the stack, it seems pretty clear that the problem is either in the internal dns part of the code (dns_internal.c) or hash routines. After this it was all dirty work. Going through various parts of the squid code, mainly internal dns and hash routines, and trying different things. Interestingly, I nailed down the cause of the problem while sitting at a bus stop, waiting for the bus to Pune :) Luckily I had my laptop with me at that time and I could verify it immediately. For the results of my findings, I'll quote myself from the bug report:
The problem seems to be in the way squid's internal DNS system (dns_internal.c) keeps record of looked up but not yet answered DNS queries. This bug is hit specifically when multiple search paths are used in /etc/resolv.conf.

Squid caches all dns queries before sending them to avoid duplicate queries for the same name. (look at: idnsCacheQuery(q) and hash_table *idns_lookup_hash, in dns_internal.c). This mechanism works well unless multiple search paths are defined in /etc/resolv.conf. When multiple dns search paths are defined, same query object is modified and next search path is concatenated to it's name. This query is cached again and resent.

Problem is that the query is not unlinked before being cached and thus linked again. Only the key of hash object (that's actually name) changes this time; object itself remains same. This corrupts the hash table of looked up queries.

Once you know the problem and what's causing it, it becomes pretty easy to fix that. That's true at least in the world of computers, especially open source world, if not generally in life :) So, I wrote a patch to fix it and it worked. This patch made into both the branches of squid - 2.6 and 3.0. Also, it got me my name listed on the changesets page as a contributor :)

It was really exciting as it was my first contribution (in the form of code) to an existing open source project. I have started a couple of small time open source projects, but contributing to an existing and mature project was another type of fun :-)

April 23, 2007

Real Tail'ing in Python

or, finding last few lines in a file.

Ok. So, last solution was not perfect. It just returned last line from a file. What about returning say 10 or may be more lines? Here is the modified Tail function to do that:
def Tail(filepath, nol=10, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
nol: number of lines to print
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[:-1]
lines = read_str.split('\n')
if len(lines) >= nol: # Got nol lines
return "\n".join(lines[-nol:])
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()

You can call it in your program like this:
Tail('/var/log/syslog') or,
Tail('/etc/httpd/logs/access.log', 100)
Useful, Isn't it?

Cheers,
-Manu

April 2, 2007

Tail'ing in Python

or, finding last line of a huge file..

How do you find the last line of a 2 GB log file from within your program? You don't want to go through the whole file, right? Right. What you want to do is, you want to start reading from end until you find a newline character. Here is how I did it in Python:


def Tail(filepath, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[0:-1]
lines = read_str.split('\n')
if len(lines) > 1: # Got a line
return lines[len(lines) - 1]
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()


(There will hardly be any reason to change read_size. I used it mainly for testing.)

It works quite similar to the way Unix 'tail -1' works. It can be easily be modified to return last 10 or 'n' lines, I believe. But, I haven't got the time and reason to try that yet :)

Remember, it's supposed to be called from within the python programs, not from command line (because Unix tail does that better ;-)).

I have done quite a bit of testing, so it must be safe to use.

cheers,
Manu

January 25, 2007

pactester - a tool to test proxy auto-config (PAC) files

Hackers and Sysadmins :-)

Google has recently released "pactester", a tool to test proxy auto-configuration (PAC) files. Use of PAC files is becoming more and more common because of automation and ease of administration provided by them. Before pactester there was no "real" way to test the PAC files. We could tell whether this site will be accessible using this PAC file or not. But, we could not tell which proxy server will be used for a specific URL unless we examine the traffic using some network sniffer or check the access logs at the proxy server. Both of these ways were not very accessible and were time consuming. Of course, another way to test would be the manual inspection of PAC files, but again it's error prone and quite impractical for large and complex PAC files.

Pactester resolves all these issues by simulating the browser behavior. It evaluates the PAC file in a JavaScript context and returns the proxy server for a specific URL using the PAC file's logic. It's written in perl and uses the same JavaScript code to evaluate the PAC file functions as is used by Mozilla browsers. Documentation, in the form of README and INSTALL files, is included in the source code tarball. Some relevant URLs:

Project home page: http://code.google.com/p/pactester
Project download: http://http://code.google.com/p/pactester/downloads/list
Quick Documentation: http://pactester.googlecode.com/svn/trunk/README
Mailing list and discussion group: http://groups.google.com/group/pactester

Tested to works on: Linux, Mac OS X, Windows-Cygwin

Cheers to open source!! :-)
-M
---
Manu Garg/http://www.manugarg.com/"Journey is the destination of life"

Update: Now there is an implementation of pactester in C using pacparser library. Compiled binary of pactester for Windows can be downloaded from pactester download page.