November 16, 2005

Continuing disrupted file transfer: magic of dd and netcat

We had a problem last week. DBA team wanted to clone QA database from the export backup of production database. A usual activity in DBA world, I guess. Our production and qa systems are at different sites and we were facing some WAN issues resulting in very slow file transfer. Export backup consisted of one file of size 3.8 GB. Our offshore DBA started file transfer at 3 AM EDT. By 8 PM in the evening, almost 3.3 GB had been copied. I know it's a long long time. But, WAN was really slow because of some third-party issues. And just then network team started working on the issue (with no prior notification to us of course). You can imagine what would have happened after that. Yes, connection was broken and solaris 8 ftp server doesn't support 'restart'.

Knowing that ftp copies file sequentially, I was pretty much confident that there must be some way to continue with that 3.3 GB file. My acquaintance with dd and nc came to rescue. This is how I did it-

Problem: There is a file called prdcma_fullexp_200511032230.dmp.gz on amusprddb06 of size 3877579206 bytes (approx 3.8 GB). This file is getting transferred to amusqadb02 through ftp and network goes down. 3343810560 bytes (3.3GB) have been copied. How to complete this file?

Solution: nc (formal name: netcat) was the obvious choice for data transfer over network. I needed some tool to seek within the file. dd worked just fine for the purpose. After going through manpage of dd, I figured out following commands for the purpose:

At server side (amusprddb06):
dd if=prdcma_fullexp_200511032230.dmp.gz iseek=3265440 bs=1024 | ~/nc -l -p 2005


At the client side (amusqadb02):
./nc amusprddb06 2005 | dd of=prdcma_fullexp_200511032230.dmp.gz seek=3265440 bs=1024


This is how I calculated seek required:
Copied bytes=3343810560 => Copied blocks=3265440

Block size can be different too. Actually it's not required that the copied bytes be divisible by block size. For example if copied bytes were 3343810570 instead of 3343810560 (10 bytes more), I could have seeked the same number of blocks and overwritten 10 bytes. Not a big deal, right.

Here are some interesting observations from above transfer:

-dd on the server side reports 521258+1 records transferred i.e. 521258 full and 1 partial block
-dd on the client side reports 500678+42702 records transferred i.e. 500678 full and 42702 partial blocks.

Obviously, client is not getting all 1024 sized packets. Packets are being broken over the network. And interestingly (from manpage of dd)- "When dd reads from a pipe, using the ibs=X and obs=Y operands, the output will always be blocked in chunks of size Y. When bs=Z is used, the output blocks will be whatever was available to be read from the pipe at the time."

Server side shows just 1 partial block and that is because file size is not completely divisible by 1024.

Fine tuning block size (something in sync with network MSS) will really speed up. I didn't bother checking. You can give it a try.

cheers,
Manu Garg
http://www.manugarg.com
"Journey is the destination of life"

Technorati tags: