Sunday 24 August 2014

Its time to Zip!




Hey guys, sorry for posting so late. I was busy these days due to my semester examination.

Today we are going to discuss about file compression and archiving in Linux. For the Windows users, this may be a bit unconventional. In Windows, you just need to click the compress option in the menu to get a “ZIP Archive” of the selected files and folders. What I mean to say is, Archiving and Compression of files are done altogether. Many Windows users may not even differentiate the difference between compression and archiving. Though the same can be done in Linux by a similar GUI tool, but things done from the command line have their own charm, as I say ;)

Let us make the concepts clear first.

Archiving is concatenation of two or more files into a single file, called the archive file. The total size of this archive file is approximately equal (but not less) to the sum of the files contained in the archive.

On the other hand, Compression is reducing the size of the file on the disk by using certain algorithms. Obviously, if decompression results into the exact file, the algorithm is loss-less, else lossy.
It is a common practice to create an archive of the selected files and then compress it to get the final “compressed archive”. Unlike Windows, in Linux compression and archiving are two different processes. For archiving the most common format is '.tar', which stands for tape-archive.
Once we make the archive, we are ready to compress it using the various compression tools available. The most common tools are gzip, bzip2, zip, 7z, etc. Out of these, 7z has the highest compression ratio but is the slowest. The basic trade-off is between  speed and compression ratio. The most popular tools in Linux are gzip and bzip2. Here we will use the gzip tool, which gives the compressed file extension '.gz'. It is worth mentioning that bzip2 has higher compression ratio than gzip, but is slower than gzip. For general usage, I would always recommend gzip.
The basic strategy is converting the files into an archive, and then compressing it using gzip. Thus a file 'myFile' gets converted into 'myFile.tar' and then into 'myFile.tar.gz'
Luckily, we can do these two things using only one command line tool, 'tar'.
Open the terminal, and select a folder to compress. Say, we select 'directory1' which contains large number of text files.
Now go the parent directory of 'directory1' and enter the following command:

tar -czvf myArchive.tar.gz ./directory1

Note: If we wanted to do the same with files instead of the entire directory, simply enter:

tar -czvf myArchive.tar.gz file1 file2 file3 (...and so on)

Your compressed archive will be created bearing the name 'myArchive.tar.gz' . The .tar.gz extension describes that the directory was first archived into 'myArchive.tar' and then this 'myArhcive.tar' was compressed into myArchive.tar.gz using gzip tool. Alternatively we can give the file name as 'myArchive.tgz'  to show the same. The .tgz or .tar.gz files are also called tarballs.
To extract the file, use the following command:

tar -xvzf myArchive.tar.gz

Now let us explore the tar command in a bit detail:
The basic structure is:
tar <options> <archive name> <file1> <file2> <file3> ....
where the options are:
  • c: Create Archive – Create a “tar” archive
  • z: Zip/ Unzip the archive using gzip ( for bzip2, use “j” )
  • v: Verbose-List the files processed You may exclude this option, but it is a good practice to see what “tar” is doing with your files
  • f: File Archive- Use the archive file. It is a bit complex option, but for now just take it as a good practice to use this option. It asks tar to take the archive to be created for compression.
  • x: Extract archive files from the archive.


The tar tool has numerous such options. For a detailed list, you can view its man page by entering:

man tar

Further, you may first archive the file and then compress it manually using gzip command line tool. For this first create a tar (do not use the “z” option). Now use the gzip tool to compress the tar file. If you want to decompress a .gz file, use the 'gunzip' tool. See the man page of gzip for more details.
Compressed archives are a very good way to backup or efficiently store files which are not required at present. Being a coder, I frequently archive my programs to free up disk space. For simple documents, the compression ratio may be as high as 90% . For example, my folder containing C and  C++ codes with a total size of 106 MB approx, was reduced to 4.43 MB after converting it into a tarball.
But before this news starts making you excited enough to set up your hands at tar, let me give you an important tip:
Do not try to compress multimedia files (Pictures, Videos, Music) as your work will go in vain. This is because most of the multimedia files are already compressed, so you will get very poor compression. PDFs also do not get compressed much, but they can be compressed if you want to free up some MBs. Once I tried to compress 14 movies with a total size of 20 GBs. The entire process took 30 minutes and as a result I got a tarball of 19 GBs. So it was a big waste of my 30 minutes. With 7z, it  took obviously much more time and I had to kill the process in between as I was getting bored ( :D ).

No comments:

Post a Comment