--- title: "Tar Is for Tape Drive" category: - blog abstract: Exploring IT history to understand current quirks date: 2023-04-12T17:17:07+02:00 year: 2023 draft: false tags: - tar - tarball - Unix - POSIX - computer-history --- One of the things I never knew I wanted to know: > tar creates and manipulates streaming archive files. > (source: man tar(1)) Tarball is a file standing for the drive: > A tar archive consists of a series of file objects, hence the popular term tarball, referencing how a tarball collects objects of all kinds that stick to its surface. > (source: [Wikipedia](https://en.wikipedia.org/wiki/Tarball_(computing))) This explains two things I never got about tar: - It's slow. The tarball represents a physical tape, so to get a file from it, tar must virtually forward it to the needed position. - It's not compressed, as otherwise, the computer would need to read the entire tape into memory, extract it and work on that. Tar is one of the things where computer archeology makes things clearer. Nowadays, we often forget that computers have a history and a lot of things we take for granted need to be invented. Many quirks can be understood if we explore the past a bit. **Errata** [Humm](https://bsd.network/@humm) pointed to me that the above is not technically true. 1. Looking at low level, what makes tar slower than ZIP is lack of index. Zip files have an index which allows finding and reading the expected file. In tar, we need to seek and read file header to determine if this part of the archive contains the file we need. And this forces us to read all files before, so if it's our file let's extract it; otherwise look at the next header. 2. It is fully possible to compress a file with a streaming archiver, like gzip and read the headers as we are streaming the archive. Thanks!