summaryrefslogtreecommitdiff
path: root/content/blog/2023/git-objects.md
blob: 5ce1ae8737f69d87ba809242c66614adb5034cce (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
title: "Git Objects"
categories:
- blog
abstract: How does Git store it's database?
date: 2023-04-28T22:37:57+02:00
year: 2023
draft: false
tags:
- Git
- tutorial
- engineering
---
Any git repository has a hidden `.git` folder. If you open it, all internals of Git are at your disposal. Today, something I should have learned a long time ago: objects.

First: a commit is an object. You can see it via `git cat-file -p <SHA of commit>`. The first two lines of the output will look like this:
```
tree b4653c20c7486d8b9e4eb10a882b79a3a9f3cfdf
parent 5eb01813d3e6b1f2ac1c7f432d5d994a7fee9ec1
```

The parent is the SHA of the parent's commit, but that's unimportant today. Instead, let's focus on the tree. You can check what's inside using the same `git cat-file -p <SHA>`, and you will see a listing of the top-level folder in the git repository. You can also `cat-file` any of those. There are two types of objects in Git:

- tree - a tree of other objects
- blob - a file (compressed)

What does it mean? A commit is a reference to the state of the entire repository at a given moment in time. The state consists of entire files (blobs) and references to other nodes in the trees (directories). Neat. 

This is why you don't want to store big binary files in git, as each version is a copy of the file. Not very space-effective.

You can see each of those objects in `.git/objects`, but since they are compressed, it's much easier to use `git cat-file`. Note that blob objects don't have any filename attached - just the content. Instead, the filename is taken from a tree object. This is a benefit: the blob object will be reused when you have the same file under multiple names.