It’s easy for me to understand how Git commits are implemented, but it’s difficult to understand other people’s views on submissions. So I asked some questions to others on Mastodon.
What do you think of Git submission?
I conducted a very unscientific survey and asked people what they think of Git commits: is it a snapshot, a diff, or a list of all previous commits? (Of course, it's reasonable to think of it as all three, but I'm curious about people's main
turn out:
- 51% difference
- 42% Snapshot
- 4% History of all previous commits
- 3% "Other"
I'm surprised how close the ratios are for the two options of Difference and Snapshot. People also made some interesting but conflicting points, like
"In my opinion the commit is a diff, but I think it's actually implemented as a snapshot" and
"In my opinion , the commit is a snapshot, but I think it's actually implemented as a diff". We'll talk more about how submission is actually implemented later.
Before we go any further: What do we mean by "a difference" or "a snapshot"?
What is the difference?
The "difference" I'm talking about is probably pretty obvious: the difference is what you get when you run git show COMMIT_ID
. For example, here's a typo fix in the rbspy project:
diff --git a/src/ui/summary.rs b/src/ui/summary.rs index 5c4ff9c..3ce9b3b 100644 --- a/src/ui/summary.rs +++ b/src/ui/summary.rs @@ -160,7 +160,7 @@ mod tests { "; let mut buf: Vec = Vec::new(); -stats.write(&mut buf).expect("Callgrind write failed"); +stats.write(&mut buf).expect("summary write failed"); let actual = String::from_utf8(buf).expect("summary output not utf8"); assert_eq!(actual, expected, "Unexpected summary output"); }
You can see it on GitHub: https://github.com/rbspy/rbspy/commit/24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b
What is a snapshot?
What I mean by "snapshot" is "all the files you get when you run git checkout COMMIT_ID
".
Git usually refers to the list of submitted files as a "tree" (such as a "directory tree"). You can see all the files submitted above on GitHub:
https://github.com/rbspy/rbspy/tree/24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b (it is /tree/
instead of /commit/
)
Is "how Git is implemented" really the right way to explain it?
The advice I hear most often about learning Git is probably "Just learn how Git represents things internally, and everything will become clearer." I obviously really like this perspective (if you've spent some time reading this blog, you'll know that I like
But as a method of learning Git, it was not as successful as I hoped! Normally I would excitedly start explaining "Okay, so a Git
commit is a snapshot, it has a pointer to its parent commit, then a branch is a pointer to the commit, and then...", but I tried People who help will tell me that they didn't really find this explanation very useful, they still don't get it. So I've been looking at other options.
But let’s talk about the internal implementation first.
How Git represents commits internally - Snapshot
Internally, Git represents commits as snapshots (which store a "tree" of the current version of each file). I'm in a Git repository, where are your files? I've written about this in , but here's a very quick overview of the internal format.
This is a submission representation:
$ git cat-file -p 24ad81d2439f9e63dd91cc1126ca1bb5d3a4da5b tree e197a79bef523842c91ee06fa19a51446975ec35 parent 26707359cdf0c2db66eb1216bf7ff00eac782f65 author Adam Jensen1672104452 -0500 committer Adam Jensen1672104890 -0500 Fix typo in expectation message
And, when we view this tree object, we see a list of every file/subdirectory under the root of the repository in this commit:
$ git cat-file -p e197a79bef523842c91ee06fa19a51446975ec35 040000 tree 2fcc102acd27df8f24ddc3867b6756ac554b33ef.cargo 040000 tree 7714769e97c483edb052ea14e7500735c04713eb.github 100644 blob ebb410eb8266a8d6fbde8a9ffaf5db54a5fc979a.gitignore 100644 blob fa1edfb73ce93054fe32d4eb35a5c4bee68c5bf5ARCHITECTURE.md 100644 blob 9c1883ee31f4fa8b6546a7226754cfc84ada5726CODE_OF_CONDUCT.md 100644 blob 9fac1017cb65883554f821914fac3fb713008a34CONTRIBUTORS.md 100644 blob b009175dbcbc186fb8066344c0e899c3104f43e5Cargo.lock 100644 blob 94b87cd2940697288e4f18530c5933f3110b405bCargo.toml
This means that checking out a Git commit is always fast: it's just as easy for Git to check out yesterday's commit as it is to check out a million commits ago. Git never needs to reapply 10,000 diffs to determine the current state because commits are never stored as diffs at all.
Snapshots are compressed using packfile
I just mentioned that Git commit is a snapshot, but when someone says "In my opinion, the commit is a snapshot, but I think it is a difference in implementation"
, this is actually also true. ! Git
commits aren't represented in the form of diffs you might be used to (they're not stored on disk as a diff from the previous commit), but the basic intuition is that if you're going to do a 10,000## If the file in line # is edited 500 times, then the efficiency of storing 500 files will be very low.
git clone a repository, Git also compresses the data.
- The object is stored as a reference to the "original file" and a "delta"
- A delta is a series of instructions such as "read bytes 0 to 100, then insert the byte 'hello there', then read bytes 120 to 200". It pieces together new text from the original files. So there is no concept of "delete", only copy and add.
- I think there are fewer layers of mutans: I don't know how to check how many layers of mutans Git has to go through to get a given object, but my impression is that it's usually not many. Maybe less than 10 floors? I'd love to know how to actually find out, though.
- The original file does not have to be from the previous commit, it can be anything. Maybe it could even be from a later commit? I am not sure.
- There is no "correct" algorithm for calculating changes, Git just has some approximate heuristics
Something weird is actually happening when you look at the diff
What actually happens when we rungit show SOME_COMMIT to see the diff of a certain commit is a bit counter-intuitive. My understanding is:
That said, I think Git stores commits as snapshots, and packfile is just an implementation detail to save disk space and speed up cloning. I've never actually had to know how packfile works, but it does help me understand how Git snapshots commits without taking up too much disk space.
A “wrong” Git understanding: commits are diffs
I think a fairly common understanding of Git’s “error” is:
- Commits are stored as diffs based on the previous commit (plus a pointer to the parent commit and author and message).
- To get the current state of a commit, Git needs to reapply all previous commits from scratch.
I think this misunderstanding is sometimes very useful, and it doesn't seem to be a problem for daily Git use. I really like that it makes the things we use most (differences) the most basic elements - it's very intuitive to me.
I've also been thinking about some other useful but "wrong" understandings of Git, such as:
- Commit information can be edited (actually not, you just copy an identical commit and give it new information, the old commit still exists)
- Commits can be moved to a different base (similarly, they are copied)
I think there is a range of "wrong" understandings of Git that make perfect sense, are largely supported by the Git user interface, and do not cause problems in most cases. But it can get confusing when you want to undo a change or something goes wrong.
Some advantages of treating commits as diffs
Even if I know commits are snapshots in Git, I probably treat them as diffs most of the time because:
- Most of the time I focus on the changes I'm making - if I just change a line of code, obviously I'm mainly thinking about that line of code rather than the current state of the entire codebase
- You'll see the difference when you click on a Git commit on GitHub or use
git show
, so it's just something I'm used to seeing - I use rebasing a lot, it's all about reapplying differences
Some advantages of treating commits as snapshots
But I also sometimes think of commits as snapshots because:
- Git is often confused by the movement of files: sometimes I move a file and edit it, and Git doesn't recognize that it has been moved, and instead displays it as
"old.py removed, new.py added". This is because Git only stores snapshots, so when it says "Move old.py -> new.py"
At this time, it is just a guess because the contents of old.py and new.py are similar. - This way it's easier to understand what
git checkout COMMIT_ID
is doing (the idea of reapplying 10,000 commits stresses me out) - Merge commits look more like snapshots to me, since the merged commit can actually be anything (it's just a new snapshot!). It helped me understand why arbitrary changes can be made when resolving merge conflicts, and why care should be taken when resolving conflicts.
Some other understandings about submission
Some of Mastodon’s replies also mentioned:
- "Additional" out-of-band information about the commit, such as an email, a GitHub pull request, or a conversation you had with a colleague
- Think of "difference" as a "state before state after"
- And, of course, many people view submissions differently depending on the circumstances
Some other words people use when talking about commits that may be less ambiguous:
- "Revision" (seems more like a snapshot)
- "Patch" (looks more like a diff)
That’s it!
I have a hard time understanding the different understandings people have of Git. What's especially tricky is that, although "wrong" understandings are often very useful, people are so keen to be wary of "wrong" mental models that people are reluctant to share their "wrong" ideas for fear of some Git interpreter Will stand up and explain to them why they are wrong. (These Git
interpreters usually mean well, but it can have a negative impact regardless)
Thanks to Marco Rogers, Marie Flanagan, and everyone at Mastodon for discussing Git commits with me.
The above is the detailed content of Are Git commits diffs, snapshots, or history?. For more information, please follow other related articles on the PHP Chinese website!

Do you have trouble downloading or sending attachments in Outlook 365? Sometimes, Outlook doesn’t show them for some unknown reason, so you are unable to see them. In this post on php.cn Website, we collect some use tips for attachments not showing.

When V Rising players try to join a server that is close to or already full, they may encounter the “V Rising connection timed out” issue. If you are one of them, you can refer to this post from php.cn to get solutions. Now, keep on your reading.

Windows supplies real-time protection via Windows Security. But this feature may prevent you from doing something it thinks are dangerous. In this situation, you may want to temporarily turn on real-time protection. This php.cn post will show you how

Microsoft has started working on next year’s Windows updates very early. Recent rumors state that the next update in 2024 might be Windows 11 24H2 rather than Windows 12. Everything is uncertain now. php.cn will now take you to see some related infor

The error 0x80030001 often happens when you are attempting to copy files. The error code will be accompanied by a message that tells “unable to perform requested operation”. If you are struggling with this error, you can read this article on php.cn W

On February 13, 2024, Microsoft released KB5034765 (OS builds 22621.3155 and 22631.3155) for Windows 11 22H2 and Windows 11 23H2. This security update brings you many new improvements and bug fixes. You can learn how to download and install Windows 1

Device Manager is widely used when you need to fix some computer issues. You can check the problematic devices and decide to uninstall or update device drivers. Besides, you can also set Power Management settings in Device Manager. However, you may f

When Backup and Restore (Windows Backup) fails to work, you can choose to reset it to default. How to restore Windows Backup to default in Windows 11/10? php.cn will guide you to easily do this thing in 2 ways and let’s go to see them.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 English version
Recommended: Win version, supports code prompts!

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool