Back to Blog

Meaningful Diffs

Meaningful Diffs

I used to think a 2,000-line diff meant I’d had an incredibly productive afternoon. Then I realized 1,980 of those lines were just pnpm-lock.yaml updating every sub-dependency because I bumped a single minor version of a linting tool. It’s a psychological tax on your reviewers. When a human sees a massive scrollbar, they stop looking for bugs and start looking for the "Approve" button just to make the headache go away.

The signal-to-noise ratio in your Pull Requests (PRs) is a choice. You don't have to let auto-generated gunk and binary blobs clutter your history.

The Secret Weapon: .gitattributes

Most developers are familiar with .gitignore, but its sibling, .gitattributes, is where the real magic happens. While .gitignore tells Git what to ignore entirely, .gitattributes tells Git how to *treat* the files it's already tracking.

Create a .gitattributes file in your project root. Here is how we start cleaning things up.

Hiding the Lockfile Churn

Lockfiles (like package-lock.json, poetry.lock, or Cargo.lock) are essential for reproducible builds, but they are rarely "human-readable" in any meaningful way during a review.

You can tell Git to treat these as binary files so they don't show up as a wall of text:

# Treat lockfiles as binary so they don't clutter diffs
package-lock.json binary
yarn.lock binary
pnpm-lock.yaml binary

When you run git diff now, instead of 500 lines of JSON changes, you’ll see:
Binary files a/package-lock.json and b/package-lock.json differ

If you are using GitHub, you can go a step further using the linguist attributes. This tells GitHub to hide the file by default in the PR UI:

package-lock.json linguist-generated=true

Dealing with Generated Code

If you’re working with Protobufs, GraphQL codegen, or OpenApi clients, you’re likely checking in thousands of lines of machine-generated code. Reviewing these is a waste of time—if the schema changed, that's what the reviewer should look at, not the 5,000-line Java class that resulted from it.

Mark your generated directories:

**/generated/* linguist-generated=true
src/api/client.ts linguist-generated=true

On GitHub, this collapses the file by default. The reviewer can still click "Show Diff" if they’re suspicious, but the "Signal" (your manual logic changes) is no longer buried under the "Noise" (the machine's output).

Smarter Merging for Configs

Have you ever had a merge conflict in a .env.example or a changelog.md that felt unnecessary? You can define merge strategies per file type.

For example, if you have a file where you always want to keep your version during a conflict, or if you want to use a specific "union" strategy (which keeps both sides of the conflict), you can specify that:

# Keep both sides of the change for the changelog
CHANGELOG.md merge=union

Better Diffs for Non-Text Files

Git is surprisingly capable if you give it a hint. If you have files that aren't exactly "code" but are still text-based (like Large Language Model prompts or long-form documentation), the default line-by-line diff is often useless because changing one word in a paragraph marks the whole block as "changed."

One way to fix this is by using Semantic Line Breaks (breaking lines at clauses), but if you can't change the writing style, you can at least tell Git to treat certain files differently.

For Word documents or PDFs (yes, really), you can actually configure Git to convert them to text before diffing. In your .gitattributes:

*.docx diff=word

Then, you’d configure your local git config:
git config diff.word.textconv pandoc --to=plain

Now, git diff shows you the actual text changes in a Word doc rather than a message saying "binary files differ."

A Cleaner Workflow

Setting up a .gitattributes file is a one-time task that pays dividends every time someone opens a PR. It respects your teammates' time.

Reviewing code is hard enough. Don't make people hunt for your logic in a haystack of node_modules updates and minified CSS. Mark your generated files, silence your lockfiles, and keep the focus on the code that actually matters.