Blame docs/diff-internals.md

Packit Service 20376f
Diff is broken into four phases:
Packit Service 20376f
Packit Service 20376f
1. Building a list of things that have changed.  These changes are called
Packit Service 20376f
   deltas (git_diff_delta objects) and are grouped into a git_diff_list.
Packit Service 20376f
2. Applying file similarity measurement for rename and copy detection (and
Packit Service 20376f
   to potentially split files that have changed radically).  This step is
Packit Service 20376f
   optional.
Packit Service 20376f
3. Computing the textual diff for each delta.  Not all deltas have a
Packit Service 20376f
   meaningful textual diff.  For those that do, the textual diff can
Packit Service 20376f
   either be generated on the fly and passed to output callbacks or can be
Packit Service 20376f
   turned into a git_diff_patch object.
Packit Service 20376f
4. Formatting the diff and/or patch into standard text formats (such as
Packit Service 20376f
   patches, raw lists, etc).
Packit Service 20376f
Packit Service 20376f
In the source code, step 1 is implemented in `src/diff.c`, step 2 in
Packit Service 20376f
`src/diff_tform.c`, step 3 in `src/diff_patch.c`, and step 4 in
Packit Service 20376f
`src/diff_print.c`.  Additionally, when it comes to accessing file
Packit Service 20376f
content, everything goes through diff drivers that are implemented in
Packit Service 20376f
`src/diff_driver.c`.
Packit Service 20376f
Packit Service 20376f
External Objects
Packit Service 20376f
----------------
Packit Service 20376f
Packit Service 20376f
* `git_diff_options` represents user choices about how a diff should be
Packit Service 20376f
  performed and is passed to most diff generating functions.
Packit Service 20376f
* `git_diff_file` represents an item on one side of a possible delta
Packit Service 20376f
* `git_diff_delta` represents a pair of items that have changed in some
Packit Service 20376f
  way - it contains two `git_diff_file` plus a status and other stuff.
Packit Service 20376f
* `git_diff_list` is a list of deltas along with information about how
Packit Service 20376f
  those particular deltas were found.
Packit Service 20376f
* `git_diff_patch` represents the actual diff between a pair of items.  In
Packit Service 20376f
  some cases, a delta may not have a corresponding patch, if the objects
Packit Service 20376f
  are binary, for example.  The content of a patch will be a set of hunks
Packit Service 20376f
  and lines.
Packit Service 20376f
* A `hunk` is range of lines described by a `git_diff_range` (i.e.  "lines
Packit Service 20376f
  10-20 in the old file became lines 12-23 in the new").  It will have a
Packit Service 20376f
  header that compactly represents that information, and it will have a
Packit Service 20376f
  number of lines of context surrounding added and deleted lines.
Packit Service 20376f
* A `line` is simple a line of data along with a `git_diff_line_t` value
Packit Service 20376f
  that tells how the data should be interpreted (e.g. context or added).
Packit Service 20376f
Packit Service 20376f
Internal Objects
Packit Service 20376f
----------------
Packit Service 20376f
Packit Service 20376f
* `git_diff_file_content` is an internal structure that represents the
Packit Service 20376f
  data on one side of an item to be diffed; it is an augmented
Packit Service 20376f
  `git_diff_file` with more flags and the actual file data.
Packit Service 20376f
Packit Service 20376f
    * it is created from a repository plus a) a git_diff_file, b) a git_blob,
Packit Service 20376f
   or c) raw data and size
Packit Service 20376f
    * there are three main operations on git_diff_file_content:
Packit Service 20376f
    
Packit Service 20376f
        * _initialization_ sets up the data structure and does what it can up to,
Packit Service 20376f
          but not including loading and looking at the actual data
Packit Service 20376f
        * _loading_ loads the data, preprocesses it (i.e. applies filters) and
Packit Service 20376f
          potentially analyzes it (to decide if binary)
Packit Service 20376f
        * _free_ releases loaded data and frees any allocated memory
Packit Service 20376f
Packit Service 20376f
* The internal structure of a `git_diff_patch` stores the actual diff
Packit Service 20376f
  between a pair of `git_diff_file_content` items
Packit Service 20376f
Packit Service 20376f
    * it may be "unset" if the items are not diffable
Packit Service 20376f
    * "empty" if the items are the same
Packit Service 20376f
    * otherwise it will consist of a set of hunks each of which covers some
Packit Service 20376f
      number of lines of context, additions and deletions
Packit Service 20376f
    * a patch is created from two git_diff_file_content items
Packit Service 20376f
    * a patch is fully instantiated in three phases:
Packit Service 20376f
    
Packit Service 20376f
        * initial creation and initialization
Packit Service 20376f
        * loading of data and preliminary data examination
Packit Service 20376f
        * diffing of data and optional storage of diffs
Packit Service 20376f
    * (TBD) if a patch is asked to store the diffs and the size of the diff
Packit Service 20376f
      is significantly smaller than the raw data of the two sides, then the
Packit Service 20376f
      patch may be flattened using a pool of string data
Packit Service 20376f
Packit Service 20376f
* `git_diff_output` is an internal structure that represents an output
Packit Service 20376f
  target for a `git_diff_patch`
Packit Service 20376f
    * It consists of file, hunk, and line callbacks, plus a payload
Packit Service 20376f
    * There is a standard flattened output that can be used for plain text output
Packit Service 20376f
    * Typically we use a `git_xdiff_output` which drives the callbacks via the
Packit Service 20376f
      xdiff code taken from core Git.
Packit Service 20376f
Packit Service 20376f
* `git_diff_driver` is an internal structure that encapsulates the logic
Packit Service 20376f
  for a given type of file
Packit Service 20376f
    * a driver is looked up based on the name and mode of a file.
Packit Service 20376f
    * the driver can then be used to:
Packit Service 20376f
        * determine if a file is binary (by attributes, by git_diff_options
Packit Service 20376f
          settings, or by examining the content)
Packit Service 20376f
        * give you a function pointer that is used to evaluate function context
Packit Service 20376f
          for hunk headers
Packit Service 20376f
    * At some point, the logic for getting a filtered version of file content
Packit Service 20376f
      or calculating the OID of a file may be moved into the driver.