Rewriting project history for fun and profit
Cleaning up the version history of various firmware projects
Our Front panel adapter project started out as two separate programs:
The two were combined into a MikroBasic implementation of a front panel with a setup menu sometime around 2007, and the result was used on a minor variant of the RFG, then front panel development was largely suspended.
Later in about 2014-2015 development of the standard RFG firmware resumed. A key development was the introduction of the "enhanced" front panel adapter, which enabled the addition of further controls without the need to redesign the main Analogue Controller PCB. Around this time the code base was line-for-line translated from Basic to C.
The process of translating the code was slow but surprisingly straightforward, apart from some unfortunate differences in the way source modules are managed.
After the code base was converted from Basic to C the project was placed under a revision control using the "git" revision control system.
Prior to the use of git each major development had been retained by storing a copy of the entire project in a suitably named folder, but minor development stages were simply overwritten. The only way to really track the development path was through a large comment block at the beginning of the main section.
Revision control retains the history of the project as a series of date-stamped steps called "commits". At any time the project may be "rewound" to an earlier commit and that version tested, avoiding the need to duplicate the entire workspace. In addition it is possible to maintain parallel versions to support development in multiple directions that may later be reconciled with a "merge".
In short Revision control gives us a really deep "undo" capability and the ability to return to past versions. It also grants the designer the freedom to take development in multiple directions and reconcile them later.
With more organisation it allows multiple team members to work on the same project separately and simultaneously, then combine their work at a later date.
Due to inexperience and learning-by-doing the early project history of the front panel is slightly broken. In particular a significant number of files are in the history that should have been excluded. In addition the first four commits are incomplete, and do not contain all the required source files. By about the fifth commit the project is complete, containing all required source and configuration files, but at this point still also containing unwanted intermediate (non-source) files.
It isn't possible to simply edit the history directly, as each commit "depends" on the previous one with strong integrity-checking. Git uses a system that resembles a blockchain. This is excessive for a one-user RCS but essential to shared Git projects.
A "rebase" operation allows us to create an alternative history and apply all subsequent changes to that history, leaving us with a new chain of commits almost identical to the original but with the alterations we specify.
Typically the original history will be retained until a garbage collection operation removes it, unless the original history is still referenced somewhere. My preference is to "name" the old history so it is retained, then run comparisons between the old and the new history. This is critical as "rebase" is one of the few operations in GIT with potentially irreversible consequences.
MPLAB X Project file structure:
Project root contains source code, an editable "Makefile" and the ".gitignore" and ".gitattributes" configuration files
Note that some developers prefer to put source code in a subfolder. At this time I consider it a personal preference, though in some environments it may be mandatory.
".gitignore" contains a list of folders and files that are not to be stored in revision control. Note that typically the .gitignore file IS stored under revision control though it doesn't have to be. Also if an "ignored" file type is checked in, either because it was done prior to gitignore or it was forced, then it will continue to be tracked irrespective of if it is listed as excluded, so if you add .gitignore to an existing project you may still need to manually clean up.
".gitattributes" mostly contains instructions for handling various file types in the project. This tells Git if a file is a binary or a text file, and for text files if they are in Windows or Linux format. It is worth noting that MPLAB X appears to use a version of git called jgit which is older than the release version of Git, and does not check .gitattributes. This can give rise to compatibility issues
.git folder: This is where the history is stored and will usually only be accessed using "git" commands.
build, debug, dist folders: These are where compiled code and intermediate files go, and should be excluded from the repository
nbproject: This contains two files that are needed to reconstruct the project's configuration: configurations.xml and project.xml. It also contains a significant number of generated files that are usually excluded, and the "private" subfolder containing machine-specific configuration. For this reason the .gitignore rules for "nbproject" tend to be a bit complicated.
nbproject/private contains a SECOND configurations.xml file. This is deliberate, the MPLAB options have been split such that project configuration goes in the first one and computer-specific information such as install files will be stored in the private one. By ignoring "private" we help ensure that the stored project is "portable" between computers.
nbproject: project.project appears to be a junk file of zero length. It is not always present and can usually be deleted.
Filling in gaps in the opening commits of the Front Panel project
Given what we now know about MPLAB it is possible to inspect the project commits to determine what should have been excluded. Specifically it will be desirable to insert the correct "gitignore" at the outset. We'll also insert a "gitattributes" file but with only the minimum "* text=auto" configuration to reduce issues with line-endings.
The first commit is titled: "added some comments" and only contains the main source file, at this point still named fpa0862.c. Revision control removes the need to "version" the filename, so it is preferable to use one consistent filename.
The second commit is titled: "Added Idle hook to displays and Read_Analog", and is the one where the majority of the project is added to revision control, but not the configuration. This is also the point where the options filename is corrected from fpa0861options to fpa0862options.
The third commit is "Imported serial code" and just adds serial.c, a complete MAX3110 demo that has yet to be adapted into a library.
The fourth: "Serial code compiles" adds an incomplete "gitignore" file, the missing configurations and a lot of files that should have been ignored.
The sixth "Testing Serial" also adds some unwanted files.
In particular most of "nbproject" should have been excluded. These files pop up repeatedly in subsequent commits.
1: Make a "dry run" on a duplicate before attempting the proper operation. If too many errors occur then abandon the attempt and try again later (This turned out to be unnecessary as Git allows you to easily abandon the rebase)
2: Initiate an interactive rewrite of "master"
3: Duplicate the first commit
4: Select editing of the first, second, third (was second) and fifth (was fourth) commits
5: When the first commit comes up amend it, adding the proper gitignore, removing any source files already present and adding all the 0.861 project files.
5b: Note that I now consider it preferable to put .gitignore and .gitattributes in a commit at the beginning before the project files, but this is a personal preference not a requirement.
6: The second commit should complete by itself, however if there is contention then the source file may need adding. amend it to remove the old source file. When this is done GIT will report that fpa0861 was renamed to fpa0862.
7: When the third commit comes up make sure the redundant fpa0861options files are removed, using amend if necessary. Again this makes it look like the files were renamed.
8: When the fifth commit comes up make sure the unwanted files are removed.
An example command would be git rm –cached <filename> which stages a deletion. We then amend the previous commit, the add and delete cancel out leaving the file on disk but not committed. Then we deliberately remove the unwanted files from nbproject so any subsequent commits referencing those files will fail and need merging. Note the details of when to use rm --cached and when to use reset are complex, and explained better elsewhere
9: subsequent commits may fail due to containing changes to excluded files, the procedure is simply to "rm --cached" these files in order that the "rebase" can continue. Deliberately remove the unwanted files from nbproject.
It should be noted that there is an easier way to do the file removal, it is possible to bulk scan a whole repository to remove files matching a pattern, but Git filter-branch is a separate subject
Perform an interractive in-place rebase of the current branch, it will open a text editor listing the last <number> commits OLDEST FIRST (the reverse of how commits are usually visualised)
Goes all the way back to the beginning
When using "rebase" interractively the text editor is often "vim" which operates in a slightly obscure way. To perform conventional editing place the cursor where you wish to start and press "i" then press escape when done
Alternatively some configurations (github shell) just open notepad instead
"git status" indicates which files are staged to be committed
On the first test run a significant number of "Merge conflicts" sprung up. Merging is supposed to be the correct way to resolve conflicts between the old and new history, but there should not have been any conflicts. Most of these were line-ending problems. By having an inconsistent line-ending configuration different installs of Git see the same file differently. ".gitattributes" fixes this.
It also looks as if there is no simple way to "back out" of a merge commit. Normal commits can be reset and rewritten using the "splitting" procedure or by amending, but once a merge is in progress it must be completed. This is tricky as the merge procedure is quite scary when you are unfamiliar
git checkout --theirs <filename> retrieves the newer version
git checkout --ours <filename> retrieves the older version (This contradicts how a "merge" normally works, in rebase "ours" represents the status quo, "theirs" represents the new
git rm --cached <filename> sucessfully unstages the unwanted files
Using the above it was possible to rebase a test version however the "github" git version had persistent problems merging XML files and tripped over CR/LF issues frequently
A subsequent attempt with ".gitattributes" added completed without the problems.
Further it was a trivial matter to point the master branch to the new location on contract1971 and to rebase devel onto its new home:
git rebase --onto <destination> <source> <branch>
Footnotes: Compatibility with MPLAB/Netbeans
There is an important language difference: If you are checking out a revision in Netbeans then Revert just means discard the current changes.
In GIT "revert" often but not always means create a commit that undoes a previous commit.
The "switch" operation in Netbeans appears to be the GIT RESET function which is important but "unsafe".
Monday re-try using Thursday 14/9/2017 backup as base (to remove Friday's "hacking")
The "Modifications to I2C" commit has an unwanted disassembly file. Removed.
MPLAB git may be set to treat all files as binary?
contract1971 vfd7000 compiles (other variants currently need work anyway)
Checking out revisions in MPLAB. MPLAB tends to choke if there are major changes to project settings, so if project.XML changes then it is best to close the project, perform the check-out in Git GUI, then re-open the project.
master vfd7000 compiles
0861 variants compile but "complain"
Further work: splicing a project history.
The Frequency Agile project was split into two projects for reasons that seemed good at the time.
This is not too hard to fix.
Start with the newest project. Create an extra "was" branch at the head so it will be retained.
Add the old project as a "remote" and "fetch" it. This will leave you with two histories.
Create a "old" branch on the remote. I thought I had to force this using its SHA, but actually "create new branch" should have done it.
Remove the remote. Now we have a master, a "was" and a "old" commit chain.
Check out "old" then delete all the project files and check in the result. This is important, the newer commits start from an empty state so to join them up Git needs to see an empty project. The "old" chain should end in an empty commit.
Rebase master onto old:
git checkout master
git rebase --onto old --root
This should proceed without conflict since there were no files at the head of old.
The new history has a gap where everything was deleted. Perform an interactive rebase and "squash" the following empty commit. This will close the gap enabling changes to be tracked across the join.
After the join use the "diff" function to compare master with "was" to confirm there is no change
Resolution of the line-endings issue
Background to the line-endings issue:
Different computer systems use different line ending markers, and this causes problems when plain text is transferred from one computer to another.
MS-DOS derived systems use "CRLF", two bytes.
Linux uses "LF"
Many GNU tools use "LF" even when ported to Windows.
Some really old systems use "CR".
Git internally expects text files to use "LF" but can tolerate "CRLF".
The official strategy for Git on Windows is to convert text files to Linux format when checking them in and convert them back when checking out.
The ".gitattributes" file should list file types and their correct conversion strategy.
A common alternative in Windows (apparently used by MPLAB) is to perform no conversions at all. In order to make MPLAB git follow the convention followed by later "gits" it is nessecery to make a configuration change in the repository (not global). In repository config add "autocrlf=true" to the section "[core]".
Problems occur when a Windows only project is transferred to Linux.
It may be preferable to "normalize" line endings.
There are three strategies to follow:
1: Given the choice start a project with a clear ".gitattributes" configuration
2: Have a change-over commit and recommit everything that is affected, then use normalised format from that point on. This may still cause grief if anyone has to access the history.
3: rewrite the whole history to the correct format. This is a scary option but ultimately preferable.