Inconvenient syntax in source code

There are some ways that data can be expressed in source code or in text-format data that cause unnecessary difficulty when managing the files in git.

Difficulties take the form of excess differences that may bear little relation to the actual change. These excess differences are sometimes referred to as "diff noise", since the function of comparing files is named "diff".

The Cambridge Comma

A Web search for "cambridge comma" currently returns references to an April fool's day joke, however in computing there is a real need for a catchy sounding term for an additional technically redundant comma after the last item in a list!

Comma delimited lists cause an irritating problem in revision control because they tend to make the last entry in the list special. Every other entry consists of an item of data followed by a separator. The final entry is unique in that it is not followed by a separator.

Consider the following sample:

{
  Item 1,
  Item 2,
  Item 3,
  Item 4
}

See how Item 4 is not interchangeable with the others. Now lets say I extend the list to:

{
  Item 1,
  Item 2,
  Item 3,
  Item 4,
  Item 5
}

In order to add a line I have had to alter the previous line.

This is a problem in revision control as it means the change takes place over more than the minimum number of lines.

If an additional trailing comma is added the problem goes away

{
  Item 1,
  Item 2,
  Item 3,
  Item 4,
  Item 5,
}

We can now add, delete and re-arrange the lines without regard to their position in the list.

While this may seem like a trivial matter when editing manually this can be an enormous help when using revision control, especially when combined with machine-generated file formats where the user does not control the order the list elements appear in. JSON is an egregious offender. In contrast XML generally doesn't suffer from this as each entry tends to be of the form <opening tag> content </closing tag> with no special lead or tail format provided the XML is formatted nicely.

Where a trailing "Cambridge" comma is expressly disallowed a dummy end record could be added, having the same effect.

Unnecessary enumeration

Sometimes when listing data a file syntax requires the number of lines to be given separately. A classic example would be a dimensioned constant array.

Any change to the length of the content of the array must be accompanied by a change to the dimension. This is a nuisance as it is entirely possible for git to automatically track and merge multiple changes to the array's content, but the changes to the dimension value will overlap and need manual intervention.

The solution is typically to count the entries in the list and use the derived length instead. This can be achieved in modern C syntax by leaving the dimension blank.

If the number of elements is actually required this may be derived using "sizeof()". One complication of sizeof is that in XC8 at least the value only became valid in the module where the array was initialised, and even then only after the array initialisation. In order for this value to be useable elsewhere in the project it had to be assigned to a "constant" variable.

More unnecessary enumeration

Worse than the above is formats where the list is actually numbered.

Here's an extract from a project configuration file that manages to break both those rules

[EXPANDED_NODES]
0=MMC.mcppi
1=Sources
Count=2

This means that there is just no way that a plain-text merge could work on this without manual intervention. To add or remove a line a merge tool has to be written for the specific list format.

Project file trivia

Many development environments have a "project file" that stores information about how to build (compile) the project. They may also store information about the configuration of the computer the program runs on, and lists of what files are open in the IDE, tab orders etc.

The problem is that if the project file committed to git contains information about computer configuration it will cause conflicts between developers who may not have the exact same setup. The solution is typically to store project file paths relative to the project folder, and exclude the absolute project path from the file.

Further to the above if trivial IDE state information is stored in the project file then it is likely to change with every commit regardless of whether it is needed. The solution is to push IDE state information into a separate file.

Microchip partly addressed this in MPLAB X, as the IDE configuration file has been split in two parts, one in nbproject and the second machine-specific one in nbproject/private.

I would argue that there is a need for at least three separate configuration files, one for project configuration, one for machine specific configuration and a third one for IDE layout.