I finally accrued enough credit on Google Play to purchase HJ Lim’s recording of 30 Beethoven Sonata ($9.49). I pulled it up in VLC to give its metadata a cursory look-over, and I was horrifying. The birdbrain in charge of typing this up had bungled it royally – typos and inconsistencies (album-wide, but more seriously, sometimes within the same sonata) plagued the 98-piece traversal.
Because Google Play (like every sensible digital music vendor) sells everything in MP3, I couldn’t use my own tagging tool (naklo, which I unwisely and unportably tied to metaflac) to fix the problem. In fact, I have never before encountered such a huge set of MP3s to be batchwise tagged. Ex Falso could do the job, but the per-song overhead (because I’m inept and can’t navigate simple GUIs at all) was far, far too high. I needed some way to see all the titles (at the least, the per-sonata titles two/three/four at a time) at once, like the model I follow to use naklo.
mid3v2 came to my rescue. As of Fedora 23 it comes bundled with Mutagen – so it had been living quietly and anonymously on my system (Quod Libet and Ex Falso both pull in Mutagen) for a few years now. First things first – less a few modifications (including a typo and a really serious numbering anomaly), I exported the virgin tags straight into backup files.
for f in *.mp3; do mid3v2 -l “$f” > “$f”.tags; done
So each MP3 file yields up its ID3 tags into a file of the same name, plus a “.tags” extension I tack on. Then I collected all the titles in one single file to streamline the fascist homogenization process. “-h” suppresses the filename prepended to the match, and “TIT2” is the ID3 tag for “title.”
grep -h “TIT2” ./*.tags > itles
(It is my convention to call the control file “ontrol” and the titles file “itles” so I can run “naklo -c ontrol -t itles…” it’s a cute way of making the typing fall under my fingers.)
The dark descent
So all the titles now lived in the single file “itles.” I inhaled gently.
My inner librarian withered. I was diminished by the incontinence of this imbecile – I’ll call him “Hal” from now on – and how badly he had ruined the tagging on this box set.
The first in the cycle, the “Hammerklavier,” was given as “No.29 in Bb Opus 106.” (The first half of the box set roughly followed in this vein.) The last in the cycle, the jazzy no. 32, was tagged “No. 32 in C Minor, Op.111.” The space between “No.” and the number mysteriously slid over to separate the “Opus” from its number, the comma evaporated entirely, and “Opus” truncated itself to “Op.”
The strangest thing was watching Hal try to decide whether to use “Bb” or “B flat.” In fact, the “Moonlight” sonata suffered from this bizarre splinching:
TIT2=Sonata No. 14 in C Sharp Minor, Op.27 No.2 ‘Moonlight’: I. Adagio sostenuto
TIT2=Sonata No.14 in C# minor Opus 27 No.2 “Moonlight”: II. Allegretto
TIT2=Sonata No.14 in C# minor Opus 27 No.2 “Moonlight”: III. Presto agitato
and the remaining actually followed in the explicit “flat / sharp” manner.
The “Moonlight” editorial actually floats up another breakage: I consider single quotes sacred, so I avoid their use unless I want something written verbatim for bash. The split between double-quoted subtitles (“Hammerklavier”) and single-quoted ones (‘Pathetique’) was again roughly split half-and-half, former and latter halves.
Hal induced a rather special seizure in the “Waldstein:”
TIT2=Sonata No.21 in C Opus 53 “Waldstein”: I. Allegro con brio
TIT2=Sonata No.21 in C Opus 53 “Waldstein”: II. Introduzione: Adagio molto
TIT2=Piano Sonata No.21 in C Major, Op. 53 ‘Waldstein’: III. Rondo: Allegretto moderato – Prestissimo
The third movement was alone in the 98 pieces to have the “major” modifier stated explicitly, while the other 97 followed the “C [full stop] <-> A minor” convention.
“Jelly babies to manual”
The corner cases of the “Moonlight” and “Waldstein” were tweaked manually – there was no sense in writing generalized procedures to fix these singleton problems. The rest was justifiably automated.
All solutions provided in vim.
“Bb -> B flat” was easy:
:%s/ \([A-G]\)b / \1 Flat /g
Spacing issues are hardly worth mentioning:
:%s/ No. / No./
:%s/ Opus / Op./
Single-to-double quotes was a little tricky. Single-quoted subtitles should be changed to double-quoted ones, but single quotes afterward (“L’absence”) should be left alone:
:%s/ ‘\([^’]\+\)’/ “\1″/
As it turns out this consideration was stupid on my part – I should have gone straight for
and manually rolled back exceptions, because “Les Adieux” was the only title to actually feature a dangling single quote outside of its subtitle.
Tagging it all together
The single non-baby step was actually writing the tags back to the files. I tore all the current ID3 tags out and cobbled up a bash script, titled “ontrol.sh.” It should mostly be self-explanatory, save the lonely sed usage (to trim some trailing whitespace on the track numbers). The disc subtitles were lifted from thumbing through the individual album (HJ also released all the sonatas across four double-decker sets) covers on AllMusic.
I’m happy to say that I didn’t go the way of Hal on a bash script this elementary – with very few tweaks the whole thing ran fairly beautifully and produced the results I desired. Notice I entirely left out album, artist, and composer tags – I assumed the per-file overhead was not so high to dominate the operation, so I thought it prudent to restrict the script’s operation to only the things that I couldn’t do immediately through Ex Falso (and it does just fine in applying a single constant tag to a lot of files).
The final result is a far cry from Hal’s horrible hodgepodge that I can listen to without much guilt.