Blame doc/README.gain

Packit c32a2d
Thomas Orgis on RVA, Gain and Pain
Packit c32a2d
Packit c32a2d
Packit c32a2d
Ok, so I'm going to add RVA/ReplayGain support... the problem there is not to read these valus from ID3 or Lame, not even to do the adjustment itself.
Packit c32a2d
The problem is more to figure out how to interpret the dB values one gets there.
Packit c32a2d
Packit c32a2d
Main players in the field of relative volume adjustment / soft gain (without modifying actual audio data):
Packit c32a2d
Packit c32a2d
http://www1.cs.columbia.edu/~cvaill/normalize/
Packit c32a2d
	...writing RVA2 ID3v2tags for dB offset to user target amplitude, default being -12dB(FS)
Packit c32a2d
http://www.replaygain.org/
Packit c32a2d
	...store the difference to reference of 83dB(SPL) ... somewhere
Packit c32a2d
Packit c32a2d
Both calculate some running RMS and do statistics with this - the main difference is the potentially different target level.
Packit c32a2d
Also both know two basic types of adjustment: Per track to make all tracks sound at the same sevel (track / radio) and the one with default meaning to keep the loudness relations over albums (batch / audiophile).
Packit c32a2d
Packit c32a2d
dB can mean many things and also the raw value of a PCM sample doesn't equal directly to loudness (power of a wave != amplitude).
Packit c32a2d
Packit c32a2d
So that says the ReplayGain about applying the adjustment:
Packit c32a2d
Packit c32a2d
	scale=10.^(replay_gain/20);
Packit c32a2d
Packit c32a2d
luckily, this is the same that I worked out on my own for the normalize RVA values in my mixplayer script:
Packit c32a2d
Packit c32a2d
	return 10**($s/20);
Packit c32a2d
Packit c32a2d
I'll take that interpretaion of dB -> linear scale factor for samples for granted, then.
Packit c32a2d
Packit c32a2d
The replay_gain value is meant in the standard to represent the offset to 83dB(SPL - depending on your amplifier...), having in mind that actual most wanted average playback level should be 83dB(SPL) (defined by movie ppl as the loudness of a -20dB(FS) signal, leaving room for louder stuff).
Packit c32a2d
But then there is the proposal to add 6dB preamp for pop music - am I judging music types with mpg123??
Packit c32a2d
These 6dB are in fact the real world since lots of programs use 89dB(SPL...) as reference.
Packit c32a2d
Thus, lame since 3.95.1 (according to MADplay's Rob Leslie who discussed with Lame ppl, verified in 3.96 source) stores the adjustment to 89dB.
Packit c32a2d
To make that all sound the same, one should add 6bB to lame <3.95.1 ReplayGain values and use later ones verbatim - achieving 89dB everytime, whatever that may mean in reality out of my speakers (my Marantz' volume knob doesn't have a scale at all - be it dB or percent;-).
Packit c32a2d
Packit c32a2d
A funny aspect of this 6dB issue is to tell lame 3.95.1 from lame 3.95 
Packit c32a2d
Packit c32a2d
As for normalize... the desired playback level is essentially undefined. Ignoring that and realizing that mpg123 has no way to determine real world sound power anyway, one has to just take the provided dB values and apply with the formula above.
Packit c32a2d
The user is responsible for providing files with his desired settings... for that reason I also won't follow the ReplayGain demand/suggestion that a player should apply an average of gains of previous tracks if the current one lacks a setting.
Packit c32a2d
Packit c32a2d
So, well. Considering that ReplayGain (at least the radio one) being stored by current lame on encoding, I suppose that if there are RVA2 values in ID3v2 tags, these were added by a conscious user act and are overriding the ReplayGain ones.
Packit c32a2d
Packit c32a2d
I already read ReplayGain entries in Lame tag... should add ID3v2 parsing. Especially since the lame tag is ambignous because of the 6dB issue... I cannot distinguish 3.95.1 from 3.95 by reading the tag - frick!
Packit c32a2d
But wait... 6dB?
Packit c32a2d
Packit c32a2d
[thomas@thorvas /home/thomas-data/mpg123-neu/lame-3.96.1]$ frontend/lame --cbr -T /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3 ../testfiles/happy_man_lame-3.96.1.mp3
Packit c32a2d
ID3v2 found. Be aware that the ID3 tag is currently lost when transcoding.
Packit c32a2d
LAME version 3.96.1 (http://lame.sourceforge.net/)
Packit c32a2d
Using polyphase lowpass filter, transition band: 17249 Hz - 17782 Hz
Packit c32a2d
Encoding /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3
Packit c32a2d
      to ../testfiles/happy_man_lame-3.96.1.mp3
Packit c32a2d
Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=3
Packit c32a2d
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
Packit c32a2d
  6371/6374  (100%)|    1:41/    1:41|    1:47/    1:47|   1.6353x|    0:00 
Packit c32a2d
average: 128.0 kbps   LR: 754 (11.83%)   MS: 5620 (88.17%)
Packit c32a2d
Packit c32a2d
Writing LAME Tag...done
Packit c32a2d
ReplayGain: -7.4dB
Packit c32a2d
revmethod = 1
Packit c32a2d
encoder padding: 1728
Packit c32a2d
Packit c32a2d
[thomas@thorvas /home/thomas-data/mpg123-neu/lame-3.95.1]$ frontend/lame --cbr -T /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3 ../testfiles/happy_man.mp3
Packit c32a2d
ID3v2 found. Be aware that the ID3 tag is currently lost when transcoding.
Packit c32a2d
LAME version 3.95  (http://www.mp3dev.org/)
Packit c32a2d
Using polyphase lowpass  filter, transition band: 17249 Hz - 17782 Hz
Packit c32a2d
Encoding /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3
Packit c32a2d
      to ../testfiles/happy_man.mp3
Packit c32a2d
Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=3
Packit c32a2d
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
Packit c32a2d
  6371/6374  (100%)|    1:36/    1:36|    1:48/    1:48|   1.7289x|    0:00 
Packit c32a2d
average: 128.0 kbps   LR: 759 (11.91%)   MS: 5615 (88.09%)
Packit c32a2d
Packit c32a2d
Writing LAME Tag...done
Packit c32a2d
ReplayGain: -7.4dB
Packit c32a2d
Packit c32a2d
[thomas@thorvas /home/thomas-data/mpg123-neu/lame-3.95]$ frontend/lame --cbr -T /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3 ../testfiles/happy_man_lame-3.95.mp3
Packit c32a2d
ID3v2 found. Be aware that the ID3 tag is currently lost when transcoding.
Packit c32a2d
LAME version 3.95  (http://www.mp3dev.org/)
Packit c32a2d
Using polyphase lowpass  filter, transition band: 17249 Hz - 17782 Hz
Packit c32a2d
Encoding /mnt/knecht_mp3/music/covenant/2006_skyshaper/03-happy_man.mp3
Packit c32a2d
      to ../testfiles/happy_man_lame-3.95.mp3
Packit c32a2d
Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=3
Packit c32a2d
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
Packit c32a2d
  6371/6374  (100%)|    1:37/    1:37|    1:43/    1:43|   1.7041x|    0:00 
Packit c32a2d
average: 128.0 kbps   LR: 759 (11.91%)   MS: 5615 (88.09%)
Packit c32a2d
Packit c32a2d
Writing LAME Tag...done
Packit c32a2d
ReplayGain: -13.4dB
Packit c32a2d
Packit c32a2d
Packit c32a2d
Together with the gain values read from tags: 
Packit c32a2d
Packit c32a2d
3.96.1:	-1.0dB	(claimed -7.4dB)
Packit c32a2d
3.95:	-1.0dB	(claimed -7.4dB)
Packit c32a2d
3.95:	-0.6dB	(claimed -13.4dB)
Packit c32a2d
Packit c32a2d
So, the difference of 6dB shows in the values lame prints on the command line... but the lame tags only have 0.4dB difference and are much lower anyway - do I parse them correctly?
Packit c32a2d
Packit c32a2d
Opinion of normalize of these files: -2dB. Great. I guess the -1 is what lame really meant, then... 
Packit c32a2d
Packit c32a2d
Packit c32a2d
Storage places
Packit c32a2d
==============
Packit c32a2d
Packit c32a2d
Points 1, 2 and 4 implemented to some respect.
Packit c32a2d
Packit c32a2d
Packit c32a2d
1. Lame/Info tag
Packit c32a2d
Packit c32a2d
supposedly in format according to the proposed standard - but I yet have to verify if Lame really does this.
Packit c32a2d
see http://gabriel.mp3-tech.org/mp3infotag.html
Packit c32a2d
Packit c32a2d
Packit c32a2d
2. ID3v2 RVA2 frame(s)
Packit c32a2d
Packit c32a2d
Normalize does that. Rare is the software reading that.
Packit c32a2d
I've never seen those frames since id3v2 -l doesn't know them.
Packit c32a2d
Packit c32a2d
Packit c32a2d
3. APE tags
Packit c32a2d
Packit c32a2d
Gah, another Tag format. Foobar2000 uses this as default.
Packit c32a2d
It's getting real-hy messy folks
Packit c32a2d
Packit c32a2d
Packit c32a2d
4. Per convention in ID3 tags
Packit c32a2d
Packit c32a2d
Well, I myself once used the ID3v1 comment field for storing the mix rva value (textual) ... but that is a tad too unspecific.
Packit c32a2d
I then also used user-defined ID3v2 comments like that:
Packit c32a2d
Packit c32a2d
[thomas@thorvas /home/thomas-data/mpg123-neu/svn/trunk]$ id3v2 -l /mnt/knecht_mp3/music/underworld/second_toughest_in_the_infants/02-banstyle_sappys_curry.mp3 
Packit c32a2d
id3v1 tag info for /mnt/knecht_mp3/music/underworld/second_toughest_in_the_infants/02-banstyle_sappys_curry.mp3:
Packit c32a2d
Title  : banstyle  sappys curry          Artist: underworld                    
Packit c32a2d
Album  : second toughest in the infants  Year: 0   , Genre: Other (12)
Packit c32a2d
Comment: Created by Grip                 Track: 2
Packit c32a2d
id3v2 tag info for /mnt/knecht_mp3/music/underworld/second_toughest_in_the_infants/02-banstyle_sappys_curry.mp3:
Packit c32a2d
TYER (Year): 0
Packit c32a2d
TRCK (Track number/Position in set): 2
Packit c32a2d
COMM (Comments): (ID3v1 Comment)[XXX]: Created by Grip
Packit c32a2d
TCON (Content type): Other (12)
Packit c32a2d
TPE1 (Lead performer(s)/Soloist(s)): underworld
Packit c32a2d
TALB (Album/Movie/Show title): second toughest in the infants
Packit c32a2d
TIT2 (Title/songname/content description): banstyle  sappys curry
Packit c32a2d
COMM (Comments): (RVA)[]: 4.3291
Packit c32a2d
COMM (Comments): (RVA_ALBUM)[]: 3.666101
Packit c32a2d
Packit c32a2d
That still doesn't look like a bad Idea to me. Not bothering with byte ordering and whatnot. Just atof(id3v2_comm_rva).
Packit c32a2d
One could still add dB, though.
Packit c32a2d
Packit c32a2d
Another convention is (rockbox mailinglist, not checked myself) used by Foobar:
Packit c32a2d
Packit c32a2d
TXXX (User defined text information): (replaygain_track_gain): -7.17 dB 
Packit c32a2d
TXXX (User defined text information): (replaygain_track_peak): 1.057122 
Packit c32a2d
TXXX (User defined text information): (replaygain_album_gain): -6.53 dB 
Packit c32a2d
TXXX (User defined text information): (replaygain_album_peak): 1.107456
Packit c32a2d
Packit c32a2d
So what are custom comment fields for when there are also custom text fields? They look very similar to me.
Packit c32a2d
Packit c32a2d
Packit c32a2d
5. Leave the haunted music file alone and store metadata externally.
Packit c32a2d
Packit c32a2d
That's the only sane way for stuff like album art... and it's the way I do it in my music archive. the wrapper script reads the adjustment values and then sets an adjusted volume.
Packit c32a2d
That's fine for my mixing daemon that manupulates the pcm data anyway, but it would be nice to have this functionality in the minimalist console mode. too.
Packit c32a2d
Even more since it can be done without additional cpu power during decoding (well, one-time set up of the decode tables is needed for every track) similar to the equalizer.
Packit c32a2d
I could simply start with text files with lines like
Packit c32a2d
Packit c32a2d
RVA_MIX: 3.4dB
Packit c32a2d
RVA_ALBUM: 1.7dB
Packit c32a2d
Packit c32a2d
Prob here is that the effort to open and parse that extra file may hinder gapless decoding between tracks...
Packit c32a2d
Well, one could parse all metadata files for a list of tracks before playback starts.
Packit c32a2d
But all this won't work for streams via stdin (hm, one could argue if the stream needs RVA at all).