Tree - source-git/mpg123 - CentOS Git server

source-git / mpg123

Blame doc/libmpg123_speed.txt

Blob History Raw

Packit	c32a2d	`This is historic... one should make a new investigation.`
Packit	c32a2d	`What I can say that a quick test of pre0.59s versus 1.7.3 with generic decoder on my x86-64 GNU/Linux box is not able to call a winner (or looser, for that matter).`
Packit	c32a2d	`Though, 1.8.0 will make the new libmpg123 a winner, because there is new optimization code going on!`
Packit	c32a2d
Packit	c32a2d	`The move to libmpg123 means some more code separation / interfacing and especially the move of any local static variables into the mpg123_handle to make multiple stream handling possible.`
Packit	c32a2d	`That may very well have an impact on performance of the mpg123 decoder.`
Packit	c32a2d	`I made some tests, even using gcc's -pg option and gprof, with mixed result: SSE and MMX on my Thinkpad X31 are slower, especially the asm synth funtion, while the generic code is fine.`
Packit	c32a2d	`On the other hand, on a K6-3+ using the same gcc version 4.1.2, the library based mpg123 is _faster_ for MMX and 3DNowExt.`
Packit	c32a2d	`Epecially the mmx synth is faster... while the 3DNowExt synth is slower, too (it's the same code as SSE synth, just calling different dct64) - but speedups in other regions still make 3DNowExt of the library mpg123 more efficient.`
Packit	c32a2d
Packit	c32a2d	`What I can clearly say is that dropping the multi-cpu support via ./configure --with-cpu does help in for both monolithic and library mpg123, but that is no wonder as it removes indirection.`
Packit	c32a2d	`The main point stays, though: On my Thinkpad the library is slow, on the K6-3+ it's fast.`
Packit	c32a2d
Packit	c32a2d	`What's the point to get here? I am not sure. We're depending on the compiler optimization (btw: Intel Compiler doesn't change the relation for the Thinkpad; not tested on the K6).`
Packit	c32a2d	`I guess that for my Thinkpad another gcc version could invert the picture again...`
Packit	c32a2d	`Also, I am not sure how far I should trust the gprof analysis... but it can be right; even when there is no apparent cause for the speed difference in the code itself, it could be some effect of cache and memory access.`
Packit	c32a2d	`Some reordering of instructions and data... for sure that happened.`
Packit	c32a2d
Packit	c32a2d	`I'll need further numbers to conclude anything about the (positive/negative) impact my code changes have.`
Packit	c32a2d
Packit	c32a2d	`OK, ran the test of trunk against branches/mpg123lib on my media box with AMD Geode (AthlonXP, actually):`
Packit	c32a2d
Packit	c32a2d	`thomas@kiste:~$ for i in mpg123-lib mpg123-trunk; do for cpu in mmx 3dnowext sse; do echo $i $cpu; time $i/src/mpg123 --cpu $cpu -q -t /thorma/var/music/metallica/ride_the_lightning/*.mp3; done; done`
Packit	c32a2d	`mpg123-lib mmx`
Packit	c32a2d
Packit	c32a2d	`real 0m25.949s`
Packit	c32a2d	`user 0m25.395s`
Packit	c32a2d	`sys 0m0.534s`
Packit	c32a2d	`mpg123-lib 3dnowext`
Packit	c32a2d
Packit	c32a2d	`real 0m25.442s`
Packit	c32a2d	`user 0m24.863s`
Packit	c32a2d	`sys 0m0.558s`
Packit	c32a2d	`mpg123-lib sse`
Packit	c32a2d
Packit	c32a2d	`real 0m25.794s`
Packit	c32a2d	`user 0m25.214s`
Packit	c32a2d	`sys 0m0.562s`
Packit	c32a2d	`mpg123-trunk mmx`
Packit	c32a2d
Packit	c32a2d	`real 0m26.650s`
Packit	c32a2d	`user 0m26.004s`
Packit	c32a2d	`sys 0m0.626s`
Packit	c32a2d	`mpg123-trunk 3dnowext`
Packit	c32a2d
Packit	c32a2d	`real 0m25.886s`
Packit	c32a2d	`user 0m25.262s`
Packit	c32a2d	`sys 0m0.600s`
Packit	c32a2d	`mpg123-trunk sse`
Packit	c32a2d
Packit	c32a2d	`real 0m25.695s`
Packit	c32a2d	`user 0m25.136s`
Packit	c32a2d	`sys 0m0.539s`
Packit	c32a2d	`thomas@kiste:~$ for i in mpg123-lib mpg123-trunk; do for cpu in 3dnow; do echo $i $cpu; time $i/src/mpg123 --cpu $cpu -q -t /thorma/var/music/metallica/ride_the_lightning/*.mp3; done; done`
Packit	c32a2d	`mpg123-lib 3dnow`
Packit	c32a2d
Packit	c32a2d	`real 0m33.011s`
Packit	c32a2d	`user 0m32.365s`
Packit	c32a2d	`sys 0m0.621s`
Packit	c32a2d	`mpg123-trunk 3dnow`
Packit	c32a2d
Packit	c32a2d	`real 0m32.830s`
Packit	c32a2d	`user 0m32.192s`
Packit	c32a2d	`sys 0m0.619s`
Packit	c32a2d
Packit	c32a2d
Packit	c32a2d	`You can't really make a decision there. It's tight.`
Packit	c32a2d	`What worries me a bit is the total loose of 3DNow against MMX - should it be that drastic?`
Packit	c32a2d	`Well, it's higher quality, at least.`
Packit	c32a2d
Packit	c32a2d
Packit	c32a2d	`Addendum: The game on an K6-3+`
Packit	c32a2d
Packit	c32a2d	`On mpg123 < 1.8.0, he 3DNowExt decoder used to be slower than the 3DNow decoder. Only recently it has been observed, that the simplification of the runtime decoder choice code sped up that one significantly, towards the same performance level as the single-decoder build of mpg123 1.6.4 (-with-cpu=3dnowext_alone having broken build in later versions:-/).`
Packit	c32a2d	`We are talking about a difference of 20% here... there is something special about the K6-3+ that makes is that sensitive to how the function pointers get thrown around.`
Packit	c32a2d
Packit	c32a2d	`Example numbers: Dynamic x86 build of 1.6.4, 3DNowExt needs 5.9 s, 3DNow 5.6 s.`
Packit	c32a2d	`3DNowExt-only build: 4.9 s`
Packit	c32a2d	`3DNow-only build: 5.6 s`
Packit	c32a2d	`Now... dynamic build of mpg123 trunk of 2010-05-24: 3DNowExt 4.9 s, 3DNow 5.6 s. That's how it should be. One might investigate how exactly the old ways before mpg123 1.8 worked against the K6-3+ ... possibly helping performance issues seen with the mpg123 coded for MPlayer on that CPU.`
Packit	c32a2d
Packit	c32a2d	`--`
Packit	c32a2d	`Thomas.`
Packit	c32a2d

source-git / mpg123

Source Code

Blame doc/libmpg123_speed.txt