alpaastero

Libav and Free Software development


Leave a comment

RealAudio interleavers

RealAudio files have several possible interleavers. The simplest is “Int0”, which means that the packets are in order. Today, I was contrasting “Int4” and “genr”. They both require rearranging data, in highly similar but not identical ways. “genr” is slightly more complex than “Int4”.

A typical Int4 pattern, writing to subpacket 0, 1, 2, 3, etc, would read data from subpacket 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 1, 7, 13, etc, in that order – assuming subpkt_h is 12, as it was in one sample file. It is effectively subpacket_h rows of subpacket_h / 2 columns, counting up by subpacket_h / 2 and wrapping every two rows.

A typical genr pattern is a little trickier. For subpacket_h = 14, and the same 6 columns per row as above, the pattern to read from looks like 0, 12, 24, 36, 48, 60, 72, 6, 18, 30, 42, 54, 66, 78, 1, etc.

I spent most of today implementing genr, carefully working with a paper notebook, pencil, Python, and a terse formula from the old implementation:

case DEINT_ID_GENR:
for (x = 0; x < w/sps; x++) avio_read(pb, ast->pkt.data+sps*(h*x+((h+1)/2)*(y&1)+(y>>1)), sps);

After various debug printfs, a lot of quality time in GDB running commands like x /94x (pkt->data + 14 * 94), a few interestingly garbled bits of audio playback, and a mentor pointing out I have some improvements to make on header parsing, I can play (some) genr files.

I have also recently implemented SIPR support, and it works in both RA and RM files. RV10 video also largely works.


1 Comment

Framecrc

Today, I learned how to use framecrc as a debug tool. Many Libav tests use framecrc to compare expected and actual decoding. While rewriting existing code, the output from the old and new versions of the code on the same sample can be checked; this makes a lot of mistakes clear quickly, including ones that can be quite difficult to debug otherwise.

Checking framecrcs interactively is straightforward: ./avconv -i somefile -c:a copy -f framecrc -. The -c:a copy specifies that the original, rather than decoded, packet should be used. The - at the end makes the output go to stdout, rather than a named file.

The output has several columns, for the stream index, dts, pts, duration, packet size, and crc:

0, 0, 0, 192, 2304, 0xbf0a6b45
0, 192, 192, 192, 2304, 0xdd016b78
0, 384, 384, 192, 2304, 0x18da71d6
0, 576, 576, 192, 2304, 0xcf5a6a07
0, 768, 768, 192, 2304, 0x3a84620a

It is also unusually simple to find out what the fields are, as libavformat/framecrcenc.c spells it out quite clearly:

static int framecrc_write_packet(struct AVFormatContext *s, AVPacket *pkt)
{
uint32_t crc = av_adler32_update(0, pkt->data, pkt->size);
char buf[256];

snprintf(buf, sizeof(buf), “%d, %10″PRId64″, %10″PRId64″, %8d, %8d, 0x%08″PRIx32″\n”,
pkt->stream_index, pkt->dts, pkt->pts, pkt->duration, pkt->size, crc);
avio_write(s->pb, buf, strlen(buf));
return 0;
}

Keiler, one of my Libav mentors, patiently explained the above; I hope documenting it helps other people who are starting with Libav development.


Leave a comment

Microbenchmarking: a null result

My first patch for undefined behavior eliminates left shifts of negative numbers, replacing a << b (where a can be negative) with a * (1 << b). This change fixes bug686, at least for fate-idct8x8 and libavcodec/dct-test -i (compiled with ubsan and fno-sanitize-recover). Due to Libav policy, the next step is to benchmark the change. I was also asked to write a simple benchmarking HowTo for the Libav wiki.

First, I installed perf: sudo aptitude install linux-tools-generic
I made two build directories, and built the code with defined behavior in one, and the code with undefined behavior in the other (with ../configure && make -j8 && make fate). Then, in each directory, I ran:

perf stat --repeat 150 ./libavcodec/dct-test -i > /dev/null

The results were somewhat more stable than with –repeat 30, but it still looks much more like noise than a meaningful result. I ran the command with –repeat 30 for both before the recorded 150 run, so both would start on equal footing. With defined behavior, the results were “0.121670022 seconds time elapsed ( +-  0.11% )”; with undefined behavior, “0.123038640 seconds time elapsed ( +-  0.15% )”. The best of a further three runs had the opposite result, shown below:

% cat undef.150.best

perf stat –repeat 150 ./libavcodec/dct-test -i > /dev/null

Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

120.427535 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
21 context-switches # 0.178 K/sec ( +- 1.88% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
226 page-faults # 0.002 M/sec ( +- 0.01% )
455’393’772 cycles # 3.781 GHz ( +- 0.05% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1’306’169’698 instructions # 2.87 insns per cycle ( +- 0.00% )
89’674’090 branches # 744.631 M/sec ( +- 0.00% )
1’144’351 branch-misses # 1.28% of all branches ( +- 0.18% )

0.120741498 seconds time elapse

% cat def.150.best

Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

120.838976 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
21 context-switches # 0.172 K/sec ( +- 1.98% )
0 cpu-migrations # 0.000 K/sec
226 page-faults # 0.002 M/sec ( +- 0.01% )
457’077’626 cycles # 3.783 GHz ( +- 0.08% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1’306’321’521 instructions # 2.86 insns per cycle ( +- 0.00% )
89’673’780 branches # 742.093 M/sec ( +- 0.00% )
1’148’393 branch-misses # 1.28% of all branches ( +- 0.11% )

0.121162660 seconds time elapsed ( +- 0.11% )

I also compared the disassembled code from jrevdct.o, before and after the changes to have defined behavior (using gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 on x86_64).

In the build directory for the code with defined behavior:
objdump -d libavcodec/jrevdct.o > def.dis
sed -e 's/^.*://' def.dis > noline.def.dis

In the build directory for the code with undefined behavior:
objdump -d libavcodec/jrevdct.o > undef.dis
sed -e 's/^.*://' undef.dis > noline.undef.dis

Leaving aside difference in jump locations (despite the fact that they can impact performance), there are two differences:

diff -u build_benchmark_undef/noline.undef.dis build_benchmark_def/noline.def.dis

–       0f bf 50 f0             movswl -0x10(%rax),%edx
+       0f b7 58 f0             movzwl -0x10(%rax),%ebxi

It’s switched to using a zero-extension rather than sign-extension in one place.

–       74 1c                   je     40 <ff_j_rev_dct+0x40>
–       c1 e2 02                shl    $0x2,%edx
–       0f bf d2                movswl %dx,%edx
–       89 d1                   mov    %edx,%ecx
–       0f b7 d2                movzwl %dx,%edx
–       c1 e1 10                shl    $0x10,%ecx
–       09 d1                   or     %edx,%ecx
–       89 48 f0                mov    %ecx,-0x10(%rax)
–       89 48 f4                mov    %ecx,-0xc(%rax)
–       89 48 f8                mov    %ecx,-0x8(%rax)
–       89 48 fc                mov    %ecx,-0x4(%rax)
+       74 19                   je     3d <ff_j_rev_dct+0x3d>
+       c1 e3 02                shl    $0x2,%ebx
+       89 da                   mov    %ebx,%edx
+       0f b7 db                movzwl %bx,%ebx
+       c1 e2 10                shl    $0x10,%edx
+       09 da                   or     %ebx,%edx
+       89 50 f0                mov    %edx,-0x10(%rax)
+       89 50 f4                mov    %edx,-0xc(%rax)
+       89 50 f8                mov    %edx,-0x8(%rax)
+       89 50 fc                mov    %edx,-0x4(%rax)

Leaving aside differences in register use:

–       0f bf d2                movswl %dx,%edx
There is one extra movswl instruction in the version with undefined behavior, at least with the particular version of the particular compiler for the particular architecture checked.

This is an example of a null result while benchmarking; neither version performs better, although any given benchmark has one or the other come out ahead, generally by less than the variance within the run. If this were a suggested performance change, it would not make sense to apply it. However, the point of this change was correctness; a performance increase is not expected, and the lack of a performance penalty is a bonus.


1 Comment

A welcome task: looking for undefined behavior with -fsanitize=undefined

One of my fantastic OPW mentors prepared a “Welcome task package”, of self-contained, approachable, useful tasks that can be done while getting used to the code, and with a much smaller scope than the core objective. This is awesome. To any mentors reading this: consider making a welcome package!

Step one of it is to use ubsan with gdb. This turned out to be somewhat intricate, so I have decided to supplement the wiki’s documentation with a step-by-step guide for Ubuntu 14.04.

1) Install clang-3.5 (sudo aptitude install clang-3.5), as Ubuntu 14.04 comes with gcc 4.8, which does not support -fsanitize=undefined.

2) Under libav, mkdir build_ubsan && cd build_ubsan && ../configure --toolchain=clang-usan --extra-cflags=-fno-sanitize-recover (alternatively, –cc=clang –extra-cflags=-fsanitize=undefined –extra-ldflags=-fsanitize=undefined can be used instead of –toolchain=clang-usan).

3) make -j8 && make fate

4) Watch where the tests die (they only die if –extra-cflags=-fno-sanitize-recover is used). For me, they died on TEST idct8x8. Running make V=1 fate and asking my mentors pointed me towards libavcodec/dct-test -i, which is dying on jrevdct.c:310:47: with “runtime error: left shift of negative value -14”. If you really want to err on the side of caution, make a second build dir, and ./configure --cc=clang && make -j8 && make fate in it, making sure it does not fail… this confirms that the problem is related to configuring with –toolchain=clang-usan (and, it turns out, with -fsanitize=undefined).

5) It’s time to use the information my mentor pointed out on the wiki about ubsan at https://wiki.libav.org/Security/Tools  – specifically, the information about useful gdb breakpoints. I put a modified version of the b_u definitions into ~/.gdbinit. The wiki has been updated now, but was originally missing a few functions, including one that turns out to be relevant: __ubsan_handle_shift_out_of_bounds

6 Run gdb ./libavcodec/dct-test, then at the gdb prompt, set args -i to set the arguments dct-test was being run with, and then b_u to load the ubsan breakpoints defined above. Then start the program: type run at the gdb prompt.

7) It turns out that a problem can be found, and the program stops running. Get a backtrace with bt.


680 in __ubsan_handle_shift_out_of_bounds ()
#1  0x000000000048ac96 in __ubsan_handle_shift_out_of_bounds_abort ()
#2  0x000000000042c074 in row_fdct_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:219
#3  ff_jpeg_fdct_islow_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:273
#4  0x0000000000425c46 in dct_error (dct=<optimized out>, test=<optimized out>, is_idct=<optimized out>, speed=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:246
#5  main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

It would be nice to see a bit more detail, so I wanted to compile the project so that less would be optimized out, and eventually settled on -O1 because compiling with ubsan and without optimizations failed (which I reported as bug 683). This led to a slightly better backtrace:


#0  0x0000000000491a70 in __ubsan_handle_shift_out_of_bounds ()
#1  0x0000000000492086 in __ubsan_handle_shift_out_of_bounds_abort ()
#2  0x0000000000434dfb in ff_j_rev_dct (data=<optimized out>) at /home/me/opw/libav/libavcodec/jrevdct.c:275
#3  0x00000000004258eb in dct_error (dct=0x4962b0 <idct_tab+64>, test=1, is_idct=1, speed=0) at /home/me/opw/libav/libavcodec/dct-test.c:246
#4  0x00000000004251cc in main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

It is possible to work around the problem by modifying the source code rather than the compiler flags: FFmpeg did so within hours of the bug report – the commit is at http://git.videolan.org/?p=ffmpeg.git;a=commit;h=bebce653e5601ceafa004db0eb6b2c7d4d16f0c0 ! Both FFmpeg and Libav have also merged my patch to work around the problem (FFmpeg patch, Libav patch). The workaround of using -O1 was suggested by one of my mentors, lu_zero; –disable-optimizations does not actually disable all optimizations (in practice, it leaves in ones necessary for compilation), and it does not touch the -O1 that –toolchain=clang-usan now sets.

Wanting a better backtrace leads to the next post: a detailed guide to narrowing down a bug in a the C compiler, Clang. Yes, I know, the problem is never a bug in the C compiler – but this time, it was.


Leave a comment

Running Libav’s tests under emulated aarch64

What’s the fun of only running code on platforms you physically have? Portability is important, and Libav actively targets several platforms. It can be useful to be able to try out the code, even if the hardware is totally unavailable.

Here is how to run Libav’s tests under aarch64, on x86_64 hardware and Ubuntu 14.04. This guide is provided in the hopes that it saves someone else 20 hours or more: there is a lot of once-excellent information which has become misleading, because a lot of progress has been made in aarch64 support. I have tried three approachs – building with Linaro’s cross-compiler, building under QEMU user emulation, and building under QEMU system emulation, and cross-compiling. Building with a cross-compiler is the fastest option. Building under user emulation is about ten times slower. Building under system emulation is about a hundred times slower. There is actually a fourth option, using ARM Foundation Model, but I have not tried it. Running under QEMU user emulation is the only approach I managed to make entirely work.

For all three approaches, you will want a rootfs; I used Ubuntu Core. You can download Ubuntu Core for aarch64 (a minimal rootfs; see https://wiki.ubuntu.com/Core to learn more),  and untar it (as root) into a new directory. Then, set an environment variable that the rest of this guide/set of notes uses frequently, changing the path to match your system:

export a64root=/path/to/your/aarch64/rootdir

Approach 1 – build under QEMU’s user emulation.

Step 1) Set up QEMU. The days when using SUSE branches were necessary are over, but it still needs to be statically linked, and not all QEMU packages are. Ubuntu has a static QEMU:

sudo aptitude install qemu-user-static

This package also sets up binfmt for you. You can delete broken or stale binfmt information by running:
echo -1 > /proc/sys/fs/binfmt_misc/archnamehere – this can be useful, especially if you have previously installed QEMU by hand.

Step 2) Copy your QEMU binary into the chroot, as root, with:

cp `which qemu-aarch64-static` $a64root/usr/bin/

Step 3) As root, set up the aarch64 image so it can do DNS resolution, so you can freely use apt-get:
echo 'nameserver 8.8.8.8' > $a64root/etc/resolv.conf

Step 4) Chroot into your new system. Run chroot $a64root /bin/bash as root.

At this point, you should be able to run an aarch64 version of ls, and confirm with file /bin/ls that it is an aarch64 binary.

Now you have a working, emulated, minimal aarch64 system.

On x86, you would run aptitude build-dep libav, but there is no such package for aarch64 yet, so outside of the chroot, on the normal system, I installed apt-rdepends and ran:
apt-rdepends --build-depends --follow=DEPENDS libav

With version information stripped out, the following packages are considered dependencies:
debhelper frei0r-plugins-dev libasound2-dev libbz2-dev libcdio-cdda-dev libcdio-dev libcdio-paranoia-dev libdc1394-22-dev libfreetype6-dev  libgnutls-dev libgsm1-dev libjack-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenjpeg-dev libopus-dev libpulse-dev libraw1394-dev librtmp-dev libschroedinger-dev libsdl1.2-dev libspeex-dev libtheora-dev libtiff-dev libtiff5-dev libva-dev libvdpau-dev libvo-aacenc-dev libvo-amrwbenc-dev libvorbis-dev libvpx-dev libx11-dev libx264-dev libxext-dev libxfixes-dev libxvidcore-dev libxvmc-dev texi2html yasm zlib1g-dev doxygen

Many of the libraries do not have current aarch64 Ubuntu packages, and neither does frei0r-plugins-dev, but running aptitude install on the above list installs a lot of useful things – including build-essential. The full list is in the command below; the missing packages are non-essential.

Step 5) Set it up: apt-get install aptitude

aptitude install git debhelper frei0r-plugins-dev libasound2-dev libbz2-dev libcdio-cdda-dev libcdio-dev libcdio-paranoia-dev libdc1394-22-dev libfreetype6-dev  libgnutls-dev libgsm1-dev libjack-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenjpeg-dev libopus-dev libpulse-dev libraw1394-dev librtmp-dev libschroedinger-dev libsdl1.2-dev libspeex-dev libtheora-dev libtiff-dev libtiff5-dev libva-dev libvdpau-dev libvo-aacenc-dev libvo-amrwbenc-dev libvorbis-dev libvpx-dev libx11-dev libx264-dev libxext-dev libxfixes-dev libxvidcore-dev libxvmc-dev texi2html yasm zlib1g-dev doxygen

Now it is time to actually build libav.

Step 6) Create a user within your chroot: useradd -m auser, and switch to running as that user: sudo -u auser bash, and type cd to go to the home directory.

Step 7) Run git clone git://git.libav.org/libav.git, then ./configure --disable-pthreads && make -j8 (change the 8 to approximately the number of CPU cores you have).
On my hardware, this takes 10-11 minutes, and ‘make fate’ takes about 16. Disabling pthreads is essential, as qemu-user does not handle threads well, and running the tests hangs randomly without it.


Approach 2: cross-compile (warning: I do not have the tests working with this approach).

1) Start by getting an aarch64 compiler. A good place to get one is http://releases.linaro.org/latest/components/toolchain/binaries/; I am using http://releases.linaro.org/latest/components/toolchain/binaries/gcc-linaro-aarch64-linux-gnu-4.8-2014.04_linux.tar.xz . Untar it, and add it to your path:

export PATH=$PATH:/path/to/your/linaro/tools/bin

2) Make the cross-compiler work. Run aptitude install lsb lib32stdc++6. Without this, invoking the compiler will say “No such file or directory”. See http://lists.linaro.org/pipermail/linaro-toolchain/2012-January/002016.html.

3) Under the libav directory (run git clone git://git.libav.org/libav.git if you do not have one), type mkdir a64crossbuild; cd a64crossbuild. Make sure the libav directory is somewhere under $a64root (it should simplify running the tests, later).

4)./configure --arch=aarch64 --cpu=generic --cross-prefix=aarch64-linux-gnu- --cc=aarch64-linux-gnu-gcc --target-os=linux --sysroot=$a64root --target-exec="qemu-aarch64-static -L $a64root" --disable-pthreads

This is a minimal variant of Jannau’s configuration – a developer who has recently done a lot of libav aarch64 work.

5) Run make -j8. On my hardware, it takes just under a minute.

6) Run make fate. Unfortunately, both versions of QEMU I tried hung on wait4 at this point (in fft-test, fate-fft-4), and used an extra couple of hundred megabytes of RAM per second until I stopped QEMU, even if I asked it to wait for a remote GDB. For anyone else trying this, https://lists.libav.org/pipermail/libav-devel/2014-May/059584.html has several useful tips for getting the tests to run after cross-compilation.


Approach 3: Use QEMU’s system emulation. In theory, this should allow you to use pthreads; in practice, the tests hung for me. The following May 9th post describes what to do: http://www.bennee.com/~alex/blog/2014/05/09/running-linux-in-qemus-aarch64-system-emulation-mode/. In short: git clone git://git.qemu.org/qemu.git qemu.git && cd qemu.git && ./configure --target-list=aarch64-softmmu && make, then

./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img  --append "console=ttyAMA0" -fsdev local,id=r,path=$a64root,security_model=none -device virtio-9p-device,fsdev=r,mount_tag=r

Then, under the buildroot system, log in as root (no password), and type mkdir /mnt/core && mount -t 9p -o trans=virtio r /mnt/core. At this point, you can run chroot /mnt/core /bin/bash, and follow the approach 1 instructions from useradd onwards, except that ./configure without –disable-pthreads should theoretically work. On my system, ./configure takes a bit over 5 minutes with this approach. Running make is quite slow; time make took 113 minutes. Do not use -j – you are limited to a single core, so -j would slow compilation down slightly. However, make fate consistently hung on acodec-pcm-alaw, and I have not yet figured out why.


 

Things not to do:

  • Use a rootfs from a year ago; I am yet to try one that is not broken, and some come with fun bonuses like infinite file system loops. These cost me well over a dozen hours.
  • Compile SUSE’s QEMU; qemu-system is bleeding-edge enough that you need to compile it from upstream, but SUSE’s patches have long been merged into the normal QEMU upstream. Unless you want qemu-system, you do not need to compile QEMU at all under Ubuntu 14.04.
  • Leave the environment variables in this tutorial unset in a new shell and wonder why things do not work.

 


Leave a comment

Getting into Libav’s OPW program: initial contributions

Applying to OPW requires an initial contribution. The Libav IRC channel suggested porting the asettb filter from FFmpeg, so I did (version 5 of the patch was merged upstream, in two parts: a rename patch and a content patch; the FFmpeg author was credited as author for the latter, while I did a signed-off-by). I also contributed a 3000+ line documentation patch, standardizing the libavfilter documentation and removing numerous English errors, and triaged a few bugs, git bisecting the one that was reproducible.