Libav and Free Software development

Leave a comment

Microbenchmarking: a null result

My first patch for undefined behavior eliminates left shifts of negative numbers, replacing a << b (where a can be negative) with a * (1 << b). This change fixes bug686, at least for fate-idct8x8 and libavcodec/dct-test -i (compiled with ubsan and fno-sanitize-recover). Due to Libav policy, the next step is to benchmark the change. I was also asked to write a simple benchmarking HowTo for the Libav wiki.

First, I installed perf: sudo aptitude install linux-tools-generic
I made two build directories, and built the code with defined behavior in one, and the code with undefined behavior in the other (with ../configure && make -j8 && make fate). Then, in each directory, I ran:

perf stat --repeat 150 ./libavcodec/dct-test -i > /dev/null

The results were somewhat more stable than with –repeat 30, but it still looks much more like noise than a meaningful result. I ran the command with –repeat 30 for both before the recorded 150 run, so both would start on equal footing. With defined behavior, the results were “0.121670022 seconds time elapsed ( +-  0.11% )”; with undefined behavior, “0.123038640 seconds time elapsed ( +-  0.15% )”. The best of a further three runs had the opposite result, shown below:

% cat

perf stat –repeat 150 ./libavcodec/dct-test -i > /dev/null

Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

120.427535 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
21 context-switches # 0.178 K/sec ( +- 1.88% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
226 page-faults # 0.002 M/sec ( +- 0.01% )
455’393’772 cycles # 3.781 GHz ( +- 0.05% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1’306’169’698 instructions # 2.87 insns per cycle ( +- 0.00% )
89’674’090 branches # 744.631 M/sec ( +- 0.00% )
1’144’351 branch-misses # 1.28% of all branches ( +- 0.18% )

0.120741498 seconds time elapse

% cat

Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

120.838976 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
21 context-switches # 0.172 K/sec ( +- 1.98% )
0 cpu-migrations # 0.000 K/sec
226 page-faults # 0.002 M/sec ( +- 0.01% )
457’077’626 cycles # 3.783 GHz ( +- 0.08% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1’306’321’521 instructions # 2.86 insns per cycle ( +- 0.00% )
89’673’780 branches # 742.093 M/sec ( +- 0.00% )
1’148’393 branch-misses # 1.28% of all branches ( +- 0.11% )

0.121162660 seconds time elapsed ( +- 0.11% )

I also compared the disassembled code from jrevdct.o, before and after the changes to have defined behavior (using gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 on x86_64).

In the build directory for the code with defined behavior:
objdump -d libavcodec/jrevdct.o > def.dis
sed -e 's/^.*://' def.dis > noline.def.dis

In the build directory for the code with undefined behavior:
objdump -d libavcodec/jrevdct.o > undef.dis
sed -e 's/^.*://' undef.dis > noline.undef.dis

Leaving aside difference in jump locations (despite the fact that they can impact performance), there are two differences:

diff -u build_benchmark_undef/noline.undef.dis build_benchmark_def/noline.def.dis

–       0f bf 50 f0             movswl -0x10(%rax),%edx
+       0f b7 58 f0             movzwl -0x10(%rax),%ebxi

It’s switched to using a zero-extension rather than sign-extension in one place.

–       74 1c                   je     40 <ff_j_rev_dct+0x40>
–       c1 e2 02                shl    $0x2,%edx
–       0f bf d2                movswl %dx,%edx
–       89 d1                   mov    %edx,%ecx
–       0f b7 d2                movzwl %dx,%edx
–       c1 e1 10                shl    $0x10,%ecx
–       09 d1                   or     %edx,%ecx
–       89 48 f0                mov    %ecx,-0x10(%rax)
–       89 48 f4                mov    %ecx,-0xc(%rax)
–       89 48 f8                mov    %ecx,-0x8(%rax)
–       89 48 fc                mov    %ecx,-0x4(%rax)
+       74 19                   je     3d <ff_j_rev_dct+0x3d>
+       c1 e3 02                shl    $0x2,%ebx
+       89 da                   mov    %ebx,%edx
+       0f b7 db                movzwl %bx,%ebx
+       c1 e2 10                shl    $0x10,%edx
+       09 da                   or     %ebx,%edx
+       89 50 f0                mov    %edx,-0x10(%rax)
+       89 50 f4                mov    %edx,-0xc(%rax)
+       89 50 f8                mov    %edx,-0x8(%rax)
+       89 50 fc                mov    %edx,-0x4(%rax)

Leaving aside differences in register use:

–       0f bf d2                movswl %dx,%edx
There is one extra movswl instruction in the version with undefined behavior, at least with the particular version of the particular compiler for the particular architecture checked.

This is an example of a null result while benchmarking; neither version performs better, although any given benchmark has one or the other come out ahead, generally by less than the variance within the run. If this were a suggested performance change, it would not make sense to apply it. However, the point of this change was correctness; a performance increase is not expected, and the lack of a performance penalty is a bonus.

Leave a comment

Tracking down a bug in Clang: -fsanitize=undefined is not bug-free

In search of a better backtrace, I tried compiling Libav with -fsanitize=undefined and –disable-optimizations, and filed Libav bug 683 when it failed. The failure did not seem reasonable given the source code, and on further examination it looked like the key problem was actually in the C compiler, Clang, rather than the code, so I looked into it and filed a Clang bug on Here is the confirmed Clang bug report.

No one wants to deal with 4000 lines of preprocessed source code if there is a better alternative, so I wrote a creduce script to find what some fairly minimal source code to trigger the bug would look like. It turns out that the problem can be reproduced in a rather tiny way: fn1() { __asm__(“” ::”r”(0), “i”(0 * 0)); }

This is a one-line file of C source code which uses inline assembly.

An acquaintance of mine asked for a detailed writeup, so here it is, in full detail, from narrowing down the problematic compiler flag combination onwards. If you are curious to see how to narrow down a bug like this by example, read on; otherwise, please skip the rest of this post. Continue reading

1 Comment

A welcome task: looking for undefined behavior with -fsanitize=undefined

One of my fantastic OPW mentors prepared a “Welcome task package”, of self-contained, approachable, useful tasks that can be done while getting used to the code, and with a much smaller scope than the core objective. This is awesome. To any mentors reading this: consider making a welcome package!

Step one of it is to use ubsan with gdb. This turned out to be somewhat intricate, so I have decided to supplement the wiki’s documentation with a step-by-step guide for Ubuntu 14.04.

1) Install clang-3.5 (sudo aptitude install clang-3.5), as Ubuntu 14.04 comes with gcc 4.8, which does not support -fsanitize=undefined.

2) Under libav, mkdir build_ubsan && cd build_ubsan && ../configure --toolchain=clang-usan --extra-cflags=-fno-sanitize-recover (alternatively, –cc=clang –extra-cflags=-fsanitize=undefined –extra-ldflags=-fsanitize=undefined can be used instead of –toolchain=clang-usan).

3) make -j8 && make fate

4) Watch where the tests die (they only die if –extra-cflags=-fno-sanitize-recover is used). For me, they died on TEST idct8x8. Running make V=1 fate and asking my mentors pointed me towards libavcodec/dct-test -i, which is dying on jrevdct.c:310:47: with “runtime error: left shift of negative value -14”. If you really want to err on the side of caution, make a second build dir, and ./configure --cc=clang && make -j8 && make fate in it, making sure it does not fail… this confirms that the problem is related to configuring with –toolchain=clang-usan (and, it turns out, with -fsanitize=undefined).

5) It’s time to use the information my mentor pointed out on the wiki about ubsan at  – specifically, the information about useful gdb breakpoints. I put a modified version of the b_u definitions into ~/.gdbinit. The wiki has been updated now, but was originally missing a few functions, including one that turns out to be relevant: __ubsan_handle_shift_out_of_bounds

6 Run gdb ./libavcodec/dct-test, then at the gdb prompt, set args -i to set the arguments dct-test was being run with, and then b_u to load the ubsan breakpoints defined above. Then start the program: type run at the gdb prompt.

7) It turns out that a problem can be found, and the program stops running. Get a backtrace with bt.

680 in __ubsan_handle_shift_out_of_bounds ()
#1  0x000000000048ac96 in __ubsan_handle_shift_out_of_bounds_abort ()
#2  0x000000000042c074 in row_fdct_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:219
#3  ff_jpeg_fdct_islow_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:273
#4  0x0000000000425c46 in dct_error (dct=<optimized out>, test=<optimized out>, is_idct=<optimized out>, speed=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:246
#5  main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

It would be nice to see a bit more detail, so I wanted to compile the project so that less would be optimized out, and eventually settled on -O1 because compiling with ubsan and without optimizations failed (which I reported as bug 683). This led to a slightly better backtrace:

#0  0x0000000000491a70 in __ubsan_handle_shift_out_of_bounds ()
#1  0x0000000000492086 in __ubsan_handle_shift_out_of_bounds_abort ()
#2  0x0000000000434dfb in ff_j_rev_dct (data=<optimized out>) at /home/me/opw/libav/libavcodec/jrevdct.c:275
#3  0x00000000004258eb in dct_error (dct=0x4962b0 <idct_tab+64>, test=1, is_idct=1, speed=0) at /home/me/opw/libav/libavcodec/dct-test.c:246
#4  0x00000000004251cc in main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

It is possible to work around the problem by modifying the source code rather than the compiler flags: FFmpeg did so within hours of the bug report – the commit is at;a=commit;h=bebce653e5601ceafa004db0eb6b2c7d4d16f0c0 ! Both FFmpeg and Libav have also merged my patch to work around the problem (FFmpeg patch, Libav patch). The workaround of using -O1 was suggested by one of my mentors, lu_zero; –disable-optimizations does not actually disable all optimizations (in practice, it leaves in ones necessary for compilation), and it does not touch the -O1 that –toolchain=clang-usan now sets.

Wanting a better backtrace leads to the next post: a detailed guide to narrowing down a bug in a the C compiler, Clang. Yes, I know, the problem is never a bug in the C compiler – but this time, it was.