- May 08, 2013
-
-
Diego Biurrun authored
-
- May 04, 2013
-
-
Diego Biurrun authored
-
- Apr 22, 2013
-
-
Diego Biurrun authored
-
- Mar 28, 2013
-
-
Clément Bœsch authored
There is no noticable benefit for such precision. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Clément Bœsch authored
Current dithering only uses the first 4 instead of the whole 8 random values. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
Clément Bœsch authored
Current code divides before increasing precision. Also reduce upper bound for strength from 255 to 64. This will prevent an overflow in the SSSE3 and MMX filter_line code: delta is expressed as an u16 being shifted by 2 to the left. If it overflows, having a strength not above 64 will make sure that m is set to 0 (making the m*m*delta >> 14 expression void). A value above 64 should not make any sense unless gradfun is used as a blur filter. Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- Mar 16, 2013
-
-
James Darnley authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Darnley authored
The filter already checks that width (and height) are greater than 3. Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Darnley authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Darnley authored
These smaller samples do not need to be unpacked to double words allowing the code to process more pixels every iteration (still 2 in MMX but 6 in SSE2). It also avoids emulating the missing double word instructions on older instruction sets. Like with the previous code for 16-bit samples this has been tested on an Athlon64 and a Core2Quad. Athlon64: 1809275 decicycles in C, 32718 runs, 50 skips 911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster Core2Quad: 921363 decicycles in C, 32756 runs, 12 skips 486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster 293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster 284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
James Darnley authored
This is a fairly dumb copy of the assembly for 8-bit samples but it works and produces identical output to the C version. The options have been tested on an Athlon64 and a Core2Quad. Athlon64: 1810385 decicycles in C, 32726 runs, 42 skips 1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster Core2Quad: 924025 decicycles in C, 32750 runs, 18 skips 623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster 406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster 387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster 307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- Mar 13, 2013
-
-
James Darnley authored
Always use the special filter for the first and last 3 columns (only). Changes made in 64ed3976 slowed the filter to just under 3/4 of what it was. This commit restores the speed while maintaining identical output. For reference, on my Athlon64: 1733222 decicycles in old 2358563 decicycles in new 1727558 decicycles in this Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
Loren Merritt authored
CC:libav-stable@libav.org Signed-off-by:
Anton Khirnov <anton@khirnov.net>
-
- Feb 15, 2013
-
-
Anton Khirnov authored
Some changes in the border pixels, visually indistinguishable.
-
- Feb 06, 2013
-
-
Anton Khirnov authored
clang says: libavfilter/vf_yadif.c:192:28: warning: incompatible pointer types assigning to 'void (*)(uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int)' from 'void (uint16_t *, uint16_t *, uint16_t *, uint16_t *, int, int, int, int, int)'
-
- Feb 04, 2013
-
-
Diego Biurrun authored
-
- Feb 02, 2013
-
-
Michael Niedermayer authored
Reference: 7a1944b9 Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- Feb 01, 2013
-
-
Diego Biurrun authored
-
- Jan 14, 2013
-
-
Daniel Kang authored
Manually load registers to avoid using 8 registers on x86_32 with compilers that do not align the stack (e.g. MSVC). Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- Jan 09, 2013
-
-
Daniel Kang authored
Signed-off-by:
Luca Barbato <lu_zero@gentoo.org>
-
- Dec 19, 2012
-
-
Clément Bœsch authored
There is no noticable benefit for such precision.
-
Clément Bœsch authored
Current dithering only use the first 4w instead of the whole 8 random values.
-
Clément Bœsch authored
Current code divide before increasing precision.
-
- Dec 06, 2012
-
-
Carl Eugen Hoyos authored
-
- Dec 05, 2012
-
-
Justin Ruggles authored
-
Justin Ruggles authored
-
- Oct 31, 2012
-
-
Diego Biurrun authored
-
- Oct 30, 2012
-
-
Diego Biurrun authored
This is more consistent with the way we handle C #includes and it simplifies the build system.
-
Diego Biurrun authored
This is necessary to allow refactoring some x86util macros with cpuflags.
-
- Oct 12, 2012
-
-
Diego Biurrun authored
-
- Sep 22, 2012
-
-
Loren Merritt authored
Fixes ticket1752 Commit message by commiter Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- Aug 30, 2012
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- Aug 26, 2012
-
-
Loren Merritt authored
13% faster on penryn, 16% on sandybridge, 15% on bulldozer Not simd; a compiler should have generated this, but gcc didn't.
-
- Aug 16, 2012
-
-
Michael Niedermayer authored
Signed-off-by:
Michael Niedermayer <michaelni@gmx.at>
-
- Aug 15, 2012
-
-
Martin Storsjö authored
Signed-off-by:
Martin Storsjö <martin@martin.st>
-
- Aug 13, 2012
-
-
Mans Rullgard authored
Under some circumstances, suncc will use a single register for the address of all memory operands, inserting lea instructions loading the correct address prior to each memory operand being used in the code. In the yadif code, the branch in the asm block bypasses such an lea instruction, causing an incorrect address to be used in the following load. This patch replaces the tmpX arrays with a single array and uses a register operand to hold its address. Although this prevents using offsets from the stack pointer to access these locations, the code still builds as 32-bit PIC even with old compilers. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- Aug 08, 2012
-
-
Mans Rullgard authored
This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- Aug 03, 2012
-
-
Diego Biurrun authored
Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
-
- Jul 31, 2012
-
-
Diego Biurrun authored
The yadif mmx optimizations contain the pmaxsw and pmaxub mmxext instructions, causing sigills on CPUs that do not support mmxext.
-