Commits · 67fad0d221458d43edb9037c30171d0adb1d609a · Samuel Mira / ffmpeg

May 08, 2013
- x86: vf_yadif: Remove stray dsputil_mmx #include · 6e9f8d6a
  Diego Biurrun authored 11 years ago
  
  6e9f8d6a
May 04, 2013
- avfilter: Add av_cold attributes to init/uninit functions · 093804a9
  Diego Biurrun authored 11 years ago
  
  093804a9
Apr 22, 2013
- x86: Move some conditional code around to avoid unused variable warnings · c1ad70c3
  Diego Biurrun authored 11 years ago
  
  c1ad70c3
Mar 28, 2013

lavfi/gradfun: remove rounding to match C and SSE code. · 1ae44c87
Clément Bœsch authored 12 years ago
```
There is no noticable benefit for such precision.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
1ae44c87

lavfi/gradfun: fix dithering in MMX code. · 38a2f88d

Clément Bœsch authored 12 years ago


Current dithering only uses the first 4 instead of the whole 8 random values.

Signed-off-by: Anton Khirnov <anton@khirnov.net>

38a2f88d

lavfi/gradfun: fix rounding in MMX code. · 2d66fc54

Clément Bœsch authored 12 years ago


Current code divides before increasing precision.

Also reduce upper bound for strength from 255 to 64.  This will prevent
an overflow in the SSSE3 and MMX filter_line code: delta is expressed as
an u16 being shifted by 2 to the left. If it overflows, having a
strength not above 64 will make sure that m is set to 0 (making the
m*m*delta >> 14 expression void).

A value above 64 should not make any sense unless gradfun is used as
a blur filter.

Signed-off-by: Anton Khirnov <anton@khirnov.net>

2d66fc54

Mar 16, 2013

yadif: remove an 'm' from the LOAD macro definition · c9a51c29
James Darnley authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
c9a51c29

yadif: remove repeated check on width · 1d3b14ca

James Darnley authored 12 years ago


The filter already checks that width (and height) are greater than 3.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

1d3b14ca

yadif: cosmetic indentation from previous commits · 7976d92d
James Darnley authored 12 years ago
```
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
```
7976d92d

yadif: x86 assembly for 9 to 14-bit samples · 0a5814c9

James Darnley authored 12 years ago


These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

0a5814c9

yadif: x86 assembly for 16-bit samples · 17e7b495

James Darnley authored 12 years ago


This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version.  The options have
been tested on an Athlon64 and a Core2Quad.

Athlon64:
1810385 decicycles in C,    32726 runs, 42 skips
1080744 decicycles in mmx,  32744 runs, 24 skips, 1.7x faster
 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster

Core2Quad:
 924025 decicycles in C,     32750 runs, 18 skips
 623995 decicycles in mmx,   32767 runs,  1 skips, 1.5x faster
 406223 decicycles in sse2,  32764 runs,  4 skips, 2.3x faster
 387842 decicycles in ssse3, 32767 runs,  1 skips, 2.4x faster
 307726 decicycles in sse4,  32763 runs,  5 skips, 3.0x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

17e7b495

Mar 13, 2013

yadif: restore speed of the C filtering code · 0735b508

James Darnley authored 12 years ago


Always use the special filter for the first and last 3 columns (only).

Changes made in 64ed3976 slowed the filter to just under 3/4 of what it
was.  This commit restores the speed while maintaining identical output.

For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

0735b508

hqdn3d: Fix out of array read in LOWPASS · 5b3c1aec
Loren Merritt authored 12 years ago
```
CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
```
5b3c1aec

Feb 15, 2013
- vf_yadif: fix out-of line reads · 64ed3976
  Anton Khirnov authored 12 years ago
  
  Some changes in the border pixels, visually indistinguishable.
  64ed3976
Feb 06, 2013

vf_yadif: silence a warning. · 99162f8d

Anton Khirnov authored 12 years ago

clang says:
libavfilter/vf_yadif.c:192:28: warning: incompatible pointer types assigning to
'void (*)(uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int)'
from 'void (uint16_t *, uint16_t *, uint16_t *, uint16_t *, int, int, int, int, int)'

99162f8d

Feb 04, 2013
- avfilter: x86: consistent filenames for filter optimizations · e66240f2
  Diego Biurrun authored 12 years ago
  
  e66240f2
Feb 02, 2013
- avfilter/x86/vf_hqdn3d_init: fix author attribution & project name · d593f2b2
  Michael Niedermayer authored 12 years ago
  
  Reference: 7a1944b9 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
  d593f2b2
Feb 01, 2013
- vf_hqdn3d: x86: Add proper arch optimization initialization · 76d90125
  Diego Biurrun authored 12 years ago
  
  76d90125
Jan 14, 2013

yadif: x86: fix build for compilers without aligned stack · 67360ccd

Daniel Kang authored 12 years ago


Manually load registers to avoid using 8 registers on x86_32 with
compilers that do not align the stack (e.g. MSVC).

Signed-off-by: Diego Biurrun <diego@biurrun.de>

67360ccd

Jan 09, 2013
- yadif: Port inline assembly to yasm · 899157b3
  Daniel Kang authored 12 years ago
  
  Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
  899157b3
Dec 19, 2012
- lavfi/gradfun: remove rounding to match C and SSE code. · 63e1fc25
  Clément Bœsch authored 12 years ago
  
  There is no noticable benefit for such precision.
  63e1fc25
- lavfi/gradfun: fix dithering in MMX code. · 60ba9a9a
  Clément Bœsch authored 12 years ago
  
  Current dithering only use the first 4w instead of the whole 8 random values.
  60ba9a9a
- lavfi/gradfun: fix rounding in MMX code. · 49de902a
  Clément Bœsch authored 12 years ago
  
  Current code divide before increasing precision.
  49de902a
Dec 06, 2012
- Fix compilation with yasm 0.6.2. · 24b20087
  Carl Eugen Hoyos authored 12 years ago
  
  24b20087
Dec 05, 2012
- x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling · b30a3633
  Justin Ruggles authored 12 years ago
  
  b30a3633
- x86: af_volume: add SSE2-optimized s16 volume scaling · f96f1e06
  Justin Ruggles authored 12 years ago
  
  f96f1e06
Oct 31, 2012
- x86: mmx2 ---> mmxext in function names · d8eda370
  Diego Biurrun authored 12 years ago
  
  d8eda370
Oct 30, 2012
- x86: yasm: Use complete source path for macro helper %includes · 04581c8c
  Diego Biurrun authored 12 years ago
  
  This is more consistent with the way we handle C #includes and it simplifies the build system.
  04581c8c
- x86: include x86inc.asm in x86util.asm · 6860b408
  Diego Biurrun authored 12 years ago
  
  This is necessary to allow refactoring some x86util macros with cpuflags.
  6860b408
Oct 12, 2012
- avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX) · f6c38c5f
  Diego Biurrun authored 12 years ago
  
  f6c38c5f
Sep 22, 2012

hqdn3d: Fix out of array read in LOWPASS · 1b1b902e

Loren Merritt authored 12 years ago


Fixes ticket1752

Commit message by commiter
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

1b1b902e

Aug 30, 2012
- x86: Split inline and external assembly #ifdefs · 17337f54
  Diego Biurrun authored 12 years ago
  
  17337f54
- avfilter: x86: Use more precise compile template names · cdaec0b2
  Diego Biurrun authored 12 years ago
  
  cdaec0b2
Aug 26, 2012

vf_hqdn3d: x86 asm · 7a1944b9

Loren Merritt authored 12 years ago

13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.

7a1944b9

Aug 16, 2012
- yadif: remove libavutil/internal.h include · a3ececf3
  Michael Niedermayer authored 12 years ago
  
  Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
  a3ececf3
Aug 15, 2012
- Don't include common.h from avutil.h · 1d9c2dc8
  Martin Storsjö authored 12 years ago
  
  Signed-off-by: Martin Storsjö <martin@martin.st>
  1d9c2dc8
Aug 13, 2012

x86: yadif: fix asm with suncc · 480178a2

Mans Rullgard authored 12 years ago


Under some circumstances, suncc will use a single register for the
address of all memory operands, inserting lea instructions loading
the correct address prior to each memory operand being used in the
code. In the yadif code, the branch in the asm block bypasses such
an lea instruction, causing an incorrect address to be used in the
following load.

This patch replaces the tmpX arrays with a single array and uses a
register operand to hold its address. Although this prevents using
offsets from the stack pointer to access these locations, the code
still builds as 32-bit PIC even with old compilers.

Signed-off-by: Mans Rullgard <mans@mansr.com>

480178a2

Aug 08, 2012

x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h · c318626c

Mans Rullgard authored 12 years ago


This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>

c318626c

Aug 03, 2012

x86: build: replace mmx2 by mmxext · 239fdf1b

Diego Biurrun authored 12 years ago

Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.

239fdf1b

Jul 31, 2012

x86: yadif: Mark mmxext optimizations as such · d1505db0

Diego Biurrun authored 12 years ago

The yadif mmx optimizations contain the pmaxsw and pmaxub mmxext
instructions, causing sigills on CPUs that do not support mmxext.

d1505db0