...
 
Commits (35)
  • Manuel Stoeckl's avatar
    Add SSE, AVX, and NEON optimized diff routines · 92788387
    Manuel Stoeckl authored
    The conditional compilation of the different routines is handled using
    meson's unstable simd module.
    
    The vectorized diff routines themselves are relatively straightforward
    translations of the plain C diff routine. Additional alignment constraints
    are imposed to ensure that loads and stores to the mapped buffer and its
    mirror are aligned; in some cases this can help performance.
    
    The output diff format has not been changed beyond a small adjustment
    to the handling of trailing data; while it would have been possible to
    use e.g. distinct diff data and control buffers, the performance gain
    for such a change is negligble.
    92788387
  • Manuel Stoeckl's avatar
    Make an ifdef condition more portable · 7a2b093b
    Manuel Stoeckl authored
    7a2b093b
  • Manuel Stoeckl's avatar
    Unroll sse diff to 32-byte chunks · ae498679
    Manuel Stoeckl authored
    The latency of SSE operations is rather high (especially on older systems),
    so this should speed up e.g. scanning unchanged buffers by a few percent.
    ae498679
  • Manuel Stoeckl's avatar
    Use per-file static libraries for simd code · 2ddf9a7c
    Manuel Stoeckl authored
    The unstable simd module from meson does not always detect neon
    correctly, and cannot set multiple flags to e.g. require multiple
    instruction sets at the same time.
    2ddf9a7c
  • Manuel Stoeckl's avatar
    Check if memfd_create is available · cbb295f9
    Manuel Stoeckl authored
    glibc support lagged kernel support by a few years,
    so some relatively recent glibc versions don't support it.
    cbb295f9
  • Manuel Stoeckl's avatar
    Control compression levels from CLI · ad58b215
    Manuel Stoeckl authored
    Both LZ4 and Zstd offer a wide range of compression speeds and qualities;
    this commit makes it possible to select the exact compression level used
    by waypipe.
    ad58b215
  • Manuel Stoeckl's avatar
    Unroll avx diff to 64-byte chunks · 9961f4a0
    Manuel Stoeckl authored
    The larger unroll size amortizes some of loop exit logic. No performance
    impact is observed under ideal benchmark scenarios, since the code is
    bandwidth limited. In theory, since the code now loads and writes
    entire cache lines at a time, performance with write-combined memory
    may be improved.  At the very least, this change reduces the total
    instruction count and the number of branches for the diff test by several
    percent, which should be helpful in hyperthreaded cases.
    9961f4a0
  • Manuel Stoeckl's avatar
    Add AVX-512F diff implementation · 70bd4297
    Manuel Stoeckl authored
    The AVX-512F diff runs roughly as fast as the AVX2 diff. (The system
    used to test this did not allow for precise measurements.)
    70bd4297
  • Manuel Stoeckl's avatar
    Forbid tiled DMABUF format modifiers by default · 5e3d5522
    Manuel Stoeckl authored
    With i915, CPU access to tiled formats is often much slower than for untiled formats,
    possibly due to the work needed for the driver/hardware to present the tiled buffer
    as though it had linear layout. This slowdown outweighs any performance gains due to
    tiling for typical OpenGL applications.
    5e3d5522
  • Jan Beich's avatar
    test: unbreak on FreeBSD after 92788387 · 998dbdf4
    Jan Beich authored
    ../test/diff_roundtrip.c:99:16: error: implicit declaration of function 'aligned_alloc' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                    char *diff = aligned_alloc(alignment, bufsize);
                                 ^
    ../test/diff_roundtrip.c:99:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
                    char *diff = aligned_alloc(alignment, bufsize);
                          ^      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../test/diff_roundtrip.c:100:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
                    char *source = aligned_alloc(alignment, bufsize);
                          ^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../test/diff_roundtrip.c:101:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
                    char *mirror = aligned_alloc(alignment, bufsize);
                          ^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../test/diff_roundtrip.c:102:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
                    char *target1 = aligned_alloc(alignment, bufsize);
                          ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../test/diff_roundtrip.c:103:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int' [-Werror,-Wint-conversion]
                    char *target2 = aligned_alloc(alignment, bufsize);
                          ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    998dbdf4
  • Jan Beich's avatar
    02983d7e
  • Jan Beich's avatar
    arm: detect NEON at runtime on FreeBSD · 23446300
    Jan Beich authored
    23446300
  • Manuel Stoeckl's avatar
    Revert to simpler LZ4 and Zstd APIs · 09d53eea
    Manuel Stoeckl authored
    As adding frame headers to compressed data is no longer needed, it
    is safe to revert from the LZ4F API to the simpler LZ4 and LZ4HC APIs
    for fast and slow LZ4 compression modes.
    
    Similarly, as fine grained control of the Zstd compression parameters
    is neither needed nor used, the use of the very new ZSTD_compress2
    function is replaced with the much older and equivalent ZSTD_compressCCtx
    function.
    09d53eea
  • Manuel Stoeckl's avatar
    Fix building NEON routines with Clang on ARMv6 · c341ce9d
    Manuel Stoeckl authored
    Clang only compiles NEON code for armv(>=7). To ensure that it is still
    possible to compile waypipe to run on armv6, and run on armv7 with neon,
    the compile architecture is increased only for the code that uses neon.
    
    Because architectures are not forward compatible, the code to check at
    runtime which instruction sets can be used has been moved to kernel.c,
    where the increased architecture version can not affect it.
    c341ce9d
  • Manuel Stoeckl's avatar
    Fix crash when resizing buffers · 33918846
    Manuel Stoeckl authored
    33918846
  • Manuel Stoeckl's avatar
    diff_roundtrip: test all available diff routines · 65518f46
    Manuel Stoeckl authored
    Also reduce the total number of test iterations, since the
    number of tests has (on some systems) been increased by a factor
    of four.
    65518f46
  • Manuel Stoeckl's avatar
    Add compression benchmarking subprogram · ab26372c
    Manuel Stoeckl authored
    Add a `waypipe bench` mode which can be used to estimate, on a computer,
    which compression level produces the lowest latency transfers for either
    highly compressible (text heavy) or weakly compressible (photographic, noisy)
    images.
    ab26372c
  • Manuel Stoeckl's avatar
    In benchmarking subprogram, use multithreaded compression · 3bd0cee9
    Manuel Stoeckl authored
    Also use the various functions and structures from shadow.c, so that
    the benchmarking routine uses most of the same code as the main data
    transfer routine.
    3bd0cee9
  • Manuel Stoeckl's avatar
    Fix import for wl_drm DMABUFs · 0ad9414f
    Manuel Stoeckl authored
    The DMABUFs provided by wl_drm::create_prime_buffer may have an
    associated modifier, which is stored as driver-specific data associated
    with the file descriptor, and not provided by the protocol. This change
    uses the gbm_bo_get_modifier function to determine the DMABUF layout
    modifier after import.
    0ad9414f
  • Manuel Stoeckl's avatar
    Use coroutines for headless test, misc fixes · 350db8ae
    Manuel Stoeckl authored
    The change to use coroutines to track each subtest makes it easier
    to also test control pipes and reconnection support.
    
    The test with the weston-simple-egl client has been commented
    out, as that client currently does not properly detect Wayland
    connection shutdown.
    350db8ae
  • Manuel Stoeckl's avatar
    Move all main source file into src/ · ae43bbd4
    Manuel Stoeckl authored
    In the process, split up the main meson.build file into one file per
    directory, as is customary.
    ae43bbd4
  • Manuel Stoeckl's avatar
    Use more standard integer types · f0236744
    Manuel Stoeckl authored
    Most notably, avoid using the `long` integer type for calculations,
    because on some systems it is the same size as `int`.
    
    Also fix integer format modifiers to use the correct prefix.
    f0236744
  • Manuel Stoeckl's avatar
    Split util.h into more specific headers · 0305b98e
    Manuel Stoeckl authored
    0305b98e
  • Manuel Stoeckl's avatar
    Align damaged intervals before merging them · adc5a860
    Manuel Stoeckl authored
    To support this efficiently, the alignment required for a diff method now
    is specified in terms of the number of trailing bits of a coordinate that
    must be zero.
    adc5a860
  • Manuel Stoeckl's avatar
    Rename damage.h to interval.h · 031379da
    Manuel Stoeckl authored
    031379da
  • Manuel Stoeckl's avatar
    Unlink shm segments immediately after creation · eded0dd7
    Manuel Stoeckl authored
    This prevents the shared memory file objects from leaking, as long
    as waypipe doesn't crash between shm_open and shm_unlink.
    eded0dd7
  • Manuel Stoeckl's avatar
    Increase timeout for fd_mirror test · a3071ff1
    Manuel Stoeckl authored
    a3071ff1
  • Manuel Stoeckl's avatar
    Fix build with -Wconversion · 3b1fe170
    Manuel Stoeckl authored
    In the process, fix the `waypipe bench` transfer time estimates.
    3b1fe170
  • Manuel Stoeckl's avatar
    Fix DMABUF creation on some old systems · 134084b9
    Manuel Stoeckl authored
    gbm_bo_create_with_modifiers is newer than gbm_bo_create, and is
    not supported on all computers. Similarly, gbm_bo_get_modifier
    is not always available.
    134084b9
  • Manuel Stoeckl's avatar
    Fix exception in startup failure test · 7b384f41
    Manuel Stoeckl authored
    os.dup2 returns None for Python < 3.7.
    7b384f41
  • Manuel Stoeckl's avatar
    Make buffer diffs use chunks of 4 bytes · 10bbade4
    Manuel Stoeckl authored
    Reducing the chunksize for the diff routines very slightly reduces the
    amount of data copied; it also should not change performance for any
    of the SIMD optimized routines, since the instructions executed and
    branches taken should not change.
    10bbade4
  • Manuel Stoeckl's avatar
    4764d1df
  • Manuel Stoeckl's avatar
    e49e5a72
  • Manuel Stoeckl's avatar
    Document "waypipe bench" mode · ac16471a
    Manuel Stoeckl authored
    ac16471a
  • Manuel Stoeckl's avatar
    Bump version · e66f4244
    Manuel Stoeckl authored
    e66f4244
......@@ -47,7 +47,7 @@ debian32_container_prep:
when: always
paths:
- b-*/meson-logs
- b-*/test
- b-*/run
- p-*
debian-build:
......
......@@ -9,3 +9,16 @@ format all source code files in the project.
[0] https://github.com/python/black
[1] https://clang.llvm.org/docs/ClangFormat.html
# Types
* Typedefs should be used only for function signatures, and never applied to
structs.
* `short`, `long`, and `long long` should not be used, in favor of `int16_t`
and `int64_t`.
* All wire-format structures should use fixed size types. It's safe to assume
that buffers will never be larger than about 1 GB, so buffer sizes and
indices do not require 64 bit types when used in protocol message headers.
* `printf` should be called with the correct format codes. For example, `%zd`
for `ssize_t`, and the `PRIu32` macro for `uint32_t`.
* Avoid unnecessary casts.
......@@ -49,8 +49,8 @@ Requirements:
* meson (build, >= 0.47. with dependencies `ninja`, `pkg-config`, `python3`)
* wayland (build, >= 1.10 for the `wl_surface::damage_buffer` request)
* wayland-protocols (build, >= 1.12, for the xdg-shell protocol, and others)
* liblz4 (optional)
* libzstd (optional, >= 1.4.0)
* liblz4 (optional, >=1.7.0)
* libzstd (optional, >= 0.4.6)
* libgbm (optional, to support programs using OpenGL via DMABUFs)
* libdrm (optional, same as for libgbm)
* ffmpeg (optional, >=3.1, needs avcodec/avutil/swscale for lossy video encoding)
......
#!/bin/sh
clang-format -style=file --assume-filename=C -i \
util.h \
waypipe.c server.c handlers.c client.c util.c parsing.c dmabuf.c shadow.c mainloop.c interval.c video.c \
test/diff_roundtrip.c test/damage_merge.c test/fd_mirror.c test/wire_parse.c test/fuzz_hook.c
black -q test/headless.py test/startup_failure.py \
protocols/symgen.py
clang-format -style=file --assume-filename=C -i src/*.h src/*.c test/*.c
black -q test/*.py protocols/*.py
......@@ -8,14 +8,15 @@ project(
'warning_level=3',
'werror=true',
],
version: '0.4.0',
version: '0.5.0',
)
cc = meson.get_compiler('c')
config_data = configuration_data()
# mention version
version = '"@0@"'.format(meson.project_version())
add_project_arguments('-DWAYPIPE_VERSION=@0@'.format(version), language: 'c')
config_data.set('WAYPIPE_VERSION', version)
# Make build reproducible if possible
python3 = import('python').find_installation()
......@@ -37,7 +38,7 @@ endif
libgbm = dependency('gbm', required: get_option('with_dmabuf'))
libdrm = dependency('libdrm', required: get_option('with_dmabuf'))
if libgbm.found() and libdrm.found()
add_project_arguments('-DHAS_DMABUF=1', language: 'c')
config_data.set('HAS_DMABUF', 1, description: 'Support DMABUF replication')
endif
pthreads = dependency('threads')
rt = cc.find_library('rt')
......@@ -45,59 +46,35 @@ rt = cc.find_library('rt')
is_linux = host_machine.system() == 'linux'
is_darwin = host_machine.system() == 'darwin'
if (is_linux or is_darwin) and cc.has_header('sys/sdt.h')
add_project_arguments('-DHAS_USDT=1', language: 'c')
config_data.set('HAS_USDT', 1, description: 'Enable static trace probes')
endif
liblz4 = dependency('liblz4', required: get_option('with_lz4'))
liblz4 = dependency('liblz4', version: '>=1.7.0', required: get_option('with_lz4'))
if liblz4.found()
add_project_arguments('-DHAS_LZ4=1', language: 'c')
config_data.set('HAS_LZ4', 1, description: 'Enable LZ4 compression')
endif
libzstd = dependency('libzstd', version: '>=1.4.0', required: get_option('with_zstd'))
libzstd = dependency('libzstd', version: '>=0.4.6', required: get_option('with_zstd'))
if libzstd.found()
add_project_arguments('-DHAS_ZSTD=1', language: 'c')
config_data.set('HAS_ZSTD', 1, description: 'Enable Zstd compression')
endif
libavcodec = dependency('libavcodec', required: get_option('with_video'))
libavutil = dependency('libavutil', required: get_option('with_video'))
libswscale = dependency('libswscale', required: get_option('with_video'))
if libavcodec.found() and libavutil.found() and libswscale.found()
add_project_arguments('-DHAS_VIDEO=1', language: 'c')
config_data.set('HAS_VIDEO', 1, description: 'Enable video (de)compression')
endif
libva = dependency('libva', required: get_option('with_vaapi'))
if libva.found()
add_project_arguments('-DHAS_VAAPI=1', language: 'c')
config_data.set('HAS_VAAPI', 1, description: 'Enable hardware video (de)compression with VAAPI')
endif
subdir('protocols')
waypipe_source_files = ['client.c', 'dmabuf.c', 'handlers.c', 'mainloop.c', 'parsing.c', 'server.c', 'shadow.c', 'interval.c', 'util.c', 'video.c']
waypipe_dependencies = [
libgbm, # General GPU buffer creation, aligned with dmabuf proto
liblz4, # Fast compression option
libzstd, # Slow compression option
libavcodec,libavutil,libswscale, # Video encoding
pthreads, # To run expensive computations in parallel
protos, # Wayland protocol data
rt, # For shared memory
libva, # For NV12->RGB conversions
]
waypipe_includes = []
waypipe_includes = [include_directories('protocols'), include_directories('src')]
if libdrm.found()
waypipe_includes += include_directories(libdrm.get_pkgconfig_variable('includedir'))
endif
lib_waypipe_src = static_library(
'waypipe_src',
waypipe_source_files,
include_directories: waypipe_includes,
dependencies: waypipe_dependencies,
)
waypipe_prog = executable(
'waypipe',
['waypipe.c'],
link_with: lib_waypipe_src,
install: true
)
subdir('protocols')
subdir('src')
subdir('test')
scdoc = dependency('scdoc', version: '>=1.9.4', native: true, required: get_option('man-pages'))
if scdoc.found()
......@@ -115,95 +92,3 @@ if scdoc.found()
install_dir: '@0@/man1'.format(mandir)
)
endif
# Testing
test_diff = executable(
'diff_roundtrip',
['test/diff_roundtrip.c'],
include_directories: waypipe_includes,
link_with: lib_waypipe_src,
)
test('Whether diff operations successfully roundtrip', test_diff)
test_damage = executable(
'damage_merge',
['test/damage_merge.c'],
link_with: lib_waypipe_src
)
test('If damage rectangles merge efficiently', test_damage)
test_mirror = executable(
'fd_mirror',
['test/fd_mirror.c'],
link_with: lib_waypipe_src,
dependencies: [libgbm]
)
# disable leak checking, because library code is often responsible
test('How well file descriptors are replicated', test_mirror, env: ['ASAN_OPTIONS=detect_leaks=0'])
gen_path = join_paths(meson.current_source_dir(), 'protocols/symgen.py')
test_fnlist = files('test/test_fnlist.txt')
testproto_src = custom_target(
'test-proto code',
output: '@BASENAME@-data.c',
input: 'test/test-proto.xml',
command: [python3, gen_path, 'data', test_fnlist, '@INPUT@', '@OUTPUT@'],
)
testproto_header = custom_target(
'test-proto client-header',
output: '@BASENAME@-defs.h',
input: 'test/test-proto.xml',
command: [python3, gen_path, 'header', test_fnlist, '@INPUT@', '@OUTPUT@'],
)
test_parse = executable(
'wire_parse',
['test/wire_parse.c', testproto_src, testproto_header],
link_with: lib_waypipe_src,
dependencies: [protos]
)
test('That protocol parsing fails cleanly', test_parse)
weston_dep = dependency('weston', required: false)
testprog_paths = []
if weston_dep.found()
# Sometimes weston's test clients are installed here instead
testprog_paths += weston_dep.get_pkgconfig_variable('libexecdir')
endif
weston_prog = find_program('weston', required: false)
envlist = [
'TEST_WAYPIPE_PATH=@0@'.format(waypipe_prog.full_path()),
]
if weston_prog.found()
envlist += 'TEST_WESTON_PATH=@0@'.format(weston_prog.path())
endif
test_programs = [
['TEST_WESTON_SHM_PATH', 'weston-simple-shm'],
['TEST_WESTON_EGL_PATH', 'weston-simple-egl'],
['TEST_WESTON_DMA_PATH', 'weston-simple-dmabuf-drm'],
['TEST_WESTON_TERM_PATH', 'weston-terminal'],
['TEST_WESTON_PRES_PATH', 'weston-presentation-shm'],
['TEST_WESTON_SUBSURF_PATH', 'weston-subsurfaces'],
]
have_test_progs = false
foreach t : test_programs
test_prog = find_program(t[1], required: false)
foreach p : testprog_paths
if not test_prog.found()
test_prog = find_program(join_paths(p, t[1]), required: false)
endif
endforeach
if test_prog.found()
have_test_progs = true
envlist += '@0@=@1@'.format(t[0], test_prog.path())
endif
endforeach
if weston_prog.found() and have_test_progs
test_headless = join_paths(meson.current_source_dir(), 'test/headless.py')
test('If clients crash when run with weston via waypipe', python3, args: test_headless, env: envlist)
endif
test_startup = join_paths(meson.current_source_dir(), 'test/startup_failure.py')
test('That waypipe exits cleanly given a bad setup', python3, args: test_startup, env: envlist)
fuzz_hook = executable(
'fuzz_hook',
['test/fuzz_hook.c'],
link_with: lib_waypipe_src,
dependencies: [pthreads]
)
# todo: make function list dependency explicit
gen_path = join_paths(meson.current_source_dir(), 'symgen.py')
symgen_path = join_paths(meson.current_source_dir(), 'symgen.py')
fn_list = join_paths(meson.current_source_dir(), 'function_list.txt')
wayland_protos = dependency('wayland-protocols', version: '>=1.12') # xdg-shell
......@@ -34,13 +34,13 @@ foreach xml : protocols
'@0@ code'.format(xml.underscorify()),
output: '@BASENAME@-data.c',
input: xml,
command: [python3, gen_path, 'data', fn_list, '@INPUT@', '@OUTPUT@'],
command: [python3, symgen_path, 'data', fn_list, '@INPUT@', '@OUTPUT@'],
)
protocols_headers += custom_target(
'@0@ client-header'.format(xml.underscorify()),
output: '@BASENAME@-defs.h',
input: xml,
command: [python3, gen_path, 'header', fn_list, '@INPUT@', '@OUTPUT@'],
command: [python3, symgen_path, 'header', fn_list, '@INPUT@', '@OUTPUT@'],
)
endforeach
......
......@@ -124,9 +124,9 @@ def write_func(is_header, ostream, iface_name, func, is_request, export_list):
i
)
)
W("\tuint32_t arg{}_a = (uint32_t)payload[i];".format(i))
W("\tint arg{}_a = (int)payload[i];".format(i))
if n_reg_left > 0:
W("\ti += 1 + ((arg{}_a + 0x3) >> 2);".format(i))
W("\ti += 1 + (unsigned int)((arg{}_a + 0x3) >> 2);".format(i))
tmp_names.append("arg{}_a".format(i))
tmp_names.append("arg{}_b".format(i))
......@@ -196,7 +196,7 @@ def write_func(is_header, ostream, iface_name, func, is_request, export_list):
if arg_type in ("string", "array"):
gaps.append(0)
nta.append("true" if arg_type == "string" else "false")
newvec_idxs.append("-1")
newvec_idxs.append("(unsigned int)-1")
newvec_types.append("NULL")
base_g = str(gaps[0])
......
This diff is collapsed.
......@@ -25,7 +25,7 @@
#define _XOPEN_SOURCE 700
#include "util.h"
#include "main.h"
#include <errno.h>
#include <fcntl.h>
......
......@@ -24,6 +24,7 @@
*/
#define _XOPEN_SOURCE 700
#include "dmabuf.h"
#include "util.h"
#ifndef HAS_DMABUF
......@@ -36,12 +37,13 @@ int init_render_data(struct render_data *data)
}
void cleanup_render_data(struct render_data *data) { (void)data; }
struct gbm_bo *import_dmabuf(struct render_data *rd, int fd, size_t *size,
const struct dmabuf_slice_data *info)
struct dmabuf_slice_data *info, bool read_modifier)
{
(void)rd;
(void)fd;
(void)size;
(void)info;
(void)read_modifier;
return NULL;
}
bool is_dmabuf(int fd)
......@@ -96,6 +98,7 @@ uint32_t dmabuf_get_simple_format_for_plane(uint32_t format, int plane)
#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
......@@ -147,6 +150,9 @@ int init_render_data(struct render_data *data)
data->dev = dev;
/* Set the path to the card used for protocol handlers to see */
data->drm_node_path = card;
/* Assume true initially, fall back to old buffer creation path
* if the newer path errors out */
data->supports_modifiers = true;
return 0;
}
void cleanup_render_data(struct render_data *data)
......@@ -157,10 +163,9 @@ void cleanup_render_data(struct render_data *data)
data->dev = NULL;
data->drm_fd = -1;
}
cleanup_hwcontext(data);
}
static long get_dmabuf_fd_size(int fd)
static ssize_t get_dmabuf_fd_size(int fd)
{
ssize_t endp = lseek(fd, 0, SEEK_END);
if (endp == -1) {
......@@ -177,7 +182,7 @@ static long get_dmabuf_fd_size(int fd)
}
struct gbm_bo *import_dmabuf(struct render_data *rd, int fd, size_t *size,
const struct dmabuf_slice_data *info)
struct dmabuf_slice_data *info, bool read_modifier)
{
ssize_t endp = get_dmabuf_fd_size(fd);
if (endp == -1) {
......@@ -185,12 +190,32 @@ struct gbm_bo *import_dmabuf(struct render_data *rd, int fd, size_t *size,
}
*size = (size_t)endp;
/* Multiplanar formats are all rather badly supported by
* drivers/libgbm/libdrm/compositors/applications/everything. */
struct gbm_import_fd_modifier_data data;
if (info) {
// Select all plane metadata associated to planes linked to this
// fd
struct gbm_bo *bo;
if (!info) {
/* No protocol info, so guess the dimensions */
struct gbm_import_fd_data data;
data.fd = fd;
data.width = 256;
data.height = (uint32_t)(endp + 1023) / 1024;
data.format = GBM_FORMAT_XRGB8888;
data.stride = 1024;
bo = gbm_bo_import(rd->dev, GBM_BO_IMPORT_FD, &data,
GBM_BO_USE_RENDERING | GBM_BO_USE_LINEAR);
} else if (read_modifier) {
struct gbm_import_fd_data data;
data.fd = fd;
data.width = info->width;
data.height = info->height;
data.format = info->format;
data.stride = info->strides[0];
bo = gbm_bo_import(rd->dev, GBM_BO_IMPORT_FD, &data,
GBM_BO_USE_RENDERING | GBM_BO_USE_LINEAR);
} else {
/* Multiplanar formats are all rather badly supported by
* drivers/libgbm/libdrm/compositors/applications/everything. */
struct gbm_import_fd_modifier_data data;
// Select all plane metadata associated to planes linked
// to this fd
data.modifier = info->modifier;
data.num_fds = 0;
uint32_t simple_format = 0;
......@@ -214,24 +239,23 @@ struct gbm_bo *import_dmabuf(struct render_data *rd, int fd, size_t *size,
data.width = info->width;
data.height = info->height;
data.format = simple_format;
} else {
data.num_fds = 1;
data.fds[0] = fd;
data.offsets[0] = 0;
data.strides[0] = 1024;
data.width = 256;
data.height = (uint32_t)(endp + 1023) / 1024;
data.format = GBM_FORMAT_XRGB8888;
data.modifier = 0;
}
struct gbm_bo *bo = gbm_bo_import(rd->dev, GBM_BO_IMPORT_FD_MODIFIER,
&data, GBM_BO_USE_RENDERING);
bo = gbm_bo_import(rd->dev, GBM_BO_IMPORT_FD_MODIFIER, &data,
GBM_BO_USE_RENDERING);
}
if (!bo) {
wp_error("Failed to import dmabuf to gbm bo: %s",
strerror(errno));
return NULL;
}
if (read_modifier) {
info->modifier = gbm_bo_get_modifier(bo);
const uint64_t drm_format_mod_invalid = 0x00ffffffffffffffULL;
if (info->modifier == drm_format_mod_invalid) {
/* gbm_bo_get_modifier can fail */
info->modifier = 0;
}
}
return bo;
}
......@@ -280,6 +304,7 @@ struct gbm_bo *make_dmabuf(struct render_data *rd, size_t size,
const struct dmabuf_slice_data *info)
{
struct gbm_bo *bo;
retry:
if (!info || info->num_planes == 0) {
uint32_t width = 512;
uint32_t height =
......@@ -292,6 +317,40 @@ struct gbm_bo *make_dmabuf(struct render_data *rd, size_t size,
wp_error("Failed to make dmabuf: %s", strerror(errno));
return NULL;
}
} else if (!rd->supports_modifiers) {
uint32_t simple_format = dmabuf_get_simple_format_for_plane(
info->format, 0);
/* If the modifier is nonzero, assume that the backend
* preferred modifier matches it. With this old API, there
* really isn't any way to do this better */
bo = gbm_bo_create(rd->dev, info->width, info->height,
simple_format,
GBM_BO_USE_RENDERING |
(info->modifier ? 0
: GBM_BO_USE_LINEAR));
if (!bo) {
wp_error("Failed to make dmabuf (old path): %s",
strerror(errno));
return NULL;
}
uint64_t mod = gbm_bo_get_modifier(bo);
const uint64_t drm_format_mod_invalid = 0x00ffffffffffffffULL;
if (mod != drm_format_mod_invalid && mod != info->modifier) {
wp_error("DMABUF with autoselected modifier %" PRIx64
" does not match desired %" PRIx64
", expect a crash",
mod, info->modifier);
gbm_bo_destroy(bo);
return NULL;
}
int tfd = gbm_bo_get_fd(bo);
ssize_t csize = get_dmabuf_fd_size(tfd);
close(tfd);
if (csize != (ssize_t)size) {
wp_error("Created DMABUF size (%zd disagrees with original size (%zu), giving up");
gbm_bo_destroy(bo);
return NULL;
}
} else {
uint64_t modifiers[2] = {info->modifier, GBM_BO_USE_RENDERING};
uint32_t simple_format = dmabuf_get_simple_format_for_plane(
......@@ -313,21 +372,26 @@ struct gbm_bo *make_dmabuf(struct render_data *rd, size_t size,
*/
bo = gbm_bo_create_with_modifiers(rd->dev, info->width,
info->height, simple_format, modifiers, 2);
if (!bo && errno == ENOSYS) {
wp_debug("Creating a DMABUF with modifiers explicitly set is not supported; retrying");
rd->supports_modifiers = false;
goto retry;
}
if (!bo) {
wp_error("Failed to make dmabuf (with modifier %lx): %s",
info->modifier, strerror(errno));
return NULL;
}
int tfd = gbm_bo_get_fd(bo);
long csize = get_dmabuf_fd_size(tfd);
ssize_t csize = get_dmabuf_fd_size(tfd);
close(tfd);
if (csize != (long)size) {
wp_error("Created DMABUF size (%ld disagrees with original size (%ld), %s",
if (csize != (ssize_t)size) {
wp_error("Created DMABUF size (%zd disagrees with original size (%zu), %s",
csize, size,
(csize > (long)size)
(csize > (ssize_t)size)
? "keeping anyway"
: "attempting taller");
if (csize < (long)size) {
if (csize < (ssize_t)size) {
// Retry, with height increased to hopefully
// contain enough bytes
uint32_t nheight =
......@@ -346,10 +410,10 @@ struct gbm_bo *make_dmabuf(struct render_data *rd, size_t size,
return NULL;
}
int nfd = gbm_bo_get_fd(bo);
long nsize = get_dmabuf_fd_size(nfd);
ssize_t nsize = get_dmabuf_fd_size(nfd);
close(nfd);
if (nsize < (long)size) {
wp_error("Trying to fudge dmabuf height to reach target size of %ld bytes; failed, got %ld",
if (nsize < (ssize_t)size) {
wp_error("Trying to fudge dmabuf height to reach target size of %zu bytes; failed, got %zd",
size, nsize);
}
}
......
/*
* Copyright © 2019 Manuel Stoeckl
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef WAYPIPE_DMABUF_H
#define WAYPIPE_DMABUF_H
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
typedef void *VADisplay;
typedef unsigned int VAGenericID;
typedef VAGenericID VAConfigID;
struct render_data {
bool disabled;
int drm_fd;
const char *drm_node_path;
struct gbm_device *dev;
bool supports_modifiers;
/* video hardware context */
bool av_disabled;
struct AVBufferRef *av_hwdevice_ref;
struct AVBufferRef *av_drmdevice_ref;
VADisplay av_vadisplay;
VAConfigID av_copy_config;
};
struct dmabuf_slice_data {
/* This information partially duplicates that of a gbm_bo. However, for
* instance with weston, it is possible for the compositor to handle
* multibuffer multiplanar images, even though a driver may only support
* multiplanar images derived from a single underlying dmabuf. */
uint32_t width;
uint32_t height;
uint32_t format;
int32_t num_planes;
uint32_t offsets[4];
uint32_t strides[4];
uint64_t modifier;
// to which planes is the matching dmabuf assigned?
uint8_t using_planes[4];
};
/* Additional information to help serialize a dmabuf */
int init_render_data(struct render_data *);
void cleanup_render_data(struct render_data *);
bool is_dmabuf(int fd);
struct gbm_bo *make_dmabuf(struct render_data *rd, size_t size,
const struct dmabuf_slice_data *info);
int export_dmabuf(struct gbm_bo *bo);
struct gbm_bo *import_dmabuf(struct render_data *rd, int fd, size_t *size,
struct dmabuf_slice_data *info, bool read_modifier);
void destroy_dmabuf(struct gbm_bo *bo);
void *map_dmabuf(struct gbm_bo *bo, bool write, void **map_handle);
int unmap_dmabuf(struct gbm_bo *bo, void *map_handle);
/** The handle values are unique among the set of currently active buffer
* objects. To compare a set of buffer objects, produce handles in a batch, and
* then free the temporary buffer objects in a batch */
int get_unique_dmabuf_handle(
struct render_data *rd, int fd, struct gbm_bo **temporary_bo);
uint32_t dmabuf_get_simple_format_for_plane(uint32_t format, int plane);
#endif // WAYPIPE_DMABUF_H
......@@ -24,7 +24,10 @@
*/
#define _XOPEN_SOURCE 700
#include "util.h"
#include "main.h"
#include "parsing.h"
#include "shadow.h"
#include <errno.h>
#include <stdlib.h>
......@@ -112,12 +115,12 @@ struct waypipe_presentation {
struct wp_object base;
// reference clock - given clock
long clock_delta_nsec;
int64_t clock_delta_nsec;
int clock_id;
};
struct waypipe_presentation_feedback {
struct wp_object base;
long clock_delta_nsec;
int64_t clock_delta_nsec;
};
struct wp_linux_dmabuf_params {
......@@ -499,9 +502,9 @@ static int compute_damage_coordinates(int *xlow, int *xhigh, int *ylow,
* 789 789
* 00000000
*/
bool xyexch = magic & (1 << (4 * transform));
bool xflip = magic & (1 << (4 * transform + 1));
bool yflip = magic & (1 << (4 * transform + 2));
bool xyexch = magic & (1u << (4 * transform));
bool xflip = magic & (1u << (4 * transform + 1));
bool yflip = magic & (1u << (4 * transform + 2));
int ew = xyexch ? buf_height : buf_width;
int eh = xyexch ? buf_width : buf_height;
if (xflip) {
......@@ -663,7 +666,8 @@ void do_wl_surface_req_commit(struct context *ctx)
}
}
merge_damage_records(&sfd->damage, i, damage_array);
merge_damage_records(&sfd->damage, i, damage_array,
ctx->g->threads.diff_alignment_bits);
free(damage_array);
free(surface->damage_list);
surface->damage_list = NULL;
......@@ -734,13 +738,13 @@ void do_wl_keyboard_evt_keymap(
size_t fdsz = 0;
fdcat_t fdtype = get_fd_type(fd, &fdsz);
if (fdtype != FDC_FILE || fdsz != size) {
wp_error("keymap candidate fd %d was not file-like (type=%s), and with size=%ld did not match %d",
wp_error("keymap candidate fd %d was not file-like (type=%s), and with size=%zu did not match %u",
fd, fdcat_to_str(fdtype), fdsz, size);
return;
}
struct shadow_fd *sfd = translate_fd(
&ctx->g->map, &ctx->g->render, fd, fdtype, fdsz, NULL);
struct shadow_fd *sfd = translate_fd(&ctx->g->map, &ctx->g->render, fd,
fdtype, fdsz, NULL, false);
struct wp_keyboard *keyboard = (struct wp_keyboard *)ctx->obj;
keyboard->owned_buffer = shadow_incref_protocol(sfd);
(void)format;
......@@ -764,14 +768,14 @@ void do_wl_shm_req_create_pool(
* which then increases the size
*/
if (fdtype != FDC_FILE || (int32_t)fdsz < size) {
wp_error("File type or size mismatch for fd %d with claimed: %s %s | %ld %d",
wp_error("File type or size mismatch for fd %d with claimed: %s %s | %zu %u",
fd, fdcat_to_str(fdtype),
fdcat_to_str(FDC_FILE), fdsz, size);
return;
}
struct shadow_fd *sfd = translate_fd(
&ctx->g->map, &ctx->g->render, fd, fdtype, fdsz, NULL);
struct shadow_fd *sfd = translate_fd(&ctx->g->map, &ctx->g->render, fd,
fdtype, fdsz, NULL, false);
the_shm_pool->owned_buffer = shadow_incref_protocol(sfd);
}
......@@ -790,8 +794,8 @@ void do_wl_shm_pool_req_resize(struct context *ctx, int32_t size)
}
/* The display side will be updated already via buffer update msg */
if (!ctx->on_display_side) {
extend_shm_shadow(
&ctx->g->map, the_shm_pool->owned_buffer, size);
extend_shm_shadow(&ctx->g->map, &ctx->g->threads,
the_shm_pool->owned_buffer, (size_t)size);
}
}
void do_wl_shm_pool_req_create_buffer(struct context *ctx, struct wp_object *id,
......@@ -865,7 +869,8 @@ void do_zwlr_screencopy_frame_v1_evt_ready(struct context *ctx,
.width = buffer->shm_height * buffer->shm_stride,
.stride = 0,
.rep = 1};
merge_damage_records(&sfd->damage, 1, &interval);
merge_damage_records(&sfd->damage, 1, &interval,
ctx->g->threads.diff_alignment_bits);
(void)tv_sec_lo;
(void)tv_sec_hi;
......@@ -884,10 +889,10 @@ void do_zwlr_screencopy_frame_v1_req_copy(
frame->buffer_id = buf->obj_id;
}
static long timespec_diff(struct timespec val, struct timespec sub)
static int64_t timespec_diff(struct timespec val, struct timespec sub)
{
// Overflows only with 68 year error, insignificant
return (val.tv_sec - sub.tv_sec) * 1000000000L +
return (val.tv_sec - sub.tv_sec) * 1000000000LL +
(val.tv_nsec - sub.tv_nsec);
}
void do_wp_presentation_evt_clock_id(struct context *ctx, uint32_t clk_id)
......@@ -907,8 +912,8 @@ void do_wp_presentation_evt_clock_id(struct context *ctx, uint32_t clk_id)
clock_gettime(pres->clock_id, &t0);
clock_gettime(reference_clock, &t1);
clock_gettime(pres->clock_id, &t2);
long diff1m0 = timespec_diff(t1, t0);
long diff2m1 = timespec_diff(t2, t1);
int64_t diff1m0 = timespec_diff(t1, t0);
int64_t diff2m1 = timespec_diff(t2, t1);
pres->clock_delta_nsec = (diff1m0 - diff2m1) / 2;
}
}
......@@ -939,18 +944,18 @@ void do_wp_presentation_feedback_evt_presented(struct context *ctx,
/* convert local to reference, on display side */
int dir = ctx->on_display_side ? 1 : -1;
uint64_t sec = tv_sec_lo + tv_sec_hi * 0x100000000L;
long nsec = tv_nsec;
uint64_t sec = tv_sec_lo + tv_sec_hi * 0x100000000uLL;
int64_t nsec = tv_nsec;
nsec += dir * feedback->clock_delta_nsec;
sec = (uint64_t)((long)sec + nsec / 1000000000L);
sec = (uint64_t)((int64_t)sec + nsec / 1000000000LL);
nsec = nsec % 1000000000L;
if (nsec < 0) {
nsec += 1000000000L;
sec--;
}
// Size not changed, no other edits required
ctx->message[2] = (uint32_t)(sec / 0x100000000L);
ctx->message[3] = (uint32_t)(sec % 0x100000000L);
ctx->message[2] = (uint32_t)(sec / 0x100000000uLL);
ctx->message[3] = (uint32_t)(sec % 0x100000000uLL);
ctx->message[4] = (uint32_t)nsec;
}
......@@ -1021,7 +1026,7 @@ void do_wl_drm_req_create_prime_buffer(struct context *ctx,
#endif
struct shadow_fd *sfd = translate_fd(&ctx->g->map, &ctx->g->render,
name, FDC_DMABUF, 0, &info);
name, FDC_DMABUF, 0, &info, true);
buf->type = BUF_DMA;
buf->dmabuf_nplanes = 1;
buf->dmabuf_buffers[0] = shadow_incref_protocol(sfd);
......@@ -1040,7 +1045,7 @@ void do_zwp_linux_dmabuf_v1_evt_modifier(struct context *ctx, uint32_t format,
(void)format;
uint64_t modifier = modifier_hi * 0x100000000uL * modifier_lo;
// Prevent all advertisements for dmabufs with modifiers
if (modifier && ctx->g->config->linear_dmabuf) {
if (modifier && ctx->g->config->only_linear_dmabuf) {
ctx->drop_this_msg = true;
}
}
......@@ -1204,8 +1209,8 @@ void do_zwp_linux_buffer_params_v1_req_create(struct context *ctx,
if (!ctx->on_display_side) {
reintroduce_add_msgs(ctx, params);
}
struct dmabuf_slice_data info = {.width = width,
.height = height,
struct dmabuf_slice_data info = {.width = (uint32_t)width,
.height = (uint32_t)height,
.format = format,
.num_planes = params->nplanes,
.strides = {params->add[0].stride,
......@@ -1254,7 +1259,7 @@ void do_zwp_linux_buffer_params_v1_req_create(struct context *ctx,
struct shadow_fd *sfd = translate_fd(&ctx->g->map,
&ctx->g->render, params->add[i].fd, res_type, 0,
&info);
&info, false);
/* increment for each extra time this fd will be sent */
if (sfd->has_owner) {
shadow_incref_transfer(sfd);
......@@ -1328,7 +1333,7 @@ void do_zwlr_export_dmabuf_frame_v1_evt_object(struct context *ctx,
struct dmabuf_slice_data info = {.width = frame->width,
.height = frame->height,
.format = frame->format,
.num_planes = frame->nobjects,
.num_planes = (int32_t)frame->nobjects,
.strides = {frame->objects[0].stride,
frame->objects[1].stride,
frame->objects[2].stride,
......@@ -1352,7 +1357,7 @@ void do_zwlr_export_dmabuf_frame_v1_evt_object(struct context *ctx,
#endif
struct shadow_fd *sfd = translate_fd(&ctx->g->map, &ctx->g->render, fd,
FDC_DMABUF, 0, &info);
FDC_DMABUF, 0, &info, false);
if (sfd->buffer_size < size) {
wp_error("Frame object %u has a dmabuf with less (%u) than the advertised (%u) size",
index, (uint32_t)sfd->buffer_size, size);
......@@ -1393,7 +1398,7 @@ static void translate_data_transfer_fd(struct context *context, int32_t fd)
* around should be, according to the protocol, only written into and
* closed */
translate_fd(&context->g->map, &context->g->render, fd, FDC_PIPE_IW, 0,
NULL);
NULL, false);
}
void do_gtk_primary_selection_offer_req_receive(
struct context *ctx, const char *mime_type, int fd)
......
......@@ -24,7 +24,7 @@
*/
#define _XOPEN_SOURCE 700
#include "util.h"
#include "shadow.h"
#include <stdlib.h>
#include <string.h>
......@@ -122,7 +122,8 @@ static int fix_merge_stack_property(int size, struct merge_stack_elem *stack,
* TODO: explicit time limiting/adaptive margin! */
void merge_mergesort(const int old_count, struct interval *old_list,
const int new_count, const struct ext_interval *const new_list,
int *dst_count, struct interval **dst_list, int merge_margin)
int *dst_count, struct interval **dst_list, int merge_margin,
int alignment_bits)
{
/* Stack-based mergesort: the buffer at position `i+1`
* should be <= 1/2 times the size of the buffer at
......@@ -155,6 +156,7 @@ void merge_mergesort(const int old_count, struct interval *old_list,
if (e.width <= 0 || e.rep <= 0 || e.start < 0) {
continue;
}
/* To limit CPU time, if it is very likely that
* an interval would be merged anyway, then
* replace it with its containing interval. */
......@@ -162,7 +164,8 @@ void merge_mergesort(const int old_count, struct interval *old_list,
bool force_combine = (absorbed > 30000) ||
10 * remaining < src_count;
long end = e.start + e.stride * (long)(e.rep - 1) + e.width;
int64_t end = e.start + e.stride * (int64_t)(e.rep - 1) +
e.width;
if (end >= INT32_MAX) {
/* overflow protection */
e.width = INT32_MAX - 1 - e.start;
......@@ -180,21 +183,37 @@ void merge_mergesort(const int old_count, struct interval *old_list,
&base.size, (void **)&base.data);
struct interval *vec = &base.data[base.count];
for (int k = 0; k < e.rep; k++) {
vec[k].start = e.start + k * e.stride;
vec[k].end = vec[k].start + e.width;
int iw = 0;
int last_end = INT32_MIN;
for (int ir = 0; ir < e.rep; ir++) {
int start = e.start + ir * e.stride;
int end = start + e.width;
start = (start >> alignment_bits) << alignment_bits;
end = ((end + (1 << alignment_bits) - 1) >>
alignment_bits)
<< alignment_bits;
if (start > last_end) {
vec[iw].start = start;
vec[iw].end = end;
last_end = end;
iw++;
} else {
vec[iw - 1].end = end;
last_end = end;
}
}
/* end sentinel */
vec[e.rep] = (struct interval){
vec[iw] = (struct interval){
.start = INT32_MAX, .end = INT32_MAX};
src_count += e.rep;
src_count += iw;
substack[substack_size] = (struct merge_stack_elem){
.offset = base.count, .count = e.rep};
.offset = base.count, .count = iw};
substack_size++;
base.count += e.rep + 1;
base.count += iw + 1;
/* merge down the stack as far as possible */
substack_size = fix_merge_stack_property(substack_size,
......@@ -214,7 +233,7 @@ void merge_mergesort(const int old_count, struct interval *old_list,
/* This value must be larger than 8, or diffs will explode */
#define MERGE_MARGIN 256
void merge_damage_records(struct damage *base, int nintervals,
const struct ext_interval *const new_list)
const struct ext_interval *const new_list, int alignment_bits)
{
for (int i = 0; i < nintervals; i++) {
base->acc_damage_stat += new_list[i].width * new_list[i].rep;
......@@ -227,7 +246,8 @@ void merge_damage_records(struct damage *base, int nintervals,
}
merge_mergesort(base->ndamage_intvs, base->damage, nintervals, new_list,
&base->ndamage_intvs, &base->damage, MERGE_MARGIN);
&base->ndamage_intvs, &base->damage, MERGE_MARGIN,
alignment_bits);
}
int get_damage_area(const struct damage *base)
......@@ -243,7 +263,8 @@ int get_damage_area(const struct damage *base)
for (int i = 0; i < base->ndamage_intvs; i++) {
tca += base->damage[i].end - base->damage[i].start;
}
double cover_fraction = base->acc_damage_stat / (double)tca;
float cover_fraction =
(float)base->acc_damage_stat / (float)tca;
wp_debug("Damage interval: {%d(%d)} -> [%d, %d) [%d], %f",
base->ndamage_intvs, base->acc_count, low, high,
tca, cover_fraction);
......
/*
* Copyright © 2019 Manuel Stoeckl
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef WAYPIPE_INTERVAL_H
#define WAYPIPE_INTERVAL_H
#include <stdint.h>
struct ext_interval {
/* A slight modification of the standard 'damage' rectangle
* formulation, written to be agnostic of whatever buffers
* underlie the system.
*
* [start,start+width),[start+stride,start+stride+width),
* ... [start+(rep-1)*stride,start+(rep-1)*stride+width) */
int32_t start;
/* Subinterval width */
int32_t width;
/* Number of distinct subinterval start positions. For a single
* interval, this is one. */
int32_t rep;
/* Spacing between start positions, should be > width, unless
* the is only one subinterval, in which case the value shouldn't
* matter and is conventionally set to 0. */
int32_t stride;
};
struct interval {
/* start+end is better than start+width, since the limits are used
* repeatedly by merge operations, while width is only needed for
* e.g. streaming area estimates which are very fast anyway */
int32_t start;
int32_t end;
};
#define DAMAGE_EVERYTHING ((struct interval *)-1)
struct damage {
/* Interval-based damage tracking. If damage is NULL, there is
* no recorded damage. If damage is DAMAGE_EVERYTHING, the entire
* region should be updated. If ndamage_rects > 0, then
* damage points to an array of struct damage_interval objects. */
struct interval *damage;
int ndamage_intvs;
int64_t acc_damage_stat;
int acc_count;
};
/** Given an array of extended intervals, update the base damage structure
* so that it contains a reasonably small disjoint set of extended intervals
* which contains the old base set and the new set. Before merging, all
* interval boundaries will be rounded to the next multiple of
* `1 << alignment_bits`. */
void merge_damage_records(struct damage *base, int nintervals,
const struct ext_interval *const new_list, int alignment_bits);
/** Return the total area covered by the damage region */
int get_damage_area(const struct damage *base);
/** Set damage to empty */
void reset_damage(struct damage *base);
/** Expand damage to cover everything */
void damage_everything(struct damage *base);
/* internal merge driver, made visible for testing */
void merge_mergesort(const int old_count, struct interval *old_list,
const int new_count, const struct ext_interval *const new_list,
int *dst_count, struct interval **dst_list, int merge_margin,
int alignment_bits);
#endif // WAYPIPE_INTERVAL_H
/*
* Copyright © 2019 Manuel Stoeckl
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "interval.h"
#include "util.h"
#include <inttypes.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#if defined(__linux__) && defined(__arm__)
#include <asm/hwcap.h>
#include <sys/auxv.h>
#elif defined(__FreeBSD__) && defined(__arm__)
#include <sys/auxv.h>
#endif
size_t run_interval_diff_C(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ idiff, size_t i, const size_t i_end)
{
const uint64_t *__restrict__ mod = imod;
uint64_t *__restrict__ base = ibase;
uint64_t *__restrict__ diff = (uint64_t * __restrict__) idiff;
/* we paper over gaps of a given window size, to avoid fine
* grained context switches */
const size_t i_start = i;
size_t dc = 0;
uint64_t changed_val = i < i_end ? mod[i] : 0;
uint64_t base_val = i < i_end ? base[i] : 0;
i++;
// Alternating scanners, ending with a mispredict each.
bool clear_exit = false;
while (i < i_end) {
while (changed_val == base_val && i < i_end) {
changed_val = mod[i];
base_val = base[i];
i++;
}
if (i == i_end) {
/* it's possible that the last value actually;
* see exit block */
clear_exit = true;
break;
}
size_t last_header = dc++;
diff[last_header] = (uint64_t)((i - 1) * 2);
diff[dc++] = changed_val;
base[i - 1] = changed_val;
// changed_val != base_val, difference occurs at early
// index
size_t nskip = 0;
// we could only sentinel this assuming a tiny window
// size
while (i < i_end && nskip <= (size_t)diff_window_size / 2) {
base_val = base[i];
changed_val = mod[i];
base[i] = changed_val;
i++;
diff[dc++] = changed_val;
nskip++;
nskip *= (base_val == changed_val);
}
dc -= nskip;
diff[last_header] |= (uint64_t)((i - nskip) * 2) << 32;
/* our sentinel, at worst, causes overcopy by one. this
* is fine
*/
}
/* If only the last block changed */
if ((clear_exit || i_start + 1 == i_end) && changed_val != base_val) {
diff[dc++] = ((uint64_t)(i_end * 2) << 32) |
(uint64_t)((i_end - 1) * 2);
diff[dc++] = changed_val;
base[i_end - 1] = changed_val;
}
return dc * 2;
}
#ifdef HAVE_AVX512F
static bool avx512f_available(void)
{
return __builtin_cpu_supports("avx512f");
}
size_t run_interval_diff_avx512f(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ idiff, size_t i, const size_t i_end);
#endif
#ifdef HAVE_AVX2
static bool avx2_available(void) { return __builtin_cpu_supports("avx2"); }
size_t run_interval_diff_avx2(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ idiff, size_t i, const size_t i_end);
#endif
#ifdef HAVE_NEON
static bool neon_available(void)
{
/* The actual methods are platform-dependent */
#if defined(__linux__) && defined(__arm__)
return (getauxval(AT_HWCAP) & HWCAP_NEON) != 0;
#elif defined(__FreeBSD__) && defined(__arm__)
unsigned long hwcap = 0;
elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap));
return (hwcap & HWCAP_NEON) != 0;
#endif
return true;
}
size_t run_interval_diff_neon(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ idiff, size_t i, const size_t i_end);
#endif
#ifdef HAVE_SSE41
static bool sse41_available(void) { return __builtin_cpu_supports("sse4.1"); }
size_t run_interval_diff_sse41(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ idiff, size_t i, const size_t i_end);
#endif
interval_diff_fn_t get_diff_function(enum diff_type type, int *alignment_bits)
{
#ifdef HAVE_AVX512F
if ((type == DIFF_FASTEST || type == DIFF_AVX512F) &&
avx512f_available()) {
*alignment_bits = 6;
return run_interval_diff_avx512f;
}
#endif
#ifdef HAVE_AVX2
if ((type == DIFF_FASTEST || type == DIFF_AVX2) && avx2_available()) {
*alignment_bits = 6;
return run_interval_diff_avx2;
}
#endif
#ifdef HAVE_NEON
if ((type == DIFF_FASTEST || type == DIFF_NEON) && neon_available()) {
*alignment_bits = 4;
return run_interval_diff_neon;
}
#endif
#ifdef HAVE_SSE41
if ((type == DIFF_FASTEST || type == DIFF_SSE41) && sse41_available()) {
*alignment_bits = 5;
return run_interval_diff_sse41;
}
#endif
if ((type == DIFF_FASTEST || type == DIFF_C)) {
*alignment_bits = 3;
return run_interval_diff_C;
}
*alignment_bits = 0;
return NULL;
}
/** Construct the main portion of a diff. The provided arguments should
* be validated beforehand. All intervals, as well as the base/changed data
* pointers, should be aligned to the alignment size associated with the
* interval diff function */
size_t construct_diff_core(interval_diff_fn_t idiff_fn, int alignment_bits,
const struct interval *__restrict__ damaged_intervals,
int n_intervals, void *__restrict__ base,
const void *__restrict__ changed, void *__restrict__ diff)
{
uint32_t *diff_blocks = (uint32_t *)diff;
size_t cursor = 0;
for (int i = 0; i < n_intervals; i++) {
struct interval e = damaged_intervals[i];
size_t bend = (size_t)e.end >> alignment_bits;
size_t bstart = (size_t)e.start >> alignment_bits;
cursor += (*idiff_fn)(24, changed, base, diff_blocks + cursor,
bstart, bend);
}
return cursor * sizeof(uint32_t);
}
size_t construct_diff_trailing(size_t size, int alignment_bits,
char *__restrict__ base, const char *__restrict__ changed,
char *__restrict__ diff)
{
size_t alignment = 1u << alignment_bits;
size_t ntrailing = size % alignment;
size_t offset = size - ntrailing;
bool tail_change = false;
if (ntrailing > 0) {
for (size_t i = 0; i < ntrailing; i++) {
tail_change |= base[offset + i] != changed[offset + i];
}
}
if (tail_change) {
for (size_t i = 0; i < ntrailing; i++) {
diff[i] = changed[offset + i];
base[offset + i] = changed[offset + i];
}
return ntrailing;
}
return 0;
}
void apply_diff(size_t size, char *__restrict__ target1,
char *__restrict__ target2, size_t diffsize, size_t ntrailing,
const char *__restrict__ diff)
{
size_t nblocks = size / sizeof(uint32_t);
size_t ndiffblocks = diffsize / sizeof(uint32_t);
uint32_t *__restrict__ t1_blocks = (uint32_t *)target1;
uint32_t *__restrict__ t2_blocks = (uint32_t *)target2;
uint32_t *__restrict__ diff_blocks = (uint32_t *)diff;
for (size_t i = 0; i < ndiffblocks;) {
size_t nfrom = (size_t)diff_blocks[i];
size_t nto = (size_t)diff_blocks[i + 1];
size_t span = nto - nfrom;
if (nto > nblocks || nfrom >= nto ||
i + (nto - nfrom) >= ndiffblocks) {
wp_error("Invalid copy range [%zu,%zu) > %zu=nblocks or [%zu,%zu) > %zu=ndiffblocks",
nfrom, nto, nblocks, i + 1,
i + 1 + span, ndiffblocks);
return;
}
memcpy(t1_blocks + nfrom, diff_blocks + i + 2,
sizeof(uint32_t) * span);
memcpy(t2_blocks + nfrom, diff_blocks + i + 2,
sizeof(uint32_t) * span);
i += span + 2;
}
if (ntrailing > 0) {
size_t offset = size - ntrailing;
for (size_t i = 0; i < ntrailing; i++) {
target1[offset + i] = diff[diffsize + i];
target2[offset + i] = diff[diffsize + i];
}
}
}
/*
* Copyright © 2019 Manuel Stoeckl
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef WAYPIPE_KERNEL_H
#define WAYPIPE_KERNEL_H
#include <stddef.h>
#include <stdint.h>
struct interval;
typedef size_t (*interval_diff_fn_t)(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ diff, size_t i, const size_t i_end);
enum diff_type {
DIFF_FASTEST,
DIFF_AVX512F,
DIFF_AVX2,
DIFF_SSE41,
DIFF_NEON,
DIFF_C,
};
/** Returns a function pointer to a diff construction kernel, and indicates
* the alignment of the data which is to be passed in */
interval_diff_fn_t get_diff_function(enum diff_type type, int *alignment_bits);
size_t construct_diff_core(interval_diff_fn_t idiff_fn, int alignment_bits,
const struct interval *__restrict__ damaged_intervals,
int n_intervals, void *__restrict__ base,
const void *__restrict__ changed, void *__restrict__ diff);
size_t construct_diff_trailing(size_t size, int alignment_bits,
char *__restrict__ base, const char *__restrict__ changed,
char *__restrict__ diff);
void apply_diff(size_t size, char *__restrict__ target1,
char *__restrict__ target2, size_t diffsize, size_t ntrailing,
const char *__restrict__ diff);
#endif // WAYPIPE_KERNEL_H
/*
* Copyright © 2019 Manuel Stoeckl
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial
* portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>
#ifdef __x86_64__
static inline int tzcnt(uint64_t v) { return (int)_tzcnt_u64(v); }
#else
static inline int tzcnt(uint64_t v) { return v ? __builtin_ctzll(v) : 64; }
#endif
#ifdef __x86_64__
static inline int lzcnt(uint64_t v) { return (int)_lzcnt_u64(v); }
#else
static inline int lzcnt(uint64_t v) { return v ? __builtin_clzll(v) : 64; }
#endif
size_t run_interval_diff_avx2(const int diff_window_size,
const void *__restrict__ imod, void *__restrict__ ibase,
uint32_t *__restrict__ diff, size_t i, const size_t i_end)
{
const __m256i *__restrict__ mod = imod;
__m256i *__restrict__ base = ibase;
size_t dc = 0;
while (1) {
/* Loop: no changes */
uint32_t *ctrl_blocks = &diff[dc];
dc += 2;
int trailing_unchanged = 0;
for (; i < i_end; i++) {
__m256i m0 = _mm256_load_si256(&mod[2 * i]);
__m256i m1 = _mm256_load_si256(&mod[2 * i + 1]);
__m256i b0 = _mm256_load_si256(&base[2 * i]);
__m256i b1 = _mm256_load_si256(&base[2 * i + 1]);
__m256i eq0 = _mm256_cmpeq_epi32(m0, b0);
__m256i eq1 = _mm256_cmpeq_epi32(m1, b1);
/* It's very hard to tell which loop exit method is
* better, since the routine is typically bandwidth
* limited */
#if 1
uint32_t mask0 = (uint32_t)_mm256_movemask_epi8(eq0);
uint32_t mask1 = (uint32_t)_mm256_movemask_epi8(eq1);
uint64_t mask = mask0 + mask1 * 0x100000000uLL;
if (~mask) {
#else
__m256i andv = _mm256_and_si256(eq0, eq1);
if (_mm256_testz_si256(andv, _mm256_set1_epi8(-1))) {
uint32_t mask0 = (uint32_t)_mm256_movemask_epi8(
eq0);
uint32_t mask1 = (uint32_t)_mm256_movemask_epi8(
eq1);
uint64_t mask = mask0 + mask1 * 0x100000000uLL;
#endif
_mm256_store_si256(&base[2 * i], m0);
_mm256_store_si256(&base[2 * i + 1], m1);
/* Write the changed bytes, starting at the
* first modified term,
* and set the n_unchanged counter */
size_t ncom = (size_t)tzcnt(~mask) >> 2;