...
 
Commits (121)
A simple program written in C to learn more about the ioctl's used by the
bifrost GPU's kernel driver, so we can start drawing some triangles.
Build mesa with panfrost. Clone panwrap into:
Eventually, this code will become historical.
mesa/src/panfrost/pandecode/
So you have a path like:
/home/alyssa/mesa/src/panfrost/pandecode/panloader/panwrap
(Yes, really.)
Inside panwrap, make a build/ directory, cd in, meson .. . && ninja. You'll get
a libpanwrap.so out in build/panwrap/libpanwrap.so, upload that to the board
you're trying to trace and LD_PRELOAD it in to get a trace (including
disassembly of both Midgard and Bifrost).
If your name is Ryan, uncomment "#define dvalin" in include/mali-ioctl.h and panwrap will be built to trace a Dvalin kernel instead.
Framebuffer memory notes
=========================
The framebuffer is BGRA8888 format. It likely uses 16x16 tiles (TODO:
verify). There is a stride of zeroes every 1856 bytes for unknown
reasons.
---
Zeroes at:
1345548
1347404
1349260
1351116
Deltas of 1856 between zero regions; groups of 1804 valid pixels in
between
2048 = 1856 + 4 * 48?
Coarse job descriptor memory map
================================
E000 (refreshed -- 0x340)
E040 (descriptor)
E060 (VERTEX)
E120 (referenced in VERTEX)
E140 (referenced in TILER)
E170 (referenced in VERTEX + TILER)
E180 (descriptor)
E1A0 (TILER)
E280 (refreshed -- 0x80)
E300 (descriptor set)
E320 (SET_VALUE)
E340 (refreshed -- 0x80)
E380 (descriptor)
E3A0 (FRAGMENT)
E3C0 (soft job chain, refreshed -- 0x28)
Conclusions:
FRAGMENT <= 32 bytes
SET_VALUE <= 32 bytes
VERTEX <= 192 bytes
TILER <= 224 bytes
This is a job-based architecture. All interesting behaviour (shaders,
rendering) is a result of jobs. Each job is sent from the driver across
the shim to the GPU. The job is encoded as a special data structure in
GPU memory.
There are two families of jobs, hardware jobs and software jobs.
Hardware jobs interact with the GPU directly. Software jobs are used to
manipulate the driver. Software jobs set BASE_JD_REQ_SOFT_JOB in the
.core_req field of the atom.
Hardware jobs contain the jc pointer into GPU memory. This points to the
job descriptor. All hardware jobs begin with the job descriptor header
which is found in the shim headers. The header contains a field,
job_type, which must be set according to the job type:
Byte | Job type
----- | ---------
0 | Not started
1 | Null
2 | Set value
3 | Cache flush
4 | Compute
5 | Vertex
6 | (none)
7 | Tiler
8 | Fused
9 | Fragment
This header contains a pointer to the next job, forming sequences of
hardware jobs.
The header also denotes the type of job (vertex, fragment, or tiler).
After the header there is simple type specific information.
Set value jobs follow:
struct tentative_set_value {
uint64_t write_iut; /* Maybe vertices or shader? */
uint64_t unknown1; /* Trace has it set at 3 */
}
Fragment jobs follow:
struct tentative_fragment {
tile_coord_t min_tile_coord;
tile_coord_t max_tile_coord;
uint64_t fragment_fbd;
};
tile_coord_t is an ordered pair for specifying a tile. It is encoded as
a uint32_t where bits 0-11 represent the X coordinate and bits 16-27
represent the Y coordinate.
Tiles are 16x16 pixels. This can be concluded from max_tile_coord in
known render sizes.
Fragment jobs contain an external resource, the framebuffer (in shared
memory / UMM). The framebuffer is in BGRA8888 format.
Vertex and tiler jobs follow the same structure (pointers are 32-bit to
GPU memory):
struct tentative_vertex_tiler {
uint32_t block1[11];
uint32_t addresses1[4];
tentative_shader *shaderMeta;
attribute_buffer *vb[];
attribute_meta_t *attribute_meta[];
uint32_t addresses2[5];
tentative_fbd *fbd;
uint32_t addresses3[1];
uint32_t block2[36];
}
In tiler jobs, block1[8] encodes the drawing mode used:
Byte | Mode
----- | -----
0x01 | GL_POINTS
0x02 | GL_LINES
0x08 | GL_TRIANGLES
0x0A | GL_TRIANGLE_STRIP
0x0C | GL_TRIANGLE_FAN
The shader metadata follows a (very indirect) structure:
struct tentative_shader {
uint64_t shader; /* ORed with 16 bytes of flags */
uint32_t block[unknown];
}
Shader points directly to the compiled instruction stream. For vertex
jobs, this is the vertex shader. For tiler jobs, this is the fragment
shader.
Shaders are 128-bit aligned. The lower 128-bits of the shader metadata
pointer contains flags. Bit 0 is set for all shaders. Bit 2 is set for
vertex shaders. Bit 3 is set for fragment shaders.
The attribute buffers encode each attribute (like vertex data) specified
in the shader.
struct attribute_buffer {
float *elements;
size_t element_size; /* sizeof(*elements) * component_count */
size_t total_size; /* element_size * num_vertices */
}
The attribute buffers themselves have metadata in attribute_meta_t,
which is a uint64_t internally. The lowest byte of attribute_meta_t is
the corresponding attribute number. The rest appears to be flags
(0x2DEA2200). After the final attribute, the metadata will be zero,
acting as a null terminator for the attribute list.
Comparing a simple triangle sample with a texture mapped sample, the
following differences appear between the fragment jobs:
- Different shaders (obviously)
- Texture mapped attributes aren't decoding at all (vec5?)
- Null tripped!
null1: B291E720
null2: B291E700
(null 3 is still NULL)
Notice that these are both SAME_VA addresses 32 bytes apart.
null1 contains texture metadata. In this case, the 0x20 buffer is:
23 00 00 00 00 00 01 00 88 e8 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
null2 contains a(nother) list of addresses, like how attributes are
encoded. In this case, the buffer is:
c0 81 02 02 01 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
It appears null1 is a zero-terminated array of metadata and null2 is the
corresponding zero-terminated array of textures themselves.
- Different uniforms (understandable)
- Shader metadata is different:
No texture map: 00 00 00 00 00 00 08 00 02 06 22 00
^ mask: 01 00 01 00 00 00 0F 00 00 08 20 00
Texture mapped: 01 00 01 00 00 00 07 00 02 0e 02 00
- Addr[8] is a little different, but not necessarily related
- Addr[9] is one byte different (ditto)
import gdb
import gdb.printing
class MaliPtr:
""" Print a GPU pointer """
def __init__(self, val):
self.val = val
def to_string(self):
return hex(self.val)
pp = gdb.printing.RegexpCollectionPrettyPrinter("panloader")
pp.add_printer('gpu_ptr', '^mali_ptr$', MaliPtr)
gdb.printing.register_pretty_printer(None, pp)
/*
* © Copyright 2017-2018 The Panfrost Community
*
* This program is free software and is provided to you under the terms of the
* GNU General Public License version 2 as published by the Free Software
* Foundation, and any use by you of this program is subject to the terms
* of such GNU license.
*
* A copy of the licence is included with the program, and can also be obtained
* from Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
* Boston, MA 02110-1301, USA.
*
*/
#ifndef __MALI_IOCTL_DVALIN_H__
#define __MALI_IOCTL_DVALIN_H__
union mali_ioctl_mem_alloc {
union mali_ioctl_header header;
/* [in] */
struct {
u64 va_pages;
u64 commit_pages;
u64 extent;
u64 flags;
} in;
struct {
u64 flags;
u64 gpu_va;
} out;
} __attribute__((packed));
struct mali_ioctl_get_gpuprops {
u64 buffer;
u32 size;
u32 flags;
};
#define MALI_IOCTL_TYPE_BASE 0x80
#define MALI_IOCTL_TYPE_MAX 0x81
#define MALI_IOCTL_TYPE_COUNT (MALI_IOCTL_TYPE_MAX - MALI_IOCTL_TYPE_BASE + 1)
#define MALI_IOCTL_GET_VERSION (_IOWR(0x80, 0, struct mali_ioctl_get_version))
#define MALI_IOCTL_SET_FLAGS ( _IOW(0x80, 1, struct mali_ioctl_set_flags))
#define MALI_IOCTL_JOB_SUBMIT ( _IOW(0x80, 2, struct mali_ioctl_job_submit))
#define MALI_IOCTL_GET_GPUPROPS ( _IOW(0x80, 3, struct mali_ioctl_get_gpuprops))
#define MALI_IOCTL_POST_TERM ( _IO(0x80, 4))
#define MALI_IOCTL_MEM_ALLOC (_IOWR(0x80, 5, union mali_ioctl_mem_alloc))
#define MALI_IOCTL_MEM_QUERY (_IOWR(0x80, 6, struct mali_ioctl_mem_query))
#define MALI_IOCTL_MEM_FREE ( _IOW(0x80, 7, struct mali_ioctl_mem_free))
#define MALI_IOCTL_HWCNT_SETUP ( _IOW(0x80, 8, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_ENABLE ( _IOW(0x80, 9, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_DUMP ( _IO(0x80, 10))
#define MALI_IOCTL_HWCNT_CLEAR ( _IO(0x80, 11))
#define MALI_IOCTL_DISJOINT_QUERY ( _IOR(0x80, 12, __ioctl_placeholder))
// Get DDK version
// mem jit init
#define MALI_IOCTL_SYNC ( _IOW(0x80, 15, struct mali_ioctl_sync))
#define MALI_IOCTL_FIND_CPU_OFFSET (_IOWR(0x80, 16, __ioctl_placeholder))
#define MALI_IOCTL_GET_CONTEXT_ID ( _IOR(0x80, 17, struct mali_ioctl_get_context_id))
// TLStream acquire
// TLStream Flush
#define MALI_IOCTL_MEM_COMMIT ( _IOW(0x80, 20, struct mali_ioctl_mem_commit))
#define MALI_IOCTL_MEM_ALIAS (_IOWR(0x80, 21, struct mali_ioctl_mem_alias))
#define MALI_IOCTL_MEM_IMPORT (_IOWR(0x80, 22, struct mali_ioctl_mem_import))
#define MALI_IOCTL_MEM_FLAGS_CHANGE ( _IOW(0x80, 23, struct mali_ioctl_mem_flags_change))
#define MALI_IOCTL_STREAM_CREATE ( _IOW(0x80, 24, struct mali_ioctl_stream_create))
#define MALI_IOCTL_FENCE_VALIDATE ( _IOW(0x80, 25, __ioctl_placeholder))
#define MALI_IOCTL_GET_PROFILING_CONTROLS ( _IOW(0x80, 26, __ioctl_placeholder))
#define MALI_IOCTL_DEBUGFS_MEM_PROFILE_ADD ( _IOW(0x80, 27, __ioctl_placeholder))
// Soft event update
// sticky resource map
// sticky resource unmap
// Find gpu start and offset
#define MALI_IOCTL_HWCNT_SET ( _IOW(0x80, 32, __ioctl_placeholder))
// gwt start
// gwt stop
// gwt dump
/// Begin TEST type region
/// End TEST type region
#endif /* __MALI_IOCTL_DVALIN_H__ */
/*
* © Copyright 2017-2018 The Panfrost Community
*
* This program is free software and is provided to you under the terms of the
* GNU General Public License version 2 as published by the Free Software
* Foundation, and any use by you of this program is subject to the terms
* of such GNU license.
*
* A copy of the licence is included with the program, and can also be obtained
* from Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
* Boston, MA 02110-1301, USA.
*
*/
#ifndef __MALI_IOCTL_MIDGARD_H__
#define __MALI_IOCTL_MIDGARD_H__
#define MALI_IOCTL_TYPE_BASE 0x80
#define MALI_IOCTL_TYPE_MAX 0x82
union mali_ioctl_mem_alloc {
struct {
union mali_ioctl_header header;
/* [in] */
u64 va_pages;
u64 commit_pages;
u64 extent;
/* [in/out] */
u64 flags;
/* [out] */
mali_ptr gpu_va;
u16 va_alignment;
u32 :32;
u16 :16;
} inout;
} __attribute__((packed));
#define MALI_IOCTL_TYPE_COUNT (MALI_IOCTL_TYPE_MAX - MALI_IOCTL_TYPE_BASE + 1)
#define MALI_IOCTL_GET_VERSION (_IOWR(0x80, 0, struct mali_ioctl_get_version))
#define MALI_IOCTL_MEM_ALLOC (_IOWR(0x82, 0, union mali_ioctl_mem_alloc))
#define MALI_IOCTL_MEM_IMPORT (_IOWR(0x82, 1, struct mali_ioctl_mem_import))
#define MALI_IOCTL_MEM_COMMIT (_IOWR(0x82, 2, struct mali_ioctl_mem_commit))
#define MALI_IOCTL_MEM_QUERY (_IOWR(0x82, 3, struct mali_ioctl_mem_query))
#define MALI_IOCTL_MEM_FREE (_IOWR(0x82, 4, struct mali_ioctl_mem_free))
#define MALI_IOCTL_MEM_FLAGS_CHANGE (_IOWR(0x82, 5, struct mali_ioctl_mem_flags_change))
#define MALI_IOCTL_MEM_ALIAS (_IOWR(0x82, 6, struct mali_ioctl_mem_alias))
#define MALI_IOCTL_SYNC (_IOWR(0x82, 8, struct mali_ioctl_sync))
#define MALI_IOCTL_POST_TERM (_IOWR(0x82, 9, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_SETUP (_IOWR(0x82, 10, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_DUMP (_IOWR(0x82, 11, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_CLEAR (_IOWR(0x82, 12, __ioctl_placeholder))
#define MALI_IOCTL_GPU_PROPS_REG_DUMP (_IOWR(0x82, 14, struct mali_ioctl_gpu_props_reg_dump))
#define MALI_IOCTL_FIND_CPU_OFFSET (_IOWR(0x82, 15, __ioctl_placeholder))
#define MALI_IOCTL_GET_VERSION_NEW (_IOWR(0x82, 16, struct mali_ioctl_get_version))
#define MALI_IOCTL_SET_FLAGS (_IOWR(0x82, 18, struct mali_ioctl_set_flags))
#define MALI_IOCTL_SET_TEST_DATA (_IOWR(0x82, 19, __ioctl_placeholder))
#define MALI_IOCTL_INJECT_ERROR (_IOWR(0x82, 20, __ioctl_placeholder))
#define MALI_IOCTL_MODEL_CONTROL (_IOWR(0x82, 21, __ioctl_placeholder))
#define MALI_IOCTL_KEEP_GPU_POWERED (_IOWR(0x82, 22, __ioctl_placeholder))
#define MALI_IOCTL_FENCE_VALIDATE (_IOWR(0x82, 23, __ioctl_placeholder))
#define MALI_IOCTL_STREAM_CREATE (_IOWR(0x82, 24, struct mali_ioctl_stream_create))
#define MALI_IOCTL_GET_PROFILING_CONTROLS (_IOWR(0x82, 25, __ioctl_placeholder))
#define MALI_IOCTL_SET_PROFILING_CONTROLS (_IOWR(0x82, 26, __ioctl_placeholder))
#define MALI_IOCTL_DEBUGFS_MEM_PROFILE_ADD (_IOWR(0x82, 27, __ioctl_placeholder))
#define MALI_IOCTL_JOB_SUBMIT (_IOWR(0x82, 28, struct mali_ioctl_job_submit))
#define MALI_IOCTL_DISJOINT_QUERY (_IOWR(0x82, 29, __ioctl_placeholder))
#define MALI_IOCTL_GET_CONTEXT_ID (_IOWR(0x82, 31, struct mali_ioctl_get_context_id))
#define MALI_IOCTL_TLSTREAM_ACQUIRE_V10_4 (_IOWR(0x82, 32, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_TEST (_IOWR(0x82, 33, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_STATS (_IOWR(0x82, 34, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_FLUSH (_IOWR(0x82, 35, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_READER_SETUP (_IOWR(0x82, 36, __ioctl_placeholder))
#define MALI_IOCTL_SET_PRFCNT_VALUES (_IOWR(0x82, 37, __ioctl_placeholder))
#define MALI_IOCTL_SOFT_EVENT_UPDATE (_IOWR(0x82, 38, __ioctl_placeholder))
#define MALI_IOCTL_MEM_JIT_INIT (_IOWR(0x82, 39, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_ACQUIRE (_IOWR(0x82, 40, __ioctl_placeholder))
#endif /* __MALI_IOCTL_MIDGARD_H__ */
......@@ -20,12 +20,9 @@
#ifndef __MALI_IOCTL_H__
#define __MALI_IOCTL_H__
#include <panloader-util.h>
#include <config.h>
//#define dvalin
#define MALI_GPU_NUM_TEXTURE_FEATURES_REGISTERS 3
#define MALI_GPU_MAX_JOB_SLOTS 16
#define MALI_MAX_COHERENT_GROUPS 16
#include <panloader-util.h>
typedef u8 mali_atom_id;
......@@ -484,231 +481,6 @@ struct mali_jd_replay_jc {
u64 jc;
};
/* Capabilities of a job slot as reported by JS_FEATURES registers */
#define JS_FEATURE_NULL_JOB (1u << 1)
#define JS_FEATURE_SET_VALUE_JOB (1u << 2)
#define JS_FEATURE_CACHE_FLUSH_JOB (1u << 3)
#define JS_FEATURE_COMPUTE_JOB (1u << 4)
#define JS_FEATURE_VERTEX_JOB (1u << 5)
#define JS_FEATURE_GEOMETRY_JOB (1u << 6)
#define JS_FEATURE_TILER_JOB (1u << 7)
#define JS_FEATURE_FUSED_JOB (1u << 8)
#define JS_FEATURE_FRAGMENT_JOB (1u << 9)
struct mali_gpu_core_props {
/**
* Product specific value.
*/
u32 product_id;
/**
* Status of the GPU release.
* No defined values, but starts at 0 and increases by one for each
* release status (alpha, beta, EAC, etc.).
* 4 bit values (0-15).
*/
u16 version_status;
/**
* Minor release number of the GPU. "P" part of an "RnPn" release
* number.
* 8 bit values (0-255).
*/
u16 minor_revision;
/**
* Major release number of the GPU. "R" part of an "RnPn" release
* number.
* 4 bit values (0-15).
*/
u16 major_revision;
u16 :16;
/**
* @usecase GPU clock speed is not specified in the Midgard
* Architecture, but is <b>necessary for OpenCL's clGetDeviceInfo()
* function</b>.
*/
u32 gpu_speed_mhz;
/**
* @usecase GPU clock max/min speed is required for computing
* best/worst case in tasks as job scheduling ant irq_throttling. (It
* is not specified in the Midgard Architecture).
*/
u32 gpu_freq_khz_max;
u32 gpu_freq_khz_min;
/**
* Size of the shader program counter, in bits.
*/
u32 log2_program_counter_size;
/**
* TEXTURE_FEATURES_x registers, as exposed by the GPU. This is a
* bitpattern where a set bit indicates that the format is supported.
*
* Before using a texture format, it is recommended that the
* corresponding bit be checked.
*/
u32 texture_features[MALI_GPU_NUM_TEXTURE_FEATURES_REGISTERS];
/**
* Theoretical maximum memory available to the GPU. It is unlikely
* that a client will be able to allocate all of this memory for their
* own purposes, but this at least provides an upper bound on the
* memory available to the GPU.
*
* This is required for OpenCL's clGetDeviceInfo() call when
* CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The
* client will not be expecting to allocate anywhere near this value.
*/
u64 gpu_available_memory_size;
};
struct mali_gpu_l2_cache_props {
u8 log2_line_size;
u8 log2_cache_size;
u8 num_l2_slices; /* Number of L2C slices. 1 or higher */
u64 :40;
};
struct mali_gpu_tiler_props {
u32 bin_size_bytes; /* Max is 4*2^15 */
u32 max_active_levels; /* Max is 2^15 */
};
struct mali_gpu_thread_props {
u32 max_threads; /* Max. number of threads per core */
u32 max_workgroup_size; /* Max. number of threads per workgroup */
u32 max_barrier_size; /* Max. number of threads that can
synchronize on a simple barrier */
u16 max_registers; /* Total size [1..65535] of the register
file available per core. */
u8 max_task_queue; /* Max. tasks [1..255] which may be sent
to a core before it becomes blocked. */
u8 max_thread_group_split; /* Max. allowed value [1..15] of the
Thread Group Split field. */
enum {
MALI_GPU_IMPLEMENTATION_UNKNOWN = 0,
MALI_GPU_IMPLEMENTATION_SILICON = 1,
MALI_GPU_IMPLEMENTATION_FPGA = 2,
MALI_GPU_IMPLEMENTATION_SW = 3,
} impl_tech :8;
u64 :56;
};
/**
* @brief descriptor for a coherent group
*
* \c core_mask exposes all cores in that coherent group, and \c num_cores
* provides a cached population-count for that mask.
*
* @note Whilst all cores are exposed in the mask, not all may be available to
* the application, depending on the Kernel Power policy.
*
* @note if u64s must be 8-byte aligned, then this structure has 32-bits of
* wastage.
*/
struct mali_ioctl_gpu_coherent_group {
u64 core_mask; /**< Core restriction mask required for the
group */
u16 num_cores; /**< Number of cores in the group */
u64 :48;
};
/**
* @brief Coherency group information
*
* Note that the sizes of the members could be reduced. However, the \c group
* member might be 8-byte aligned to ensure the u64 core_mask is 8-byte
* aligned, thus leading to wastage if the other members sizes were reduced.
*
* The groups are sorted by core mask. The core masks are non-repeating and do
* not intersect.
*/
struct mali_gpu_coherent_group_info {
u32 num_groups;
/**
* Number of core groups (coherent or not) in the GPU. Equivalent to
* the number of L2 Caches.
*
* The GPU Counter dumping writes 2048 bytes per core group,
* regardless of whether the core groups are coherent or not. Hence
* this member is needed to calculate how much memory is required for
* dumping.
*
* @note Do not use it to work out how many valid elements are in the
* group[] member. Use num_groups instead.
*/
u32 num_core_groups;
/**
* Coherency features of the memory, accessed by @ref gpu_mem_features
* methods
*/
u32 coherency;
u32 :32;
/**
* Descriptors of coherent groups
*/
struct mali_ioctl_gpu_coherent_group group[MALI_MAX_COHERENT_GROUPS];
};
/**
* A complete description of the GPU's Hardware Configuration Discovery
* registers.
*
* The information is presented inefficiently for access. For frequent access,
* the values should be better expressed in an unpacked form in the
* base_gpu_props structure.
*
* @usecase The raw properties in @ref gpu_raw_gpu_props are necessary to
* allow a user of the Mali Tools (e.g. PAT) to determine "Why is this device
* behaving differently?". In this case, all information about the
* configuration is potentially useful, but it <b>does not need to be processed
* by the driver</b>. Instead, the raw registers can be processed by the Mali
* Tools software on the host PC.
*
*/
struct mali_gpu_raw_props {
u64 shader_present;
u64 tiler_present;
u64 l2_present;
u64 stack_present;
u32 l2_features;
u32 suspend_size; /* API 8.2+ */
u32 mem_features;
u32 mmu_features;
u32 as_present;
u32 js_present;
u32 js_features[MALI_GPU_MAX_JOB_SLOTS];
u32 tiler_features;
u32 texture_features[3];
u32 gpu_id;
u32 thread_max_threads;
u32 thread_max_workgroup_size;
u32 thread_max_barrier_size;
u32 thread_features;
/*
* Note: This is the _selected_ coherency mode rather than the
* available modes as exposed in the coherency_features register.
*/
u32 coherency_mode;
};
typedef u64 mali_ptr;
#define MALI_PTR_FMT "0x%" PRIx64
......@@ -778,7 +550,6 @@ struct mali_jd_atom_v2 {
u8 :8;
mali_jd_core_req core_req; /**< core requirements */
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_jd_atom_v2, 48, 48);
/**
* enum mali_error - Mali error codes shared with userspace
......@@ -803,14 +574,17 @@ enum mali_error {
* Header used by all ioctls
*/
union mali_ioctl_header {
#ifdef dvalin
u32 pad[0];
#else
/* [in] The ID of the UK function being called */
u32 id :32;
/* [out] The return value of the UK function that was called */
enum mali_error rc :32;
u64 :64;
#endif
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(union mali_ioctl_header, 8, 8);
struct mali_ioctl_get_version {
union mali_ioctl_header header;
......@@ -818,24 +592,6 @@ struct mali_ioctl_get_version {
u16 minor; /* [out] */
u32 :32;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_get_version, 16, 16);
struct mali_ioctl_mem_alloc {
union mali_ioctl_header header;
/* [in] */
u64 va_pages;
u64 commit_pages;
u64 extent;
/* [in/out] */
u64 flags;
/* [out] */
mali_ptr gpu_va;
u16 va_alignment;
u32 :32;
u16 :16;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_mem_alloc, 56, 56);
struct mali_mem_import_user_buffer {
u64 ptr;
......@@ -859,7 +615,6 @@ struct mali_ioctl_mem_import {
u64 gpu_va;
u64 va_pages;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_mem_import, 48, 48);
struct mali_ioctl_mem_commit {
union mali_ioctl_header header;
......@@ -870,7 +625,6 @@ struct mali_ioctl_mem_commit {
u32 result_subcode;
u32 :32;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_mem_commit, 32, 32);
enum mali_ioctl_mem_query_type {
MALI_MEM_QUERY_COMMIT_SIZE = 1,
......@@ -887,7 +641,6 @@ struct mali_ioctl_mem_query {
/* [out] */
u64 value;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_mem_query, 32, 32);
struct mali_ioctl_mem_free {
union mali_ioctl_header header;
......@@ -928,29 +681,12 @@ struct mali_ioctl_sync {
} type :8;
u64 :56;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_sync, 40, 40);
struct mali_ioctl_gpu_props_reg_dump {
union mali_ioctl_header header;
struct mali_gpu_core_props core;
struct mali_gpu_l2_cache_props l2;
u64 :64;
struct mali_gpu_tiler_props tiler;
struct mali_gpu_thread_props thread;
struct mali_gpu_raw_props raw;
/** This must be last member of the structure */
struct mali_gpu_coherent_group_info coherency_info;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_gpu_props_reg_dump, 536, 536);
struct mali_ioctl_set_flags {
union mali_ioctl_header header;
u32 create_flags; /* [in] */
u32 :32;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_set_flags, 16, 16);
struct mali_ioctl_stream_create {
union mali_ioctl_header header;
......@@ -960,7 +696,6 @@ struct mali_ioctl_stream_create {
s32 fd;
u32 :32;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_stream_create, 48, 48);
struct mali_ioctl_job_submit {
union mali_ioctl_header header;
......@@ -969,63 +704,27 @@ struct mali_ioctl_job_submit {
u32 nr_atoms;
u32 stride;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_job_submit, 24, 24);
struct mali_ioctl_get_context_id {
union mali_ioctl_header header;
/* [out] */
s64 id;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_ioctl_get_context_id, 16, 16);
#undef PAD_PTR
#undef PAD_CPU_PTR
/* Defined in mali-props.h */
struct mali_ioctl_gpu_props_reg_dump;
/* For ioctl's we haven't written decoding stuff for yet */
typedef struct {
union mali_ioctl_header header;
} __ioctl_placeholder;
#define MALI_IOCTL_TYPE_BASE 0x80
#define MALI_IOCTL_TYPE_MAX 0x82
#define MALI_IOCTL_TYPE_COUNT (MALI_IOCTL_TYPE_MAX - MALI_IOCTL_TYPE_BASE + 1)
#define MALI_IOCTL_GET_VERSION (_IOWR(0x80, 0, struct mali_ioctl_get_version))
#define MALI_IOCTL_MEM_ALLOC (_IOWR(0x82, 0, struct mali_ioctl_mem_alloc))
#define MALI_IOCTL_MEM_IMPORT (_IOWR(0x82, 1, struct mali_ioctl_mem_import))
#define MALI_IOCTL_MEM_COMMIT (_IOWR(0x82, 2, struct mali_ioctl_mem_commit))
#define MALI_IOCTL_MEM_QUERY (_IOWR(0x82, 3, struct mali_ioctl_mem_query))
#define MALI_IOCTL_MEM_FREE (_IOWR(0x82, 4, struct mali_ioctl_mem_free))
#define MALI_IOCTL_MEM_FLAGS_CHANGE (_IOWR(0x82, 5, struct mali_ioctl_mem_flags_change))
#define MALI_IOCTL_MEM_ALIAS (_IOWR(0x82, 6, struct mali_ioctl_mem_alias))
#define MALI_IOCTL_SYNC (_IOWR(0x82, 8, struct mali_ioctl_sync))
#define MALI_IOCTL_POST_TERM (_IOWR(0x82, 9, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_SETUP (_IOWR(0x82, 10, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_DUMP (_IOWR(0x82, 11, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_CLEAR (_IOWR(0x82, 12, __ioctl_placeholder))
#define MALI_IOCTL_GPU_PROPS_REG_DUMP (_IOWR(0x82, 14, struct mali_ioctl_gpu_props_reg_dump))
#define MALI_IOCTL_FIND_CPU_OFFSET (_IOWR(0x82, 15, __ioctl_placeholder))
#define MALI_IOCTL_GET_VERSION_NEW (_IOWR(0x82, 16, struct mali_ioctl_get_version))
#define MALI_IOCTL_SET_FLAGS (_IOWR(0x82, 18, struct mali_ioctl_set_flags))
#define MALI_IOCTL_SET_TEST_DATA (_IOWR(0x82, 19, __ioctl_placeholder))
#define MALI_IOCTL_INJECT_ERROR (_IOWR(0x82, 20, __ioctl_placeholder))
#define MALI_IOCTL_MODEL_CONTROL (_IOWR(0x82, 21, __ioctl_placeholder))
#define MALI_IOCTL_KEEP_GPU_POWERED (_IOWR(0x82, 22, __ioctl_placeholder))
#define MALI_IOCTL_FENCE_VALIDATE (_IOWR(0x82, 23, __ioctl_placeholder))
#define MALI_IOCTL_STREAM_CREATE (_IOWR(0x82, 24, struct mali_ioctl_stream_create))
#define MALI_IOCTL_GET_PROFILING_CONTROLS (_IOWR(0x82, 25, __ioctl_placeholder))
#define MALI_IOCTL_SET_PROFILING_CONTROLS (_IOWR(0x82, 26, __ioctl_placeholder))
#define MALI_IOCTL_DEBUGFS_MEM_PROFILE_ADD (_IOWR(0x82, 27, __ioctl_placeholder))
#define MALI_IOCTL_JOB_SUBMIT (_IOWR(0x82, 28, struct mali_ioctl_job_submit))
#define MALI_IOCTL_DISJOINT_QUERY (_IOWR(0x82, 29, __ioctl_placeholder))
#define MALI_IOCTL_GET_CONTEXT_ID (_IOWR(0x82, 31, struct mali_ioctl_get_context_id))
#define MALI_IOCTL_TLSTREAM_ACQUIRE_V10_4 (_IOWR(0x82, 32, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_TEST (_IOWR(0x82, 33, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_STATS (_IOWR(0x82, 34, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_FLUSH (_IOWR(0x82, 35, __ioctl_placeholder))
#define MALI_IOCTL_HWCNT_READER_SETUP (_IOWR(0x82, 36, __ioctl_placeholder))
#define MALI_IOCTL_SET_PRFCNT_VALUES (_IOWR(0x82, 37, __ioctl_placeholder))
#define MALI_IOCTL_SOFT_EVENT_UPDATE (_IOWR(0x82, 38, __ioctl_placeholder))
#define MALI_IOCTL_MEM_JIT_INIT (_IOWR(0x82, 39, __ioctl_placeholder))
#define MALI_IOCTL_TLSTREAM_ACQUIRE (_IOWR(0x82, 40, __ioctl_placeholder))
#ifdef dvalin
#include <mali-ioctl-dvalin.h>
#else
#include <mali-ioctl-midgard.h>
#endif
#endif /* __MALI_IOCTL_H__ */
/*
* © Copyright 2017-2018 The Panfrost Community
*
* This program is free software and is provided to you under the terms of the
* GNU General Public License version 2 as published by the Free Software
* Foundation, and any use by you of this program is subject to the terms
* of such GNU licence.
*
* A copy of the licence is included with the program, and can also be obtained
* from Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
* Boston, MA 02110-1301, USA.
*
*/
#ifndef __MALI_JOB_H__
#define __MALI_JOB_H__
#include <config.h>
#include <mali-ioctl.h>
#define MALI_SHORT_PTR_BITS (sizeof(uintptr_t)*8)
#define MALI_FBD_HIERARCHY_WEIGHTS 8
#define MALI_PAYLOAD_SIZE 256
enum mali_job_type {
JOB_NOT_STARTED = 0,
JOB_TYPE_NULL = 1,
JOB_TYPE_SET_VALUE = 2,
JOB_TYPE_CACHE_FLUSH = 3,
JOB_TYPE_COMPUTE = 4,
JOB_TYPE_VERTEX = 5,
JOB_TYPE_TILER = 7,
JOB_TYPE_FUSED = 8,
JOB_TYPE_FRAGMENT = 9,
};
enum mali_gl_mode {
MALI_GL_POINTS = 0x1,
MALI_GL_LINES = 0x2,
MALI_GL_LINE_STRIP = 0x4,
MALI_GL_LINE_LOOP = 0x6,
MALI_GL_TRIANGLES = 0x8,
MALI_GL_TRIANGLE_STRIP = 0xA,
MALI_GL_TRIANGLE_FAN = 0xC,
};
#define MALI_GL_CULL_FACE_BACK 0x80
#define MALI_GL_CULL_FACE_FRONT 0x40
#define MALI_GL_FRONT_FACE(v) (v << 5)
#define MALI_GL_CCW (0)
#define MALI_GL_CW (1)
/* TODO: Might this actually be a finer bitfield? */
#define MALI_DEPTH_STENCIL_ENABLE 0x6400
#define DS_ENABLE(field) \
(field == MALI_DEPTH_STENCIL_ENABLE) \
? "MALI_DEPTH_STENCIL_ENABLE" \
: (field == 0) ? "0" \
: "0 /* XXX: Unknown, check hexdump */"
/* Used in stencil and depth tests */
enum mali_func {
MALI_FUNC_NEVER = 0,
MALI_FUNC_LESS = 1,
MALI_FUNC_EQUAL = 2,
MALI_FUNC_LEQUAL = 3,
MALI_FUNC_GREATER = 4,
MALI_FUNC_NOTEQUAL = 5,
MALI_FUNC_GEQUAL = 6,
MALI_FUNC_ALWAYS = 7
};
/* Same OpenGL, but mixed up. Why? Because forget me, that's why! */
enum mali_alt_func {
MALI_ALT_FUNC_NEVER = 0,
MALI_ALT_FUNC_GREATER = 1,
MALI_ALT_FUNC_EQUAL = 2,
MALI_ALT_FUNC_GEQUAL = 3,
MALI_ALT_FUNC_LESS = 4,
MALI_ALT_FUNC_NOTEQUAL = 5,
MALI_ALT_FUNC_LEQUAL = 6,
MALI_ALT_FUNC_ALWAYS = 7
};
/* Flags apply to unknown2_3? */
#define MALI_HAS_MSAA (1 << 0)
#define MALI_CAN_DISCARD (1 << 5)
#define MALI_HAS_BLEND_SHADER (1 << 6)
/* func is mali_func */
#define MALI_DEPTH_FUNC(func) (func << 8)
#define MALI_GET_DEPTH_FUNC(flags) ((flags >> 8) & 0x7)
#define MALI_DEPTH_FUNC_MASK MALI_DEPTH_FUNC(0x7)
#define MALI_DEPTH_TEST (1 << 11)
/* Next flags to unknown2_4 */
#define MALI_STENCIL_TEST (1 << 0)
/* What?! */
#define MALI_SAMPLE_ALPHA_TO_COVERAGE_NO_BLEND_SHADER (1 << 1)
#define MALI_NO_DITHER (1 << 9)
#define MALI_DEPTH_RANGE_A (1 << 12)
#define MALI_DEPTH_RANGE_B (1 << 13)
#define MALI_NO_MSAA (1 << 14)
/* Stencil test state is all encoded in a single u32, just with a lot of
* enums... */
enum mali_stencil_op {
MALI_STENCIL_KEEP = 0,
MALI_STENCIL_REPLACE = 1,
MALI_STENCIL_ZERO = 2,
MALI_STENCIL_INVERT = 3,
MALI_STENCIL_INCR_WRAP = 4,
MALI_STENCIL_DECR_WRAP = 5,
MALI_STENCIL_INCR = 6,
MALI_STENCIL_DECR = 7
};
struct mali_stencil_test {
unsigned ref : 8;
unsigned mask : 8;
enum mali_func func : 3;
enum mali_stencil_op sfail : 3;
enum mali_stencil_op dpfail : 3;
enum mali_stencil_op dppass : 3;
unsigned zero : 4;
} __attribute__((packed));
/* Blending is a mess, since anything fancy triggers a blend shader, and
* -those- are not understood whatsover yet */
#define MALI_MASK_R (1 << 0)
#define MALI_MASK_G (1 << 1)
#define MALI_MASK_B (1 << 2)
#define MALI_MASK_A (1 << 3)
enum mali_nondominant_mode {
MALI_BLEND_NON_MIRROR = 0,
MALI_BLEND_NON_ZERO = 1
};
enum mali_dominant_blend {
MALI_BLEND_DOM_SOURCE = 0,
MALI_BLEND_DOM_DESTINATION = 1
};
enum mali_dominant_factor {
MALI_DOMINANT_UNK0 = 0,
MALI_DOMINANT_ZERO = 1,
MALI_DOMINANT_SRC_COLOR = 2,
MALI_DOMINANT_DST_COLOR = 3,
MALI_DOMINANT_UNK4 = 4,
MALI_DOMINANT_SRC_ALPHA = 5,
MALI_DOMINANT_DST_ALPHA = 6,
MALI_DOMINANT_UNK7 = 7,
};
enum mali_blend_modifier {
MALI_BLEND_MOD_UNK0 = 0,
MALI_BLEND_MOD_NORMAL = 1,
MALI_BLEND_MOD_SOURCE_ONE = 2,
MALI_BLEND_MOD_DEST_ONE = 3,
};
struct mali_blend_mode {
enum mali_blend_modifier clip_modifier : 2;
unsigned unused_0 : 1;
unsigned negate_source : 1;
enum mali_dominant_blend dominant : 1;
enum mali_nondominant_mode nondominant_mode : 1;
unsigned unused_1 : 1;
unsigned negate_dest : 1;
enum mali_dominant_factor dominant_factor : 3;
unsigned complement_dominant : 1;
} __attribute__((packed));
struct mali_blend_equation {
/* Of type mali_blend_mode */
unsigned rgb_mode : 12;
unsigned alpha_mode : 12;
unsigned zero1 : 4;
/* Corresponds to MALI_MASK_* above and glColorMask arguments */
unsigned color_mask : 4;
unsigned padding : 32;
} __attribute__((packed));
/* Alpha coverage is encoded as 4-bits (from a clampf), with inversion
* literally performing a bitwise invert. This function produces slightly wrong
* results and I'm not sure why; some rounding issue I suppose... */
#define MALI_ALPHA_COVERAGE(clampf) ((uint16_t) (int) (clampf * 15.0f))
#define MALI_GET_ALPHA_COVERAGE(nibble) ((float) nibble / 15.0f)
/* Applies to unknown1 */
#define MALI_NO_ALPHA_TO_COVERAGE (1 << 10)
struct mali_tripipe {
mali_ptr shader;
u16 texture_count;
u16 sampler_count;
/* Counted as number of address slots (i.e. half-precision vec4's) */
u16 attribute_count;
u16 varying_count;
/* 0x200 except MALI_NO_ALPHA_TO_COVERAGE. Mysterious 1 other times. Who knows really? */
u16 unknown1;
/* Whole number of uniform registers used, times two; whole number of
* work registers used (no scale).
*/
unsigned work_count : 5;
unsigned uniform_count : 5;
unsigned unknown2 : 6;
} __attribute__((packed));
struct mali_fragment_core {
/* Depth factor is exactly as passed to glDepthOffset. Depth units is
* equal to the value passed to glDeptOhffset + 1.0f (use
* MALI_NEGATIVE) */
float depth_units;
float depth_factor;
u32 unknown2_2;
u16 alpha_coverage;
u16 unknown2_3;
u8 stencil_mask_front;
u8 stencil_mask_back;
u16 unknown2_4;
struct mali_stencil_test stencil_front;
struct mali_stencil_test stencil_back;
u32 unknown2_7;
u32 unknown2_8;
/* Check for MALI_HAS_BLEND_SHADER to decide how to interpret */
union {
mali_ptr blend_shader;
/* Exact format of this is not known yet */
struct mali_blend_equation blend_equation;
};
} __attribute__((packed));
/* See the presentations about Mali architecture for why these are together like this */
struct mali_shader_meta {
struct mali_tripipe tripipe;
struct mali_fragment_core fragment_core;
} __attribute__((packed));
/* This only concerns hardware jobs */
/* Possible values for job_descriptor_size */
#define MALI_JOB_32 0
#define MALI_JOB_64 1
struct mali_job_descriptor_header {
u32 exception_status;
u32 first_incomplete_task;
u64 fault_pointer;
u8 job_descriptor_size : 1;
enum mali_job_type job_type : 7;
u8 job_barrier : 1;
u8 unknown_flags : 7;
u16 job_index;
u16 job_dependency_index_1;
u16 job_dependency_index_2;
union {
u64 next_job_64;
u32 next_job_32;
};
} __attribute__((packed));
struct mali_payload_set_value {
u64 out;
u64 unknown;
} __attribute__((packed));
/* Special attributes have a fixed index */
#define MALI_SPECIAL_ATTRIBUTE_BASE 16
#define MALI_VERTEX_ID (MALI_SPECIAL_ATTRIBUTE_BASE + 0)
#define MALI_INSTANCE_ID (MALI_SPECIAL_ATTRIBUTE_BASE + 1)
struct mali_attr {
mali_ptr elements;
u32 stride;
u32 size;
} __attribute__((packed));
/* TODO: I'm pretty sure this isn't really right in the presence of more
* complicated metadata, like matrices or varyings */
enum mali_attr_type {
MALI_ATYPE_PACKED = 1,
MALI_ATYPE_UNK1 = 1,
MALI_ATYPE_BYTE = 3,
MALI_ATYPE_SHORT = 4,
MALI_ATYPE_INT = 5,
MALI_ATYPE_GPVARYING = 6,
MALI_ATYPE_FLOAT = 7,
};
struct mali_attr_meta {
u8 index;
u64 unknown1 :14;
/* Part of the type specifier, anyway:
* 1: packed (with other encoding weirdness)
* 3: byte
* 4: short
* 5: int
* 6: used for float gl_Position varying?
* 7: half, float, packed
*/
unsigned type : 3;
/* After MALI_POSITIVE, 4 for vec4, 1 for scalar, etc */
unsigned nr_components : 2;
/* Somewhat correlated to the opposite of not_normalised, or the opposite of is_half_float? */
unsigned unknown2 : 1;
/* If the type is a signed integer, is_int_signed is set. If the type
* is a half-float, it's also set. Otherwise, it is clear. */
unsigned is_int_signed : 1;
/* if `normalized` passed to VertexAttribPointer is clear */
unsigned not_normalised : 1;
u64 unknown3 :34;
} __attribute__((packed));
ASSERT_SIZEOF_TYPE(struct mali_attr_meta,
sizeof(u64), sizeof(u64));
enum mali_fbd_type {
MALI_SFBD = 0,
MALI_MFBD = 1,
};
#define FBD_TYPE (1)
#define FBD_MASK (~0x3f)
/* Applies to unknown_draw */
#define MALI_DRAW_INDEXED_UINT8 (0x10)
#define MALI_DRAW_INDEXED_UINT16 (0x20)
#define MALI_DRAW_INDEXED_UINT32 (0x30)
struct mali_payload_vertex_tiler {
/* Exactly as passed to glLineWidth */
float line_width;
/* Off by one */
u32 vertex_count;
u32 unk1; // 0x28000000
unsigned draw_mode : 4;
unsigned unknown_draw : 28;
u32 zero0;
u32 zero1;
/* Like many other strictly nonzero quantities, index_count is
* subtracted by one. For an indexed cube, this is equal to 35 = 6
* faces * 2 triangles/per face * 3 vertices/per triangle - 1. For
* non-indexed draws, equal to vertex_count. */
u32 index_count;
/* No hidden structure; literally just a pointer to an array of
* uint32_t indices. Thanks, guys, for not making my life insane for
* once! NULL for non-indexed draws. */
uintptr_t indices;
u32 zero3;
u32 gl_enables; // 0x5
/* Offset for first vertex in buffer */
u32 draw_start;
u32 zero5;
/* Zero for vertex jobs. Pointer to the position (gl_Position) varying
* output from the vertex shader for tiler jobs. */
uintptr_t position_varying;
uintptr_t unknown1; /* pointer */
/* For reasons I don't quite understand this is a pointer to a pointer.
* That second pointer points to the actual texture descriptor. */
uintptr_t texture_trampoline;
/* For OpenGL, from what I've seen, this is intimately connected to
* texture_meta. cwabbott says this is not the case under Vulkan, hence
* why this field is seperate (Midgard is Vulkan capable) */
uintptr_t sampler_descriptor;
uintptr_t uniforms;
u8 flags : 4;
uintptr_t _shader_upper : MALI_SHORT_PTR_BITS - 4; /* struct shader_meta */
uintptr_t attributes; /* struct attribute_buffer[] */
uintptr_t attribute_meta; /* attribute_meta[] */
uintptr_t varyings; /* struct attr */
uintptr_t unknown6; /* pointer */
uintptr_t viewport;
u32 zero6;
mali_ptr framebuffer;
} __attribute__((packed));
//ASSERT_SIZEOF_TYPE(struct mali_payload_vertex_tiler, 256, 256);
/* Pointed to from texture_trampoline, mostly unknown still, haven't
* managed to replay successfully */
/* Purposeful off-by-one in width, height fields. For example, a (64, 64)
* texture is stored as (63, 63) in these fields. This adjusts for that.
* There's an identical pattern in the framebuffer descriptor. Even vertex
* count fields work this way, hence the generic name -- integral fields that
* are strictly positive generally need this adjustment. */
#define MALI_POSITIVE(dim) (dim - 1)
/* Opposite of MALI_POSITIVE, found in the depth_units field */
#define MALI_NEGATIVE(dim) (dim + 1)
/* Used with channel swizzling */
enum mali_channel {
MALI_CHANNEL_RED = 0,
MALI_CHANNEL_GREEN = 1,
MALI_CHANNEL_BLUE = 2,
MALI_CHANNEL_ALPHA = 3,
MALI_CHANNEL_ZERO = 4,
MALI_CHANNEL_ONE = 5,
MALI_CHANNEL_RESERVED_0 = 6,
MALI_CHANNEL_RESERVED_1 = 7,
};
/* Used with wrapping. Incomplete (this is a 4-bit field...) */
enum mali_wrap_mode {
MALI_WRAP_REPEAT = 0x8,
MALI_WRAP_CLAMP_TO_EDGE = 0x9,
MALI_WRAP_CLAMP_TO_BORDER = 0xB,
MALI_WRAP_MIRRORED_REPEAT = 0xC
};
struct mali_texture_descriptor {
uint16_t width;
uint16_t height;
uint16_t depth;
uint16_t unknown1;
uint32_t format1;
uint32_t unknown3;
/* Swizzling is a single 32-bit word, broken up here for convenience.
* Here, swizzling refers to the ES 3.0 texture parameters for channel
* level swizzling, not the internal pixel-level swizzling which is
* below OpenGL's reach */
enum mali_channel swizzle_r : 3;
enum mali_channel swizzle_g : 3;
enum mali_channel swizzle_b : 3;
enum mali_channel swizzle_a : 3;
unsigned swizzle_zero : 20;
uint32_t unknown5;
uint32_t unknown6;
uint32_t unknown7;
mali_ptr swizzled_bitmap_0;
mali_ptr swizzled_bitmap_1;
} __attribute__((packed));
/* Used as part of filter_mode */
#define MALI_GL_LINEAR 0
#define MALI_GL_NEAREST 1
/* Used to construct low bits of filter_mode */
#define MALI_GL_TEX_MAG(mode) (((mode) & 1) << 0)
#define MALI_GL_TEX_MIN(mode) (((mode) & 1) << 1)
#define MALI_GL_TEX_MAG_MASK (1)
#define MALI_GL_TEX_MIN_MASK (2)
#define MALI_FILTER_NAME(filter) (filter ? "MALI_GL_NEAREST" : "MALI_GL_LINEAR")
struct mali_sampler_descriptor {
uint32_t filter_mode;
/* Who knows? ("Someone under NDA" "Um, who else?" "You, in the future,
* I hope?") */
uint32_t unknown1;
/* All one word in reality, but packed a bit */
enum mali_wrap_mode wrap_s : 4;
enum mali_wrap_mode wrap_t : 4;
enum mali_wrap_mode wrap_r : 4;
enum mali_alt_func compare_func : 3;
/* A single set bit of unknown, ha! */
unsigned unknown2 : 1;
unsigned zero : 16;
uint32_t zero2;
float border_color[4];
} __attribute__((packed));
/* TODO: What are the floats? Apparently always { -inf, -inf, inf, inf },
* unless the scissor test is enabled.
*
* viewport0/viewport1 form the arguments to glViewport. viewport1 is modified
* by MALI_POSITIVE; viewport0 is as-is.
*/
struct mali_viewport {
float floats[4];
float depth_range_n;
float depth_range_f;
u16 viewport0[2];
u16 viewport1[2];
} __attribute__((packed));
/* TODO: Varying meta is symmetrical with attr_meta, but there is some
* weirdness associated. Figure it out. */
struct mali_unknown6 {
u64 unknown0;
u64 unknown1;
};
/* From presentations, 16x16 tiles externally. Use shift for fast computation
* of tile numbers. */
#define MALI_TILE_SHIFT 4
#define MALI_TILE_LENGTH (1 << MALI_TILE_SHIFT)
/* Tile coordinates are stored as a compact u32, as only 12 bits are needed to
* each component. Notice that this provides a theoretical upper bound of (1 <<
* 12) = 4096 tiles in each direction, addressing a maximum framebuffer of size
* 65536x65536. Multiplying that together, times another four given that Mali
* framebuffers are 32-bit ARGB8888, means that this upper bound would take 16
* gigabytes of RAM just to store the uncompressed framebuffer itself, let
* alone rendering in real-time to such a buffer.
*
* Nice job, guys.*/
/* From mali_kbase_10969_workaround.c */
#define MALI_X_COORD_MASK 0x00000FFF
#define MALI_Y_COORD_MASK 0x0FFF0000
/* Extract parts of a tile coordinate */
#define MALI_TILE_COORD_X(coord) ((coord) & MALI_X_COORD_MASK)
#define MALI_TILE_COORD_Y(coord) (((coord) & MALI_Y_COORD_MASK) >> 16)
#define MALI_TILE_COORD_FLAGS(coord) ((coord) & ~(MALI_X_COORD_MASK | MALI_Y_COORD_MASK))
/* No known flags yet, but just in case...? */
#define MALI_TILE_NO_FLAG (0)
/* Helpers to generate tile coordinates based on the boundary coordinates in
* screen space. So, with the bounds (0, 0) to (128, 128) for the screen, these
* functions would convert it to the bounding tiles (0, 0) to (7, 7).
* Intentional "off-by-one"; finding the tile number is a form of fencepost
* problem. */
#define MALI_MAKE_TILE_COORDS(X, Y) ((X) | ((Y) << 16))
#define MALI_BOUND_TO_TILE(B, bias) ((B - bias) >> MALI_TILE_SHIFT)
#define MALI_COORDINATE_TO_TILE(W, H, bias) MALI_MAKE_TILE_COORDS(MALI_BOUND_TO_TILE(W, bias), MALI_BOUND_TO_TILE(H, bias))
#define MALI_COORDINATE_TO_TILE_MIN(W, H) MALI_COORDINATE_TO_TILE(W, H, 0)
#define MALI_COORDINATE_TO_TILE_MAX(W, H) MALI_COORDINATE_TO_TILE(W, H, 1)
struct mali_payload_fragment {
u32 min_tile_coord;
u32 max_tile_coord;
mali_ptr framebuffer;
} __attribute__((packed));
//ASSERT_SIZEOF_TYPE(struct mali_payload_fragment, 12, 16);
/* (Single?) Framebuffer Descriptor */
/* Flags apply to format. With just MSAA_A and MSAA_B, the framebuffer is
* configured for 4x. With MSAA_8, it is configured for 8x. */
#define MALI_FRAMEBUFFER_MSAA_8 (1 << 3)
#define MALI_FRAMEBUFFER_MSAA_A (1 << 4)
#define MALI_FRAMEBUFFER_MSAA_B (1 << 23)
/* Fast/slow based on whether all three buffers are cleared at once */
#define MALI_CLEAR_FAST (1 << 18)
#define MALI_CLEAR_SLOW (1 << 28)
#define MALI_CLEAR_SLOW_STENCIL (1 << 31)
struct mali_single_framebuffer {
u32 unknown1;
u32 unknown2;
u64 unknown_address_0;
u64 zero1;
u64 zero0;
/* Exact format is ironically not known, since EGL is finnicky with the
* blob. MSAA, colourspace, etc are configured here. */
u32 format;
u32 clear_flags;
u32 zero2;
/* Purposeful off-by-one in these fields should be accounted for by the
* MALI_DIMENSION macro */
u16 width;
u16 height;
u32 zero3[8];
/* By default, the framebuffer is upside down from OpenGL's
* perspective. Set framebuffer to the end and negate the stride to
* flip in the Y direction */
mali_ptr framebuffer;
int32_t stride;
u32 zero4;
/* Depth and stencil buffers are interleaved, it appears, as they are
* set to the same address in captures. Both fields set to zero if the
* buffer is not being cleared. Depending on GL_ENABLE magic, you might
* get a zero enable despite the buffer being present; that still is
* disabled. */
mali_ptr depth_buffer; // not SAME_VA
u64 depth_buffer_enable;
mali_ptr stencil_buffer; // not SAME_VA
u64 stencil_buffer_enable;
u32 clear_color_1; // RGBA8888 from glClear, actually used by hardware
u32 clear_color_2; // always equal, but unclear function?
u32 clear_color_3; // always equal, but unclear function?
u32 clear_color_4; // always equal, but unclear function?
/* Set to zero if not cleared */
float clear_depth_1; // float32, ditto
float clear_depth_2; // float32, ditto
float clear_depth_3; // float32, ditto
float clear_depth_4; // float32, ditto
u32 clear_stencil; // Exactly as it appears in OpenGL
u32 zero6[7];
/* Very weird format, see generation code in trans_builder.c */
u32 resolution_check;
u32 tiler_flags;
u64 unknown_address_1; /* Pointing towards... a zero buffer? */
u64 unknown_address_2;
/* See mali_kbase_replay.c */
u64 tiler_heap_free;
u64 tiler_heap_end;
/* More below this, maybe */
} __attribute__((packed));
/* Multi? Framebuffer Descriptor */
struct mali_tentative_mfbd {
u64 blah; /* XXX: what the fuck is this? */
/* This GPU address is unknown, except for the fact there's something
* executable here... */
u64 ugaT;
u32 block1[10];
u32 unknown1;
u32 flags;
u8 block2[16];
u64 heap_free_address;
u64 unknown2;
u32 weights[MALI_FBD_HIERARCHY_WEIGHTS];
u64 unknown_gpu_addressN;
u8 block3[88];
u64 unknown_gpu_address;
u64 unknown3;
u8 block4[40];
} __attribute__((packed));
/* Originally from chai, which found it from mali_kase_reply.c */
#endif /* __MALI_JOB_H__ */
/*
* © Copyright 2017-2018 The Panfrost Community
*
* This program is free software and is provided to you under the terms of the
* GNU General Public License version 2 as published by the Free Software
* Foundation, and any use by you of this program is subject to the terms
* of such GNU license.
*
* A copy of the licence is included with the program, and can also be obtained
* from Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
* Boston, MA 02110-1301, USA.
*
*/
#ifndef __MALI_PROPS_H__
#define __MALI_PROPS_H__
#include "mali-ioctl.h"
#define MALI_GPU_NUM_TEXTURE_FEATURES_REGISTERS 3
#define MALI_GPU_MAX_JOB_SLOTS 16
#define MALI_MAX_COHERENT_GROUPS 16
/* Capabilities of a job slot as reported by JS_FEATURES registers */
#define JS_FEATURE_NULL_JOB (1u << 1)
#define JS_FEATURE_SET_VALUE_JOB (1u << 2)
#define JS_FEATURE_CACHE_FLUSH_JOB (1u << 3)
#define JS_FEATURE_COMPUTE_JOB (1u << 4)
#define JS_FEATURE_VERTEX_JOB (1u << 5)
#define JS_FEATURE_GEOMETRY_JOB (1u << 6)
#define JS_FEATURE_TILER_JOB (1u << 7)
#define JS_FEATURE_FUSED_JOB (1u << 8)
#define JS_FEATURE_FRAGMENT_JOB (1u << 9)
struct mali_gpu_core_props {
/**
* Product specific value.
*/
u32 product_id;
/**
* Status of the GPU release.
* No defined values, but starts at 0 and increases by one for each
* release status (alpha, beta, EAC, etc.).
* 4 bit values (0-15).
*/
u16 version_status;
/**
* Minor release number of the GPU. "P" part of an "RnPn" release
* number.
* 8 bit values (0-255).
*/
u16 minor_revision;
/**
* Major release number of the GPU. "R" part of an "RnPn" release
* number.
* 4 bit values (0-15).
*/
u16 major_revision;
u16 :16;
/**
* @usecase GPU clock speed is not specified in the Midgard</