Skip to content
Snippets Groups Projects

aarch64 neon support (cont'd)

Merged Marek Vasut requested to merge marex/orc:aarch64 into master

This extends MR !37 (closed) and implements most of the remaining Aarch64 instructions. This is enough to accelerate e.g. videotestsrc, accumulator and flags2d are also implemented.

The videoconvert from I420 to RGB (useful e.g. for jpegdec and openh264dec) acceleration is not implemented due to missing loadupdb implementation. This will be added in separate MR.

Edited by Tim-Philipp Müller

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Marek Vasut mentioned in merge request !37 (closed)

    mentioned in merge request !37 (closed)

  • Marek Vasut changed the description

    changed the description

  • Both accumulator and 2d operations are still missing.

    Does it gracefully fall back to the backup code if any of those are used? That is, can this be merged already to improve the situation or would it cause failures with the unsupported operations?

  • Marek Vasut added 57 commits

    added 57 commits

    • a3b51418...8fc6bdf5 - 20 commits from branch gstreamer:master
    • 80c1defe - aarch64: make some setups for aarch64 support
    • 7c64f89f - aarch64: implement emits for general instructions
    • 9264fd76 - aarch64: implement emits for some vector instructions and ORC ops (add)
    • ae4e4d6a - aarch64: orcprogram-neon porting to aarch64
    • 4bd7309e - aarch64: Use 64bit operations on 64bit pointers
    • de0b15dd - aarch64: Repair 8bit load/store opcode
    • 7d1c4da3 - aarch64: Repair emit for imm 1
    • 8b8dade9 - aarch64: Repair storeX instructions
    • 4c83f87e - aarch64: Implement unary instruction emit
    • 42a956a1 - aarch64: Implement convX instructions
    • d17fcdd7 - aarch64: Implement select{0,1}X instructions
    • 1fafc66a - aarch64: Implement mulhX instructions
    • 1cdfaaba - aarch64: Implement mov instructions
    • 5a6baa5f - aarch64: Implement shift instructions
    • bf7603ff - aarch64: Implement loadX instructions
    • 5345fc6f - aarch64: Clean up mergeX/splatX instructions
    • e44e7e38 - aarch64: Implement mergeX instructions
    • 4455a443 - aarch64: Implement copyX/orX instructions
    • 3cc2d82e - aarch64: Implement xorX instructions
    • f8c53ae3 - aarch64: Implement absX instructions
    • 629e7ea2 - aarch64: Implement andX instructions
    • a74f2c47 - aarch64: Implement subX instructions
    • 65391500 - aarch64: Implement loadiX instructions
    • 3d873fd0 - aarch64: Implement accX instructions
    • b084cb8f - aarch64: Implement vminX/vmaxX instructions
    • cd198fa6 - aarch64: Implement signX instructions
    • 35c8f664 - aarch64: Implement splitX/splatX instructions
    • 30f15617 - aarch64: Implement loadupdb instruction
    • 2aa2e3f3 - aarch64: Implement avgX instructions
    • 9b619424 - aarch64: Implement cmpX instructions
    • 8d784b4d - aarch64: Implement mulX instructions
    • 46f6e6a0 - aarch64: Implement div255w instruction
    • 2c55d753 - aarch64: Implement swapX instructions
    • 9f99280a - aarch64: Implement splatw3q instruction
    • 8c39b126 - aarch64: Implement andn instruction
    • ee2c7eaa - aarch64: Implement floating-point arithmetic instructions
    • b93dd9ca - aarch64: Implement accumulator store

    Compare with previous version

  • Marek Vasut added 1 commit

    added 1 commit

    • b1f3be50 - aarch64: Implement const64 loadiq

    Compare with previous version

  • Marek Vasut added 16 commits

    added 16 commits

    • 2012ee8e - aarch64: Implement accX instructions
    • 2b2ecc44 - aarch64: Implement vminX/vmaxX instructions
    • 49cc5250 - aarch64: Implement signX instructions
    • d6b987e5 - aarch64: Implement splitX/splatX instructions
    • a00198bd - aarch64: Implement loadupdb instruction
    • 6ab7e50a - aarch64: Implement avgX instructions
    • ab81317b - aarch64: Implement cmpX instructions
    • d2c9d4f2 - aarch64: Implement mulX instructions
    • e68b5eac - aarch64: Implement div255w instruction
    • 7a5a5e7e - aarch64: Implement swapX instructions
    • 179973d3 - aarch64: Implement splatw3q instruction
    • e17726ce - aarch64: Implement andn instruction
    • c88cb606 - aarch64: Implement floating-point arithmetic instructions
    • 19819c61 - aarch64: Implement accumulator store
    • 43d57033 - aarch64: Implement const64 loadiq
    • eb027f95 - aarch64: Implement flags2d

    Compare with previous version

  • Marek Vasut added 25 commits

    added 25 commits

    • 37910403 - aarch64: Implement loadX instructions
    • cd793476 - aarch64: Clean up mergeX/splatX instructions
    • 8dd3da66 - aarch64: Implement mergeX instructions
    • 4de4e86f - aarch64: Implement copyX/orX instructions
    • 1583be7d - aarch64: Implement xorX instructions
    • cc12b32f - aarch64: Implement absX instructions
    • 8624e098 - aarch64: Implement andX instructions
    • b81b455f - aarch64: Implement subX instructions
    • c158b220 - aarch64: Implement loadiX instructions
    • d2692517 - aarch64: Implement accX instructions
    • a79dee7b - aarch64: Implement vminX/vmaxX instructions
    • 934c3c07 - aarch64: Implement signX instructions
    • 3d8a3baf - aarch64: Implement splitX/splatX instructions
    • 622779b5 - aarch64: Implement loadupdb instruction
    • 6987dcad - aarch64: Implement avgX instructions
    • 00943762 - aarch64: Implement cmpX instructions
    • ac30e990 - aarch64: Implement mulX instructions
    • e4339b43 - aarch64: Implement div255w instruction
    • 2b0b1a04 - aarch64: Implement swapX instructions
    • b0c0092f - aarch64: Implement splatw3q instruction
    • d4ba6b5b - aarch64: Implement andn instruction
    • 8922c7c4 - aarch64: Implement floating-point arithmetic instructions
    • bcd510e4 - aarch64: Implement accumulator store
    • c53f6898 - aarch64: Implement const64 loadiq
    • 09a333e9 - aarch64: Implement flags2d

    Compare with previous version

  • Marek Vasut added 3 commits

    added 3 commits

    • d89699c6 - aarch64: Implement double-precision floating-point arithmetic instructions
    • 22109801 - aarch64: Implement divf instruction
    • e82bd525 - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 3 commits

    added 3 commits

    • 53fc0a5c - aarch64: Implement double-precision floating-point arithmetic instructions
    • 1d0bfd62 - aarch64: Implement divf instruction
    • 2759a325 - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 32 commits

    added 32 commits

    • d6dc365e - aarch64: Implement select{0,1}X instructions
    • 3dd3d3e9 - aarch64: Implement mulhX instructions
    • fa368b10 - aarch64: Implement mov instructions
    • 64f5a009 - aarch64: Implement shift instructions
    • c842588f - aarch64: Implement loadX instructions
    • e92f62be - aarch64: Clean up mergeX/splatX instructions
    • c3fd0b27 - aarch64: Implement mergeX instructions
    • 4b12ee1e - aarch64: Implement copyX/orX instructions
    • be11f5a8 - aarch64: Implement xorX instructions
    • 084ad6f7 - aarch64: Implement absX instructions
    • 40277eb7 - aarch64: Implement andX instructions
    • f6145dc8 - aarch64: Implement subX instructions
    • ef67e36a - aarch64: Implement loadiX instructions
    • 26651ffc - aarch64: Implement accX instructions
    • f5b0e20e - aarch64: Implement vminX/vmaxX instructions
    • 8b6682b2 - aarch64: Implement signX instructions
    • 82318e2a - aarch64: Implement splitX/splatX instructions
    • 01677c5a - aarch64: Implement loadupdb instruction
    • b4a7374f - aarch64: Implement avgX instructions
    • 40670229 - aarch64: Implement cmpX instructions
    • 2818bf91 - aarch64: Implement mulX instructions
    • 30acad7d - aarch64: Implement div255w instruction
    • 1e3a7765 - aarch64: Implement swapX instructions
    • 2773f9f3 - aarch64: Implement splatw3q instruction
    • c8eed996 - aarch64: Implement andn instruction
    • 29105437 - aarch64: Implement floating-point arithmetic instructions
    • 686d1684 - aarch64: Implement accumulator store
    • e184d5b0 - aarch64: Implement const64 loadiq
    • fd3a31e0 - aarch64: Implement flags2d
    • b0c6bcfb - aarch64: Implement double-precision floating-point arithmetic instructions
    • c412702b - aarch64: Implement divf instruction
    • 5bacf34e - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 42 commits

    added 42 commits

    • 6ad01fcf - aarch64: make some setups for aarch64 support
    • 788c909b - aarch64: implement emits for general instructions
    • ee91c27b - aarch64: implement emits for some vector instructions and ORC ops (add)
    • f8aaecb5 - aarch64: orcprogram-neon porting to aarch64
    • 55a2b65c - aarch64: Use 64bit operations on 64bit pointers
    • 68236d84 - aarch64: Repair 8bit load/store opcode
    • cc15cb84 - aarch64: Repair emit for imm 1
    • a0df3bb0 - aarch64: Repair storeX instructions
    • 0e9f3693 - aarch64: Implement unary instruction emit
    • ae90f1e4 - aarch64: Implement convX instructions
    • f130fed6 - aarch64: Implement select{0,1}X instructions
    • 7f232ef6 - aarch64: Implement mulhX instructions
    • 80992353 - aarch64: Implement mov instructions
    • 6b61afbc - aarch64: Implement shift instructions
    • 527e340e - aarch64: Implement loadX instructions
    • 78a960dc - aarch64: Clean up mergeX/splatX instructions
    • 389e9eb9 - aarch64: Implement mergeX instructions
    • 704b3683 - aarch64: Implement copyX/orX instructions
    • fb46152c - aarch64: Implement xorX instructions
    • e22be081 - aarch64: Implement absX instructions
    • cfc7c896 - aarch64: Implement andX instructions
    • d9dccb9b - aarch64: Implement subX instructions
    • 7a66b215 - aarch64: Implement loadiX instructions
    • 7fabaa7d - aarch64: Implement accX instructions
    • 8671f23c - aarch64: Implement vminX/vmaxX instructions
    • f7031940 - aarch64: Implement signX instructions
    • b22fa663 - aarch64: Implement splitX/splatX instructions
    • 860411c3 - aarch64: Implement loadupdb instruction
    • b476c2cc - aarch64: Implement avgX instructions
    • 4f1ff670 - aarch64: Implement cmpX instructions
    • da181a13 - aarch64: Implement mulX instructions
    • 4737af57 - aarch64: Implement div255w instruction
    • 5ee54c3e - aarch64: Implement swapX instructions
    • e6a41e25 - aarch64: Implement splatw3q instruction
    • 6b96b59f - aarch64: Implement andn instruction
    • b8e65adb - aarch64: Implement floating-point arithmetic instructions
    • e0408513 - aarch64: Implement accumulator store
    • d84b1687 - aarch64: Implement const64 loadiq
    • 237746ae - aarch64: Implement flags2d
    • f4f34d58 - aarch64: Implement double-precision floating-point arithmetic instructions
    • d4755780 - aarch64: Implement divf instruction
    • fc42f9c5 - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut changed the description

    changed the description

  • The aarch32 and aarch64 both pass the ORC tests, and whatever other ORC code I could find and test. There are compile-failures on ldres{lin,near}{l,b} opcodes, as those are not implemented, but they were missing before too. I think this can be merged now.

  • Marek Vasut added 26 commits

    added 26 commits

    • edbd8a0f - aarch64: Implement mergeX instructions
    • b9d47aa0 - aarch64: Implement copyX/orX instructions
    • f2521eff - aarch64: Implement xorX instructions
    • ebf84090 - aarch64: Implement absX instructions
    • dad18cc9 - aarch64: Implement andX instructions
    • 52a27376 - aarch64: Implement subX instructions
    • c148e5d5 - aarch64: Implement loadiX instructions
    • 34ecc949 - aarch64: Implement accX instructions
    • f2fdf4bc - aarch64: Implement vminX/vmaxX instructions
    • 477cd20c - aarch64: Implement signX instructions
    • 7dd9d4d3 - aarch64: Implement splitX/splatX instructions
    • 925cbe0a - aarch64: Implement loadupdb instruction
    • 3b240459 - aarch64: Implement avgX instructions
    • 0f070cdd - aarch64: Implement cmpX instructions
    • f4575d48 - aarch64: Implement mulX instructions
    • 65d73339 - aarch64: Implement div255w instruction
    • 2c9545fe - aarch64: Implement swapX instructions
    • 1d57567f - aarch64: Implement splatw3q instruction
    • d3710200 - aarch64: Implement andn instruction
    • f4553ccd - aarch64: Implement floating-point arithmetic instructions
    • 6bbd74d8 - aarch64: Implement accumulator store
    • 952bce19 - aarch64: Implement const64 loadiq
    • c9166faa - aarch64: Implement flags2d
    • 56c122a9 - aarch64: Implement double-precision floating-point arithmetic instructions
    • 18803a0c - aarch64: Implement divf instruction
    • 48dbc63d - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 1 commit

    added 1 commit

    Compare with previous version

  • Marek Vasut added 1 commit

    added 1 commit

    Compare with previous version

  • Marek Vasut added 35 commits

    added 35 commits

    • 164bcb6f - aarch64: Fix MSVC warnings
    • ccf12ae3 - aarch64: Implement unary instruction emit
    • 5503b7e0 - aarch64: Implement convX instructions
    • ae6de7ac - aarch64: Implement select{0,1}X instructions
    • 9cd4030e - aarch64: Implement mulhX instructions
    • 1f55fc25 - aarch64: Implement mov instructions
    • 58900f71 - aarch64: Implement shift instructions
    • 5820f57a - aarch64: Implement loadX instructions
    • da85036e - aarch64: Clean up mergeX/splatX instructions
    • 255b0ea2 - aarch64: Implement mergeX instructions
    • 069cab4a - aarch64: Implement copyX/orX instructions
    • 10a43495 - aarch64: Implement xorX instructions
    • 5aba5c42 - aarch64: Implement absX instructions
    • 93eab062 - aarch64: Implement andX instructions
    • 00bf8149 - aarch64: Implement subX instructions
    • 3de19796 - aarch64: Implement loadiX instructions
    • cde478dc - aarch64: Implement accX instructions
    • a8dfd255 - aarch64: Implement vminX/vmaxX instructions
    • 7525fefd - aarch64: Implement signX instructions
    • 85c14b67 - aarch64: Implement splitX/splatX instructions
    • 8dc2714b - aarch64: Implement loadupdb instruction
    • b74936d7 - aarch64: Implement avgX instructions
    • 774f2b04 - aarch64: Implement cmpX instructions
    • bfb0b9a0 - aarch64: Implement mulX instructions
    • 7de2087e - aarch64: Implement div255w instruction
    • c6d25ab9 - aarch64: Implement swapX instructions
    • 8a0696c3 - aarch64: Implement splatw3q instruction
    • d38ef503 - aarch64: Implement andn instruction
    • 1828d761 - aarch64: Implement floating-point arithmetic instructions
    • 588bb481 - aarch64: Implement accumulator store
    • d561250b - aarch64: Implement const64 loadiq
    • 65b648f0 - aarch64: Implement flags2d
    • 9751e1fc - aarch64: Implement double-precision floating-point arithmetic instructions
    • cb33d35b - aarch64: Implement divf instruction
    • ba88b06f - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 16 commits

    added 16 commits

    • 31774aca - aarch64: Implement loadupdb instruction
    • 46ae28fa - aarch32: Implement loadupdb instruction
    • 651978f6 - aarch64: Implement avgX instructions
    • f490214e - aarch64: Implement cmpX instructions
    • 61d84d8f - aarch64: Implement mulX instructions
    • 27996e36 - aarch64: Implement div255w instruction
    • 97397db5 - aarch64: Implement swapX instructions
    • 16449550 - aarch64: Implement splatw3q instruction
    • efbd5a64 - aarch64: Implement andn instruction
    • ff490170 - aarch64: Implement floating-point arithmetic instructions
    • 732bb307 - aarch64: Implement accumulator store
    • 76ac0041 - aarch64: Implement const64 loadiq
    • eaea8117 - aarch64: Implement flags2d
    • 84b6a1d7 - aarch64: Implement double-precision floating-point arithmetic instructions
    • dc2d0945 - aarch64: Implement divf instruction
    • b1a17ab0 - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Marek Vasut added 15 commits

    added 15 commits

    • b0e1a59a - aarch32: Implement loadupdb instruction
    • b701af88 - aarch64: Implement avgX instructions
    • d8e7a23b - aarch64: Implement cmpX instructions
    • 78cef540 - aarch64: Implement mulX instructions
    • 5c282df5 - aarch64: Implement div255w instruction
    • 47c4c819 - aarch64: Implement swapX instructions
    • edf6a010 - aarch64: Implement splatw3q instruction
    • bc51fc64 - aarch64: Implement andn instruction
    • 20a716da - aarch64: Implement floating-point arithmetic instructions
    • 36229f79 - aarch64: Implement accumulator store
    • d215ba96 - aarch64: Implement const64 loadiq
    • 9a228d60 - aarch64: Implement flags2d
    • 7be398bc - aarch64: Implement double-precision floating-point arithmetic instructions
    • ed2be3a6 - aarch64: Implement divf instruction
    • 63f3a5dd - aarch64: Implement sqrtf instruction

    Compare with previous version

  • Sebastian Dröge unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading