- 07 Feb, 2019 10 commits
-
-
Dmitry Osipenko authored
Currently gathers of a hung job are getting NOP'ed and a restarted CDMA executes the NOP'ed gathers. There shouldn't be a reason to not restart CDMA execution starting with a next job, avoiding the unnecessary churning with gathers NOP'ing. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Dmitry Osipenko authored
There is a chance that the last job has been completed at the time of CDMA timeout handler invocation. In this case there is no need to complete the completed job. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Dmitry Osipenko authored
Host1x doesn't have information about jobs inter-dependency, that is something that will become available once host1x will get a proper jobs scheduler implementation. Currently a hang job causes other unrelated jobs to be canceled, that is a relic from downstream driver which is irrelevant to upstream. Let's cancel only the hanging job and not to touch other jobs in queue. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The host1x CDMA push buffer is terminated by a special opcode (RESTART) that tells the CDMA to wrap around to the beginning of the push buffer. To accomodate the RESTART opcode, an extra 4 bytes are allocated on top of the 512 * 8 = 4096 bytes needed for the 512 slots (1 slot = 2 words) that are used for other commands passed to CDMA. This requires that two memory pages are allocated, but most of the second page (4092 bytes) is never used. Decrease the number of slots to 511 so that the RESTART opcode fits within the page. Adjust the push buffer wraparound code to take into account push buffer sizes that are not a power of two. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an absolute address. This can cause SMMU faults if the CDMA fetches past a pushbuffer's IOMMU mapping. Properly setting the DMAEND prevents the CDMA from fetching beyond that address and avoid such issues. This is currently not observed because a whole (almost) page of essentially scratch space absorbs any excessive prefetching by CDMA. However, changing the number of slots in the push buffer can trigger these SMMU faults. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The host1x and clients instantiated on Tegra186 support addressing 40 bits of memory. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that is not addressable by the memory clients. To fix this, restrict the IOVA space to the DMA mask of the host1x device. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
Tegra186 and later support 40 bits of address space. Additional registers need to be programmed to store the full 40 bits of push buffer addresses. Since command stream gathers can also reside in buffers in a 40-bit address space, a new variant of the GATHER opcode is also introduced. It takes two parameters: the first parameter contains the lower 32 bits of the address and the second parameter contains bits 32 to 39. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The CDMA push buffer can currently only handle opcodes that take a single word parameter. However, the host1x implementation on Tegra186 and later supports opcodes that require multiple words as parameters. Unfortunately the way the push buffer is structured, these wide opcodes cannot simply be composed of two regular opcodes because that could result in the wide opcode being split across the end of the push buffer and the final RESTART opcode required to wrap the push buffer around would break the wide opcode. One way to fix this would be to remove the concept of slots to simplify push buffer operations. However, that's not entirely trivial and should be done in a separate patch. For now, simply use a different function to push four-word opcodes into the push buffer. Technically only three words are pushed, with the fourth word used as padding to preserve the 2-word alignment required by the slots abstraction. The fourth word is always a NOP opcode. Additional care must be taken when the end of the push buffer is reached. If a four-word opcode doesn't fit into the push buffer without being split by the boundary, NOP opcodes will be introduced and the new wide opcode placed at the beginning of the push buffer. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
When processing command streams, make sure the host1x's stream ID is programmed for the channel so that addresses are properly translated through the SMMU. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 04 Feb, 2019 3 commits
-
-
Thierry Reding authored
In order to enable the MMIO path stream ID protection provided by the incarnation of host1x found in Tegra186 and later, the host1x must be provided with the list of stream ID register offsets for each of its clients. Some clients (such as VIC) have multiple stream ID registers that are assumed to be contiguous. The host1x is programmed with the base offset and a limit which provide the range of registers that the host1x needs to monitor for writes. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
This new debugfs file represents the state of host1x bus devices, specifying the list of subdevices and marking which ones have successfully registered. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Arnd Bergmann authored
In this usage, the two are completely equivalent, but the completion documents better what is going on, and we generally try to avoid semaphores these days. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 29 Nov, 2018 1 commit
-
-
Thierry Reding authored
The host1x hardware found on Tegra194 is mostly backwards compatible with the version found on Tegra186, with the notable exceptions of the increased number of syncpoints and mlocks. In addition, some rarely used features such as syncpoint wait bases were dropped and some registers had to move around to accomodate the increased number of syncpoints. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 27 Nov, 2018 2 commits
-
-
Thierry Reding authored
The number of syncpoints on Tegra186 is 576 and therefore no longer fits into 8 bits. Increase the size of the syncpoint ID field to 10 in order to accomodate all syncpoints. Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The register region allocated per channel was decreased from 16384 bytes to 256 bytes on Tegra186 and later. Resize the region to make sure every channel (instead of only the first) is properly programmed. Suggested-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 26 Sep, 2018 2 commits
-
-
Dmitry Osipenko authored
Host1x is getting attached to an implicit IOMMU DMA domain if CONFIG_ARM_DMA_USE_IOMMU=y. Since Host1x driver manages IOMMU by itself, Host1x device must be detached from the implicit domain using arch-specific IOMMU-API. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
All other assignments have a single space around the = sign, so remove the spurious tab for consistency. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 09 Jul, 2018 2 commits
-
-
Dmitry Osipenko authored
Only gather pins are mapped by the Host1x driver, regular BO relocations are not. Check whether size of unpin isn't 0, otherwise IOVA allocation at 0x0 could be erroneously released. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Dmitry Osipenko authored
Host1x's CDMA can't access the command buffers if IOMMU and Host1x firewall are enabled in the kernels config because firewall doesn't map the copied buffer into IOVA space. Fix this by skipping IOMMU initialization if firewall is enabled as firewall merges sparse cmdbufs into a single contiguous buffer and hence IOMMU isn't needed in this case. Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 18 May, 2018 7 commits
-
-
Thierry Reding authored
The number of words and the offset in a gather don't need to be explicitly sized, so make them unsigned int instead. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
All other array variables use a plural, and this is the only one using the *array suffix. This is confusing, so rename it for consistency. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
Functions taking a pointer to a host1x syncpoint as an argument don't need to specify a pointer to a host1x instance because it can be obtained from the syncpoint. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
Use unsigned int where possible and don't unnecessarily initialize the loop variable. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
Rather than storing some identifier derived from the application context that can't be used concretely anywhere, store a pointer to the client directly so that accesses can be made directly through that client object. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The job submission userspace ABI doesn't support this and there are no plans to implement it, so all of this code is dead and can be removed. Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Emil Goode authored
The compiler is complaining with the following errors: drivers/gpu/host1x/cdma.c:94:48: error: passing argument 3 of ‘dma_alloc_wc’ from incompatible pointer type [-Werror=incompatible-pointer-types] drivers/gpu/host1x/cdma.c:113:48: error: passing argument 3 of ‘dma_alloc_wc’ from incompatible pointer type [-Werror=incompatible-pointer-types] The expected pointer type of the third argument to dma_alloc_wc() is dma_addr_t but phys_addr_t is passed. Change the phys member of struct push_buffer to be dma_addr_t so that we pass the correct type to dma_alloc_wc(). Also check pb->mapped for non-NULL in the destroy function as that is the right way of checking if dma_alloc_wc() was successful. Signed-off-by:
Emil Goode <emil.fsw@goode.io> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 17 May, 2018 2 commits
-
-
Thierry Reding authored
The IOVA API uses a memory cache to allocate IOVA nodes from. To make sure that this cache is available, obtain a reference to it and release the reference when the cache is no longer needed. On 64-bit ARM this is hidden by the fact that the DMA mapping API gets that reference and never releases it. On 32-bit ARM, this is papered over by the Tegra DRM driver (the sole user of the host1x API requiring the cache) acquiring a reference to the IOVA cache for its own purposes. However, there may be additional users of this API in the future, so fix this upfront to avoid surprises. Fixes: 404bfb78 ("gpu: host1x: Add IOMMU support") Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Dmitry Osipenko authored
If IOVA allocation or IOMMU mapping fails, dma_free_wc() is invoked with size=0 because of a typo, that triggers "kernel BUG at mm/vmalloc.c:124!". Signed-off-by:
Dmitry Osipenko <digetx@gmail.com> Reviewed-by:
Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 03 May, 2018 2 commits
-
-
Christoph Hellwig authored
With each bus implementing its own DMA configuration callback, there is no need for bus to explicitly set the force_dma flag. Modify the of_dma_configure function to accept an input parameter which specifies if implicit DMA configuration is required when it is not described by the firmware. Signed-off-by:
Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts Reviewed-by:
Rob Herring <robh@kernel.org> [hch: tweaked the changelog a bit] Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Nipun Gupta authored
ACPI/OF support for configuration of DMA is a bus specific aspect, and thus should be configured by the bus. Introduces a 'dma_configure' bus method so that busses can control their DMA capabilities. Also update the PCI, Platform, ACPI and host1x buses to use the new method. Suggested-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts Acked-by:
Thierry Reding <treding@nvidia.com> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [hch: simplified host1x_dma_configure based on a comment from Thierry, rewrote changelog] Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- 21 Dec, 2017 1 commit
-
-
Thierry Reding authored
Use IOMMU groups to attach the host1x device to its IOMMU domain. This is not strictly necessary because the domain isn't shared with any other device, but it makes the code consistent with how IOMMU is handled in other drivers and provides an easy way to detect when no IOMMU has been attached via device tree. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 13 Dec, 2017 2 commits
-
-
Thierry Reding authored
When an error happens during the initialization of one of the sub- devices, make sure to properly cleanup all sub-devices that have been initialized up to that point. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Thierry Reding authored
The current check is slightly difficult to read, rewrite it to improve that a little. Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
- 02 Nov, 2017 1 commit
-
-
Greg Kroah-Hartman authored
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by:
Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by:
Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 20 Oct, 2017 5 commits
-
-
Mikko Perttunen authored
This function actually doesn't sleep in the version that was merged. Signed-off-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Mikko Perttunen authored
The disassembler for debug dumps was missing some newer host1x opcodes. Add disassembly support for these. Signed-off-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Mikko Perttunen authored
The host1x driver prints out "disassembly" dumps of the command FIFO and gather contents on submission timeouts. However, the output has been quite difficult to read with unnecessary newlines and occasional missing parentheses. Fix these problems by using pr_cont to remove unnecessary newlines and by fixing other small issues. Signed-off-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Tested-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Mikko Perttunen authored
The gather filter is a feature present on Tegra124 and newer where the hardware prevents GATHERed command buffers from executing commands normally reserved for the CDMA pushbuffer which is maintained by the kernel driver. This commit enables the gather filter on all supporting hardware. Signed-off-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Mikko Perttunen authored
Since Tegra186 the Host1x hardware allows syncpoints to be assigned to specific channels, preventing any other channels from incrementing them. Enable this feature where available and assign syncpoints to channels when submitting a job. Syncpoints are currently never unassigned from channels since that would require extra work and is unnecessary with the current channel allocation model. Signed-off-by:
Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by:
Dmitry Osipenko <digetx@gmail.com> Signed-off-by:
Thierry Reding <treding@nvidia.com>
-