[CIR] Include union tail pad in getTypeSizeInBits (#198361)
Padded CIR unions (e.g. libstdc++ `std::string` SSO layout) carry a
trailing byte-array member so the record matches the AST layout size.
`RecordType::getTypeSizeInBits` was returning only the largest-aligned
member and ignored that tail, so the CIR view of the union was 8 bytes
smaller than what `LowerToLLVM` emits. Parent structs then picked up
a spurious trailing pad via `insertPadding`, arrays of those structs
used the wrong stride, and heap allocations could be overrun (Eigen's
`array_of_string` hits this directly).
The fix adds the padding member's size when the union is marked
`padded`, so struct size, GEP strides, and `new T[n]` allocation sizes
match OGCG. Regression test models the SSO-shaped record and checks
the 96-byte `new` for three elements.
[OpenCL] Add Intel subgroup buffer prefetch and local block I/O builtins (#199258)
Add cl_intel_subgroup_buffer_prefetch and
cl_intel_subgroup_local_block_io
declarations to OpenCLBuiltins.td and cover them with header-free SPIR
tests.
This keeps the generated OpenCL builtins in sync with opencl-c.h for the
Intel subgroup buffer prefetch and local block I/O extensions.
Per the cl_intel_subgroup_local_block_io specification, the _ui local
aliases (intel_sub_group_block_read_ui*, intel_sub_group_block_write_ui*
with __local pointer) are declared under
FuncExtIntelSubgroupLocalBlockIO
alone, without a char/short/long prerequisite. A dedicated test
(intel-subgroup-local-block-io-ui-without-char-short-long.cl) verifies
that
they resolve when only cl_intel_subgroup_local_block_io is active.
[6 lines not shown]
[OpenCL] Fix image2d_t qualifier for intel_sub_group_block_write_ui (#199232)
The intel_sub_group_block_write_ui[2,4,8] overloads for image2d_t were
declared with a read_only qualifier, both in opencl-c.h and in
OpenCLBuiltins.td. A write operation cannot target a read_only image,
and
the base intel_sub_group_block_write together with the analogous _us,
_uc
and _ul aliases all correctly use write_only image2d_t.
Per the cl_intel_subgroups_short [1], cl_intel_subgroups_char [2] and
cl_intel_subgroups_long [3] specifications, the _ui aliases are added
"for
naming consistency [...] There is no change to the description or
behavior
of these functions" relative to the cl_intel_subgroups base, which uses
write_only image2d_t for writes.
The typo was introduced in b833bf6ae14f and preserved across all
[18 lines not shown]
[offload] Use device memory for the multithreaded kernel lanuch test (#199132)
This commit modifies the multithreaded kernel launch test to use device
memory instead of managed memory. The test is reported to be failing
intermittently in systems where concurrent managed memory access is
not supported. This is the case for NVIDIA devices that do not support
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS.
The concept of concurrent and coherent managed memory access should
be exposed to liboffload users somehow, e.g., adding it as device property,
so it is clear what execution patterns are allowed with managed memory.
However, this test is just testing concurrent kernel launches. This commit
fixes it until we decide how to proceed with the guarantees on that type of
allocations.
[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. (#142599)
Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if
* zext (C + A)<nsw> >=s 0 and
* A >=s V.
For now this is limited to cases where the first operand is a constant,
so the SExt can be folded to a new constant. This can be relaxed in the
future.
The initial version checks for non-negative manually to limit compile-time,
supporting only A = smax(C2, ..) where C2 >= abs(C)
Alive2 proof of the general pattern and the test changes in zext-nuw.ll
(times out in the online instance but verifies locally)
https://alive2.llvm.org/ce/z/_BtyGy
PR: github.com/llvm/llvm-project/pull/142599
[clang-doc][nfc] Silence tidy warning about anonymous namespace (#198071)
clang-tidy complains that we should prefer static over the anonymous
namespace, despite the API being static in addition to being in the
anonymous namespace. We can silence the diagnostic by simply removing
the namespace declaration.
[MLIR] Fix mlir-doc build, add missing "-dialect nvgpu" (#199279)
Was broken with
> when more than 1 dialect is present, one must be selected via
'-dialect'
Reapply [SimplifyCFG] Extend jump-threading to allow live local defs (#197850)
Restore "Extend jump-threading to allow live local defs" #135079. Long
compilation time with reduce.cu in hipcub/warp was partially addressed
in #195744. Compilation time for reduce.cu with this PR (after #195744)
is 6 minutes 40 seconds. Without (#195744) compilation time was several
hours.
Long compilation time in reduce.cu was only exposed by jump-threading.
In my view the primary causes were due to inlining, SROA tripling the IR
code size, and SSA updating 26K phi-nodes resulting in an O(N^2) search
for duplicates. #195744 limits phi search times.
This reverts commit a76750e6de6aba2223097dc505578556ec245d50.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[flang] Fixed FIR AA's getSource() for box loads inside acc.compute_region. (#199157)
This patch fixes a regression caused by #198635: when we call getSource()
for a `fir.load` of a box we have to handle the input value that might be
a `BlockArgument` and pass-through it.
Open yaml, etc as text files (#199253)
These tests were failing on z/OS because the text input files were being
opened as binary.
```
FAIL: LLVM :: tools/dsymutil/AArch64/typedef-different-types.test
FAIL: LLVM :: tools/dsymutil/X86/mismatch.m
FAIL: LLVM :: tools/dsymutil/embed-resource.test
FAIL: LLVM :: tools/llvm-gsymutil/X86/elf-symtab-file.yaml
```
Open the files as text to solve the problems.
[FileCheck] Refactor -dump-input test (#198137)
This PR is stacked on PR #198136.
This patch refactors `llvm/test/FileCheck/dump-input/annotations.txt` to
improve maintainability and coverage and to prepare for the upcoming
implementation of search range annotations.
Lit substitutions
=================
The test repeats the same basic set of RUN lines *many* times. This
patch encapsulates those in lit substitutions to improve
maintainability. By doing so, it also helps to ensure more consistent
coverage of all cases and thus slightly expands coverage.
-strict-whitespace
==================
[25 lines not shown]
[docs] update noescape semantics to disallow free (#195973)
This changes the documented semantics of the `noescape` attribute to
disallow freeing the pointer, and allow escapes of the integer value of
the memory address, as discussed in
https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.
It also clarifies that the attribute may only be used to annotate the
outermost pointer level of nested pointer parameters.
[CIR][CUDA] Introduce cu.var_registration for shadow and attach device-side var metadata, internalize device side variables, and lower poison attribute (#190087)
Signed-off-by: ZakyHermawan <zaky.hermawan9615 at gmail.com>
[FileCheck] Resurrect overflow tests (#198136)
D150880 (landed as 0726cb004718) uses `APInt` to eliminate most integer
overflow issues from FileCheck numeric variables. It also removes the 4
tests in `llvm/test/FileCheck/match-time-error-propagation`.
While the elimination of overflow issues reduces the importance of those
tests, the tests still seem worthwhile. Without them, I see no test that
exercises the "unable to substitute variable or numeric expression:
overflow error" diagnostic in FileCheck input dumps.
This patch resurrects those tests and updates them to exercise the
remaining unsigned underflow case.