[CIR] Implement non-scalar lvalue return values (#190795)
I could only get these to happen in C++03 (as we do a
materialize-temporary-expr in later standards), but this does appear in
a number of benchmarks. The implementation ends up being pretty trivial,
as we just have to lower the aggregate correctly.
[CIR] Add lowering for long-double increment/decrement (#190812)
This showed up a handful of times in some benchmarks. Supporting
long-double is pretty trivial, so this patch does so, with some work to
make sure all 3 formats of long-double work in the test (plus some
command-line replacement, hopefully that isn't too confusing).
The NYI is left in place, as we're not yet implementing any of the
'half' types (or other smaller FP types).
[lldb][DWARFASTParserClang] Handle pointer-to-member-data non-type (#189510)
## Reland Notes
Re applying [187598](https://github.com/llvm/llvm-project/pull/187598)
This is a reland of the original commit which was reverted due to a
failure on the Windows buildbot.
Root cause of the Windows failure:
* The fix introduces TemplateArgument::Declaration (pointing to a
FieldDecl)
* GetValueParamType() in TypeSystemClang.cpp did not handle this kind,
so CreateTemplateParameterList() created a
TemplateTypeParmDecl instead of a NonTypeTemplateParmDecl for the
corresponding template parameter.
* On Windows, the Microsoft name mangler calls
cast<NonTypeTemplateParmDecl>(Parm) when mangling member data pointer
NTTPs, which crashed because Parm was a TemplateTypeParmDecl.
* The Itanium mangler (Linux/Mac) does not inspect the parameter
[58 lines not shown]
[VPlan] Properly preserve IsMaterialized in VPlan::duplicate (NFC). (#190849)
Make sure IsMaterialized is preserved in VPlan::duplicate for
VPSymbolicValues. This is currently NFC.
Split off from approved
https://github.com/llvm/llvm-project/pull/156262.
[Flang][OpenMP] Fix Common Blocks use in update to/from and target maps causing compiler errors (#187221)
This patch attempts to fix a compiler ICE when common blocks are used in
target update to/from, it seems to stem from the fact that we do not
resolve the symbols in the relevant clauses, so when we later process
the maps we don't have the right symbol that references the common block
that was setup and bound by the fortran lowering. Resolving the names
seems to do the trick.
There is a second issue where when referencing a common block with an
array contained in it and utilising the array within the target region,
we'll currently not accurately map over the bounds and cause a FIR/MLIR
verification error. The fix for this is to simply move the common block
member re-binding/re-materialization for the target region to before the
bounds data re-materialization we do during target region generation.
[RISCV] Use signed target constants for XCVmem post-inc loads (#189276)
First time opening a PR against LLVM, so please let me know if anything
is missing / wrong.
This fixes an assertion in RISC-V DAG isel for CORE-V xcvmem
post-increment loads with negative immediate offsets.
`RISCVDAGToDAGISel::Select` recognizes `xcvmem POST_INC` loads and
checks whether the offset fits the signed 12-bit immediate form used by
`cv.lb/cv.lbu/cv.lh/cv.lhu/cv.lw ... , (rs1), imm12`. That path was
extracting the offset with `getSExtValue()`, but then rebuilding it with
`getTargetConstant(...)`, which takes the unsigned constant path.
For negative offsets, that could trip the APInt assertion:
```
Assertion failed: (llvm::isUIntN(BitWidth, val) && "Value is not an N-bit unsigned value")
```
[34 lines not shown]
[NFCI] Check for non-null before dereferencing a VPBB ptr (#190403)
A VPBB variable is possibly null (defined via a ternary), but is
subsequently dereferenced without a check included. This patch adds a
check for it to avoid any possibly null dereference. This was found via
static analysis, there is not a known case right now where this issue is
hit.
[Flang][OpenMP] Allow user-defined default mappers to bypass the implicit mapper fence (#189136)
Currently we wall out implicit declare mappers from being applied to
enter/exit/update (which we'll need to address in future PRs, as this
likely should work to some extent for allocatable member mapping). A
side effect of this is that it's causing user-defined default declare
mappers to not apply in scenarios when they should.
I believe these user-defined default declare mappers should apply in all
cases where that type is mapped and no other mapper has been explicitly
specified, as they replace the original default mapping behaviour from
my admittedly shoddy specification reading skills.
The user defined default mappers should "implicitly" apply because:
1. No explicit mapper modifier is specified
2. The fallback behavior should be "as if the modifier was specified
with the default mapper-identifier" (Section 5.9)
3. The user-defined default mapper "overrides the predefined default
mapper for the given type" (Section 5.8.2)
[10 lines not shown]
[MCP] Never eliminate frame-setup/destroy instructions
Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.
The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.
Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
[clang][OpenMP] declare_target/local clause variable can't be in map clause (#190470)
In OpenMP 6.0, the 'local' clause was added to the declare_target
directive. Variables listed in the 'local' clause are considered to be
device-local. In addition, a new map clause restriction was added:
A device-local variable must not appear as a list item in a map clause.
See OpenMP 6.0 specification section 7.9.6, map Clause, Restrictions, p.
386.
Testing:
- New error messages test for device-local variables defined in
declare_target local clauses (device-local) used in map clauses.
- ninja check-openmp
[MCP][NFC] Opinionated refactoring
There are a few minor inconsistencies across the pass which I found mildly
distracting:
* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
the `optional`.
Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.
Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.
[11 lines not shown]
[HLSL] Rewrite inline HLSL intrinsics into TableGen (#188362)
Partially addresses https://github.com/llvm/llvm-project/issues/188345.
This PR rewrites all applicable inline HLSL intrinsics from
`hlsl_intrinsics.h` into TableGen.
The unsigned `abs` from `hlsl_alias_intrinsics.h` is also rewritten into
TableGen since it can also be defined inline.
The `NonUniformResourceIndex` is moved from `hlsl_intrinsics.h` over to
`hlsl_alias_intrinsics.h` since it can be defined as an alias.
`__detail::.*_impl` helper functions that were one liners have been
removed, and their corresponding HLSL intrinsics have been defined in
TableGen using the `Body` field instead.
Note that rewriting `refract` in TableGen instead of templates
introduces some significant changes to error messages and also
introduces a new offload test suite failure in the fp16 test because a
[10 lines not shown]
[MCP][NFC] Cleanup and prepare to preserve frame-setup/destroy
This mixes renames, removing redundant code, avoiding
`else`-after-`return`, etc. with factoring out the `isNeverRedundant`
concept.
Change-Id: I43a62a9415019cdd63c68fd3b915ebb7505d317a
[RISCV][MCA] Do not use mask instructions that can potentially be optimized by uArch (#190820)
Context:
https://github.com/llvm/llvm-project/pull/189785#discussion_r3019282209
Some mask instructions have a form that can potentially be optimized by
HW implementation: `vmxor.mm vd, vs, vs` and `vmclr vd, vs`, for
instance. This patch avoids using such instructions in MCA tests.
[BOLT][AArch64] Optimize the mov-imm-to-reg operation (#189304)
On AArch64, logical immediate instructions are used to encode some
special immediate values. And even at `-O0` level, the AArch64 backend
would not choose to generate 4 instructions (movz, movk, movk, movk) for
moving such a special value to a 64-bit regiter.
For example, to move the 64-bit value `0x0001000100010001` to `x0`, the
AArch64 backend would not choose a 4-instruction-sequence like
```
movz x0, 0x0001
movk x0, 0x0001, lsl 16
movk x0, 0x0001, lsl 32
movk x0, 0x0001, lsl 48
```
Actually, the AArch64 backend would choose to generate one instruction
```
mov x0, 0x0001000100010001
```
[10 lines not shown]