LLVM/project 61a5d04llvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project 285da48llvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project e1ec0cfllvm/lib/Target/AMDGPU SISchedule.td VOP1Instructions.td, llvm/test/tools/llvm-mca/AMDGPU gfx12-permlane16-cycles.s

[AMDGPU] Add DummySchedWrite to avoid multiple issue cycles (#190095)

TargetSchedule.td specifies that each explicit def of an instruction
must have an associated SchedWrite type. This is a bit unfortunate due
to the MachineScheduler handling of the ScedWrites.

Each of these different SchedWrite contribute to the number of MicroOps
for the MCSchedClassDesc for the instruction --
https://github.com/llvm/llvm-project/blob/096f9d0aa8edb8bad77e8061a6aa9cbf61bcb5ac/llvm/utils/TableGen/SubtargetEmitter.cpp#L1136

Then in the MachineScheduler.cpp's bumpNode, we grab the numMicroOps
from the MCSchedClassDesc
https://github.com/llvm/llvm-project/blob/3d7eedce5658c41a1b22775938359bfafac47fc9/llvm/lib/CodeGen/MachineScheduler.cpp#L2948

We then use this numMicroOps as how many micro ops we'll need to issue
for this instruction. For our target, the issueWidth per cycle is 1. So,
for these instructions the MachineScheduler thinks they will take
multiple cycles to issue, and we add stalls to the hazardRecognizer
https://github.com/llvm/llvm-project/blob/3d7eedce5658c41a1b22775938359bfafac47fc9/llvm/lib/CodeGen/MachineScheduler.cpp#L3100

    [11 lines not shown]
DeltaFile
+110-0llvm/test/tools/llvm-mca/AMDGPU/gfx12-permlane16-cycles.s
+36-0llvm/lib/Target/AMDGPU/SISchedule.td
+1-1llvm/lib/Target/AMDGPU/VOP1Instructions.td
+147-13 files

LLVM/project cf551dcllvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project dc6e5dbllvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project 9a03e00llvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project 8918319clang/lib/CIR/CodeGen CIRGenExpr.cpp, clang/test/CIR/CodeGen non-scalar-lval-return.cpp

[CIR] Implement non-scalar lvalue return values (#190795)

I could only get these to happen in C++03 (as we do a
materialize-temporary-expr in later standards), but this does appear in
a number of benchmarks. The implementation ends up being pretty trivial,
as we just have to lower the aggregate correctly.
DeltaFile
+42-0clang/test/CIR/CodeGen/non-scalar-lval-return.cpp
+3-4clang/lib/CIR/CodeGen/CIRGenExpr.cpp
+45-42 files

LLVM/project a50839dclang/lib/CIR/CodeGen CIRGenExprScalar.cpp, clang/test/CIR/CodeGen long-double-inc-dec.cpp

[CIR] Add lowering for long-double increment/decrement (#190812)

This showed up a handful of times in some benchmarks. Supporting
long-double is pretty trivial, so this patch does so, with some work to
make sure all 3 formats of long-double work in the test (plus some
    command-line replacement, hopefully that isn't too confusing).

The NYI is left in place, as we're not yet implementing any of the
'half' types (or other smaller FP types).
DeltaFile
+126-0clang/test/CIR/CodeGen/long-double-inc-dec.cpp
+2-1clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+128-12 files

LLVM/project 103f821lldb/source/Plugins/SymbolFile/DWARF DWARFASTParserClang.cpp DWARFASTParserClang.h, lldb/source/Plugins/TypeSystem/Clang TypeSystemClang.cpp

[lldb][DWARFASTParserClang] Handle pointer-to-member-data non-type (#189510)

## Reland Notes
Re applying [187598](https://github.com/llvm/llvm-project/pull/187598)

This is a reland of the original commit which was reverted due to a
failure on the Windows buildbot.

Root cause of the Windows failure:
* The fix introduces TemplateArgument::Declaration (pointing to a
FieldDecl)
* GetValueParamType() in TypeSystemClang.cpp did not handle this kind,
so CreateTemplateParameterList() created a
TemplateTypeParmDecl instead of a NonTypeTemplateParmDecl for the
corresponding template parameter.
* On Windows, the Microsoft name mangler calls
cast<NonTypeTemplateParmDecl>(Parm) when mangling member data pointer
NTTPs, which crashed because Parm was a TemplateTypeParmDecl.
* The Itanium mangler (Linux/Mac) does not inspect the parameter

    [58 lines not shown]
DeltaFile
+94-11lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+16-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/main.cpp
+14-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/TestCppNonTypeTemplateParamPtrToMember.py
+7-0lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h
+3-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/Makefile
+2-0lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+136-116 files

LLVM/project 1b2c4d7llvm/tools/llvm-profgen ProfiledBinary.cpp ProfiledBinary.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+11-0llvm/tools/llvm-profgen/ProfiledBinary.cpp
+9-0llvm/tools/llvm-profgen/ProfiledBinary.h
+20-02 files

LLVM/project 2e29531llvm/lib/Transforms/Vectorize VPlan.cpp

[VPlan] Properly preserve IsMaterialized in VPlan::duplicate (NFC). (#190849)

Make sure IsMaterialized is preserved in VPlan::duplicate for
VPSymbolicValues. This is currently NFC.

Split off from approved
https://github.com/llvm/llvm-project/pull/156262.
DeltaFile
+20-8llvm/lib/Transforms/Vectorize/VPlan.cpp
+20-81 files

LLVM/project 9b16edbflang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Semantics resolve-directives.cpp

[Flang][OpenMP] Fix Common Blocks use in update to/from and target maps causing compiler errors (#187221)

This patch attempts to fix a compiler ICE when common blocks are used in
target update to/from, it seems to stem from the fact that we do not
resolve the symbols in the relevant clauses, so when we later process
the maps we don't have the right symbol that references the common block
that was setup and bound by the fortran lowering. Resolving the names
seems to do the trick.

There is a second issue where when referencing a common block with an
array contained in it and utilising the array within the target region,
we'll currently not accurately map over the bounds and cause a FIR/MLIR
verification error. The fix for this is to simply move the common block
member re-binding/re-materialization for the target region to before the
bounds data re-materialization we do during target region generation.
DeltaFile
+41-0flang/lib/Semantics/resolve-directives.cpp
+24-10flang/lib/Lower/OpenMP/OpenMP.cpp
+30-0flang/test/Lower/OpenMP/common-block-target-update.f90
+95-103 files

LLVM/project 0ac8fedllvm/lib/Target/RISCV RISCVISelDAGToDAG.cpp, llvm/test/CodeGen/RISCV xcvmem.ll

[RISCV] Use signed target constants for XCVmem post-inc loads (#189276)

First time opening a PR against LLVM, so please let me know if anything
is missing / wrong.

This fixes an assertion in RISC-V DAG isel for CORE-V xcvmem
post-increment loads with negative immediate offsets.

`RISCVDAGToDAGISel::Select` recognizes `xcvmem POST_INC` loads and
checks whether the offset fits the signed 12-bit immediate form used by
`cv.lb/cv.lbu/cv.lh/cv.lhu/cv.lw ... , (rs1), imm12`. That path was
extracting the offset with `getSExtValue()`, but then rebuilding it with
`getTargetConstant(...)`, which takes the unsigned constant path.

For negative offsets, that could trip the APInt assertion:
```
Assertion failed: (llvm::isUIntN(BitWidth, val) && "Value is not an N-bit unsigned value")
```


    [34 lines not shown]
DeltaFile
+15-0llvm/test/CodeGen/RISCV/xcvmem.ll
+2-2llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+17-22 files

LLVM/project fc01c11llvm/lib/Transforms/Vectorize VPlanUtils.cpp

[NFCI] Check for non-null before dereferencing a VPBB ptr (#190403)

A VPBB variable is possibly null (defined via a ternary), but is
subsequently dereferenced without a check included. This patch adds a
check for it to avoid any possibly null dereference. This was found via
static analysis, there is not a known case right now where this issue is
hit.
DeltaFile
+12-11llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+12-111 files

LLVM/project ba01e8dflang/lib/Lower/OpenMP ClauseProcessor.cpp ClauseProcessor.h, flang/test/Lower/OpenMP declare-mapper.f90 target-motion-skip-implicit-mapper.f90

[Flang][OpenMP] Allow user-defined default mappers to bypass the implicit mapper fence (#189136)

Currently we wall out implicit declare mappers from being applied to
enter/exit/update (which we'll need to address in future PRs, as this
likely should work to some extent for allocatable member mapping). A
side effect of this is that it's causing user-defined default declare
mappers to not apply in scenarios when they should.

I believe these user-defined default declare mappers should apply in all
cases where that type is mapped and no other mapper has been explicitly
specified, as they replace the original default mapping behaviour from
my admittedly shoddy specification reading skills.

The user defined default mappers should "implicitly" apply because:
1. No explicit mapper modifier is specified
2. The fallback behavior should be "as if the modifier was specified
with the default mapper-identifier" (Section 5.9)
3. The user-defined default mapper "overrides the predefined default
mapper for the given type" (Section 5.8.2)

    [10 lines not shown]
DeltaFile
+78-63flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+129-0flang/test/Lower/OpenMP/declare-mapper.f90
+0-30flang/test/Lower/OpenMP/target-motion-skip-implicit-mapper.f90
+2-2flang/lib/Lower/OpenMP/ClauseProcessor.h
+209-954 files

LLVM/project 2b9944bllvm/lib/CodeGen MachineCopyPropagation.cpp, llvm/test/CodeGen/X86 machine-copy-prop.mir

[MCP] Never eliminate frame-setup/destroy instructions

Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.

The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.

Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
DeltaFile
+22-0llvm/test/CodeGen/X86/machine-copy-prop.mir
+11-5llvm/lib/CodeGen/MachineCopyPropagation.cpp
+33-52 files

LLVM/project 4b31f1ellvm/lib/Target/AArch64 AArch64A57FPLoadBalancing.cpp AArch64.h

[NewPM] Port for AArch64A57FPLoadBalancing (#190652)
DeltaFile
+66-46llvm/lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp
+9-2llvm/lib/Target/AArch64/AArch64.h
+2-2llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+1-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+78-504 files

LLVM/project bafb2cbclang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaOpenMP.cpp

[clang][OpenMP] declare_target/local clause variable can't be in map clause (#190470)

In OpenMP 6.0, the 'local' clause was added to the declare_target
directive. Variables listed in the 'local' clause are considered to be
device-local. In addition, a new map clause restriction was added:
A device-local variable must not appear as a list item in a map clause.
See OpenMP 6.0 specification section 7.9.6, map Clause, Restrictions, p.
386.

Testing:
- New error messages test for device-local variables defined in
declare_target local clauses (device-local) used in map clauses.
  - ninja check-openmp
DeltaFile
+70-0clang/test/OpenMP/declare_target_local_map_messages.cpp
+15-0clang/lib/Sema/SemaOpenMP.cpp
+2-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+87-03 files

LLVM/project f069b82llvm/lib/Target/AArch64/GISel AArch64PostLegalizerCombiner.cpp

[NFC] Drop AArch64PostLegalizerCombiner dep on TargetPassConfig (#190569)

This will enable NewPM porting.

Replaced with the definition in

[AArch64PassConfig::getCSEConfig](https://github.com/llvm/llvm-project/blob/1d549d9a777a6faef6d425cb6482ab1fa6b91bb7/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L614)
DeltaFile
+2-5llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+2-51 files

LLVM/project 9b6226fclang/test/CodeGen/AArch64/neon intrinsics.c, clang/test/Headers __clang_hip_math.hip

Merge remote-tracking branch 'origin/main' into users/ziqingluo/PR-172429193

 Conflicts:
        clang/include/clang/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.h
        clang/lib/ScalableStaticAnalysisFramework/Analyses/CMakeLists.txt
        clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp
        clang/test/Analysis/Scalable/UnsafeBufferUsage/tu-summary-serialization.test
        clang/test/Analysis/Scalable/ssaf-format/list.test
        clang/unittests/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageTest.cpp
DeltaFile
+3,666-5,073llvm/test/CodeGen/RISCV/rvv/expandload.ll
+1,318-117llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll
+736-647clang/test/Headers/__clang_hip_math.hip
+835-387llvm/test/CodeGen/AMDGPU/fcanonicalize.bf16.ll
+465-665llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
+1,005-45clang/test/CodeGen/AArch64/neon/intrinsics.c
+8,025-6,9341,655 files not shown
+56,634-26,3431,661 files

LLVM/project 4ef126dllvm/lib/CodeGen MachineCopyPropagation.cpp

[MCP][NFC] Opinionated refactoring

There are a few minor inconsistencies across the pass which I found mildly
distracting:

* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
  is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
  the `optional`.

Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.

Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.


    [11 lines not shown]
DeltaFile
+163-194llvm/lib/CodeGen/MachineCopyPropagation.cpp
+163-1941 files

LLVM/project f68868dllvm/lib/IR Value.cpp

Revert "[IR] Use iteration limit in stripPointerCastsAndOffsets" (#190839)

Reverts llvm/llvm-project#190472

Causes crashes:
https://github.com/llvm/llvm-project/pull/190472#issuecomment-4201843466
DeltaFile
+7-12llvm/lib/IR/Value.cpp
+7-121 files

LLVM/project 88af280clang/include/clang/Basic HLSLIntrinsics.td, clang/lib/Headers/hlsl hlsl_intrinsics.h hlsl_intrinsic_helpers.h

[HLSL] Rewrite inline HLSL intrinsics into TableGen (#188362)

Partially addresses https://github.com/llvm/llvm-project/issues/188345.
This PR rewrites all applicable inline HLSL intrinsics from
`hlsl_intrinsics.h` into TableGen.

The unsigned `abs` from `hlsl_alias_intrinsics.h` is also rewritten into
TableGen since it can also be defined inline.

The `NonUniformResourceIndex` is moved from `hlsl_intrinsics.h` over to
`hlsl_alias_intrinsics.h` since it can be defined as an alias.

`__detail::.*_impl` helper functions that were one liners have been
removed, and their corresponding HLSL intrinsics have been defined in
TableGen using the `Body` field instead.

Note that rewriting `refract` in TableGen instead of templates
introduces some significant changes to error messages and also
introduces a new offload test suite failure in the fp16 test because a

    [10 lines not shown]
DeltaFile
+0-591clang/lib/Headers/hlsl/hlsl_intrinsics.h
+325-4clang/include/clang/Basic/HLSLIntrinsics.td
+25-33clang/test/SemaHLSL/BuiltIns/refract-errors.hlsl
+10-45clang/lib/Headers/hlsl/hlsl_intrinsic_helpers.h
+13-38clang/test/SemaHLSL/BuiltIns/length-errors.hlsl
+24-24clang/test/CodeGenHLSL/builtins/ldexp.hlsl
+397-73512 files not shown
+487-93718 files

LLVM/project 75b35dfllvm/lib/CodeGen MachineCopyPropagation.cpp

[MCP][NFC] Cleanup and prepare to preserve frame-setup/destroy

This mixes renames, removing redundant code, avoiding
`else`-after-`return`, etc. with factoring out the `isNeverRedundant`
concept.

Change-Id: I43a62a9415019cdd63c68fd3b915ebb7505d317a
DeltaFile
+71-62llvm/lib/CodeGen/MachineCopyPropagation.cpp
+71-621 files

LLVM/project b3c093dllvm/test/tools/llvm-mca/RISCV/Inputs/rvv mask.s, llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv mask.test

[RISCV][MCA] Do not use mask instructions that can potentially be optimized by uArch (#190820)

Context:
https://github.com/llvm/llvm-project/pull/189785#discussion_r3019282209

Some mask instructions have a form that can potentially be optimized by
HW implementation: `vmxor.mm vd, vs, vs` and `vmclr vd, vs`, for
instance. This patch avoids using such instructions in MCA tests.
DeltaFile
+176-176llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/mask.test
+176-176llvm/test/tools/llvm-mca/RISCV/SiFiveP600/rvv/mask.test
+176-176llvm/test/tools/llvm-mca/RISCV/SiFiveX100/rvv/mask.test
+176-176llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv/mask.test
+88-88llvm/test/tools/llvm-mca/RISCV/Inputs/rvv/mask.s
+792-7925 files

LLVM/project 3c11ae6lldb/tools/driver lldb-mte-entitlements.plist

[lldb] Fixup MTE entitlement spelling
DeltaFile
+2-2lldb/tools/driver/lldb-mte-entitlements.plist
+2-21 files

LLVM/project 05f9c66llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 branch-on-bool.ll

[AArch64] Normalize (bool CC 1) to (bool NewCC 0) in LowerBR_CC (#189380)
DeltaFile
+202-0llvm/test/CodeGen/AArch64/branch-on-bool.ll
+42-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+244-12 files

LLVM/project 12245c2llvm/lib/IR Value.cpp

Revert "[IR] Use iteration limit in stripPointerCastsAndOffsets (#190472)"

This reverts commit b5e7dbb30ace6c9f7b7920462e209bb08e7ffa56.
DeltaFile
+7-12llvm/lib/IR/Value.cpp
+7-121 files

LLVM/project c7c9025bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp CMakeLists.txt, bolt/unittests/Core MCPlusBuilder.cpp

[BOLT][AArch64] Optimize the mov-imm-to-reg operation (#189304)

On AArch64, logical immediate instructions are used to encode some
special immediate values. And even at `-O0` level, the AArch64 backend
would not choose to generate 4 instructions (movz, movk, movk, movk) for
moving such a special value to a 64-bit regiter.

For example, to move the 64-bit value `0x0001000100010001` to `x0`, the
AArch64 backend would not choose a 4-instruction-sequence like
```
movz x0, 0x0001
movk x0, 0x0001, lsl 16
movk x0, 0x0001, lsl 32
movk x0, 0x0001, lsl 48
```
Actually, the AArch64 backend would choose to generate one instruction
```
mov x0, 0x0001000100010001
```

    [10 lines not shown]
DeltaFile
+97-0bolt/unittests/Core/MCPlusBuilder.cpp
+63-24bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+1-0bolt/lib/Target/AArch64/CMakeLists.txt
+161-243 files

LLVM/project 5baec2cbolt/include/bolt/Profile DataAggregator.h, bolt/lib/Profile DataAggregator.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+86-3bolt/lib/Profile/DataAggregator.cpp
+6-0bolt/include/bolt/Profile/DataAggregator.h
+92-32 files