LLVM/project 0b8239dbolt/docs BinaryAnalysis.md

Fix wording in the description of auth oracles
DeltaFile
+12-5bolt/docs/BinaryAnalysis.md
+12-51 files

LLVM/project 15cd9f7llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[DAG] expandIntMINMAX - use getOppositeSignednessMinMaxOpcode helper to flip min/max signedness. NFC. (#177450)

DeltaFile
+1-17llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+1-171 files

LLVM/project be96289mlir/include/mlir/Interfaces ControlFlowInterfaces.h, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][Interfaces] Add generic pattern for region inlining (#176641)

Add a new canonicalization pattern that inlines the body of acyclic
`RegionBranchOpInterface` ops. This pattern is a generalization and
replacement for the following existing patterns:

* `SingleBlockExecuteInliner`: inlines `scf.execute_region` ops with a
single block.
* `SimplifyTrivialLoops`: inlines / folds away `scf.for` ops with 0 or 1
iterations.
* `RemoveStaticCondition`: inlines `scf.if` ops with a static condition.
* `FoldConstantCase`: inlines `scf.index_switch` ops with a constant
operand.

Additionally, this new pattern is also enabled for `scf.while` ops.
Loops with `scf.condition(%false)` are now also inlined. (New test case
added.)

The new pattern looks for region branch ops with a single acyclic path

    [3 lines not shown]
DeltaFile
+233-0mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+26-145mlir/lib/Dialect/SCF/IR/SCF.cpp
+39-0mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
+20-0mlir/test/Dialect/SCF/canonicalize.mlir
+4-2mlir/test/Dialect/SCF/one-shot-bufferize.mlir
+4-2mlir/test/Dialect/Arith/int-range-interface.mlir
+326-1496 files

LLVM/project 0aa4082llvm/lib/Target/RISCV RISCVInstrInfoY.td, llvm/test/MC/RISCV/rvy rvy-valid-mode-independent.s

inline format templates that are used only once and fix packy test

Created using spr 1.3.8-beta.1
DeltaFile
+19-27llvm/lib/Target/RISCV/RISCVInstrInfoY.td
+2-2llvm/test/MC/RISCV/rvy/rvy-valid-mode-independent.s
+21-292 files

LLVM/project 265d093llvm/lib/Target/NVPTX NVPTXAssignValidGlobalNames.cpp, llvm/test/CodeGen/NVPTX extern-shared-valid-name.ll

[NVPTX] fix illegal name for .extern .shared global variables (#173018)

In ptx we can create a GV in AS(3) that will be compiled to a `.extern
.shared` in ptx. Since the `.extern .shared` is not an "extern" in the
traditional sense of the word it will not be linked based on name but
rather refer to the shared memory allocated at kernel launch.

Since we don't care about the name it's tempting to make the GV unnamed.
Then the problem that the `nameUnamedGlobals` will use a name for the
global that is invalid ptx occurs. For non-extern globals, this is later
fixed by running the `NVPTXAssignValidGlobalNames` pass. However, It
makes sure to not touch externs as changing the name of "traditional
externs" will cause linking issues down the road.

This MR treats `.extern .shared` in the same manner as non-extern
globals during `NVPTXAssignValidGlobalNames` to fix the invalid names
given by `nameUnamedGlobals`.

Co-authored-by: Kjetil Kjeka <kjetil at muybridge.com>
DeltaFile
+7-2llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
+6-0llvm/test/CodeGen/NVPTX/extern-shared-valid-name.ll
+13-22 files

LLVM/project 153f3cfllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

use bits instead of hex

Created using spr 1.3.8-beta.1
DeltaFile
+47,697-51,378llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+14,474-16,242llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+16,328-12,881llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+13,036-14,705llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+11,668-13,311llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+10,558-11,908llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+113,761-120,425994 files not shown
+261,448-237,4341,000 files

LLVM/project c26571dllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+47,697-51,378llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+14,474-16,242llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+16,328-12,881llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+13,036-14,705llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+11,668-13,311llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+10,558-11,908llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+113,761-120,425993 files not shown
+261,425-237,431999 files

LLVM/project 967aeecllvm/lib/Target/AMDGPU AMDGPUSubtarget.cpp, llvm/test/CodeGen/AMDGPU waves-per-eu-hints-lower-occupancy-target.ll default-flat-work-group-size-overrides-waves-per-eu.ll

[AMDGPU] Allow amdgpu-waves-per-eu to lower target occupancy range (#168358)

`amdgpu-waves-per-eu` currently does not allow users to lower the target
occupancy range that the backend will try to achieve. For example, for a
kernel targeting gfx942 with default flat workgroup sizes and no LDS
usage, `AMDGPUSubtarget::getWavesPerEU` will by default produce the
range [4,10]. Setting `"amdgpu-waves-per-eu=M,N"` (N being optional)
will only have an effect if 4 <= M <= N <= 10, in which case the [M,N]
range will be produced instead. Advanced developers may in some cases
know that a specifc kernel would not benefit from higher occupancies and
wish to communicate that to the backend. It in turns could make better
codegen decisions if it knows that increasing occupancy is not a
priority.

This modifies the computation of the waves/EU range to enable this
behavior. User-provided minimum/maximum number of waves/EU are now able
to change the default waves/EU range almost arbitrarily, with only
subtarget's specifications and the maximum occupancy induced by
workgroup size and LDS usage limiting the target occupancy range. In the

    [7 lines not shown]
DeltaFile
+84-0llvm/test/CodeGen/AMDGPU/waves-per-eu-hints-lower-occupancy-target.ll
+0-61llvm/test/CodeGen/AMDGPU/default-flat-work-group-size-overrides-waves-per-eu.ll
+20-14llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+13-12llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll
+12-0llvm/test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll
+2-2llvm/test/CodeGen/AMDGPU/vgpr-limit-gfx1250.ll
+131-891 files not shown
+134-907 files

LLVM/project 6d8b254llvm/test/CodeGen/X86 pclmulqdq.ll

[X86] Add test coverage for #176879 (#177393)

DeltaFile
+571-0llvm/test/CodeGen/X86/pclmulqdq.ll
+571-01 files

LLVM/project 429afddllvm/test/CodeGen/AMDGPU fneg-combines.f16.ll bf16.ll, llvm/test/CodeGen/NVPTX cmpxchg-sm90.ll cmpxchg-sm70.ll

rebase

Created using spr 1.3.8-beta.1
DeltaFile
+3,002-1,209llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll
+2,975-1,201llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll
+2,975-1,201llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
+2,278-0llvm/test/CodeGen/X86/clmul-vector-256.ll
+372-419llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+247-430llvm/test/CodeGen/AMDGPU/bf16.ll
+11,849-4,460237 files not shown
+17,985-9,085243 files

LLVM/project e89a99ellvm/test/CodeGen/X86 combine-pclmul.ll

[X86] Add crash regression test reported on #176932 (#177414)

is128BitLaneRepeatedShuffleMask can't handle target shuffle masks containing SM_SentinelZero
DeltaFile
+11-0llvm/test/CodeGen/X86/combine-pclmul.ll
+11-01 files

LLVM/project 6934ed5llvm/test/Transforms/Attributor nofpclass.ll, llvm/test/Transforms/GVN merge-nofpclass.ll

IR: Add !nofpclass metadata (#177140)

This adds the analogous metadata to the nofpclass attribute
to assert values are not a certain set of floating-point classes.
This allows the same information to be expressed if a function
argument is passed indirectly. This matches the bitmask encoding
of nofpclass.

I also think this should be allowed for stores to symmetrically handle
sret, but leave that for later.

Alternatively we could add a more expressive !fprange metadata,
but that would be much more complex. It's useful to match the attribute,
and more annotations can always be added.

Fixes #133560
DeltaFile
+91-0llvm/test/Verifier/nofpclass-metadata.ll
+70-0llvm/test/Transforms/InstCombine/loadstore-metadata.ll
+58-0llvm/test/Transforms/Attributor/nofpclass.ll
+45-0llvm/test/Transforms/GVN/PRE/load-metadata.ll
+44-0llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll
+38-0llvm/test/Transforms/GVN/merge-nofpclass.ll
+346-014 files not shown
+556-1520 files

LLVM/project 7719519llvm/test/CodeGen/PowerPC aix-ifunc-alias.ll

add XFAIL'ed alias testcase; to be addressed in a separate PR
DeltaFile
+16-0llvm/test/CodeGen/PowerPC/aix-ifunc-alias.ll
+16-01 files

LLVM/project f4d2970compiler-rt/test/profile/AIX ifunc.c

add PGO test
DeltaFile
+16-0compiler-rt/test/profile/AIX/ifunc.c
+16-01 files

LLVM/project ee621d2bolt/docs BinaryAnalysis.md

Add table of contents
DeltaFile
+13-7bolt/docs/BinaryAnalysis.md
+13-71 files

LLVM/project 346c3e7bolt/docs BinaryAnalysis.md

Move 'Usage' section, add brief algorithm description to the above section
DeltaFile
+56-47bolt/docs/BinaryAnalysis.md
+56-471 files

LLVM/project 869256dllvm/utils/gn/secondary/clang/lib/Analysis BUILD.gn

[gn build] Port 65134634f994
DeltaFile
+1-0llvm/utils/gn/secondary/clang/lib/Analysis/BUILD.gn
+1-01 files

LLVM/project 14a209fllvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize vplan-printing-reductions.ll

[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672)

Replace ComputeFindIVResult with ComputeReductionResult + explicit
compare + select, to more explicitly and simpler model computing finding
the first/last induction, which boils down to a min/max reduction +
compare and select of the sentinel value.

PR: https://github.com/llvm/llvm-project/pull/176672
DeltaFile
+70-65llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+2-40llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+22-17llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+0-7llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+3-1llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+1-3llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+98-1333 files not shown
+99-1359 files

LLVM/project 5bc9a4bbolt/docs BinaryAnalysis.md

Move explanation of no-cfg fallback to the 'detailed' section
DeltaFile
+7-12bolt/docs/BinaryAnalysis.md
+7-121 files

LLVM/project e0a1326llvm/lib/Target/SystemZ SystemZISelLowering.cpp SystemZISelLowering.h

[SystemZ] Precommit for moving some functions around. (#177441)

In preparation for #171066 (FP16 vector support).
DeltaFile
+38-27llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+1-8llvm/lib/Target/SystemZ/SystemZISelLowering.h
+39-352 files

LLVM/project e936715llvm/test/CodeGen/NVPTX cmpxchg-sm90.ll cmpxchg-sm60.ll

[NVPTX] Weak cmpxchg unittests for NVPTX (#176078)

DeltaFile
+3,002-1,209llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll
+2,975-1,201llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
+2,975-1,201llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll
+92-79llvm/test/CodeGen/NVPTX/cmpxchg.py
+9,044-3,6904 files

LLVM/project d5545dbllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Mark strict_fp16_to_fp as expand (#177417)

This prevents a regression in a future change.
DeltaFile
+2-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+2-11 files

LLVM/project d056ae4clang/include/clang/Analysis/Analyses/LifetimeSafety Facts.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp Origins.cpp

[LifetimeSafety] Detect dangling fields
DeltaFile
+151-0clang/test/Sema/warn-lifetime-safety-dangling-field.cpp
+51-17clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+48-4clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+0-28clang/test/Analysis/lifetime-cfg-output.cpp
+17-8clang/lib/Analysis/LifetimeSafety/Origins.cpp
+15-8clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
+282-6514 files not shown
+381-9320 files

LLVM/project 45f2ed9bolt/docs BinaryAnalysis.md

Link: incomplete help message is a known issue
DeltaFile
+2-0bolt/docs/BinaryAnalysis.md
+2-01 files

LLVM/project 35563faclang/test/Sema warn-lifetime-analysis-nocfg.cpp

[LifetimeSafety] Add dtor for failing CFG-based tests (#177362)

DeltaFile
+12-6clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
+12-61 files

LLVM/project d10b2b5llvm/lib/IR Globals.cpp, llvm/lib/Linker LinkModules.cpp

[NFCI] replace getValueType with new getGlobalSize query (#177186)

Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
DeltaFile
+5-8llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+5-6llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
+6-4llvm/lib/Linker/LinkModules.cpp
+4-5llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+4-5llvm/lib/Transforms/IPO/GlobalOpt.cpp
+6-0llvm/lib/IR/Globals.cpp
+30-2823 files not shown
+64-6029 files

LLVM/project b6fd5bbllvm/utils profcheck-xfail.txt

[ProfCheck] Add LoopInterchange test to xfail list

LoopInterchange is not currently on in the default pipeline and we have
not done any work around profile propagation within the pass yet, so
disable the test for now to get the profcheck bot back to green.
DeltaFile
+1-0llvm/utils/profcheck-xfail.txt
+1-01 files

LLVM/project ce88e6eclang Maintainers.rst

Move John McCall to the inactive maintainers list (#177406)

While reaching out to folks for a maintainers list refresh, John asked
to step down due to other commitments. Thank you for all your help!
DeltaFile
+1-3clang/Maintainers.rst
+1-31 files

LLVM/project 056e5a3llvm/test/CodeGen/AMDGPU fneg-combines.f16.ll bf16.ll

AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7 (#175795)

Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.

I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
DeltaFile
+372-419llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+247-430llvm/test/CodeGen/AMDGPU/bf16.ll
+116-174llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll
+139-139llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+112-153llvm/test/CodeGen/AMDGPU/select-fabs-fneg-extract.f16.ll
+140-114llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll
+1,126-1,42981 files not shown
+3,579-4,36087 files

LLVM/project d2df453llvm/lib/Target/AMDGPU GCNSubtarget.h SIISelLowering.cpp

[NFCI][AMDGPU] Use `GET_SUBTARGETINFO_MACRO` in `GCNSubtarget.h`
DeltaFile
+9-278llvm/lib/Target/AMDGPU/GCNSubtarget.h
+7-5llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+16-2832 files