[Passes] Remove some optsize checks (#189369)
LibCallsShrinkWrapPass and PGOMemOPSizeOpt already check for optsize
attributes internally, so there is no need to handle this in the pass
pipeline.
The context here is that I'd like to make the pass pipeline completely
independent of Os/Oz so that we know for sure that function-level
optsize/minsize attributes behave identically to the pipeline-level
option.
[AMDGPU][SIFoldOperands] Fix OR -1 fold
In SIFoldOperands, folding `or x, -1` to `v_mov_b32 -1` removed `Src1Idx`,
which is incorrect because `-1` is in `Src0Idx` (after canonicalization).
[Support][APint] Fix APInt::urem for edge case. Use `U.pVal[0] instead getZExtValue() (#189441)
Use `U.pVal[0]` instead of `getZExtValue()` in the
`APInt::urem(uint64_t)` power-of-two fast path. `getZExtValue()`
requires the entire APInt to fit into 64 bits, but this code can be
reached for multi-word values, which may trigger assertions
(`assert(getActiveBits() <= 64 && "Too many bits for uint64_t")`) or
otherwise mis-handle wide integers.
Also add simple test for edge cases.
Improvement for https://github.com/llvm/llvm-project/pull/189245
[ADT] implement countl_zero_constexpr and reuse it for countl_zero & bit_width_constexpr (#189111)
Implement constant evaluated `countl_zero_constexpr` similar to
`countr_zero_constexpr` and use it for `countl_zero` and
`bit_width_constexpr`.
Also, `countl_zero` now use fast intrinsic path for `uint8/uint16` types
(use `sizeof(T) <= 4` instead `sizeof(T) == 4`).
[CodeView] Expose fallible type accessors in TpiStream (#188299)
`LazyRandomTypeCollection` already has fallible functions for
`getType(TypeIndex)` this exposes them in `TpiStream` and does a mini
cleanup in `LazyRandomTypeCollection`'s `GetType`.
Context: #186948 saw a crash in LLDB where we call `GetType` without
checking the type index before calling the method. In `GetType` we
called `error(std::move(EC))`, which ignores the error in release mode.
The cause was the type index `0x80000169` in an `S_LOCAL`.
We now do a soft fail in release mode - we already check the error, so
we might as well return an empty value.
Aside: The type index there feels really unusual, the type indices in
other records around the `S_LOCAL` were in a similar range. Almost looks
like some integer over-/underflow.
clang: Check -Xarch compatibility using Triple parsed architecture.
This will allow recognizing any of the triple aliases for the architecture.
This will avoid test failures when the amdgcn triple top level architecture
is renamed.
[AArch64] Always print the PRFM operation name (#182035)
When the encoding in the "Rt" field of the PRFM instruction maps to a
`<prfop>` value, the name of the prefetch operation should be printed
regardless of whether the associated feature (e.g. FEAT_PRFMSLC)
is available. If the feature is not available, the instruction is a nop.
All other encodings are printed as an immediate.
OpenMP: Match all Triple recognized arch aliases
This liberalizes match(device = {arch(some_arch)} to recognize
other names for some_arch.
Previously this compared against getArchTypeForLLVMName, which
only matches a subset of names (which seems to be the canonical
architecture names). There was a special case hack for "x86_64",
which is one of the "x86-64" aliases accepted by parseArch, but is
not the canonical architecture name.
Triple: Expose parseArch as a public method
Clang has some code which is doing a direct arch name
string compare which should really be recognizing anything
usable as a triple architecture. It makes more sense to
directly parse the architecture than to construct a temporary
triple just to see what the parsed arch is.
For some reason the existing public parsing method is
getArchTypeForLLVMName. I'm not fully sure what the difference between
the 2 is supposed to be. My current guess is getArchTypeForLLVMName is
only supposed to handle the canonical architecture name.
Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794)" (#188851)
Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.
The regression reported in #188536 occurred because
LinearClauseProcessor was rewriting all basic blocks whose names
contained a given substring, including those that were not part of the
translated SIMD region.
This didn't cause problems before because linear variables were always
privatized, which doesn't happen with this change.
The issue is fixed by rewriting only the basic blocks that correspond to
the omp.simd operation.
[openmp] Add support for Arm64X to libomp
This patch allows building libomp.dll and libomp.lib as Arm64X binaries
containing both arm64 and arm64ec code and useable from applications
compiled for both architectures.
[mlir][spirv] Add Cast/Rescale ops in TOSA Ext Inst Set (#189028)
This patch introduces the following operators:
spirv.Tosa.Cast
spirv.Tosa.Rescale
Also dialect and serialization round-trip tests have been added.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[openmp] Add support for arm64ec to libomp (#176151)
This patch adds arm64ec support to libomp.
Note that this support isn't entirely usable on Windows hosts as libomp
requires LLVM_PER_TARGET_RUNTIME_DIR=On for to work correctly when
multiple runtimes are built, which is unsupported on Windows. A
following patch will add arm64x support to the build to rectify this.
[lldb] Remove "flash" and "blocksize" from MemoryRegionInfo constructor (#189636)
These are only set to non-default values after calling a constructor.
Removing them removes noise from many tests that make MemoryRegionInfos.
[llc] Change `TargetMachine` allocation assert to error (#189541)
As we shouldn't assert an allocation (which can fail).
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[DA] Remove ExtraGCD from GCD MIV (NFC) (#172004)
As some code was removed in #169927, `ExtraGCD` in `gcdMIVtest` is no
longer necessary. This patch removes it and also adjust the comments.