[CodeGen] Expand power-of-2 div/rem at IR level in ExpandIRInsts.
Previously, power-of-2 div/rem operations wider than
MaxLegalDivRemBitWidth were excluded from IR expansion and left for
backend peephole optimizations. Some backends can fail to process such
instructions in case we switch off DAGCombiner.
Now ExpandIRInsts expands them into shift/mask sequences:
- udiv X, 2^C -> lshr X, C
- urem X, 2^C -> and X, (2^C - 1)
- sdiv X, 2^C -> bias adjustment + ashr X, C
- srem X, 2^C -> X - (((X + Bias) >> C) << C)
Special cases handled:
- Division/remainder by 1 or -1 (identity, negation, or zero)
- Exact division (sdiv exact skips bias, produces ashr exact)
- Negative power-of-2 divisors (result is negated)
- INT_MIN divisor (correct via countr_zero on bit pattern)
[SPIRV] Add handling for `uinc_wrap` and `udec_wrap` atomics (#179114)
This adds atomicrmw `uinc_wrap` and `udec_wrap` operations support for
SPIR-V. Since SPIR-V doesn't provide dedicated instructions for those
two operations, we have to use the `AtomicExpand` pass to expand the
operations into CAS forms.
Closes #177204.
workflows/release-task: Use less privileged token for uploading release notes (#180299) (#180650)
We were using one token for both pushing to the llvmbot fork and for
creating a pull request against the www-releases repository, since the
fork and the repository have different owners, we were using a classic
access token which has very coarse-grained permissions. By using two
separate tokens, we limit the permissions to just what we need to do the
task.
This is a re-commit of b6ee085068972a41f3b2735a9f7e3ca48eab0f00 minus
the environment changes which were causing the workflow to fail.
[lldb] Get shared cache path from inferior, open (#180323)
Get the shared cache filepath and uuid that the inferior process is
using from debugserver, try to open that shared cache on the lldb host
mac and if the UUID matches, index all of the binaries in that shared
cache. When looking for binaries loaded in the process, get them from
the already-indexed shared cache.
Every time a binary is loaded, PlatformMacOSX may query the shared cache
filepath and uuid from the Process, and pass that to
HostInfo::GetSharedCacheImageInfo() if available (else fall back to the
old HostInfo::GetSharedCacheImageInfo method which only looks at lldb's
own shared cache), to get the file being requested.
ProcessGDBRemote caches the shared cache filepath and uuid from the
inferior, once it has a non-zero UUID. I added a lock for this ivar
specifically, so I don't have 20 threads all asking for the shared cache
information from debugserver and updating the cached answer. If we never
get back a non-zero UUID shared cache reply, we will re-query at every
[20 lines not shown]
[clang][driver] Correcting arguments when using `libFuzzer` with `-shared-libsan` (#164842)
This PR contains two commits:
- Add required dependencies when using `-shared-libsan` and fuzzer.
Since libFuzzer is a static library we need to make sure that we add its
dependencies when building with `-shared-libsan`. E.g libFuzzer uses
`ceilf()` from `libm.so` when building on Gnu toolchain.
Previously, the resulting command did not contain the required link
libraries, giving build failures
(only a static sanitizer runtime would trigger the call to
`linkSanitizerRuntimeDeps`).
- Correcting dependency order when using fuzzer.
When building using `-shared-libsan` the sanitizer library needs to be
first in link order.
Since the fuzzer requires `-lstdc++` we have to make sure that the
sanitizer library is added before `-lstdc++`.
---------
Signed-off-by: Björn Svensson <bjorn.a.svensson at est.tech>
[SPIR-V] Emit ceil(Bitwidth / 32) words during OpConstant creation (#180218)
Fixes error of handing constant integers with width in (64; 128) range.
Found during review of
https://github.com/llvm/llvm-project/pull/180182
[win][aarch64] The Windows Control Flow Guard Check function also preserves X15 (#179738)
The target function to be checked by the Control Flow Guard Check
function is stored in `X15` on AArch64. This register is guaranteed to
be preserved by that function (on success), thus after it returns `X15`
can be used to branch to the target function instead of having to load
it from another register or the stack.
[LV] Add FindLast tests where IV-based expression could be sunk. (NFC)
Add set of FindLast tests where the selected expression is based on an
IV and could be sunk.
[Bazel] NFC refactor out redundant is_x86_64_non_windows config (#180296)
The logic of `is_x86_64_non_windows` looks unnecessarily complicated and
is only used at one site... clean up the unused targets and refactor
x86_64 BLAKE3 asm sources into a separate filegroup. And then
`is_x86_64_non_windows` can be put inside a default condition.
[bazel] Fix multicall tool invocation disambiguation (#180607)
The code seems to have considered the potential problem but did not
quite succeed in solving it ;)
[libc] Disable Clinger fast path for baremetal (#180598)
Clinger fast path bloats baremetal targets which are constrained in
binary size. Disabling it for baremetal libc builds.
[LV] Add additional tests for reductions with intermediate stores. (NFC)
Adds missing test coverage for reductions with intermediate stores,
including partial reductions with intermediate stores, as well as
chained min/max reductions with intermediate stores.
[MLIR][Utils] Fix overflow in constantTripCount for narrow types (#179985)
Extend operands when computing ub - lb to avoid overflow in signed
arithmetic. E.g., i8: ub=127, lb=-128 yields 255, which overflows
without extension.
[RISCV] Combine shuffle of shuffles to a single shuffle (#178095)
Compressing to a single shuffle doesn't remove any information and the backend can better apply specific optimizations to a single shuffle.
Addresses #176218.
---------
Co-authored-by: Luke Lau <luke_lau at igalia.com>
[clang][modules] Add single-module-parse-mode callback (#179714)
This PR adds new preprocessor callback that's invoked whenever the
single-module-parse-mode skips over a module import. This will be used
later on from the dependency scanner.
[lld][WebAssembly] Add new __rodata_start/__rodata_end symbols (#172102)
This is similar to etext/_etext in the ELF linker. Its useful in
emscripten to know where the RO data data ends and the data begins (even
though the Wasm format itself has no concept of RO data).
See
https://github.com/emscripten-core/emscripten/discussions/25939#discussioncomment-15243731
[SPIRV] Implement lowering for HLSL Texture2D sampling intrinsics (#179312)
This patch implements the SPIR-V lowering for the following HLSL
intrinsics:
- SampleBias
- SampleGrad
- SampleLevel
- SampleCmp
- SampleCmpLevelZero
It defines the required LLVM intrinsics in 'IntrinsicsDirectX.td' and
'IntrinsicsSPIRV.td'.
It updates 'SPIRVInstructionSelector.cpp' to handle the new intrinsics
and
generates the correct 'OpImageSample*' instructions with the required
operands
(Bias, Grad, Lod, ConstOffset, MinLod, etc.).
[3 lines not shown]
[Github] Add runs-on to release-tasks.yml
This was failing validation against main and sending everyone emails.
Try adding the fix that was suggested in the workflow run.