[AArch64] Preserve SDNode flags when lowering fixed vectors to scalable operations (#204616)
Preserve the original SDNode flags when LowerToScalableOp rebuilds fixed-length vector operations using their scalable container type. This allows combines to use flag information generated before the scalable was created.
[lldb][Linux][AArch64] Get NT_ARM_ constants from llvm's ELF header (#205834)
The first thing I do for any new register set is add it to the llvm
header. So we should just use those values instead of having all these
macros to handle older kernels.
[libc++] Remove __broadcast simd function (#205559)
The simd vector type in Clang already provides a conversion operator
that acts as a broadcast constructor. We can use that instead.
[flang-rt][CMake] Avoid 'use, intrinsic ::' (#205634)
Two build failures reported after #204260
* Unix Makefiles generator stops working: The cause is that the rules
for building each OBJECT library lands in its own Makefile, e.g.
`flang_rt.mod.fortran.builtins.dir/build.make` and
`libomp-mod.dir/build.make`. Trying to inject dependencies directly for
build rules in the other file does not work.
* `__ppc_types.f90` not tracked: Forgotten in #204260 due to being only
conditionally enabled for PowerPC targets.
The solution for both is to just remove the workaround for CMake not
recognizing modules uses declared using `intrinsic` which caused these
problems. This PR promotes the `use` constructs in the module sources to
normal dependencies that are not ignored by CMake.
The `intrinsic` modifier changes the search path to only look for such a
[35 lines not shown]
[CIR][OpenMP] Initial implementation of target region support (#195452)
This patch adds support for target regions with some basic support for map
clauses. It also changes the clause handling to make use of the OMP dialect
ClauseOps to simplify op constrution.
Assisted-by: Cursor / claude-4.6-opus-high
[libsycl] add operators to sycl::range and sycl::id (#203572)
This PR was assisted by GH Copilot (tests extension).
Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
[clang][opencl][sycl] Deprecate opencl_global_device and opencl_global_host (#203569)
These attributes were originally introduced as part of the SYCL
upstreaming effort to enable improved performance for USM pointers on
FPGA targets. However, subsequent evaluation indicates that they are not
meaningfully used in practice. Additionally given the current shift in
focus away from FPGAs in DPC++, these attributes no longer serve an
active purpose. Their removal would simplify the codebase and reduce
ongoing maintenance burden.
RFC:
https://discourse.llvm.org/t/rfc-remove-opencl-global-device-and-opencl-global-host-address-space-attributes/90677
[RISCV][P-ext] Select signed widening add/sub accumulate to wadda/wsuba (#205475)
WADDA is rd += sext(rs1) + sext(rs2) and WSUBA is rd += sext(rs1) - sext(rs2),
the signed counterparts of WADDAU/WSUBAU added in #181396.
Add the WADDA/WSUBA SelectionDAG nodes, fold ADDD/SUBD whose addend is a
sign-extended i32 (high half == sra(lo, 31)) into them, collapse chained
accumulates into the free source slot, and select them to the wadda/wsuba
instructions.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[AMDGPU][NFC] Roundtrip gfx11_asm_vop3_from_vop2.s
Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting
downstream.
Catches a problem with printing op_sel for the tied operands in
v_fmac_f16_e64.
[libc++] Add missing attribute usages to `<__memory/shared_ptr.h>` (#205776)
Since 44546e0e32077241ca9a9a90ac57f2f086f9488a, lack of
`_LIBCPP_NODEBUG` and `_LIBCPP_HIDE_FROM_ABI` are caught by clang-tidy.
This patch adds them wherever expected.
[libc][stdlib] Add unsetenv (#202422)
Added the POSIX unsetenv() function and its internal support.
Implemented EnvironmentManager::unset() to remove a variable by name,
free the string if allocated, and compact the array.
Updated EnvironmentManager to synchronize the public global environ
pointer when transitioning to managed storage.
Registered for x86_64, aarch64, and riscv. Integration tests cover basic
operations and edge cases.
Assisted-by: Automated tooling, human reviewed.
[Bazel] Fixes 5314be5 (#205818)
This fixes 5314be5a740c9985b0b3ab958269b5f1824cce02.
Signed-off-by: Ingo Müller <ingomueller at google.com>
Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
[AArch64] Correct A510 scheduling information for LDn instructions (#205518)
The latency and throughput for these instructions don't match what's in
the A510 Software Optimization Guide, so adjust them so that they do
match. Also rearrange the definitions to match how they're structured in
the optimization guide and rename things in a similar manner to how the
C1 CPUs do things, as it's much clearer.
[mlir][vector] add consistent stride verification to `masked load/store` and `gather/scatter` ops (#204842)
Extend negative stride checks to MaskedLoadOp, MaskedStoreOp, GatherOp,
and ScatterOp to match LoadOp and StoreOp behavior.
Depends on: #204611.
AI Disclaimer: I used AI for the tests.
---------
Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Revert "Reapply "[InstCombine] Merge consecutive assumes", round 2" (#205805)
It looks like there is still a bug with removing assumes from the
assumption cache.
Reverts llvm/llvm-project#205773
[clang][bytecode] Fix `evaluateDestruction()` (#205778)
Me previous testing regarding this seems to have been insufficient. Or
this regressed some time along the way.
Now that `CLANG_USE_EXPERIMENTAL_CONST_INTERP` is used for testing I
noticed a few regressions.
We need to special-case the evaluating decl in a few places, since it's
a global variable that we're allowed to modify.
[libc] Add libgen.h to target public headers (#205804)
Ensure libgen.h is included in TARGET_PUBLIC_HEADERS for Linux targets
so that it gets generated and installed.
Assisted-by: Automated tooling, human reviewed.
[Offload][OpenMP][Flang] Update no-loop test (#205803)
Updates to the kernel type detection logic now allow `target parallel
do` to be promoted to SPMD-No-Loop.
A currently broken offload test that was affected by this change is
updated here.
[clang][dataflow] Move expensive solver asserts under EXPENSIVE_CHECKS (#205715)
The watched-literal solver has a few invariant checks that run on every
solver iteration in assertion builds. Some of these checks rebuild and
iterate over the watched-literal state. This overhead is usually hidden,
but it becomes dominant for large flow-sensitive analyses.
While testing clang-tidy's `unchecked-optional-access` check on real
world projects (in this case, LLVM itself), we found there are a few
extreme slow analyses caused by this overhead.
| Time | File |
|---------|-----------------------------------------------------|
| 8235.7s | llvm-project/clang/utils/TableGen/RISCVVEmitter.cpp |
| 8197.2s | llvm-project/clang/lib/Driver/Multilib.cpp |
(Ran on a machine with Icelake 32cores + 128gb memory)
After moving these asserts to `EXPENSIVE_CHECKS`, the same files
[13 lines not shown]
[NaryReassociate] Fix divide by zero crash in NaryReassociatePass (#202377)
Updates NaryReassociatePass with a safety check to guard against GEPs
into arrays with zero sized element types (eg. [0 x ptr]) to prevent
division by zero.
[analyzer] Fix unjustified early return in processCallExit (#205656)
In `ExprEngine::processCallExit` step 3 may theoretically split the
state because it calls `removeDead`, which activates `LiveSymbols` and
`DeadSymbols` callbacks of various checkers. (However, in practice it is
likely that these checker callbacks never actually split the state -- at
least, no such state splits happen in the LIT tests.)
The nodes produced by `removeDead` are placed in the set `CleanedNodes`;
in theory the different execution paths should be handled in parallel,
independently of each other. However, the loop `for (ExplodedNode *N :
CleanedNodes)` contained an early return statement, which meant that if
the creation of `CEENode` failed for a node `N`, then the subsequent
iterations were skipped altogether.
This commit replaces the `return` with a `continue` to ensure that the
nodes in `CleanedNodes` are handled independently (if there are several
such nodes).
[6 lines not shown]
GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen merge
In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.