[CGP][AArch64] Do not sink instructions that might read/write memory. (#176182)
The test case's call instruction was being sank past the point where the
memory
it accessed was valid. Add a check that CGP does not try to sink
instruction that
might be invalid to move.
Fixes #176095
[SCCP] Correct range calculation for get.vector.length to use getUnsignedMax instead of getUpper. (#176493)
getUpper returns 1 more than the maxium value included in the range.
This may be 0. We should not use this in a umin. Instead we should
get the maximum value included in the range and use that for the umin.
Then convert that to Upper for the new range by adding 1.
The test was manually reduced from a downstream failure, but I couldn't
get it behave exactly the same way without more instructions. It should
be enough to show an incorrect range being calculated.
Fixes #176471
[mlir][memref] Refactor `ViewOpShapeFolder` (#176567)
This PR makes the following changes to ViewOpShapeFolder:
- Add comments for `ViewOpShapeFolder`.
- Drop the redundant offset check.
- Simplify the implementation by introducing
`foldDynamicToStaticDimSizes`.
- Add missing test coverage.
[llvm][clang] Remove `llvm::OwningArrayRef` (#169126)
`OwningArrayRef` has several problems.
The naming is strange: `ArrayRef` is specifically a non-owning view, so
the name means "owning non-owning view".
It has a const-correctness bug that is inherent to the interface.
`OwningArrayRef<T>` publicly derives from `MutableArrayRef<T>`. This
means that the following code compiles:
```c++
void const_incorrect(llvm::OwningArrayRef<int> const a) {
a[0] = 5;
}
```
It's surprising for a non-reference type to allow modification of its
elements even when it's declared `const`. However, the problems from
[55 lines not shown]
[ControlFlowHub] Fix duplicate DomTree updates when branch successors are identical
When a conditional branch has both successors pointing to the same block (e.g., `br i1 %cond, label %bb, label %bb`), `ControlFlowHub::finalize` generates duplicate `Delete` updates for the same CFG edge. This can cause assertion in `fix-irreducible` pass.
Fixes #176553.
[NFC][clang-tidy] Update documentation for StatusOr check. (#176498)
Specifically:
1. Avoid the "or" suffix for variable names per
[abseil.io/tips/181](https://abseil.io/tips/181)
2. Replace DCHECK with CHECK which works in non-debug mode
3. Suggest init-capture in workaround for lambda captures
4. Reduce one line length to satisfy `doc8`
[ControlFlowHub] Fix duplicate DomTree updates when branch successors are identical
When a conditional branch has both successors pointing to the same block (e.g., `br i1 %cond, label %bb, label %bb`), `ControlFlowHub::finalize` generates duplicate `Delete` updates for the same CFG edge. This can cause assertion in `fix-irreducible` pass.
Fixes #176553.
[ControlFlowHub] Fix duplicate DomTree updates when branch successors are identical
When a conditional branch has both successors pointing to the same block (e.g., `br i1 %cond, label %bb, label %bb`), `ControlFlowHub::finalize` generates duplicate `Delete` updates for the same CFG edge. This can cause assertion in `fix-irreducible` pass.
Fixes #176553.
workflows/release-binaries: Run this job once a week to catch regressions (#176008)
This will increase the chances that we can have this job working for the
first release candidate.
[RFC][Clang][AMDGPU] Emit only delta target-features to reduce IR bloat
Currently, AMDGPU functions have `target-features` attribute populated with all default features for the target GPU. This is redundant because the backend can derive these defaults from the `target-cpu` attribute via `AMDGPUTargetMachine::getFeatureString()`.
In this PR, for AMDGPU targets only:
- Functions without explicit target attributes no longer emit `target-features`
- Functions with `__attribute__((target(...)))` or `-target-feature` emit only features that differ from the target's defaults (delta)
The backend already handles missing `target-features` correctly by falling back to the TargetMachine's defaults.
A new cc1 flag `-famdgpu-emit-full-target-features` is added to emit full features when needed.
Example:
Before:
```llvm
attributes #0 = { "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,..." }
[13 lines not shown]
workflows/release-binaries: Fix attestation artifact name (#176417)
We were contructing the attestation artifact name using the arch and the
OS of the current runner instead of using the runner that the builds
were done on. This led to a conflict in artifact names between all the
release binary jobs.
[RFC][Clang][AMDGPU] Emit only delta target-features to reduce IR bloat
Currently, AMDGPU functions have `target-features` attribute populated with all default features for the target GPU. This is redundant because the backend can derive these defaults from the `target-cpu` attribute via `AMDGPUTargetMachine::getFeatureString()`.
In this PR, for AMDGPU targets only:
- Functions without explicit target attributes no longer emit `target-features`
- Functions with `__attribute__((target(...)))` or `-target-feature` emit only features that differ from the target's defaults (delta)
The backend already handles missing `target-features` correctly by falling back to the TargetMachine's defaults.
A new cc1 flag `-famdgpu-emit-full-target-features` is added to emit full features when needed.
Example:
Before:
```llvm
attributes #0 = { "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,..." }
[13 lines not shown]
[libc] Remove ballot on slab find (#176606)
Summary:
This negatively impacts performance, while the other changes in the
initial PR slightly improved it. This was originally done to make Volta
independent thread scheduling work, but that doesn't seem to work
correctly all the time either so we should make this faster.
[Clang][AMDGPU] Handle `wavefrontsize32` and `wavefrontsize64` features more robustly
We should also not allow `-wavefrontsize32` and `-wavefrontsize64` to be specified at the same time.