[mlir][amdgpu] gfx1250+ lower fat_raw_pointer_cast (#175047)
* numRecords are set to all 1s if out of bounds is not requested.
* set flags correctly to zero.
[CodeGen] Strip Coroutine suffixes when generating pseudo probe (#173834)
CoroSplit pass now creates separate DWARF symbols with the `.resume`,
`.destroy`, `.cleanup` suffixes.
https://github.com/llvm/llvm-project/pull/141889 But pseudo probes are
created in an earlier pass (`SampleProfileProbePass`) before the
CoroSplit, which creates a mismatch of Function GUIDs between the
original function name and the function names with the coroutine
suffixes during the CodeGen when the AsmPrinter iterates through the
`InlinedAt` chain and generates the `InlineStack`.
This will create mismatched pseudo probes in the final binary and
llvm-profgen will also fail when parsing the pseudo probe section. This
fix simply strips the coroutine suffixes from the inline callers' name,
so the CoroSplit changes will be transparent.
[InstCombine] Limit canonicalization of extractelement(cast) to constant index or same basic block (#166227)
The current canonicalization of extractelement(cast) requires that the
CastInst has only one use. However, when that use occurs inside a loop,
it still satisfies this condition, even though the cast is effectively
used multiple times, once per iteration, rather than truly being used
once.
```cpp
} else if (auto *CI = dyn_cast<CastInst>(I)) {
// Canonicalize extractelement(cast) -> cast(extractelement).
// Bitcasts can change the number of vector elements, and they cost
// nothing.
if (CI->hasOneUse() && (CI->getOpcode() != Instruction::BitCast)){
```
Before
```llvm
%34 = fptosi <4 x float> %33 to <4 x i32>
;/loop{
[21 lines not shown]
[clang-tidy] Prefer the faster LLVM ADT sets and maps over `std::` ones (#174357)
The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](https://github.com/llvm/llvm-project/pull/174237#issuecomment-3707395449)
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:
Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s]
[11 lines not shown]
DeveloperPolicy: Add note about legacy bitcode performance (#174720)
Note that bitcode does not attempt to guarantee performance
parity with upgraded bitcode.
[mlir][acc] Add OffloadLiveInValueCanonicalization pass (#174671)
Introduce a pass to canonicalize live-in values for regions that will be
outlined for device execution.
When a region is outlined, values defined outside but used inside become
arguments to the outlined function. However, some values cannot or
should not be passed as arguments:
- Synthetic types (shape metadata, field indices)
- Constants better recreated inside the region
- Address-of operations for device-resident globals
This pass identifies such values and either sinks the defining operation
into the region (when all uses are inside) or clones it inside (when
uses exist both inside and outside).
To identify target regions in a dialect-agnostic way, this patch
introduces `OffloadRegionOpInterface`. This marker interface allows the
pass to work uniformly across OpenACC compute constructs, GPU
[11 lines not shown]
[VPlan] Merge cases inferring type of operand 0 (NFC).
Merge all cases that infer the scalar type of operand 0 in
inferScalarTypeForRecipe(const VPInstruction).
[SLP]Update deps for copyables operands, if the user is used several times in node
If the user instruction is used several times in the node, and in one
cases its operand is copyable, but in another is not, need to check all
operands to be sure we do not miss scheduling
Precommit test for PR #171012 (#171013)
This patch precommits a test where base offsets are negative. PR
[171012](https://github.com/llvm/llvm-project/pull/171012 ) will
eliminate negative offsets by sorting the scratch instructions.
[AMDGPU] Handle `s_setreg_imm32_b32` targeting `MODE` register
On certain hardware, this instruction clobbers VGPR MSB `bits[12:19]`, so we need to restore the current mode.
[flang] Add traits to several AST nodes, NFC
There are quite a few AST nodes that don't have any of the standard
traits (Wrapper/Tuple/etc). Because of that they require special
handling in the parse tree visitor.
Convert a subset of these nodes to the typical format, and remove
the special cases from the parse tree visitor.
Remove more filesystem.mount_info usage
This commit replaces `filesystem.mount_info` calls where we can
use more direct `filesystem.statfs` calls and removes path
restriction for filesystem.statfs.