[Flang][OpenMP][Offload] Modify MapInfoFinalization to handle attach mapping and 6.1's ref_* and attach map keywords
This PR is one of four required to implement the attach mapping semantics in Flang, alongside the
ref_ptr/ref_ptee/ref_ptr_ptee map modifiers and the attach(always/never/auto) modifiers.
This PR is the MapInfoFinalization changes required to support these features, it mainly deals with
applying the correct attach map type and manipulating the descriptor types maps for base address
and descriptor so that when we specify ref_ptr/ref_ptee we emit one of the two maps and when we
emit ref_ptr_ptee we emit our usual default maps. In all cases we add the "glue" of an new
attach map except in cases where a user has provided attach never. In cases where we are
provided an always, we apply the always map type to our attach maps.
It's important to note the runtime has a toggle for the auto map behaviour, which will flip the
attach behaviour to the newer semantics or the older semantics for backwards compatability (outside
the purview of this PR but good to mention).
[CIR] Add ASTVarDeclInterface for AST attribute access
Add the ASTVarDeclInterface which provides methods to access clang AST
VarDecl information from CIR attributes. This interface enables:
- mangleStaticGuardVariable: Mangle guard variable names using clang's
MangleContext
- isLocalVarDecl: Check if a variable is function-local
- getTLSKind: Get thread-local storage kind
- isInline: Check if the variable is inline
- getTemplateSpecializationKind: Get template specialization info
- getVarDecl: Direct access to the underlying VarDecl pointer
This infrastructure is needed for proper handling of static local
variables with guard variables in LoweringPrepare.
[SROA] Avoid redundant `.oldload` generation when `memset` fully covers a partition (#179643)
In our internal (ByteDance) builds we frequently hit very large
`DeadPhiWeb`s that cause serious compile-time slowdowns, especially in
some auto-generated code where a single file can take 20+ minutes to
compile. There were previous attempts to reduce `DeadPhiWeb` in
`InstCombine` (e.g. llvm/llvm-project#108876 and
llvm/llvm-project#158057), but in our workload we still see a lot of
time spent later in the pipeline (notably `JumpThreading` and
`CorrelatedValuePropagation`).
After digging into our cases, a big chunk of the `DeadPhiWeb` comes from
SROA rewriting `memset`s. We often end up with patterns like:
```
%.sroa.xxx.oldload = load <ty>, ptr %.sroa.xxx
%unused = ptrtoint ptr %.sroa.xxx.oldload to i64 ; or a bitcast-like use
store <ty> <new_value>, ptr %.sroa.xxx
```
Even if `%unused` is cleaned up by later DCE-style passes, the
[33 lines not shown]
[llvm-profgen] Support loading symbols from symtab for COFF (#179175)
PE has strict size constraints. The DWARF sections can occupy a
significant amount of spaces. When using pseudo probe, the symtab
already contains all the required info except symbol size. This
patch teachs llvm-profgen to load symbol size from PDB file.
[CIR] Add static_local attribute to GlobalOp and GetGlobalOp
This attribute marks function-local static variables that require
guarded initialization (e.g., C++ static local variables with
non-constant initializers). It is used by CIRGen to communicate
to LoweringPrepare which globals need guard variable emission.
[CIR][LoweringPrepare] Emit guard variables for static local initialization
This implements the lowering of static local variables with the Itanium C++ ABI
guard variable pattern in LoweringPrepare.
When a GlobalOp has the static_local attribute and a ctor region, this pass:
1. Creates a guard variable global (mangled name from AST)
2. Inserts the guard check pattern at each GetGlobalOp use site:
- Load guard byte with acquire ordering
- If zero, call __cxa_guard_acquire
- If acquire returns non-zero, inline the ctor region code
- Call __cxa_guard_release
3. Clears the static_local attribute and ctor region from the GlobalOp
[CIR] Add CIRGen support for static local variables with non-constant initializers
This adds CIRGen infrastructure for C++ function-local static variables
that require guarded initialization (Itanium C++ ABI).
Changes:
- Add ASTVarDeclAttr to carry VarDecl AST through the pipeline
- Add emitGuardedInit() to CIRGenCXXABI for guarded initialization
- Add emitCXXGuardedInit() to CIRGenFunction
- Replace NYI in addInitializerToStaticVarDecl() with ctor region emission
- Set static_local attribute on GlobalOp and GetGlobalOp
The global's ctor region contains the initialization code, which will be
lowered by LoweringPrepare to emit the actual guard variable pattern with
__cxa_guard_acquire/__cxa_guard_release calls.
[Flang][MLIR][OpenMP] Add distinct var_ptr_ptr_type to omp.map.info operations
This is a precursor patch to attach and ref_ptr/ptee mapping that I intend to upstream
over the next few weeks. The attach maps require both the type of the descriptor and
the pointed to data to calculate the appropriate offload/base pointers and size. In
the base case of ref_ptr_ptee all of this information can be gathered from the pointer
and pointee maps, but in cases where we have only one (i.e. ref_ptr/ref_ptee) we will
be missing one of the key elements required to create an corresponding attach map.
So, this PR basically adds the ability to ferry around the type of both var_ptr and
var_ptr_ptr as opposed to just var_ptr, then we can emit attach maps as seperate
map.info's that carry all the pre-requisite informaion for lowering to LLVM-IR. But,
otherwise it seems reasonable to have var_ptr_ptr mirror var_ptr in all aspects for
consistency.
[CIR] Add ASTVarDeclInterface for AST attribute access
Add the ASTVarDeclInterface which provides methods to access clang AST
VarDecl information from CIR attributes. This interface enables:
- mangleStaticGuardVariable: Mangle guard variable names using clang's
MangleContext
- isLocalVarDecl: Check if a variable is function-local
- getTLSKind: Get thread-local storage kind
- isInline: Check if the variable is inline
- getTemplateSpecializationKind: Get template specialization info
- getVarDecl: Direct access to the underlying VarDecl pointer
This infrastructure is needed for proper handling of static local
variables with guard variables in LoweringPrepare.
[Flang][OpenMP][MLIR] Add attach and ref map type lowering to MLIR
This doesn't implement the functionality, just the relevant map type
lowering to MLIR's omp.map.info. The more complicated changes to
MapInfoFinalizationPass.cpp and OpenMPTOLLVMIRTranslation.cpp to support
attach map and the various ref/attach semantics will come in a subsequent
set of PRs. This just helps compartmentalize the changeset.
[msan][NFCI] Remove redundant tests from aarch64-bf16-dotprod-intrinsics.ll (#178832)
https://github.com/llvm/llvm-project/pull/178510#discussion_r2739401507
requested simplifying test cases by using parameters directly for the
intrinsic calls. Doing that reduces the test case to duplicates of
existing tests, thus this patch deletes the redundant tests.
[msan] Add intermediate verbosity instruction dump (#178771)
This patch does not change MSan's instrumentation.
-msan-dump-{heuristic,strict}-instructions currently prints out two
lines per instruction:
1) instruction name only e.g., `call llvm.aarch64.neon.uqsub.v16i8`
2) the full instruction, including actual variables e.g., `%vqsubq_v.i15
= call noundef <16 x i8> @llvm.aarch64.neon.uqsub.v16i8(<16 x i8>
%vext21.i, <16 x i8> splat (i8 1)), !dbg !66`
Option 1) is too sparse for some uses, because it does not contain the
return types or parameter types (although `.v16i8` is part of the
function name in this example, in general, the function name does not
describe the types completely; e.g., `<16 x float>
llvm.x86.avx512.mask.scalef.ps.512(<16 x float>, <16 x float>, <16 x
float>, i16, i32)`). OTOH option 2) can be too verbose because it
contains the actual variables.
[4 lines not shown]
[CIR][NFC] Cleanup some stale missing features markers (#179822)
This deletes a few missing features markers where the missing code had
actually been implemented and deletes a handful that were not being used
anywhere.