[SROA] Avoid redundant `.oldload` generation when `memset` fully covers a partition (#179643)
In our internal (ByteDance) builds we frequently hit very large
`DeadPhiWeb`s that cause serious compile-time slowdowns, especially in
some auto-generated code where a single file can take 20+ minutes to
compile. There were previous attempts to reduce `DeadPhiWeb` in
`InstCombine` (e.g. llvm/llvm-project#108876 and
llvm/llvm-project#158057), but in our workload we still see a lot of
time spent later in the pipeline (notably `JumpThreading` and
`CorrelatedValuePropagation`).
After digging into our cases, a big chunk of the `DeadPhiWeb` comes from
SROA rewriting `memset`s. We often end up with patterns like:
```
%.sroa.xxx.oldload = load <ty>, ptr %.sroa.xxx
%unused = ptrtoint ptr %.sroa.xxx.oldload to i64 ; or a bitcast-like use
store <ty> <new_value>, ptr %.sroa.xxx
```
Even if `%unused` is cleaned up by later DCE-style passes, the
[33 lines not shown]
nfs: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the final patch in the series that enables
the extension of NFSv4.2 to support POSIX draft ACLs.
At this time, only UFS mounted with the "acls" option
will work, and only for FreeBSD built with these patches.
Patches for client and server for the Linux kernel are
in the works. (I'll admit my next little project is
cleaning the Linux patches up for submission for upstream.)
To make these changes really useful, the FreeBSD port
of OpenZFS needs to be patched to add POSIX draft ACL
support. (Support for POSIX draft ACLs is already in
the Linux port of OpenZFS.)
[4 lines not shown]
nfs: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the fifth of several patches that implement the
above draft.
This one mostly adds an extra argument to two functions
in nfscommon.ko. Unfortunately, these functions are
called in many places, so the changes are numerous, but
straightforward.
Since the internal KAPI between the NFS modules is changed
by this commit, all of nfscommon.ko, nfscl.ko and nfsd.ko
must be rebuilt from sources.
There should be no semantics change for the series at
[3 lines not shown]
nfscl: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the fourth of several patches that implement the
above draft.
There should be no semantics change for the series at
this point.
(cherry picked from commit 0e724de9ed6f2d2914cb79686a4ceee7f6dd31a1)
nfscommon: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the third of several patches that implement the
above draft.
There should be no semantics change for the series at
this point.
(cherry picked from commit 949cff4dceffdbee70fa7741c1d61cf6c5255aeb)
nfsd: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the second of several patches that implement the
above draft.
The only semantics change would be if you have exported
a UFS file system mounted with the "acl" option.
In that case, you would see the acl attribute supported.
This is bogus, but will be handled in the next commit.
(cherry picked from commit 8e3fd450cc53d37fcf4e7f460f559d03c22c0d84)
nfscommon: Add some support for POSIX draft ACLs
An internet draft (expected to become an RFC someday)
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-posix-acls
describes an extension to NFSv4.2 to handle POSIX draft ACLs.
This is the first of several patches that implement the
above draft.
This patch should not result in a semantics change.
(cherry picked from commit a35bbd5d9f5f887a6f3de15cfe61fcc73fe22dc8)
[llvm-profgen] Support loading symbols from symtab for COFF (#179175)
PE has strict size constraints. The DWARF sections can occupy a
significant amount of spaces. When using pseudo probe, the symtab
already contains all the required info except symbol size. This
patch teachs llvm-profgen to load symbol size from PDB file.
[CIR] Add static_local attribute to GlobalOp and GetGlobalOp
This attribute marks function-local static variables that require
guarded initialization (e.g., C++ static local variables with
non-constant initializers). It is used by CIRGen to communicate
to LoweringPrepare which globals need guard variable emission.
[CIR][LoweringPrepare] Emit guard variables for static local initialization
This implements the lowering of static local variables with the Itanium C++ ABI
guard variable pattern in LoweringPrepare.
When a GlobalOp has the static_local attribute and a ctor region, this pass:
1. Creates a guard variable global (mangled name from AST)
2. Inserts the guard check pattern at each GetGlobalOp use site:
- Load guard byte with acquire ordering
- If zero, call __cxa_guard_acquire
- If acquire returns non-zero, inline the ctor region code
- Call __cxa_guard_release
3. Clears the static_local attribute and ctor region from the GlobalOp
[CIR] Add CIRGen support for static local variables with non-constant initializers
This adds CIRGen infrastructure for C++ function-local static variables
that require guarded initialization (Itanium C++ ABI).
Changes:
- Add ASTVarDeclAttr to carry VarDecl AST through the pipeline
- Add emitGuardedInit() to CIRGenCXXABI for guarded initialization
- Add emitCXXGuardedInit() to CIRGenFunction
- Replace NYI in addInitializerToStaticVarDecl() with ctor region emission
- Set static_local attribute on GlobalOp and GetGlobalOp
The global's ctor region contains the initialization code, which will be
lowered by LoweringPrepare to emit the actual guard variable pattern with
__cxa_guard_acquire/__cxa_guard_release calls.
[CIR] Add ASTVarDeclInterface for AST attribute access
Add the ASTVarDeclInterface which provides methods to access clang AST
VarDecl information from CIR attributes. This interface enables:
- mangleStaticGuardVariable: Mangle guard variable names using clang's
MangleContext
- isLocalVarDecl: Check if a variable is function-local
- getTLSKind: Get thread-local storage kind
- isInline: Check if the variable is inline
- getTemplateSpecializationKind: Get template specialization info
- getVarDecl: Direct access to the underlying VarDecl pointer
This infrastructure is needed for proper handling of static local
variables with guard variables in LoweringPrepare.
[msan][NFCI] Remove redundant tests from aarch64-bf16-dotprod-intrinsics.ll (#178832)
https://github.com/llvm/llvm-project/pull/178510#discussion_r2739401507
requested simplifying test cases by using parameters directly for the
intrinsic calls. Doing that reduces the test case to duplicates of
existing tests, thus this patch deletes the redundant tests.
[msan] Add intermediate verbosity instruction dump (#178771)
This patch does not change MSan's instrumentation.
-msan-dump-{heuristic,strict}-instructions currently prints out two
lines per instruction:
1) instruction name only e.g., `call llvm.aarch64.neon.uqsub.v16i8`
2) the full instruction, including actual variables e.g., `%vqsubq_v.i15
= call noundef <16 x i8> @llvm.aarch64.neon.uqsub.v16i8(<16 x i8>
%vext21.i, <16 x i8> splat (i8 1)), !dbg !66`
Option 1) is too sparse for some uses, because it does not contain the
return types or parameter types (although `.v16i8` is part of the
function name in this example, in general, the function name does not
describe the types completely; e.g., `<16 x float>
llvm.x86.avx512.mask.scalef.ps.512(<16 x float>, <16 x float>, <16 x
float>, i16, i32)`). OTOH option 2) can be too verbose because it
contains the actual variables.
[4 lines not shown]
[CIR][NFC] Cleanup some stale missing features markers (#179822)
This deletes a few missing features markers where the missing code had
actually been implemented and deletes a handful that were not being used
anywhere.