[C++20] [Modules] Write comments in C++20 modules' module file (#192398)
Previously we avoid writing the comments in C++20 modules' module file.
But this prevents LSP tools to read the comments in it. Although we
thought to add a new option for it and ask LSP to use the new option,
the cost of comments seems to be low and new option raises complexity,
so I prefer to write comments in C++20 modules' module file by default
now.
[MacroFusion] Early return when insts already clustered (#191710)
This patch adds an early return to `fuseInstructionPair()` when macro
fused instructions are already clustered, either by an earlier fusion or
another clustering like ld/st clustering, removing the assert.
The assert is generally wrong - there are edge cases where an earlier
ld/st clustering (before macro fusion) reached the assert because it
sets `ParentClusterIdx` and fails. For example, ADRP+LOAD/STORE on
AArch64, thought it seems to be a rare case because the addresses are
ususally unkown at compile time.
It doesn't effectively change how fusions are prioritized - early
fusions still win on fusion-fusion conflicts, like before. But it
changes how we resolve the edge case of ld/st-fusion conflicts:
Previously, fusions would effectively override ld/st clustering in this
case, given that we currently limits instruction membership to at most a
single cluster through `ParentClusterIdx`. Macro fusion runs after ld/st
[10 lines not shown]
Revert "[flang][cuda] Avoid false positive on multi device symbol with components" (#192393)
Reverts llvm/llvm-project#192177
this breaks some downstream testing
[HWASan] [compiler-rt] Add tag_bits option to HWASan alloc (#192386)
This can be used to make sure the allocator does not use the top bit of
the pointer. This is useful when HWASan is used in combination with
signed-integer-overflow detection. Some code uses arithmetic on intptr_t
that overflows for sufficiently large pointers.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[HWASan] [compiler-rt] Add tag_bits option to HWASan alloc (#191089)
This can be used to make sure the allocator does not use the top bit of
the pointer. This is useful when HWASan is used in combination with
signed-integer-overflow detection. Some code uses arithmetic on intptr_t
that overflows for sufficiently large pointers.
[clang] implement CWG2064: ignore value dependence for decltype
The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.
This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.
This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.
Fixes #8740
Fixes #61818
Fixes #190388
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[X86][APX] Reset SubReg for dst and check isVirtual before getInterval/getPhys (#191765)
We have made sure dst operand never has a SubReg. We need to make sure
register is virtual when calling getInterval/getPhys.
[clang] implement CWG2064: ignore value dependence for decltype
The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.
This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.
This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.
Fixes #8740
Fixes #61818
Fixes #190388
[RISCV] Prefer LUI over PLUI.H on RV32. (#192340)
I don't think any of the cases PLUI.H can handle would be eligible for
C.LUI, but still figured it was best to use base ISA instructions when
possible.
[clang] Make serenity.cpp tests pass on clang-with-thin-lto-ubuntu (#192231)
LTO_FULL-NOT was definitely too generic and prone to matching unrelated
content. It would, as an example, match against the build path on
clang-with-thin-lto-ubuntu builder [1].
Making the match more restrictive should avoid this kind of issues.
[1] https://lab.llvm.org/buildbot/#/builders/127/builds/6956