[SlotIndexes] Make IndexListEntry/slot constructor private
This was made public only for some unit tests introduced in
e5e3dccd0741c2cf6e1885f0b6053fcfc6684102 that have now been removed.
Since they have been removed, make the method private to prevent misuse,
remove the warning now that misuse is prevented by visibility, and
remove the description of the destructor given it is redundant with the
code.
[flang-rt] Temporarily disable destructor call in OwningPtr::delete_ptr. (#182635)
This is causing failures in CUF testing, because the device compiler
cannot identify the static stack size for kernels.
[ThinLTO][MemProf] Support remark emission for thin link and use in MemProf (#182570)
Enable optimization remark emission during the ThinLTO thin link phase.
This is useful for global analysis passes like MemProf context
disambiguation which operate on the summary index and may need to
report diagnostics before any IR modules are available.
Key changes:
- Create a dummy function ("thinlto_remark_dummy") in a private Module
within the LTO class to provide the necessary Function context for
OptimizationRemarkEmitter.
- Update MemProfContextDisambiguation to use a callback for remark
emission, allowing it to report hinted sizes and other diagnostics
during the thin link.
- Ensure the dummy module and function are safely cleaned up at the end
of the LTO session via the LTO::cleanup mechanism.
[RISCV] Remove srl from (srl (and X, (1 << C)), C) used as czero.eqz/nez condition. (#182598)
(setne (and X, 1 << C), 0) is canonicalized to (srl (and X, (1 << C)),
C).
If this is later used as a czero.eqz/nez condition, we can remove
the srl if the and can be represented as an ANDI.
[HLSL] Enable `-Wconversion`, `-Wvector-conversion`, and `-Wmatrix-conversion` warnings for HLSL by default (#182607)
Fixes #180038 by enabling `-Wconversion`, `-Wvector-conversion`, and
`-Wmatrix-conversion` warnings for HLSL by default, both in the HLSL
clang driver and when fixing up clang invocations under HLSL in
CompilerInvocation.cpp (so that they are enabled even with clang -cc1).
This PR also updates existing tests to expect warnings that weren't
expected before, and removes the `-Wconversion` flags from existing HLSL
tests since it is now redundant due to being enabled by default.
Note that no existing HLSL tests use or exercise `-Wvector-conversion`
or `-Wmatrix-conversion`.
[AMDGPU] Use a general form of intrinsic for tensor load/store (#182334)
The intrinsic has five arguments for the tensor descriptor (D#), while the fifth one is reserved for future targets, and it will be silently ignored in codegen for gfx1250.
For tensor up to 2D, only the first two D# groups are meaningful and the rest should be zero-initialized.
[Hexagon] Handle subreg copies between DoubleRegs and IntRegs (#181360)
ISel can generate truncating COPYs from DoubleRegs to IntRegs when a
64-bit result (e.g., C2_mask) is used in a 32-bit context. Several
passes crashed on this pattern:
BitTracker asserted WD >= WS for COPY instructions. Handle the WD < WS
case by extracting the low WD bits from the source.
HexagonInstrInfo::copyPhysReg had no case for IntRegs <- DoubleRegs or
DoubleRegs <- IntRegs. Add both directions, respecting the subreg index
on the operand (isub_lo/isub_hi) when present.
HexagonTfrCleanup asserted that source and destination register sizes
match. Replace with proper subreg resolution on both operands and a
hasNoVRegs() guard since the pass runs post-RA.
HexagonGenMux asserted no subregs on physical register operands.
Preserve subreg information when building mux instructions and resolve
[6 lines not shown]
[HLSL][Matrix] Make matrix single element accessor return a scalar instead of vector (#182609)
Fixes #182599 by making `SemaHLSL::checkMatrixComponent` return the
element type instead of a vector when the number of vector components is
exactly 1.
[clang-doc] Improve complexity of Index construction
The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.
The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.
| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10 | 9,977 | 11,004 | 0.91x | +10.3% |
| 64 | 69,249 | 69,166 | 1.00x | -0.1% |
| 512 | 1,932,714 | 525,877 | 3.68x | -72.8% |
| 4,096 | 92,411,535 | 4,589,030 | 20.1x | -95.0% |
| 10,000 | 577,384,945 | 12,998,039 | 44.4x | -97.7% |
The patch delivers significant improvements to scalability. At 10,000
[13 lines not shown]
[clang-doc] Add basic benchmarks for library functionality
clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.
Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
[AArch64][MachineOutliner] Clear debug locations on bundled instructions (#175655)
When the machine outliner duplicates instructions, it clears their debug
locations to avoid having the outlined function reference DISubprograms
from the original functions. However, this only cleared the debug
location on the bundle header, not on the individual instructions inside
the bundle.
This caused assertion failures in `LexicalScopes::getOrCreateRegularScope`,
because the bundled instructions still had debug locations pointing to
the original function's.
Fix this by iterating through all instructions in a bundle and clearing
their debug locations as well.
AMDGPU: Try to fix leak in AMDGPULibFunc (#182583)
I don't know why this was trying to do placement do. I guess
this was overriding the unique_ptr, bypassing its destructor.