[lldb/ScriptInterpreter] Fix typo in AbstractMethodCheckerPayload (NFC) (#170187)
This fixes a typo in ScriptedPythonInterface and changes
`AbstrackMethodCheckerPayload` to `AbstractMethodCheckerPayload`.
Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
[AMDGPU] Refactor hazard recognizer for VALU-pipeline hazards. NFCI. (#168801)
This is in preparation of handling these in scheduler. I do not expect
any changes to the produced code here, it is just an infrastructure.
Our current problem with the VALU pipeline hazards is that we only
insert V_NOP instructions in the hazard recognizer mode, but ignore
it during scheduling. This patch is meant to create a mechanism to
actually account for that during scheduling.
Fix __apple_XXX iterator that iterates over all entries. (#157538)
The previous iterator for __apple_XXX sections was assuming that all
entries in the table would be contiguous and it wasn't using the offsets
table to access each chain of entries for a given name. This patch fixes
it so the iterator does the right thing.
This issue became apparent after a modification to strip template names
from DW_AT_name entries to allow adding both the template class base
name as an entry and also include the name with template names. The
commit hash is 2e7ee4dc21430b0fe4c9ee306dc1d8c7986a6646. The problem is
if the name starts with a "<" it will try and split the name. So if the
name is `"<get-size>"` it will return an empty string as the function
name, and this empty string gets added to the __apple_names table and
causes large delays when using the iterators.
[lldb][windows] fix environment handling in CreateProcessW setup (#168733)
This patch refactors and documents the setup of the `CreateProcessW`
invocation in `ProcessLauncherWindows`. It's a dependency of
https://github.com/llvm/llvm-project/pull/168729.
`CreateEnvironmentBufferW` now sorts the environment variable keys
before concatenating them into a string. From [the `CreateProcess`
documentation](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw):
> An application must manually pass the current directory information to
the new process. To do so, the application must explicitly create these
environment variable strings, sort them alphabetically (because the
system uses a sorted environment), and put them into the environment
block. Typically, they will go at the front of the environment block,
due to the environment block sort order.
`GetFlattenedWindowsCommandStringW` now returns an error which will be
surfaced, instead of failing silently.
[2 lines not shown]
[TySan] Attempt to unbreak build after #169036
If tysan was not in COMPILER_RT_SANITIZERS_TO_BUILD, we used to
get an error after #169036, see comments there for details.
Reapply "[clangd] Make lit tests work with the internal shell" (#169972)
This reverts commit bd04ef6df50e8e6e5212762fc798ea9fbdcfc897.
This reapply fixes the broken case where we would fail at CMake
configuration time if LLVM_INCLUDE_BENCHMARKS was explicitly turned off.
[HLSL] Update indexed vector elements individually (#169144)
When an individual element of a vector is updated via indexing into the vector, it needs to be handled as a store operation on that one vector element.
Clang treats vectors as one unit, so a vector element needs to be updated, the whole vector is loaded, the element is modified, and then the whole vector is stored. In HLSL vector elements are handled individually. We need to avoid this load/modify/store sequence to prevent overwriting other vector elements that might be getting updated in parallel.
Fixes #167729
Contributes to #160208.
[lldb] Fix a bug when disabling the statusline. (#169127)
Currently, disabling the statusline with `settings set show-statusline
false` leaves LLDB in a broken state. The same is true when trying to
toggle the setting again.
The issue was that setting the scroll window to 0 is apparently not
identical to setting it to the correct number of rows, even though some
documentation online incorrectly claims so.
Fixes #166608
[LLDB][NativePDB] Look for PDBs in `target.debug-file-search-paths` (#169719)
Similar to DWARF's DWO, we should look for PDBs in
`target.debug-file-search-paths` if the PDB isn't at the original
location or next to the executable.
With this PR, the search order is as follows:
1. PDB path specified in the PE/COFF file
2. Next to the executable
3. In `target.debug-file-search-paths`
This roughly matches [the order Visual Studio
uses](https://learn.microsoft.com/en-us/visualstudio/debugger/specify-symbol-dot-pdb-and-source-files-in-the-visual-studio-debugger?view=vs-2022#where-the-debugger-looks-for-symbols),
except that we don't have a project folder and don't support symbol
servers.
Closes #125355 (though I think this is already fixed in the native
plugin).
[flang][cuda][NFC] Split allocation related operation conversion from other cuf operations (#169740)
Split AllocOp, FreeOp, AllocateOp and DeallocateOp from other
conversion. Patterns are currently added to the base CUFOpConversion
when the option is enabled.
This split is a pre-requisite to be more flexible where we do the
allocation related operations conversion in the pipeline.
AMDGPU/GlobalISel: Report RegBankLegalize errors using reportGISelFailure
Use standard GlobalISel error reporting with reportGISelFailure
and pass returning false instead of llvm_unreachable.
Also enables -global-isel-abort=0 or 2 for -global-isel -new-reg-bank-select.
Note: new-reg-bank-select with abort 0 or 2 runs LCSSA,
while "intended use" without abort or with abort 1 does not run LCSSA.
[AMDGPU] Allow hazard checks for WMMA co-exec
Now we are just inserting V_NOP instrtuctions, try to schedule
something into the shadow.
It is still somewhat imprecise, for example AdvanceCycle() will
use TII.getNumWaitStates() anyway, but in a scheduling mode
we are not required to be precise. We must be finally precise
in the hazard recognizer mode. Then EmittedInstrs buffer is also
limited to MaxLookAhead even though VALU only hazards may actually
never expire and require an endless buffer. But that's OK, we can
at least mitigate what the buffer can hold. The buffer is also
currently much bigger than any of VALU hazards may need.
That said the rest of the 'fix*' functions here can be changed
the same way, these which are using V_NOPs. This one is just the
worst because it may require up to 9 nops.
[AMDGPU] Refactor hazard recognizer for VALU-pipeline hazards. NFCI.
This is in preparation of handling these in scheduler. I do not expect
any changes to the produce code here, it is just an infrastructure.
Our current problem with the VALU pipeline hazards is that we only
insert V_NOP instructions in the hazard recognizer mode, but ignore
it during scheduling. This patch is meant to create a mechanism to
actually account for that during scheduling.
GlobalISel: Stop using TPC to check if GlobalISelAbort is enabled
New pass manager does not use TargetPassConfig.
GlobalISel requires TargetPassConfig to reportGISelFailure,
and it only actual use is to check if GlobalISelAbort is enabled.
TargetPassConfig uses TargetMachine to check if GlobalISelAbort is
enabled, but TargetMachine is also available from MachineFunction.
[VPlan] Improve code in VPInstruction::generate (NFC) (#169470)
Make miscellaneous improvements including inlining some expressions and
re-using the existing State.Builder reference.
[CUDA] Add device-side kernel launch support (#165519)
- CUDA's dynamic parallelism extension allows device-side kernel
launches, which share the identical syntax to host-side launches, e.g.,
kernel<<<Dg, Db, Ns, S>>>(arguments);
but differ from the code generation. That device-side kernel launches is
eventually translated into the following sequence
config = cudaGetParameterBuffer(alignment, size);
// setup arguments by copying them into `config`.
cudaLaunchDevice(func, config, Dg, Db, Ns, S);
- To support the device-side kernel launch, 'CUDAKernelCallExpr' is
reused but its config expr is set to a call to 'cudaLaunchDevice'.
During the code generation, 'CUDAKernelCallExpr' is expanded into the
sequence aforementioned.
[2 lines not shown]
[mlir][spirv] Enable block splitting for `spirv.Switch` (#170147)
This is not strictly necessary as now selection regions can yield
values, however splitting the block simplifies the code as it avoids
unnecessary values being sunk just to be later yielded.
[VPlan] Use wide IV if scalar lanes > 0 are used with scalable vectors. (#169796)
For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar
step values. At the moment, it creates a vector, in addition to to the
first lane. The only supported case for this is when only the last lane
is used. A recipe should not set both scalar and vector values.
Instead, we can simply use a vector induction. It would also be possible
to preserve the current vector code-gen, by creating VPInstructions
based on the first lane of VPScalarIVStepsRecipe, but using a vector
induction seems simpler.
PR: https://github.com/llvm/llvm-project/pull/169796
[SPIRV] Add legalization for long vectors (#169665)
This patch introduces the necessary infrastructure to legalize vector
operations on vectors that are longer than what the SPIR-V target
supports. For instance, shaders only support vectors up to 4 elements.
The legalization is done by splitting the long vectors into smaller
vectors of a legal size.
Specifically, this patch does the following:
- Introduces `vectorElementCountIsGreaterThan` and
`vectorElementCountIsLessThanOrEqualTo` legality predicates.
- Adds legalization rules for `G_SHUFFLE_VECTOR`,
`G_EXTRACT_VECTOR_ELT`,
`G_BUILD_VECTOR`, `G_CONCAT_VECTORS`, `G_SPLAT_VECTOR`, and
`G_UNMERGE_VALUES`.
- Handles `G_BITCAST` of long vectors by converting them to
`@llvm.spv.bitcast` intrinsics which are then legalized.
- Updates `selectUnmergeValues` to handle extraction of both scalars
[3 lines not shown]
[AMDGPU][NPM] Preserve analyses in AMDGPURewriteAGPRCopyMFMA for NPM (#170130)
The pass preserved LiveStacksAnalysis but failed to preserve
LiveIntervalsAnalysis, LiveRegMatrixAnalysis, VirtRegMapAnalysis, and
SlotIndexesAnalysis under NPM. This caused these analyses to be
invalidated and recomputed, leading to incorrect behavior in subsequent
passes like VirtRegRewriter.
Fix by explicitly preserving all required analyses in the NPM version,
matching the legacy pass manager behavior.
---------
Co-authored-by: vikhegde <vikram.hegde at amd.com>