[lldb][Windows] Invalidate cached register values on thread stop (#192430)
Invalidate cached values in register context data structures on every
thread stop.
NativeRegisterContextRegisterInfo::InvalidateAllRegisters performs no
operation by default. Subclasses may override it to clear cached values
within their register context data structures whenever a thread stops.
This change intends to set up the necessary infrastructure to support
caching of the thread context in NativeRegisterContextWindows_arm64,
which will improve read performance. Currently, the thread context is
retrieved for every read or write operation.
[AArch64] Improve post-inc stores of SIMD/FP values
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32).
This avoids transferring the value through a GPR when storing.
Also remove the pre-legalization early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.
[AArch64] Improve post-inc stores of SIMD/FP values
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32).
This avoids transferring the value through a GPR when storing.
Also remove the pre-legalization early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.
[LoopPeel] Peel last iteration to enable load widening
In loops that contain multiple consecutive small loads (e.g., 3 bytes
loading i8's), peeling the last iteration makes it safe to read beyond
the accessed region, enabling the use of a wider load (e.g., i32) for
all other N-1 iterations.
Patterns such as:
```
%a = load i8, ptr %p
%b = load i8, ptr %p+1
%c = load i8, ptr %p+2
...
%p.next = getelementptr i8, ptr %p, 3
```
Can be transformed to:
```
%wide = load i32, ptr %p ; Read 4 bytes
[9 lines not shown]
Slightly adjust BUGS section for X509_addr_add_range()
Since x509_addr.c r1.95 X509_addr_add_range() clears the unused bits in
the maximum, so this is is only true in some implementations.