-
Notifications
You must be signed in to change notification settings - Fork 69
Multiple LSU pipeline support #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@MayneMei Please reach out to Knute and see if he is available for a meeting to address your questions about debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you run make regress
the last message from ctest
is printed at the end of the run. Something like /path/to/LastTest.log
. Examine that file, it will tell which tests failed and how to run them.
core/LSU.cpp
Outdated
@@ -31,7 +33,7 @@ namespace olympia | |||
cache_read_stage_(cache_lookup_stage_ | |||
+ 1), // Get data from the cache in the cycle after cache lookup | |||
complete_stage_( | |||
cache_read_stage_ | |||
cache_read_stage_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set up your editor to remove extraneous (dead) whitespace from end of lines.
core/LSU.cpp
Outdated
inst_ptr->isStoreInst() && (inst_ptr->getStatus() != Inst::Status::RETIRED); | ||
const bool cache_bypass = is_already_hit || !phy_addr_is_ready || is_unretired_store; | ||
//check if we can forward from store buffer first | ||
uint64_t load_addr = inst_ptr->getTargetVAddr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64_t
core/LSU.cpp
Outdated
ILOG("Store added to store buffer: " << inst_ptr); | ||
} | ||
|
||
LoadStoreInstInfoPtr LSU::findYoungestMatchingStore_(uint64_t addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method can be const
core/LSU.cpp
Outdated
LoadStoreInstInfoPtr matching_store = nullptr; | ||
|
||
for (auto it = store_buffer_.begin(); it != store_buffer_.end(); ++it) | ||
{ | ||
auto & store = *it; | ||
if (store->getInstPtr()->getTargetVAddr() == addr) | ||
{ | ||
matching_store = store; | ||
} | ||
} | ||
return matching_store; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::find_if
core/LSU.cpp
Outdated
auto delete_iter = sb_iter++; | ||
store_buffer_.erase(delete_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto delete_iter = sb_iter++; | |
store_buffer_.erase(delete_iter); | |
sb_iter = store_buffer_.erase(sb_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Knute, thank you for you suggestions. One question I have is that whether I need to make "data_forwarding" as an variable similar to "allow_speculative_load_exec_", so that user can manually choose whether to use this feature?
@MayneMei I fixed the macOS build. BTW I am going to release my VLSU branch, which may conflict with some of your LSU changes. Let me know if you have any trouble updating your branch. |
When I added print message inside the LSU.cpp, I can see the store buffer size was modified. But in the test it's always 0. However, the cycle time met the expectation of data forwarding, so I assume it's correct |
@MayneMei my recent PR introduces some merge conflicts for you. If you have any trouble with the merge, please let me know and I can help resolve it. |
@kathlenemagnus I resolved conflict. However during the regression test, there is an environment issue related to MacOS, and also test "BranchPred_test_Run" introduced a segfault. Other tests are all passed. Do you mind take a look at it? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a few changes, I think this will be good to go!
@MayneMei it looks like you rebased your branch incorrectly. The PR contains all of the fixes that are already in master -- perhaps you rebased all of the updates since you pulled last? |
@klingaard I reseted the commit and rebased it. |
Signed-off-by: MayneMei <[email protected]>
…for mem speculation false Signed-off-by: MayneMei <[email protected]>
Signed-off-by: MayneMei <[email protected]>
…ly can test by using auto regression test
core/lsu/LSU.cpp
Outdated
|
||
auto it = std::find_if(store_buffer_.rbegin(), store_buffer_.rend(), | ||
[addr](const auto& store) { | ||
return store->getInstPtr()->getTargetVAddr() == addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually incorrect. You're missing the size. If the store is only 1 byte and the load is expecting 4... boom.
The way this should be done is with a proper store queue or combining buffer that can mask which bytes overlap. If the overlap is partial, it's a miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @klingaard, is there any attribute I can use for the load/store inst's size? Attached is my idea of modifying this function.
LoadStoreInstInfoPtr LSU::findYoungestMatchingStore_(const uint64_t load_addr, const uint32_t load_size) const
{
// Implementation with find_if
auto it = std::find_if(store_buffer_.rbegin(), store_buffer_.rend(),
[load_addr, load_size](const auto& store) {
const auto& store_inst = store->getInstPtr();
uint64_t store_addr = store_inst->getTargetVAddr();
uint32_t store_size = store_inst->getMemAccessSize();
// Check if store fully covers the load
return (store_addr <= load_addr) &&
(store_addr + store_size >= load_addr + load_size);
});
return (it != store_buffer_.rend()) ? *it : nullptr;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will have to add the method to Inst.hpp
that gets the size:
// Get the data size in bytes
uint32_t getMemAccessSize() const { opcode_info->getDataSize() / 8; } // opcode_info's data size is in bits
Your implementation almost works. 😄
What if I have two stores that can supply the data to one load?
block-beta
columns 1
block:ID
StoreA ["Store A of 2 bytes"]
StoreB ["Store B of 2 bytes"]
end
space
LoadC ["Load C of 4 bytes"]
ID --> LoadC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @klingaard , I added a new function called tryStoreToLoadForwarding. Basically is adding a mask so that we make sure only all the bits needed by load are covered by store, then we mark could forward. Also added a paragraph under LSU.adoc to explain how it works
However, there is a build error,
"make[2]: Leaving directory '/home/runner/work/riscv-perf-model/riscv-perf-model/Release'
make[1]: *** [CMakeFiles/Makefile2:2159: test/CMakeFiles/regress.dir/rule] Error 2
make[1]: Leaving directory '/home/runner/work/riscv-perf-model/riscv-perf-model/Release'
make: *** [Makefile:839: regress] Error 2
BUILD_OLYMPIA=2
'[' 2 -ne 0 ']'
echo 'ERROR: build/regress of olympia FAILED!!!'
exit 1
ERROR: build/regress of olympia FAILED!!!"
Could you point me to where I could check this and try to fix it? I checked save artifacts but don't know what to look at. Really appreciate it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like either a syntax issue or you're missing an update: https://github.com/riscv-software-src/riscv-perf-model/actions/runs/15093255238/job/42424580055?pr=216#step:8:2035
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Everything should be fixed now. :)
Co-authored-by: Knute Lingaard (MIPS) <[email protected]> Signed-off-by: MayneMei <[email protected]>
Signed-off-by: MayneMei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the proper fix. We can move forward with this, but I do have a suggestion for improving the performance/readability of it. Feel free to merge and make a subsequent PR to change the alg if you want.
const uint32_t store_size = store_inst_ptr->getMemAccessSize(); | ||
|
||
if (store_size == 0) { | ||
continue; // Skip stores that don't actually write data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones are those?
return false; | ||
} | ||
|
||
std::vector<bool> coverage_mask(load_size, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defaults to a specialization of std::vector
which is really a dynamic std::bitset
. Suggest using a boost::dynamic_bitset
for the implementation. See my next points...
Since we don't have a concept of a store buffer class, which could maintain the valid bits per cache line, you can kinda simulate that here.
- First, you can make a rule that store to load forwarding cannot cross a cache line. Makes things a little easier.
- If the load addr + size is good create a bitmask of the bytes that are needed by the load based on the size of the cache line
- Iterate the store buffer like you're doing
- Create a bitmask for the store (like you did for the load) and clear the bits in the load bit mask that match:
load_bytes_needed &= ~store_bytes; if (load_bytes_needed.none()) { return true; // we found all the stores that can forward to this load within the cache line }
Thanks for the suggestion! I should have a following PR in next week.
Knute Lingaard ***@***.***>于2025年5月26日 周一12:37写道:
… ***@***.**** approved this pull request.
Thanks for making the proper fix. We can move forward with this, but I do
have a suggestion for improving the performance/readability of it. Feel
free to merge and make a subsequent PR to change the alg if you want.
------------------------------
In core/lsu/LSU.cpp
<#216 (comment)>
:
> + std::vector<bool> coverage_mask(load_size, false);
+ uint32_t bytes_covered_count = 0;
+
+ // Iterate through the store_buffer_ from youngest to oldest.
+ // This ensures that if multiple stores write to the same byte,
+ // the data from the youngest store is effectively used.
+ for (auto it = store_buffer_.rbegin(); it != store_buffer_.rend(); ++it)
+ {
+ const auto& store_info_ptr = *it; // LoadStoreInstInfoPtr
+ const InstPtr& store_inst_ptr = store_info_ptr->getInstPtr();
+
+ const uint64_t store_addr = store_inst_ptr->getTargetVAddr();
+ const uint32_t store_size = store_inst_ptr->getMemAccessSize();
+
+ if (store_size == 0) {
+ continue; // Skip stores that don't actually write data.
Which ones are those?
------------------------------
In core/lsu/LSU.cpp
<#216 (comment)>
:
> +
+ store_buffer_.push_back(store_info_ptr);
+ ILOG("Store added to store buffer: " << inst_ptr);
+ }
+
+ bool LSU::tryStoreToLoadForwarding(const InstPtr& load_inst_ptr) const
+ {
+ const uint64_t load_addr = load_inst_ptr->getTargetVAddr();
+ const uint32_t load_size = load_inst_ptr->getMemAccessSize();
+
+ // A load must have a non-zero size to access memory.
+ if (load_size == 0) {
+ return false;
+ }
+
+ std::vector<bool> coverage_mask(load_size, false);
This defaults to a specialization of std::vector which is really a
dynamic std::bitset. Suggest using a boost::dynamic_bitset for the
implementation. See my next points...
Since we don't have a concept of a store buffer class, which could
maintain the valid bits per cache line, you can kinda simulate that here.
1. First, you can make a rule that store to load forwarding cannot
cross a cache line. Makes things a little easier.
2. If the load addr + size is good create a bitmask of the bytes that
are needed by the load based on the size of the cache line
3. Iterate the store buffer like you're doing
4. Create a bitmask for the store (like you did for the load) and
clear the bits in the load bit mask that match:
load_bytes_needed &= ~store_bytes;
if (load_bytes_needed.none()) {
return true; // we found all the stores that can forward to this load within the cache line
}
—
Reply to this email directly, view it on GitHub
<#216 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQSAIYBUXLDB7WPTNGHL3M33ANUPLAVCNFSM6AAAAABRAUMZJKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNRZGA3DGOJXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Hi Arup and Knute, after I talked with Arup, I decided to implement store buffer first to enable data forwarding and then implement multi pipelines. Here is the draft pull. For my implementation, 30% tests passed, 69 tests failed out of 99, when I look those results I feel really lost on how to begin debug. Could you give me some guidance? Thank you!