Skip to content

Multiple LSU pipeline support #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open

Multiple LSU pipeline support #216

wants to merge 36 commits into from

Conversation

MayneMei
Copy link
Collaborator

@MayneMei MayneMei commented Nov 1, 2024

Hi Arup and Knute, after I talked with Arup, I decided to implement store buffer first to enable data forwarding and then implement multi pipelines. Here is the draft pull. For my implementation, 30% tests passed, 69 tests failed out of 99, when I look those results I feel really lost on how to begin debug. Could you give me some guidance? Thank you!

@arupc
Copy link
Collaborator

arupc commented Nov 10, 2024

@MayneMei Please reach out to Knute and see if he is available for a meeting to address your questions about debugging.

Copy link
Collaborator

@klingaard klingaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you run make regress the last message from ctest is printed at the end of the run. Something like /path/to/LastTest.log. Examine that file, it will tell which tests failed and how to run them.

core/LSU.cpp Outdated
@@ -31,7 +33,7 @@ namespace olympia
cache_read_stage_(cache_lookup_stage_
+ 1), // Get data from the cache in the cycle after cache lookup
complete_stage_(
cache_read_stage_
cache_read_stage_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set up your editor to remove extraneous (dead) whitespace from end of lines.

core/LSU.cpp Outdated
inst_ptr->isStoreInst() && (inst_ptr->getStatus() != Inst::Status::RETIRED);
const bool cache_bypass = is_already_hit || !phy_addr_is_ready || is_unretired_store;
//check if we can forward from store buffer first
uint64_t load_addr = inst_ptr->getTargetVAddr();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const uint64_t

core/LSU.cpp Outdated
ILOG("Store added to store buffer: " << inst_ptr);
}

LoadStoreInstInfoPtr LSU::findYoungestMatchingStore_(uint64_t addr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method can be const

core/LSU.cpp Outdated
Comment on lines 951 to 961
LoadStoreInstInfoPtr matching_store = nullptr;

for (auto it = store_buffer_.begin(); it != store_buffer_.end(); ++it)
{
auto & store = *it;
if (store->getInstPtr()->getTargetVAddr() == addr)
{
matching_store = store;
}
}
return matching_store;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::find_if

core/LSU.cpp Outdated
Comment on lines 1452 to 1453
auto delete_iter = sb_iter++;
store_buffer_.erase(delete_iter);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto delete_iter = sb_iter++;
store_buffer_.erase(delete_iter);
sb_iter = store_buffer_.erase(sb_iter);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Knute, thank you for you suggestions. One question I have is that whether I need to make "data_forwarding" as an variable similar to "allow_speculative_load_exec_", so that user can manually choose whether to use this feature?

@kathlenemagnus
Copy link
Collaborator

@MayneMei I fixed the macOS build.

BTW I am going to release my VLSU branch, which may conflict with some of your LSU changes. Let me know if you have any trouble updating your branch.

@MayneMei
Copy link
Collaborator Author

MayneMei commented Dec 6, 2024

When I added print message inside the LSU.cpp, I can see the store buffer size was modified. But in the test it's always 0. However, the cycle time met the expectation of data forwarding, so I assume it's correct

@MayneMei MayneMei marked this pull request as ready for review December 16, 2024 04:02
@kathlenemagnus
Copy link
Collaborator

@MayneMei my recent PR introduces some merge conflicts for you. If you have any trouble with the merge, please let me know and I can help resolve it.

@MayneMei
Copy link
Collaborator Author

@kathlenemagnus I resolved conflict. However during the regression test, there is an environment issue related to MacOS, and also test "BranchPred_test_Run" introduced a segfault. Other tests are all passed. Do you mind take a look at it? Thank you!

@klingaard klingaard changed the title Mayne lsu Multiple LSU pipeline support Mar 14, 2025
Copy link
Collaborator

@klingaard klingaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few changes, I think this will be good to go!

@klingaard
Copy link
Collaborator

@MayneMei it looks like you rebased your branch incorrectly. The PR contains all of the fixes that are already in master -- perhaps you rebased all of the updates since you pulled last?

@MayneMei
Copy link
Collaborator Author

MayneMei commented Mar 23, 2025

@klingaard I reseted the commit and rebased it.

core/lsu/LSU.cpp Outdated

auto it = std::find_if(store_buffer_.rbegin(), store_buffer_.rend(),
[addr](const auto& store) {
return store->getInstPtr()->getTargetVAddr() == addr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually incorrect. You're missing the size. If the store is only 1 byte and the load is expecting 4... boom.

The way this should be done is with a proper store queue or combining buffer that can mask which bytes overlap. If the overlap is partial, it's a miss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @klingaard, is there any attribute I can use for the load/store inst's size? Attached is my idea of modifying this function.

LoadStoreInstInfoPtr LSU::findYoungestMatchingStore_(const uint64_t load_addr, const uint32_t load_size) const
    {
        // Implementation with find_if
        auto it = std::find_if(store_buffer_.rbegin(), store_buffer_.rend(),
            [load_addr, load_size](const auto& store) {
                const auto& store_inst = store->getInstPtr();
                uint64_t store_addr = store_inst->getTargetVAddr();
                uint32_t store_size = store_inst->getMemAccessSize();

                // Check if store fully covers the load
                return (store_addr <= load_addr) && 
                    (store_addr + store_size >= load_addr + load_size);
            });

        return (it != store_buffer_.rend()) ? *it : nullptr;
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will have to add the method to Inst.hpp that gets the size:

    // Get the data size in bytes
    uint32_t getMemAccessSize() const { opcode_info->getDataSize() / 8; }  // opcode_info's data size is in bits

Your implementation almost works. 😄

What if I have two stores that can supply the data to one load?

block-beta
columns 1
  block:ID
    StoreA ["Store A of 2 bytes"]
    StoreB ["Store B of 2 bytes"]
  end
  space
  LoadC ["Load C of 4 bytes"]
  ID --> LoadC

Loading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @klingaard , I added a new function called tryStoreToLoadForwarding. Basically is adding a mask so that we make sure only all the bits needed by load are covered by store, then we mark could forward. Also added a paragraph under LSU.adoc to explain how it works

However, there is a build error,
"make[2]: Leaving directory '/home/runner/work/riscv-perf-model/riscv-perf-model/Release'
make[1]: *** [CMakeFiles/Makefile2:2159: test/CMakeFiles/regress.dir/rule] Error 2
make[1]: Leaving directory '/home/runner/work/riscv-perf-model/riscv-perf-model/Release'
make: *** [Makefile:839: regress] Error 2

BUILD_OLYMPIA=2
'[' 2 -ne 0 ']'
echo 'ERROR: build/regress of olympia FAILED!!!'
exit 1
ERROR: build/regress of olympia FAILED!!!"
Could you point me to where I could check this and try to fix it? I checked save artifacts but don't know what to look at. Really appreciate it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Everything should be fixed now. :)

Copy link
Collaborator

@klingaard klingaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the proper fix. We can move forward with this, but I do have a suggestion for improving the performance/readability of it. Feel free to merge and make a subsequent PR to change the alg if you want.

const uint32_t store_size = store_inst_ptr->getMemAccessSize();

if (store_size == 0) {
continue; // Skip stores that don't actually write data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ones are those?

return false;
}

std::vector<bool> coverage_mask(load_size, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defaults to a specialization of std::vector which is really a dynamic std::bitset. Suggest using a boost::dynamic_bitset for the implementation. See my next points...

Since we don't have a concept of a store buffer class, which could maintain the valid bits per cache line, you can kinda simulate that here.

  1. First, you can make a rule that store to load forwarding cannot cross a cache line. Makes things a little easier.
  2. If the load addr + size is good create a bitmask of the bytes that are needed by the load based on the size of the cache line
  3. Iterate the store buffer like you're doing
  4. Create a bitmask for the store (like you did for the load) and clear the bits in the load bit mask that match:
         load_bytes_needed &= ~store_bytes;
         if (load_bytes_needed.none()) {
             return true; // we found all the stores that can forward to this load within the cache line
         }
    

@MayneMei
Copy link
Collaborator Author

MayneMei commented May 26, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants