diff options
author | Davit Grigoryan <[email protected]> | 2024-09-29 20:46:04 -0700 |
---|---|---|
committer | Davit Grigoryan <[email protected]> | 2024-09-29 20:46:04 -0700 |
commit | 6278c41e2771c7b8d63a3cc5e1f8313fd52cdb87 (patch) | |
tree | d11916cedf416cbe3d45b2ace5c02251a03cada5 | |
parent | 4f7f19f1f42871c080b9450394dd828e14953e82 (diff) |
add notes for memory system
-rw-r--r-- | spec/notes.txt | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/spec/notes.txt b/spec/notes.txt index db518b9..dc753db 100644 --- a/spec/notes.txt +++ b/spec/notes.txt @@ -263,6 +263,43 @@ +-----------------------+ +* Scratchpad/Shared Memory & L1 Data Cache + + "scratchpad memory" refers to the memory accessible to all threads in the CTA + + NOTE: due to each SIMT Core having their own L1 cache, + L1 cache should only store "local memory" data and global read-only mem data + + Memory access request: + 1. Mem access req is sent from the load/store unit to the L1 cache + Mem access reqs consist of mem addrs (one for each thread in the warp) + operation type (ld/st) + 2. If the reqs cause bank conflicts -> split the reqs into two groups: [Arbiter] + threads' reqs that do NOT cause bank conflicts and those that cause + only execute the ones that do not cause conflicts and replay the instructions of the other group + 3. Memory is direct mapped => tag unit then determines the bank each thread will be using + 4. The data is looked up in the bank and then returned to the thread's lane using data crossbar + 5. If a cache miss is registered, the req goes to Pending Request Table (PRT) + 6. Reqs from PRT go to the MMU, which translated the virt. addr to physical one + then sends to L2 cache -> DRAM + the response is sent back to Fill Unit that fills the L1 cache with the results + and passes the info to the arbiter, which then re-reqs the loaded value (guaranteed hit this time) + 7. All data write reqs go to a "Write Buffer", which forwards the requests via + data crossbar to local reg file or to the MMU for global memory writes + + +* L1 Texture Cache + + For now, make the memory unified with L1 Data Cache (since only read-only values are cached) + + +* On-Chip Interconnection Network + + For high mem bandwidth, GPUs are connected to multiple DRAMs + + Mem traffic gets distributed accross then mem partitions usign addr interleaving (crossbar) + + +* Memory Partition Unit + + Contains a portion of L2 cache + + Contains a mem access scheduler ("frame buffer") + + Note: each mem partition is responsible for an exclusive addr range + => L2 cache can be written to and read from (no need for const data only) + + =================================== @@ -281,6 +318,7 @@ + Will make the assembler + Have never built a compiler before, so will be learning it for this project + =================================== |