summaryrefslogtreecommitdiff
path: root/spec/notes.txt
diff options
context:
space:
mode:
authorDavit Grigoryan <[email protected]>2024-09-29 20:46:04 -0700
committerDavit Grigoryan <[email protected]>2024-09-29 20:46:04 -0700
commit6278c41e2771c7b8d63a3cc5e1f8313fd52cdb87 (patch)
treed11916cedf416cbe3d45b2ace5c02251a03cada5 /spec/notes.txt
parent4f7f19f1f42871c080b9450394dd828e14953e82 (diff)
add notes for memory system
Diffstat (limited to 'spec/notes.txt')
-rw-r--r--spec/notes.txt38
1 files changed, 38 insertions, 0 deletions
diff --git a/spec/notes.txt b/spec/notes.txt
index db518b9..dc753db 100644
--- a/spec/notes.txt
+++ b/spec/notes.txt
@@ -263,6 +263,43 @@
+-----------------------+
+* Scratchpad/Shared Memory & L1 Data Cache
+ + "scratchpad memory" refers to the memory accessible to all threads in the CTA
+ + NOTE: due to each SIMT Core having their own L1 cache,
+ L1 cache should only store "local memory" data and global read-only mem data
+ + Memory access request:
+ 1. Mem access req is sent from the load/store unit to the L1 cache
+ Mem access reqs consist of mem addrs (one for each thread in the warp) + operation type (ld/st)
+ 2. If the reqs cause bank conflicts -> split the reqs into two groups: [Arbiter]
+ threads' reqs that do NOT cause bank conflicts and those that cause
+ only execute the ones that do not cause conflicts and replay the instructions of the other group
+ 3. Memory is direct mapped => tag unit then determines the bank each thread will be using
+ 4. The data is looked up in the bank and then returned to the thread's lane using data crossbar
+ 5. If a cache miss is registered, the req goes to Pending Request Table (PRT)
+ 6. Reqs from PRT go to the MMU, which translated the virt. addr to physical one
+ then sends to L2 cache -> DRAM
+ the response is sent back to Fill Unit that fills the L1 cache with the results
+ and passes the info to the arbiter, which then re-reqs the loaded value (guaranteed hit this time)
+ 7. All data write reqs go to a "Write Buffer", which forwards the requests via
+ data crossbar to local reg file or to the MMU for global memory writes
+
+
+* L1 Texture Cache
+ + For now, make the memory unified with L1 Data Cache (since only read-only values are cached)
+
+
+* On-Chip Interconnection Network
+ + For high mem bandwidth, GPUs are connected to multiple DRAMs
+ + Mem traffic gets distributed accross then mem partitions usign addr interleaving (crossbar)
+
+
+* Memory Partition Unit
+ + Contains a portion of L2 cache
+ + Contains a mem access scheduler ("frame buffer")
+ + Note: each mem partition is responsible for an exclusive addr range
+ => L2 cache can be written to and read from (no need for const data only)
+
+
===================================
@@ -281,6 +318,7 @@
+ Will make the assembler
+ Have never built a compiler before, so will be learning it for this project
+
===================================