add notes for memory system

author: Davit Grigoryan <[email protected]> 2024-09-29 20:46:04 -0700
committer: Davit Grigoryan <[email protected]> 2024-09-29 20:46:04 -0700
commit: 6278c41e2771c7b8d63a3cc5e1f8313fd52cdb87 (patch)
tree: d11916cedf416cbe3d45b2ace5c02251a03cada5
parent: 4f7f19f1f42871c080b9450394dd828e14953e82 (diff)
1 files changed, 38 insertions, 0 deletions
diff --git a/spec/notes.txt b/spec/notes.txt
index db518b9..dc753db 100644
--- a/spec/notes.txt
+++ b/spec/notes.txt
@@ -263,6 +263,43 @@
 +-----------------------+
 
 
+* Scratchpad/Shared Memory & L1 Data Cache
+	+ "scratchpad memory" refers to the memory accessible to all threads in the CTA
+	+ NOTE: due to each SIMT Core having their own L1 cache,
+	  L1 cache should only store "local memory" data and global read-only mem data
+	+ Memory access request:
+		1. Mem access req is sent from the load/store unit to the L1 cache
+		   Mem access reqs consist of mem addrs (one for each thread in the warp) + operation type (ld/st)
+		2. If the reqs cause bank conflicts -> split the reqs into two groups: [Arbiter]
+		   threads' reqs that do NOT cause bank conflicts and those that cause
+		   only execute the ones that do not cause conflicts and replay the instructions of the other group
+		3. Memory is direct mapped => tag unit then determines the bank each thread will be using
+		4. The data is looked up in the bank and then returned to the thread's lane using data crossbar
+		5. If a cache miss is registered, the req goes to Pending Request Table (PRT)
+		6. Reqs from PRT go to the MMU, which translated the virt. addr to physical one
+		   then sends to L2 cache -> DRAM
+		   the response is sent back to Fill Unit that fills the L1 cache with the results
+		   and passes the info to the arbiter, which then re-reqs the loaded value (guaranteed hit this time)
+		7. All data write reqs go to a "Write Buffer", which forwards the requests via
+		   data crossbar to local reg file or to the MMU for global memory writes
+
+
+* L1 Texture Cache
+	+ For now, make the memory unified with L1 Data Cache (since only read-only values are cached)
+
+
+* On-Chip Interconnection Network
+	+ For high mem bandwidth, GPUs are connected to multiple DRAMs
+	+ Mem traffic gets distributed accross then mem partitions usign addr interleaving (crossbar)
+
+
+* Memory Partition Unit
+	+ Contains a portion of L2 cache
+	+ Contains a mem access scheduler ("frame buffer")
+	+ Note: each mem partition is responsible for an exclusive addr range
+	  => L2 cache can be written to and read from (no need for const data only)
+
+
 ===================================
 
 
@@ -281,6 +318,7 @@
 + Will make the assembler
 + Have never built a compiler before, so will be learning it for this project
 
+
 ===================================
author	Davit Grigoryan <[email protected]>	2024-09-29 20:46:04 -0700
committer	Davit Grigoryan <[email protected]>	2024-09-29 20:46:04 -0700
commit	6278c41e2771c7b8d63a3cc5e1f8313fd52cdb87 (patch)
tree	d11916cedf416cbe3d45b2ace5c02251a03cada5
parent	4f7f19f1f42871c080b9450394dd828e14953e82 (diff)