Parallel Computing
   Lecture # 6



       Parallel Computer Memory
                   Architectures
Shared Memory
 General Characteristics:

 • Shared memory parallel computers vary widely, but generally
 have in common the ability for all processors to access all memory
 as global address space.
 Multiple processors can operate independently but share the
 same memory resources.
 Changes in a memory location effected by one processor are
 visible to all other processors.
 Shared memory machines can be divided into two main classes
 based upon memory access times: UMA and NUMA.
Shared Memory (UMA)
Shared Memory (NUMA)
Uniform Memory Access
(UMA):
 Most commonly represented today by Symmetric
 Multiprocessor (SMP) machines
 Identical processors
 Equal access and access times to memory
 Sometimes called CC-UMA - Cache Coherent UMA.
 Cache coherent means if one processor updates a
 location in shared memory, all the other processors
 know about the update. Cache coherency is
 accomplished at the hardware level.
Non-Uniform Memory
Access (NUMA)
 Often made by physically linking two or more SMPs
 One SMP can directly access memory of another
 SMP
 Not all processors have equal access time to all
 memories
 Memory access across link is slower
 If cache coherency is maintained, then may also be
 called CC-NUMA - Cache Coherent NUMA
Advantages:

 Global address space provides a user-friendly
 programming perspective to memory
 Data sharing between tasks is both fast and
 uniform due to the proximity of memory to CPUs
Disadvantages:
 Primary disadvantage is the lack of scalability between
 memory and CPUs. Adding more CPUs can geometrically
 increases traffic on the shared memory-CPU path, and for
 cache coherent systems, geometrically increase traffic
 associated with cache/memory management.
 Programmer responsibility for synchronization constructs
 that ensure "correct" access of global memory.
 Expense: it becomes increasingly difficult and expensive to
 design and produce shared memory machines with ever
 increasing numbers of processors.
Distributed Memory
General Characteristics:
  Like shared memory systems, distributed memory systems
  vary widely but share a common characteristic. Distributed
  memory systems require a communication network to
  connect inter-processor memory.
  Processors have their own local memory. Memory
  addresses in one processor do not map to another
  processor, so there is no concept of global address space
  across all processors.
  Because each processor has its own local memory, it
  operates independently. Changes it makes to its local
  memory have no effect on the memory of other processors.
  Hence, the concept of cache coherency does not apply.
Distributed Memory (cont.)
 When a processor needs access to data in another
 processor, it is usually the task of the programmer
 to explicitly define how and when data is
 communicated. Synchronization between tasks is
 likewise the programmer's responsibility.
 The network "fabric" used for data transfer varies
 widely, though it can can be as simple as Ethernet.
Distributed Memory (cont.)
Distributed Memory (cont.)
Advantages:
 Memory is scalable with number of processors.
 Increase the number of processors and the size of
 memory increases proportionately.
 Each processor can rapidly access its own memory
 without interference and without the overhead
 incurred with trying to maintain cache coherency.
 Cost effectiveness: can use commodity, off-the-
 shelf processors and networking
Distributed Memory (cont.)
 Disadvantages:
 The programmer is responsible for many of the
 details associated with data communication
 between processors.
 It may be difficult to map existing data structures,
 based on global memory, to this memory
 organization.
 Non-uniform memory access (NUMA) times
Hybrid Distributed-Shared
Memory
 The largest and fastest computers in the world today
 employ both shared and distributed memory
 architectures.
Hybrid Distributed-Shared
Memory (cont.)
 The shared memory component is usually a cache
 coherent SMP machine. Processors on a given SMP
 can address that machine's memory as global.
 The distributed memory component is the
 networking of multiple SMPs. SMPs know only
 about their own memory - not the memory on
 another SMP. Therefore, network communications
 are required to move data from one SMP to
 another.
Hybrid Distributed-Shared
Memory (cont.)
 Current trends seem to indicate that this type of
 memory architecture will continue to prevail and
 increase at the high end of computing for the
 foreseeable future.
 Advantages and Disadvantages: whatever is
 common to both shared and distributed memory
 architectures.

Lecture 6

  • 1.
    Parallel Computing Lecture # 6 Parallel Computer Memory Architectures
  • 2.
    Shared Memory GeneralCharacteristics: • Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space. Multiple processors can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.
  • 3.
  • 4.
  • 5.
    Uniform Memory Access (UMA): Most commonly represented today by Symmetric Multiprocessor (SMP) machines Identical processors Equal access and access times to memory Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level.
  • 6.
    Non-Uniform Memory Access (NUMA) Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Not all processors have equal access time to all memories Memory access across link is slower If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA
  • 7.
    Advantages: Global addressspace provides a user-friendly programming perspective to memory Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs
  • 8.
    Disadvantages: Primary disadvantageis the lack of scalability between memory and CPUs. Adding more CPUs can geometrically increases traffic on the shared memory-CPU path, and for cache coherent systems, geometrically increase traffic associated with cache/memory management. Programmer responsibility for synchronization constructs that ensure "correct" access of global memory. Expense: it becomes increasingly difficult and expensive to design and produce shared memory machines with ever increasing numbers of processors.
  • 9.
    Distributed Memory General Characteristics: Like shared memory systems, distributed memory systems vary widely but share a common characteristic. Distributed memory systems require a communication network to connect inter-processor memory. Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply.
  • 10.
    Distributed Memory (cont.) When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility. The network "fabric" used for data transfer varies widely, though it can can be as simple as Ethernet.
  • 11.
  • 12.
    Distributed Memory (cont.) Advantages: Memory is scalable with number of processors. Increase the number of processors and the size of memory increases proportionately. Each processor can rapidly access its own memory without interference and without the overhead incurred with trying to maintain cache coherency. Cost effectiveness: can use commodity, off-the- shelf processors and networking
  • 13.
    Distributed Memory (cont.) Disadvantages: The programmer is responsible for many of the details associated with data communication between processors. It may be difficult to map existing data structures, based on global memory, to this memory organization. Non-uniform memory access (NUMA) times
  • 14.
    Hybrid Distributed-Shared Memory Thelargest and fastest computers in the world today employ both shared and distributed memory architectures.
  • 15.
    Hybrid Distributed-Shared Memory (cont.) The shared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machine's memory as global. The distributed memory component is the networking of multiple SMPs. SMPs know only about their own memory - not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another.
  • 16.
    Hybrid Distributed-Shared Memory (cont.) Current trends seem to indicate that this type of memory architecture will continue to prevail and increase at the high end of computing for the foreseeable future. Advantages and Disadvantages: whatever is common to both shared and distributed memory architectures.