Want to create interactive content? It’s easy in Genially!

Get started free

DIGITAL TECH PRESENTATION

Pham Khanh

Created on July 12, 2022

Start designing with a free template

Discover more than 1500 professional designs like these:

Corporate Christmas Presentation

Business Results Presentation

Meeting Plan Presentation

Customer Service Manual

Business vision deck

Economic Presentation

Tech Presentation Mobile

Transcript

Multicore Computer

Loading........

PRESENTATION by Group 8

Multicore Processor

Core

01

Hardware Performance

Phạm Minh Khánh

01

Increase in Parallelism and Complexity

01

Increase in Parallelism and Complexity
Store
Decode
Fetch
Execute
Pipeline

Pipelining

01

Is a technique of decomposing a sequential process into sub-processes
Decode
Fetch
Execute
Instruction
Execute
Fetch
Decode
Execute
Decode
Fetch
With the same complexity can be implemented by a pipeline processor
Time

Superscalar

01

Multiple pipelines are constructed by replicating execution resources. This allows multiple instructions to be executed in parallel pipelines at the same time, as long as hazards are avoided

01

Superscalar

Simultaneous multithreading (SMT)

01

SMT is the process of a CPU splitting each of its physical cores into virtual cores, which are known as threads. This is done in order to increase performance and allow each core to run two instruction streams at once.

01

Lorem ipsum dolor sit amet consectetur adipiscing

Multicore processor

Lorem ipsum dolor sit amet consectetur adipiscing

Lorem ipsum dolor sit amet consectetur adipiscing

Multicore processor

Lorem ipsum dolor sit amet consectetur adipiscing

Core 1
Core 2
Core
L1-D
L1-I
L1-D
L1-I
L2 cache
(Superscalar or SMT)
Core 3
Core n
L1-I
L1-D
L1-I
L1-D
CPU

01

Power Consumption

01

Power Consumption
Watts/cm2
Logic
100
10
Memory
0.13
0.1
0.25
0.18

01

How to use all those logic transistors

How?

Pollack’s rule

01

Pollack’s rule

01

Performance increase is roughly proportional to the square root of the increase in complexity

Pollack’s rule

01

Performance increase is roughly proportional to the square root of the increase in complexity
In other word:

If

X 2

Transistor logic

Then

40% Performance increase

01

Software Performance Issue

Phạm Minh Khánh

01

Software on Multicore

01

Amdahl's Law

Use to

Calculate how much a computation can be speed up by running of a program in parallel

01

Amdahl's Law

Use to

Calculate how much a computation can be speed up by running of a program in parallel

A Program

Part which cannot be parallelized

Part which can be parallelized

Let say:

01

T = Total time of serial execution
B = Total time of non-parallelizable part
T - B = Total time of parallelizable part
N = The number of threads or CPUs

Note

01

Normalize T = 1

Note

01

Normalize T = 1

Note

01

Normalize T = 1

01

More thread per CPUs
Equal

01

More thread per CPUs
Equal
Faster Execution Time

01

Original Execution Time
Speedup =
Execution Time after Enhancement
Speedup =
B + ( 1 - B )
The execution time of half of the program can be accelerated by a factor of 2. What is the program speed-up overall

01

Speedup =
1.33
Speedup =
0.5 + ( 1 - 0.5 )
B = 0.5 N = 2

01

As the number of processors increases
But, the serial portion of each program stays the same
The amount of time required for the parallel portion of each program decreases

01

Multithreaded native applications
Multiprocess applications
Java applications
Multi-instance applications

01

From Valve’s perspective, threading granularity options are defined as follows:

01

Coarse-grained threading
Fine- grained threading
Hybrid threading

01

01

Hardware Performance

Nguyễn Tiến Hưng

01

Four general organizations for multicore systems

(a) Dedicated L1 cache

(a) Dedicated L1 cache

(b) Dedicated L2 cache

01

(c) Shared L2 cache

(a) Dedicated L1 cache

(b) Dedicated L2 cache

01

(c) Shared L2 cache

(a) Dedicated L1 cache

(b) Dedicated L2 cache

(d ) Shared L3 cache

The use of a shared L2 cache on the chip has several advantages over exclusive reliance on dedicated caches:

01

Constructive interference can reduce overall miss rates
Data shared by multiple cores is not replicated at the shared cache level

01

Threads that have a less locality can employ more cache
Interprocessor communication is easy to implement, via shared memory locations
Provide some additional performance advantage

01

INTEL x86 MULTICORE ORGANIZATION

Phạm Huy Hoàng

01

Intel Core Duo

First introduced in 2006
Implements two x86 superscalar processors with a shared L2 cache, each core has its own dedicated L1 cache, a 32-kB instruction cache and a 32-kB data cache
The 2-MB L2 cache logic allows for a dynamic allocation of cache space based on current core needs, so that one core can be assigned up to 100% of the L2 cache:

01

+ MESI( Modified, Exclusive, Shared, Invalid ) support for L1 caches
+ Extended to support multiple Core Duo in symmetric multiprocessor (SMP).
Each core has an independent thermal control unit. It designed to manage chip heat dissipation to maximize processor performance

01

The Advanced Programmable Interrupt Controller (APIC) performs a number of functions:
Provide interprocessor interrupts, which allow any process to interrupt any other processor or set of processor
Accepts I/O interrupts and routes these to the appropriate core
Each APIC includes a timer, which can be set by the OS to generate an interrupt to the local core
The power management logic is responsible for reducing power consumption when possible

01

+ In essence, the power management logic monitors thermal conditions and CPU activity and adjusts voltage levels and power consumption appropriately
The bus interface connects to the external bus, known as the Front Side Bus, which connects to main memory, I/O controllers, and other processor chips

The Intel Core i7-990X

01

Introduced in November of 2008
6 x86 simultaneous multithreading (SMT) processors, each with a dedicated L2 cache, and with a shared L3 cache
The Core i7-990X chip supports two forms of external communications to other chips. The DDR3 memory controllers and The QuickPath Interconnect

01

+ The DDR3 memory controller brings the memory controller for the DDR main memory onto the chip. The interface supports three channels that are 8 bytes wide for a total bus width of 192 bits, for an aggregate data rate of up to 32 GB/s
+ The QuickPath Interconnect enables high-speed communications among connected processor chips. The QPI link operates at 6.4 GT/s (gigatransfers per second).

Cache Latency Comparison

01

L2 Cache
L3 Cache
L1 Cache
Clock Frequency
CPU
15 cycle
3 cycle
2.66 GHz
Core 2 Quad
4 cycle
11 cycle
39 cycle
Core I7
2.66 GHz

01

ARM11 MPCORE

Nguyễn Tiến Hưng

The ARM11 MPCore is a multicore product based on the ARM11 processor family

01

The ARM11 MPCore can be configured with up to four processors, each with its own L1 instruction and data caches, per chip
Distributed interrupt controller (DIC): Handles interrupt detection and interrupt prioritization. The DIC distributes interrupts to individual processors

01

Timer: Each CPU has its own private timer that can generate interrupts
Watchdog: Issues warning alerts in the event of software failures
CPU interface: Handles interrupt acknowledgment, interrupt masking, and interrupt completion acknowledgement

01

CPU: A single ARM11 processor. Individual CPUs are referred to as MP11 CPUs
Vector floating-point (VFP) unit: A coprocessor that implements floatingpoint operations in hardware
L1 cache: Each CPU has its own dedicated L1 data cache and L1 instruction cache
Snoop control unit (SCU): Responsible for maintaining coherency among L1 data caches

Interrupt Handling

The Distributed Interrupt Controller (DIC) collates interrupts from a large number of sources. It provides
• Distribution of the interrupts to the target MP11 CPUs
• Masking of interrupts
• Tracking the status of interrupts
• Prioritization of the interrupts
• Generation of interrupts by software

01

The DIC is designed to satisfy two functional requirements:
• Provide a means of routing an interrupt request to a single CPU or multiple CPUs, as required
• Provide a means of interprocessor communication so that a thread on one CPU can cause activity by a thread on another CPU

01

The DIC can route an interrupt to one or more CPUs in the following three ways:
• An interrupt can be directed to a specific processor only
• An interrupt can be directed to a defined group of processors. The MPCore views the first processor to accept the interrupt, typically the least loaded, as being best positioned to handle the interrupt
• An interrupt can be directed to all processors
The DIC is configurable to support between 0 and 255 hardware interrupt inputs

01

The Interrupt Distributor transmits to each CPU Interface the highest Pending interrupt for that interface
It receives back the information that the interrupt has been acknowledged, and can then change the status of the corresponding interrupt. The CPU Interface also transmits End of Interrupt Information (EOI), which enables the Interrupt Distributor to update the status of this interrupt from Active to Inactive
Pham Minh Khanh
Thanks
Ho Hoang Dung
Nguyen Tien Hung
Pham Huy Hoang