## DESIGNCON<sup>®</sup> 2015

## CRITICAL MEMORY PERFORMANCE METRICS FOR DDR4 SYSTEMS

Barbara Aichinger Vice President New Business Development FuturePlus Systems Corporation



### DESIGNGON<sup>®</sup> 2015

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Outline

- How can we monitor the very fast low voltage DDR4 memory without effecting the system?
- Traditional Performance Metrics
  - Bandwidth, Latency and Power Management
- New Performance Metrics
  - Bus Mode Analysis
  - Page Hit Analysis
  - Multiple Open Banks Analysis
  - Bank Utilization Analysis
  - Row Hammer (excessive Activate) Detection
- Measured DDR4 Performance in a real target





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### How to Monitor the DDR4 Memory

- Use a slot interposer to 'listen' to the traffic between the DIMM and the Memory Controller
  - A small amount of current is 'tapped' off the bus
  - Only the Address, Command and Control bus needs to be monitored
  - Same probing method can be used for SODIMM and 'memory down' using a BGA adapter





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### **DDR4 DIMM Interposer**







January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Monitoring the DDR4 Memory

 Highly attenuated signals are then amplified and the bus is 'replicated' inside the equipment







January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

# The system boots and runs never knowing the equipment is present







### **Performance Metrics**

- Billions of clock cycles every second
- Old methods of measurement captures small amount of time in large trace buffers
- New method uses sophisticated event counters to:
  - Never miss a clock cycle
  - Count Performance Metrics for long periods of time REAL TIME





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Work Smarter not Harder

- For Performance metrics the DDR Detective<sup>®</sup> uses counters instead the traditional trace memory
  - To capture a second of DDR4 traffic would take
    4.5Gbytes of logic analyzer/protocol analyzer trace
    depth \$\$\$\$!
    - 1 hour = 270 Gbytes of trace depth and then time to sift through it and post process!
  - By using large counters and counting events and the time between events we can achieve hours and days worth of metrics with no trace buffer memory and with no time consuming post processing





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

4 Memory Channels each channel is 2 slots



ASUS X99 DDR4 Motherboard



**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Traditional Measurements

- Bandwidth
  - Command Bus Utilization
  - Data Bus Utilization
- Power Management
- Latency





### Bandwidth

- Overhead
  - Any use of the bus other than a Read or a Write
  - Command Bus Utilization
- Data Bus
  - Utilization: the % of the time that Read or Write
    Data is being transferred
  - Bandwidth: the amount of data transferred per second





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### **Command Bus Utilization**



January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### **Command Bus Utilization**







January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Data Bus Utilization

2400MT/s





FuturePlus Systems

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### DDR4 Bandwidth





Power Tools for Bus Analysis

**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### 4 Channels 1 DIMM per Channel





Power Tools for Bus Analysis

**FuturePlus Systems** 

Strategic Solutions Partner





JESIGN LON

Power Tools for Bus Analysis

**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Power Management while running Google StressApp



**FuturePlus Systems** 

TECHNOLOGIES Strategic Solutions Partner

**(EYSIGHT** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

## Power Management Analysis



**FuturePlus Systems** 

**Embedded Video** 

Power Tools for Bus Analysis

Strategic Solutions Partner

TECHNOLOGIES

#### Power Management

- ~50M servers Servers World Wide
- Each Server averages 16-24 DIMMs
  - 800M to 1.2B DIMMs
- Even small power savings per DIMM can add up

Every time **Facebook**'s data center engineers figure out a way to **reduce** server consumption **by a single watt**, the improvement, at Facebook's scale, has the potential to add **millions of dollars** to the company's bottom line.



Yevgeny Sverdlik Editor in Chief Data Center Knowledge FuturePlus Systems



### Latency

- Several Jedec Paramters apply:
  - RD to WR same rank tSR\_RTW
  - RD to PRE/PREA same Rank tRTP
  - WR to PRE(SB) or PREA (SR) tWR
  - Read to Read different Rank tDR\_RTR
  - Read to Write different Rank DR\_RTW
  - Write to Read different Rank tDR\_WTR
  - Write to Write different Rank tDR\_WTW





#### January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

D to Same Bank Group R to Same Rank R Same Bank Group D Different Bank Group 'R Different Bank Group to ACT Same Bank Group to ACT Different Bank Group tFAWmin Same Rank D Same Bank Group RD Different Bank Group PRE or PREA Same Rank PRE(SB) or PREA (SR) eset to any Command or CKE ODT Enabled MRS Other Command or ODT High ODT Enabled and PAR\_IN have odd # of 1's CL after Reset Low to High **ODT Enabled** 1st ZQCL after Reset Low to High ODT Enabled om ZQCS to any Command ODT Enabled \_ow to High, then CKE Low to High ODT Enabled Command Non-Deselect ZQCL or ZQCS RD or CKE Low or ODT Hi ODT Enabled SRX Non-Deselect inimum Pulse Width PDX is less than tPDmin PDX is greater than tPDmax PDE





FuturePlus Systems

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Latency Measurements

measurement made at 1867

| V#  | Parameter | Description                | Spec | Measured |
|-----|-----------|----------------------------|------|----------|
| V2  | tSR_RTW   | RD to WR same Rank         | 8    | 10       |
| V11 | tRTP      | RD to PRE same Rank        | 8    | 8        |
| V12 | tWR       | WR to PRE SB or PREA<br>SR | 31   | 31       |
| V53 | tDR_RTR   | RD to RD diff Rank         | 5    | 6        |
| V57 | tDR_WTR   | WR to RD diff Rank         | 3    | 6        |
| V59 | tDR_WTW   | WR to WR diff Rank         | 5    | 8        |



**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Intervening Commands

| veForm Violation     | s Setup Storage Qua | alification                             | Trigger Mode Register S | et Configuration | Violations Counts |                |           |             |          |             |
|----------------------|---------------------|-----------------------------------------|-------------------------|------------------|-------------------|----------------|-----------|-------------|----------|-------------|
| ync Notes            | Bank Address=       | s= 0   Trigger-> 10 states [ -10.69 nS] |                         |                  |                   |                |           |             |          |             |
|                      | 1 nS                |                                         |                         |                  |                   |                |           |             |          |             |
| Time                 | (528 (529           | 52A                                     | x52B x52C x52           | D (52E (5        | 52F               | <b>531 532</b> | 533       | 534         | X535     | ×536        |
| Command              | DES                 | WR-R(                                   | PRE-RO DES              |                  | RE-RO             | WR-R1          | DES       | ACT-R0      | DES      | WR-R1       |
| TRIGGER              |                     |                                         |                         |                  |                   |                |           |             |          |             |
| RA_VALID             |                     |                                         |                         |                  |                   |                |           |             |          |             |
| Bank Group           |                     |                                         |                         | t                |                   | χ2             |           |             |          | у           |
| Bank Address         | /                   |                                         |                         | t                |                   | λ              |           |             |          |             |
| Address              | X                   | 103E0                                   | X83E0                   |                  |                   | 10228          |           | <b>5A34</b> |          | X10220      |
| RAddr                | >                   | 5933                                    | χ5                      |                  |                   | ×5A34          | 5         | 5A34        | χ5       | ×5A34       |
| CAddr                |                     | (3E0                                    |                         |                  |                   | 228            |           | 234         |          | 220         |
| PV                   |                     |                                         |                         |                  |                   |                |           |             |          |             |
| ViolationID          |                     |                                         |                         |                  |                   | 59             | $\lambda$ |             |          |             |
| R0 RPS               |                     | /                                       | ACTIVE                  |                  |                   | X              | ACTIVE    | X           | ACTIVE   | X           |
| R1 RP5               | )                   |                                         | ACTIVE                  |                  |                   | X              | ACTIVE    |             | ACTIVE   | X           |
| R2 RPS               | /                   | /                                       | \                       |                  |                   |                | <u> </u>  |             | ∖        |             |
| R3 RPS               | ^                   |                                         | λ                       |                  |                   |                |           |             | λ        | $\bigwedge$ |
| ODT1                 |                     |                                         |                         |                  |                   |                |           |             |          |             |
| ODTO                 |                     |                                         |                         |                  |                   |                |           |             |          |             |
| RESETN               |                     |                                         |                         |                  |                   |                |           |             |          |             |
| ALERTN               |                     |                                         |                         |                  |                   |                |           |             |          |             |
| PAR                  |                     |                                         |                         |                  |                   |                |           |             |          |             |
|                      |                     |                                         |                         |                  |                   |                |           |             |          |             |
|                      |                     |                                         |                         |                  |                   |                |           |             |          |             |
|                      |                     |                                         |                         |                  |                   |                |           |             |          |             |
| 1315 ≑               |                     |                                         | Begin to End =          | 5,415 states [5  | 788635 µS] Be     | gin 👻 End      |           | •           |          |             |
|                      |                     |                                         |                         |                  |                   |                |           |             |          |             |
| INOLOGIES            |                     |                                         |                         |                  |                   |                | Г         | uur         | eriu     | s sys       |
| gic Solutions Partne | r                   |                                         |                         |                  |                   |                |           | Pow         | er Tools | for Bus Ar  |

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Latency Measurements

| V#     | Parameter           | Description                   | Spec | Measured     |
|--------|---------------------|-------------------------------|------|--------------|
| V1     | tCCD_L              | RD to RD Same Bank Group      | 5    | 6            |
| V3     | tCCD_L              | WR to WR Same Bank<br>Group   | 5    | 6            |
| V4     | tCCD_S              | RD to RD diff Bank Group      | 4    | 4            |
| V5     | tCCD_S              | WR to WR diff Bank Group      | 4    | 4            |
| V6     | tRRD_L              | ACT to ACT Same Bank<br>Group | 5    | 5            |
| V7     | tWTR_L              | ACT to ACT diff Bank Group    | 4    | 4            |
| V9     | tWTR_L              | WR to RD Same Bank Group      | 22   | 23           |
| V10    | tWTR_S              | WR to RD Diff Bank Group      | 17   | 19           |
| KEYSIG | i <b>HT</b><br>GIES |                               | ]    | FuturePlus S |

Power Tools for Bus Analysis

Strategic Solutions Partner

### Latency

- Good designs operate on the edge of the spec
- Architectural tradeoffs will occur
- Do I need margin?
  - Design for the worst case and buy quality parts





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

## New Performance Metrics

#### Page Hit Analysis

- Read Hit: Page was Open
- Read Miss : Page was not Open, Transaction was preceded by an ACT
- Write Hit: Page was Open
- Write Miss: Page was not Open, Transaction was preceded by an ACT
- Unused: Page was opened and closed and never accessed

#### Multiple Open Banks

- Open Banks make for faster access IF your going to that bank on the next access...performance hit if your not
- Power hit when banks are open

#### Bank Group Analysis

- New for DDR4: Back to back access to same bank is a performance hit
- Faster to have back to back accesses to different bank groups





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA





FuturePlus Systems

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA



Strategic Solutions Partner

#### Multiple Open Banks

#### How many are open at any one time



TECHNOLOGIES Strategic Solutions Partner

JESIGN LON®

#### Bank Group Access Analysis

- tCCD\_L
  - Takes longer for back to back RD/WR accesses to the same bank group
- tCCD\_S
  - Can reduce latency by going to different bank groups





### DESIGNCON<sup>®</sup> 2015

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Bank Group Access Analysis

Relative to the previous transaction how many times did the following transaction go to the same/different bank group



Strategic Solutions Partner

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### More Performance Metrics!

- Bank Utilization
  - What happens during a chip kill or page retirement scenario?
  - How does the traffic reallocate?
  - What are the performance implications?
- Do I have system hot spots?
  - Row Hammer (excessive Activates)
- Fast Boot
  - Why does the system take so long to boot?





#### **Bank Utilization**

Percentage of total cycles the banks are open (running Google Stress App)





**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### **Bank Utilization**





**FuturePlus Systems** 

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### **Boot Analysis**





**FuturePlus Systems** 

#### Row Hammer

Excessive ACTIVATE commands to a single Row

- An identified failure mechanism in DDR3 memory
- The result of charge leakage
- Current work around is to increase the refresh rate
  - Burns power, lowers performance
  - Lowers the statistical probability of an error





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

# Critical Applications cannot tolerate Row Hammer failures

- Memory Vendors are specifying how many ACTIVATES per Retention cycle to a single row the memory can tolerate
- However there is no way the memory controller or the memory can count this
- Critical Software applications should be tested to ensure that the Row Hammer Threshold is not





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### One method to detect Row Hammer





January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

#### Detecting Excessive ACTIVATE Commands

| prage Qualification<br>oper<br>Nations Setup                                      | -      |        |                |        |        |                                                                                                                                                                                                                                     |
|-----------------------------------------------------------------------------------|--------|--------|----------------|--------|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ow Hammer Setup                                                                   |        |        | 1              |        |        | Data Packets                                                                                                                                                                                                                        |
| n Log<br>de Register Set                                                          | Bank 7 |        | -              |        |        | T1U0 Rank: 1 Bank: 4 RA 0500<br>T1U0 Rank: 1 Bank: 4 RA 0500<br>T1U1 Rank: 1 Bank: 4 RA 2515<br>T1U1 Rank: 1 Bank: 4 RA 2515<br>T1U1 Rank: 0 Bank: 2 RA 2515<br>T1U1 Rank: 0 Bank: 2 RA 2515                                        |
| eformance Counters<br>ate Lating<br>aveForm<br>stations Counts<br>w Hammer Output | Bank 6 | -      | 27             |        |        | T2U1 Rank 0 Bank 4 RA 2815<br>T2U1 Rank 0 Bank 4 RA 2815<br>T1U1 Rank 0 Bank 6 RA 2815<br>T1U1 Rank 0 Bank 6 RA 2815<br>T1U1 Rank 1 Bank 6 RA 2815<br>T1U1 Rank 1 Bank 6 RA 2815<br>T1U1 Rank 1 Bank 6 RA 2815                      |
|                                                                                   | Bank 5 | -      | 9 <u>4</u>     |        |        | T1U1 Rank: 0 Bank: 3 RA 081A<br>T1U1 Rank: 0 Bank: 3 RA 081A<br>T1U1 Rank: 0 Bank: 2 RA 081A<br>T1U1 Rank: 0 Bank: 2 RA 081A<br>T1U1 Rank: 1 Bank: 1 RA 081A<br>T2U1 Rank: 1 Bank: 1 RA 081A                                        |
|                                                                                   | Bank 4 | ÷      | ( <del>-</del> |        |        | T1U1: Rank: 1 Bank: 0 RA 081A<br>T1U1: Rank: 1 Bank: 0 RA 081A<br>T1U0: Rank: 0 Bank: 5 RA 2807<br>T1U0: Rank: 0 Bank: 5 RA 2807<br>T1U0: Rank: 0 Bank: 1 RA 2807<br>T1U0: Rank: 0 Bank: 1 RA 2807<br>T1U0: Rank: 0 Bank: 1 RA 2807 |
| _                                                                                 | Bank 3 | 1.5    |                |        |        | 1100 Rank: 1 Bank: 3 RA 2007<br>1100 Rank: 1 Bank: 5 RA 2004<br>1100 Rank: 1 Bank: 5 RA 2004<br>1100 Rank: 1 Bank: 5 RA 2004<br>1100 Rank: 1 Bank: 3 RA 2004<br>1100 Rank: 1 Bank: 3 RA 2004<br>1100 Rank: 1 Bank: 5 RA 000         |
|                                                                                   | Bank 2 | -      | -              |        |        | 1111: Rank: 1 Bank: 6 RA 0200<br>1100: Rank: 1 Bank: 2 RA 27E0<br>1100: Rank: 1 Bank: 2 RA 27E0<br>1100: Rank: 1 Bank: 3 RA 07E0<br>1100: Rank: 1 Bank: 3 RA 07E0<br>1100: Rank: 1 Bank: 6 RA 0808                                  |
|                                                                                   | Bank 1 | -      | ×.             |        |        | T100, Rank: 1 Bank: 6 RA: 0808 +<br>Run Log Ouput<br>• Sort by Time                                                                                                                                                                 |
|                                                                                   | Bank 0 | -      | -              |        |        | Sott by Type<br>Statue Pists Only<br>Threshold Pists Only<br>Both                                                                                                                                                                   |
|                                                                                   |        | Rank 0 | Rank 1         | Rank 2 | Rank 3 | Total RunTime: 15 Seconds                                                                                                                                                                                                           |



FuturePlus Systems

January 27-30, 2015 | Santa Clara Convention Center | Santa Clara, CA

### Critical DDR4 Performance Metrics

- Memory Controller/System Architecture
  - Can this insight lead to better designs?
  - Benchmark Servers Memory Performance
- Workload Analysis
  - Should the Memory Controller settings be based on criteria set by the workload?
  - Can compilers be made better?
  - Can critical applications be written to avoid Row Hammer?
- Do we all need a DDR5?

Work Smarter not Harder and understand what we have





### Summary

- Power Management, Bandwidth, Latency
- NEW Metrics:
  - Page Hit Analysis
  - Multiple Open Banks
  - Bank Group Analysis
  - Bank Utilization
  - Boot Analysis
  - Row Hammer Detection
  - New Measurements give insight into new designs and better architectures





### **Contact Information**

#### Barbara Aichinger

Vice President New Business Development

#### FuturePlus Systems

Barb.Aichinger@FuturePlus.com

www.FuturePlus.com

Check out our new website dedicated to

DDR Memory! www.DDRDetective.com



