#### Meeting Latency and Jitter Demands of Beyond 5G Networking Era: **Are CNFs Up to the Challenge?**

Adil Bin Bhutto $^1~$  Ryota Kawashima $^2~$  Yuzo Taenaka $^1$  Youki Kadobayashi $^1$ 

<sup>1</sup>Nara Institute of Science and Technology, Japan
<sup>2</sup>Nagoya Institue of Technology, Japan

Contact: Adil Bin Bhutto <adil-b@ieee.org>

July 4th, 2024







#### Goal

• Explore the latency, jitter, and bandwidth characteristics of CNFs.

#### Idea

• Focus on CPU Power and Frequency Scaling Configurations.

#### Result

• Insight for predictable low latency/jitter and high throughput.



#### ► Introduction

- CPU Configuration and CNF
- ▶ Our Study
- Results
- Conclusion
- Contributions

### Compute Inter Connect for Beyond 5G

2 Introduction



Figure: Compute-Inter-Connect platform with heterogeneous technologies.<sup>1</sup>

Chaffi, M., Bariah, L., Muhaidat, S., & Debbah, M. (2023). Twelve scientific challenges for 6G: Rethinking the foundations of communications theory. IEEE Communications Surveys & Tutorials, 25(2), 868-904. Note: The fizure depicted here is a modified version taken from this publication.

## **Evolving Network Functions**

2 Introduction



Figure: Evolving Network Functions

## **Containerized Network Functions**

2 Introduction

#### Why CNF?

- 1. Agility
  - Lightweight Fast spin-off time
- 2. Portability
  - Open standards Wide adoption
- 3. Resource efficiency
- 4. Supporting ECO-System



Figure: Container architecture and its orchestration.

## Network I/O Acceleration and Virtualization

2 Introduction



#### Why DPDK?

- Polling mode  $\rightarrow$
- User space driver
- Core affinity  $\rightarrow$
- Optimized memory  $\rightarrow$
- Network virtualization -
- Cloud native acceleration

(B) Network Function on Bare-metal

Figure: Performance acceleration and vNet I/O technologies.



#### Introduction

#### ► CPU Configuration and CNF

▶ Our Study

Results

Conclusion

Contributions

### Why Focus on CPU P&C States?

3 CPU Configuration and CNF

#### **P-States (Performance States)**

 $\sqrt{
m Operating}$  Frequency Control  $\sqrt{
m Voltage}$  Control

- Advantages
  - + Dynamic performance scaling
  - + Energy Efficiency
  - + Thermal management
- Disadvantages
  - Potential performance impact
  - Transition latency

#### **C-States (Idle States)**

 $\sqrt{\rm Turns}~{\rm off}~{\rm Parts}~{\rm of}~{\rm CPU}~\sqrt{\rm Manage}~{\rm Power}~{\rm Consumption}$ 

- Advantages
  - + Power savings
  - + Extended battery life
  - + Reduced heat generation
- Disadvantages
  - Wake-up latency
  - Power management complexity
  - Potential performance penalty

### **CPU P&C State in CNF**

3 CPU Configuration and CNF

#### Effect of P&C states on Container

- CPU behavior **impacts** container **performance**.
- Dynamic frequency scaling → Unpredictable process execution speed
- Idle CPUs take longer to wake up and run processes.



Figure: Relation between containerized application and CPU P&C states.



Introduction

CPU Configuration and CNF

► Our Study

Results

Conclusion

Contributions



Understand the correlation between CNF performance and P&C Sate of modern CPU.

Address the possibility of using CNF in latency-sensitive applications.



#### **Exhaustive Experiments and Analysis**

- 10 Experiments + 152 Evaluations
- DPDK powered packet generation and forwarding CNF
- + Variable packet sized
  - + Variable packet rates
  - $+\pm PC$
- Real H/W Devices



#### Figure: Abstract Testbed

# Evaluation Design

4 Our Study

#### +PC (P&C State Enabled)

 $\checkmark$  CPU C States Support (enabled)  $\checkmark$  Speed Step Technology (enabled)  $\checkmark$  Turbo boost technology (enabled)  $\checkmark$  Speed shift Technology (enabled)  $\checkmark$  Thermal Velocity Boost Voltage Optimizations (enabled)

#### -PC (P&C State Disabled)

× CPU C States Support (disabled) × Speed Step Technology (disabled) × Turbo boost technology (disabled) × Speed shift Technology (disabled) × Thermal Velocity Boost Voltage Optimizations (disabled)

#### Hardware Virtualization (+PC / –PC)

× VT-d (disabled)

× SR-IOV (disabled)

#### Others

 $\sqrt{\rm Variable}$  packet sizes  $\sqrt{\rm Variable}$  packet rates  $\sqrt{\rm Simple}$  l2 forwarding with mac swap of UDP packets

#### Testbed 4 Our Study

Table 1: Machine specification

| Physical | Machine (Tester & DuT)                      |
|----------|---------------------------------------------|
| CPU      | Intel <sup>®</sup> CoreTM i7-13700 5.20 GHz |
| Cre      | (16 cores w/o HT)                           |
| Memory   | 16 GB (DDR4-3200)                           |
| PCIe     | PCI Express 4.0 [x16]                       |
| NIC      | Intel XL710 (i40e)                          |



Figure: Testbed

#### Testbed 4 Our Study



#### Testbed 4 Our Study

|                                              |                                                                 | Container/App     | (DPDK)    |  |
|----------------------------------------------|-----------------------------------------------------------------|-------------------|-----------|--|
|                                              |                                                                 | (DPDK)            | vSwitch   |  |
|                                              | able 1: Machine specification                                   | virtio ↔          |           |  |
| Physical                                     | Machine (Tester & DuT)                                          |                   |           |  |
| CPU                                          | Intel<br>$\ensuremath{\mathbb{R}}$ CoreTM i<br>7-13700 5.20 GHz | Shared unix socke | iii 🖌 🗋 👘 |  |
| 0.0                                          | (16 cores w/o HT)                                               | (c) Containeriz   | t1 vhost  |  |
| Memory                                       | 16 GB (DDR4-3200)                                               |                   | ) St      |  |
| PCIe                                         | PCI Express 4.0 [x16]                                           | Shared unix socke | 0         |  |
| NIC                                          | Intel XL710 (i40e)                                              |                   |           |  |
|                                              |                                                                 | vSwitch           |           |  |
|                                              |                                                                 | <b>VBWIGH</b>     |           |  |
|                                              |                                                                 |                   |           |  |
|                                              |                                                                 | ve                | th0 veth1 |  |
| MoonGen<br>(Traffic generator on bare-metal) |                                                                 | Host Ke           |           |  |
|                                              |                                                                 |                   |           |  |
| Tester UDP Traffic DuT                       |                                                                 |                   |           |  |
| (b) Baremetal                                |                                                                 |                   |           |  |
| eth0.1 eth0.2 eth1.1 eth1.2                  |                                                                 |                   |           |  |
| eth0.1 eth0.2                                |                                                                 |                   |           |  |
|                                              |                                                                 |                   |           |  |
| (a) Loopback                                 |                                                                 |                   |           |  |
| 40GbE                                        |                                                                 |                   |           |  |
| K                                            |                                                                 |                   |           |  |
|                                              |                                                                 |                   |           |  |

Contributed Ann

(55510)

Figure: Testbed



Introduction

CPU Configuration and CNF

Our Study

Results

Conclusion

Contributions

12/20

#### Throughput 5 Results



 $\begin{array}{ll} \textbf{64-800} & \rightarrow \mathsf{BNF(+PC)} \approx \mathsf{BNF(-PC)} > \mathsf{CNF(+PC)} > \mathsf{CNF(-PC)} \\ \textbf{800-1500} & \rightarrow \mathsf{BNF(+PC)} \approx \mathsf{BNF(-PC)} > \mathsf{CNF(-PC)} \approx \mathsf{CNF(+PC)} \\ & \rightarrow \mathsf{BNF(\pm PC)} \approx \mathsf{LIMIT(40GbE)} \end{array}$ 

\*BNF: Baremetal Network Function



# Latency and Jitter: CNF at 100 Kpps 5 Results



+PC : Latency ( $\downarrow$ ) Jitter ( $\uparrow$ )

-PC : Latency ( $\uparrow$ ) Jitter ( $\downarrow$ )

# Latency and Jitter: BNF at 100 Kpps 5 Results



+PC / -PC :

# Latency and Jitter: CNF at Max Incoming Rate 5 Results



# Latency and Jitter: BNF at Max Incoming Rate 5 Results



+**PC** : Latency ( $\uparrow$ ) Jitter ( $\uparrow$ )

-**PC** : Latency  $(\downarrow)$  Jitter  $(\downarrow)$ 



#### Findings

- 1. The default implementation of CNF is not suitable for latency-sensitive NF.
- 2. DPDK shines in baremetal and is capable of high throughput and low latency with predictable jitter.
- 3. Introduction of vNet I/O in CNFs  $\rightarrow$  Might be the Culprit for poor performance



Introduction

- CPU Configuration and CNF
- Our Study
- Results



#### Contributions

## **Conclusion and Future Work**

6 Conclusion

- Found CNFs are more prone to system settings than baremental NF.
- Needs further study towards the explainability of performance variations.

#### **Future Direction**

- Explain the observed performance disparity in CNFs through further analysis and instrumentation.
- Develop a better and **improved** version of **vNet I/O**.

## CPU Probing using RD-TSC [Ongoing]

6 Conclusion

- Q2. Why does CNF suffer from poor performance compared to bare-metal NF?
  - RO2.1. To determine the network component in the CNF architecture causing the bottleneck.

- Need to measure the performance of networking components of Virtio and find the performance bottleneck with a low overhead method like reading TSC counter of CPU (RD\_TSC).
- Propose a solution to address the bottleneck.
- Conduct **combinational experiments** to narrow down the effect of P&C states further.





Introduction

CPU Configuration and CNF

Our Study

Results

Conclusion

► Contributions

20/20

### **Understanding CNF Perfromance**

7 Contributions

#### **Key Points**

- + P&C-State  $\Rightarrow$  Throughput  $\uparrow$  Jitter  $\uparrow$
- Low Latency/Jitter Application 🗸

 $\sim$  Packet Size imes Traffic Rate  $\sim$  CPU Configuration.