#### Network Processor and Its Applications

Prof. Yan Luo

4/22/2003

# Network Processor Architecture and Applications

- Introduction to networking
- Network applications
  - IPv4 routing, classification etc. (Traditional)
  - URL-based switching, transcoding, etc. (New)
- Network Processor Architectures
  - Cisco Toaster
  - IBM(Hifn) PowerNP
  - Intel IXP

#### What the Internet Needs?

Increasing Huge Amount of Packets & Routing, Packet Classification, Encryption, QoS, New Applications and Protocols, etc.....







- Source: Network Processor Tutorial in Micro 34 Mangione-Smith & Memik
  ISO OSI (Open Systems Interconnection) not fully implemented
- Presentation and Session layers not present in TCP/IP







### **Application Categorization**

- Control-Plane tasks
  Less time-critical

  - Control and management of device operation
    - Table maintenance, port states, etc.
- Data-Plane tasks
  - Operations occurring real-time on "packet" path"
  - Core device operations
    - Receive, process and transmit packets



#### Data Plane Tasks

- Media Access Control
  - Low-level protocol implementation
    - Ethernet, SONET framing, ATM cell processing, etc.
- Data Parsing
  - Parsing cell or packet headers for address or protocol information
- Classification
  - Identify packet against a criteria (filtering / forwarding decision, QoS, accounting, etc.)
- Data Transformation
  - Transformation of packet data between protocols
- Traffic Management
  - Queuing, scheduling and policing packet data

#### **Other Network Processor Applications**

- Routing table lookup
  - Determine the next hop for incoming packets
- Packet Classification
  - classify packets using header fields against a set of rules
- URL-based Switching
  - Distribute HTTP requests based on URLs.
- Transcoding
  - Encryption/Decryption, intrusion detection, firewall, access control checking, denial-of-service



Routers determine next hop and forward packets

#### Packet Classification

- Routers are required to distinguish packets for
  - Flow identification
  - Fair sharing of bandwidth
  - QoS
  - Security
  - Accounting, billing
  - etc
- Packets are classified by rules
  - Src IP, Dest IP, src port #, dest port # etc
- Classification Algorithm Metrics
  - Search speed
  - Storage cost
  - Scalability
  - Updates
  - Etc.



- Tasks
  - Traverse the packet data (request) for each arriving packet and classify it:
    - Contains '.jpg' -> to image server
    - Contains 'cgi-bin/' -> to application server

#### Transcoders

- Two important requirements
  - If the receiver is not capable of interpreting the stored data (multimedia transcoders)
    - wireless receivers, hand-held devices, etc.
  - Compression for bandwidth and storage efficiency



Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik



#### Why Network Processors

- Current Situation
  - Data rates are increasing
  - Protocols are becoming more dynamic and sophisticated
  - Protocols are being introduced more rapidly
- Processing Elements
  - GP(General-purpose Processor)
    - Programmable, Not optimized for networking applications
  - ASIC(Application Specific Integrated Circuit)
    - high processing capacity, long time to develop, Lack the flexibility
  - NP(Network Processor)
    - achieve high processing performance
    - programming flexibility
    - Cheaper than GP

### Organizing Processor Resources

- Design decisions:
  - High-level organization
  - ISA and micro architecture
  - Memory and I/O integration
- Today's commercial NPs:
  - Chip multiprocessors
  - Most are multithreaded
  - Exploit little ILP (Cisco does)
  - No cache
  - Micro-programmed



- Almost all data plane operations execute on the programmable XMC
- Pipeline stages are assigned tasks e.g. classification, routing, firewall, MPLS
  - Classic SW load balancing problem
- External SDRAM shared by common pipe stages

#### IBM PowerNP

- 16 pico-procesors and 1 powerPC
- Each pico-processor
  - Support 2 hardware threads
  - 3 stage pipeline : fetch/decode/execute
- Dyadic Processing Unit
  - Two pico-processors
  - 2KB Shared memory
  - Tree search engine
- Focus is layers 2-4
- PowerPC 405 for control plane operations
  - 16K I and D caches
- Target is OC-48



#### C-Port C-5 Chip Architecture Switch CONTROL SRAM SRAM SRAM Fabric PROM PCI Table Fabric Executive Lookup Processor Processor Unit **Buffer Mngt** Queue **60Gbps Busses** Unit Mngt Unit Cluster Cluster CP-CP-CP-CP-CP-1 CP-2 CP-3 CP-0 12 13 14 15 ) Ç PHY PHY PHY PHY PHY PHY PHY PHY

#### Some Challenges

- Intelligent Design
  - Given a selection of programs, a target network link speed, the 'best' design for the processor
    - Least area
    - Least power
    - Most performance
- Write efficient multithreaded programs
  - NPs have
    - Heterogeneous computer resources
    - Non-uniform memory
    - Multiple interacting threads of execution
    - Real-time constraints
  - Make use of resources
    - How to use special instructions and hardware assists
      - Compilers
      - Hand-coded
  - Multithreaded programs
    - Manage access to shared state
    - Synchronization between threads



### IXP1200 Microengine

- 4 hardware contexts
  - Single issue processor
  - Explicit optional context switch on SRAM access
- Registers
  - All are single ported
  - Separate GPR
  - 256\*6 = 1536 registers total
- 32-bit ALU
  - Can access GPR or XFER registers
- Shared hash unit
  - 1/2/3 values 48b/64b
  - For IP routing hashing
- Standard 5 stage pipeline
- 4KB SRAM instruction store not a cache!
- Barrel shifter





- XScale core replaces StrongARM
- Microengines
  - Faster
  - More: 2 clusters of 4 microengines each
- Local memory
- Next neighbor routes added between microengines
- Hardware to accelerate CRC operations and Random number generation
  - 16 entry CAM

#### Different Types of Memory

| Туре               | Width<br>(byte) | Size<br>(bytes) | Approx<br>unloaded<br>latency<br>(cycles) | Notes                             |
|--------------------|-----------------|-----------------|-------------------------------------------|-----------------------------------|
| Local              | 4               | 2560            | 1                                         | Indexed addressing post incr/decr |
| On-chip<br>Scratch | 4               | 16K             | 60                                        | Atomic ops                        |
| SRAM               | 4               | 256M            | 150                                       | Atomic ops                        |
| DRAM               | 8               | 2G              | 300                                       | Direct path to/fro MSF            |

#### IXA Software Framework



## Summary

- NP is developing very fast and is a hot research area
- Multithreaded NP Architectures provide tremendous packet processing capability
- NP can be applied in various network layers and applications
  - Traditional apps forwarding, classification
  - Advanced apps transcoding, URL-based switching, security etc.
  - New apps