- No category
advertisement
Troubleshooting Cisco Nexus
7000 Series Switches
Jarod Xueyi Zhu, Premium Engineer
VIP TAC
技术支持专家 ;7年TAC技术支持中心工作经验;Catalyst 6500 Team Leader
VCP,RHCE
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
Agenda
Before You Get Started
•
NX-OS Troubleshooting Approach
•
Nexus 7000 Built-in Troubleshooting Tools
•
Architecture Overview
Troubleshooting
•
CPU
•
Control Plane
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
2
Before You Get Started
Troubleshooting Mind Map
What .. is broken or not functioning as expected
Why .. is it broken and is there a workaround
When .. the functionality broke or started to misbehave
Accurate
Problem Description
Successful
Troubleshooting
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
Platform.. knowledge: Hardware,
Software, Features, and
Troubleshooting capabilities
Topology .. knowledge and data path through topology
Interaction .. between Cisco devices and other vendors equipment's
3
Before You Get Started
NX-OS Approach
Facts
• NX-OS debugging and troubleshooting tools are very rich, and allow engineers to accurately assess the situation
• Customized ‘show techs” make the collection of related information accurate and quick.
Problem triggered
Problem detected
*Data*
*Collection*
90% cases problem identified
IF
Cisco VIP TAC engaged
Problem resolved
Case closed
Problem not Identified no sufficient data
VIP TAC recreate additional data
Problem identified
Special Code additional data
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
4
Before You Get Started
Troubleshooting Approach (Cont.)
Suggestions
Identify detection and trigger time as accurately as possible to set ‘good’ start up point for collected data search and analysis
Minimize delta time between trigger/detection time and data collection time
Try to recall all activities before trigger/detection time
Get proficient as much as possible with built-in tool box
Get familiar with specific feature troubleshooting cli, feature show tech-support output for on-thefly troubleshooting and analysis
Remember ..
Internal data logs have limited size, adjust them ahead of time for relevant features you have deployed
Even max-ed log size may not prevent data wrap up
Use configuration rollback or other configuration backup method while troubleshooting and making configuration changes
Forensic data survives reload or switchover via
‘
Onboard logging
’, ‘accounting-log’, ‘ nvram
’
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
5
Agenda
Before You Get Started
•
Traditional Versus NX-OS Troubleshooting Approach
•
Nexus 7000 Built-in Troubleshooting Tools
•
Architecture Overview
Troubleshooting
•
CPU
•
Control Plane
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
6
Built-In Troubleshooting Tools
Make Troubleshooting Easier and more Effective
— Almost Fun to Do
Powerful show cli
Standard CLI:
• Platform independent (PI) and dependent (PD) output
• Hardware keyword indicates platform hardware specific output
Engineering CLI
• Internal keyword
• No XML or SNMP support
Event-history logging
• Extensive feature and software component eventhistory logging
• Permanent engineering debugs output of process
Finite State Machine (FSM)
Logflash logging
• Extensive system activity logging to dedicated logflash with filtering to display only
‘what I want to see
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
7
Built-In Troubleshooting Tools
Make Troubleshooting Easier and more Effective
— Almost Fun to Do
Onboard & Accounting
Onboard logging, accounting log logging (config and exec)
• Forensic data surviving reload and switchover
• Hardware component events and manipulation activity
• Use it to ‘recall’ all activity around ‘trigger and detection’ time
GOLD system
• A diagnostic framework to detect hardware failures while the system is online and operational
• Test types:
• Bootup
• Health Monitoring
• On-demand
• Scheduled
Standard tools
• Ping, Traceroute
• Span, Netflow, XML, EEM
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
8
Built-In Troubleshooting Tools
Make Troubleshooting Easier and more Effective
— Almost Fun to Do
Debugs
• Traditional feature related debugs e.g. debug ip packet protocol igmp , debug ipv6 icmp, debug icmp
• NX-OS debugs with debugfilter, e.g. debug-filter ip packet direction inbound
ASIC info
• Easy to read asic counters and registers
• Software copy not clear-onread, must use clear cli to clear them
• Comprehensive per module,
ASIC, port, counter category filtering
ELAM & Ethanalyzer
• Embedded Logic Analyzer
Module ( ELAM capture ) provides detailed frame’s internal header info
• Built-in wireshark analyzer capturing mgmt interface and CPU traffic. The output can be redirected to a text file with no performance impact
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
9
Agenda
Before You Get Started
•
Traditional Versus NX-OS Troubleshooting Approach
•
Nexus 7000 Built-in Troubleshooting Tools
•
Architecture Overview
Troubleshooting
•
CPU
•
Control Plane
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
10
Built-In Troubleshooting Tools
System Architecture
— Multistage Switch Fabric
Facts
• Nexus 7000 implements 3-stage switch fabric
• Stages 1 and 3 on I/O modules
• Stage 2 on xbar modules
2nd stage
1
Crossbar
Fabric
ASIC
2
Crossbar
Fabric
ASIC
Fabric Modules
3 4
Crossbar
Fabric
ASIC
Crossbar
Fabric
ASIC
5
Crossbar
Fabric
ASIC
2 x 23Gbps per I/O slot per fabric module
Up to 230Gbps per I/O module with 5 fabric modules installed
Fabric
ASIC
Fabric
ASIC
20 x 23Gbps channels per fabric module
1st stage
3rd stage
Ingress Module Egress Module
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
11
Built-In Troubleshooting Tools
Architecture
— Unicast Routing Software Architecture
Facts
• uRIB digests all routing related information and builds the final routing table.
• Unicast Forwarding
Distribution Module
(UFDM) distributes forwarding information to Modules.
• FIB programs forwarding info on
Modules.
RIP IS-IS EIGRP Static OSPF v2
OSPF v3 BGP
u4RIB u6RIB
Unicast Routing Information Base (uRIB)
ARP
AM mRIB uFDM
Supervisor
I/O Module
FIB Manager
Forwarding Hardware
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
12
Built-In Troubleshooting Tools
Architecture
— Multicast Routing Software Architecture
Facts
• mRIB adds routes,
OIFs and handles updates when RPF changes
• mFDM distributes forwarding information to Modules.
• FIB programs forwarding info and
MET tables on
Modules
IGMP MSDP PIM PIM6 m4RIB m6RIB
Multicast Routing Information Base (mRIB) mFDM
FIB Manager
Forwarding / Replication Hardware
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
ICMPv6 / MLD uRIB
Supervisor
I/O Module
13
Agenda
Before You Get Started
•
Traditional Versus NX-OS Troubleshooting Approach
•
Nexus 7000 Built-in Troubleshooting Tools
•
Architecture Overview
Troubleshooting
•
CPU
•
Control Plane
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
14
Troubleshooting
CPU
— Is there a Problem?
Should I Panic?
High CPU utilization is not automatically problem indication!
NEXUS 7000 is dual core linux based system with robust preemptive scheduler
(one functional unit for both rp and sp)
Strict control-plane and data-plane separation
Scheduler assures fair access to CPU for all processes
Lower level processes (drivers) run in FIFO or non-preemptive mode
© 2014 Cisco and/or its affiliates. All rights reserved.
2GB
Internal CF
slot0
: log-flash:
Fabric
ASIC
Dedicated
Arbitration
Path
VOQ
1GE Inband
System Controller
4GB
DRAM
2MB
NVRAM
Central
Arbiter
1.66GHz
Dual-Core
Main
CPU
Cisco Public
15
Troubleshooting
CPU
—
Causes
Process
Misbehaving process(s)
Consume CPU cycles which impact normally-functioning processes
Delay or prevent CPU from processing control traffic
Usually triggered by a software bug, but it might be a product of a network event
Traffic
Unexpected traffic
Excessive CPU bound traffic, control-plane churn
Acess-list processing, hardware programming
Possible typical data center traffic (arp, ipv6 nd, etc)
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
16
Troubleshooting
CPU
—
Supervisor, General Health Check
N7k-3-VDC3# show system resources
Load average: 1 minute: 0.64 5 minutes: 1.08 15 minutes: 1.30
Processes : 3912 total, 2 running
CPU states : 4.5% user, 5.0% kernel, 90.5% idle
Memory usage: 4115232K total, 3434268K used, 680964K free
N7k-3-VDC3# show processes cpu history
1 2 111 11111211233 1 1 111 1 1 1 6 112 1 1 21132 1 111 123
919275058862141899918384800583739174756080779143297264026770
100
90
80
70
60 #
50 #
40 # #
30 ### # ## ##
20 # # # ## ###### # # ### # # ## ###
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
# = average CPU%
© 2014 Cisco and/or its affiliates. All rights reserved.
How many processes were scheduled to run in average per whole system in last 1, 5 and
15 minutes
How much of CPU cycles are used by user configured processes and kernel processes
Output IS calibrated for 2 cores
CPU utilization 60 seconds ago
Cisco Public
17
Troubleshooting
CPU
—
Identify the offending process(s)
N7K-3-VDC3# show system internal processes cpu top - 14:01:06 up 21 days, 15:35, 4 users, load average: 0.77, 0.73, 1.07
Tasks: 3257 total, 1 running, 422 sleeping, 0 stopped, 2834 zombie
Cpu(s): 5.8%us, 6.0%sy, 0.1%ni, 84.1%id, 0.4%wa, 0.1%hi, 3.4%si,
0.0%st
Mem: 4115232k total, 3875988k used, 239244k free, 82400k buffers
Swap: 0k total, 0k used, 0k free, 1817776k cached
Use X | no-more , where X is interval in seconds to get more snapshots
• Equivalent of Linux TOP monitoring tool output showing system processes across all vDCs
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22683 root 20 0 182m 63m 14m S 93.7 1.6 636:17.84 netstack
• Use it to cross check accuracy of
4149 root 20 0 111m 41m 19m S 4.5 1.0 994:43.26 stp
• Output is NOT calibrated for 2 cores
23028 root 20 0 101m 23m 9968 S 3.0 0.6 598:14.57 stp
3181 root 20 0 77684 4564 3352 S 1.5 0.1 0:30.35 securityd
processes using 100% CPU
4753 root 20 0 162m 45m 16m S 1.5 1.1 34:59.22 netstack
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
18
Troubleshooting
CPU
—
Examine the offending process(s)
N7K-3-VDC3# show processes cpu | egrep "PID|--|ospf"
PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
9337 102 72 1418 0.0% ospfv3
22916 118 62 1905 13.1% ospf
N7K-3-VDC3# show system internal sysmgr service pid 22916
Service "__inst_001__ospf" ("ospf", 58):
UUID = 0x41000119, PID = 22916, SAP = 320
State: SRV_STATE_HANDSHAKED (entered at time Thu Mar 3 21:53:59 2012).
Restart count: 1
Time of last restart: Thu Mar 3 21:53:58 2011.
The service never crashed since the last reboot.
Tag = 6467
Plugin ID: 1
PID
– Process ID
Runtime
– total non-idle time process has been actively using CPU
Invoked
– number of times process has been context switched voluntary
(finished job) and involuntary (scheduler interrupt) uSecs - average amount of time process was running during a single context switch
N7K-3-VDC3# show system internal sysmgr service name ospfv3 tag 8893
Service "__inst_001__ospfv3" ("ospfv3", 59):
UUID = 0x4100011A, PID = 9337, SAP = 328
State: SRV_STATE_HANDSHAKED (entered at time Fri Mar 25 22:33:10 2012).
Restart count: 2
Time of last restart: Fri Mar 25 22:33:09 2011.
The service never crashed since the last reboot.
Tag = 8893
Plugin ID: 1
© 2014 Cisco and/or its affiliates. All rights reserved.
Useful process level details
For testing purposes, process was manually restarted using ‘
8893
’ cli
Cisco Public restart ospfv3
19
Troubleshooting
CPU
—
Traffic Causes High CPU Utilization and Control-Plane Instability
Attackers Defense
Typical
“offending” datacenter
•
ARP, ND (IPv6)
•
DHCP traffic
•
Glean traffic (no ARP or ND)
•
Malicious traffic to 224.0.0.0/24 subnet
•
Fragments or malicious L2 mcast or ‘other’ traffic
Remember: misbehaving “expected” traffic, such as OSPF packets, might be a dangerous attacker as well
•
CPU protection via CoPP policers
•
CPU protection via L2/L3 hardware rate-limiters
(RL)
•
CoPP and RL default settings may need tweaking based on network requirement specifics
•
Both are configured/enabled per M1 I/O
Module
•
Total inband traffic allowed is the sum across all M1 I/O Modules
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
20
Troubleshooting
CPU
—
Traffic Causes High CPU Utilization and Control-Plane Instability
Problem
OSPF neighbor’s failing to come up.
•
Syslog messages report
OSPF neighbor failures
•
CPU states show high utilization caused by
OSPF and Netstack process.
N7K-1-VDC2# show system resources
Load average: 1 minute: 2.92 5 minutes: 2.38 15 minutes: 2.27
Processes : 1267 total, 4 running
CPU states : 34.0% user, 42.5% kernel, 23.5% idle
Memory usage: 4115232K total, 3638780K used, 476452K free
N7K-1-VDC2# show processes cpu sort
PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
3981 127 276 462 43.2% ospf
3841 267 78 3427 16.4% netstack
2941 34146488 7377876 4628 0.9% platform
3982 118 245 485 0.9% ospfv3
2011 Mar 26 15:38:56.395 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process
6467, Nbr 192.251.19.22 on Vlan19 from INIT to DOWN, DEADTIME
2011 Mar 26 15:38:56.584 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process
6467, Nbr 192.251.19.22 on Vlan19 from DOWN to INIT, HELLORCVD
2011 Mar 26 15:39:33.865 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process
6467, Nbr 192.251.19.22 on Vlan19 from INIT to DOWN, DEADTIME
2011 Mar 26 15:39:35.754 N7K-1-VDC2 %OSPF-5-NBRSTATE: ospf-6467 [3981] Process
6467, Nbr 192.251.19.22 on Vlan19 from DOWN to INIT, HELLORCVD
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
21
Troubleshooting
CPU
Traffic
—
Inband stats
N7K-1# show hardware internal cpu-mac inband stats | egrep " Rx|
Tx|counters|Throttle|Tick|rate|total|good|XOFF p|XON p"
RMON counters Rx Tx total packets 779905245 1421785114 good packets 779905245 1421650279 good octets (hi) 0 0 good octets (low) 172303021767 192965708376 total octets (hi) 0 0 total octets (low) 172302724342 192974265660
XON packets 0 67627
XOFF packets 0 67208
Interrupt counters
Error counters
Throttle statistics
Throttle interval ........... 2 * 100ms
Packet rate limit ........... 32000 pps
Tick counter ................ 12414130
Rx packet rate (current/max) 4993 / 20296 pps
Tx packet rate (current/max) 60 / 3474 pps
--snip--
The Challenge
how to identify offending traffic type and its source
Total number of frames received and send by CPU
Hard coded maximum limit, with larger packet size, this number may not be reached
How many times did throttling kicked in
CPU bound traffic current pps /maximum pps reached
Cisco Public
22
© 2014 Cisco and/or its affiliates. All rights reserved.
Troubleshooting
CPU Traffic
— Pktmgr debugs
N7K-1-VDC2# show system internal pktmgr interface vlan 64
Vlan64, ordinal: 117
SUP-traffic statistics: (sent/received)
Packets: 3771848 / 40687558
Bytes: 304360445 / 36018498390
Instant packet rate: 0 pps / 4951 pps
-- snip --
N7K-1-VDC2# debug pktmgr frame
2011 Mar 26 21:22:30.599670 netstack: In Vlan 64 0x0800 992 7
0000.1301.1301
-> 0100.5e00.0005
Vlan64
N7K-1-VDC2# show ip arp vlan 64 | i 0000.1301.1301
N7K-1-VDC2# show mac address-table address 0000.1301.1301 vlan 64
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+-----------------
-
64 0000.1301.1301 dynamic 0 F F Eth2/9
Use this cli first without specific interface to identify the ‘offending’ traffic - the one with the highest rate.
Alternatively, use ‘ show system internal pktmgr internal vdc inband
’ which identifies vDC interfaces and number of packet sent to the CPU debug-filter pktmgr vlan 64
Offending host mac
No ARP entry??
Source Port
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
23
Troubleshooting
CPU Traffic
—
Other Capture methods
Debug the offending process
N7K-1-VDC2# debug-filter ip ospf interface vlan 64
N7K-1-VDC2# debug logfile offending_traffic
N7K-1-VDC2# show debug logfile offending_traffic
2011 Mar 26 23:33:25.992586 ospf: 6467 [3981]
(default) rcvd: prty:7 ver:2 t:HELLO len:44 rid:0.0.0.0 area:0.0.0.0 crc:0xfdd2 aut:0 aukid:0 from
192.253.64.254/Vlan64
2011 Mar 26 23:33:25.992780 ospf: 6467 [3981] Invalid src address 192.253.64.254, should not be seen on
Vlan64
Ethanalyzer
•
Ethanalyzer can be used to capture the traffic that is taking the inband interface to the CPU.
•
Write the capture output to a pcap file and open it using wireshark for analysis
•
If still more digging is needed, use a more specific trigger to narrow down the search offending host (s)
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
24
Agenda
Before You Get Started
•
Traditional Versus NX-OS Troubleshooting Approach
•
Nexus 7000 Built-in Troubleshooting Tools
•
Architecture Overview
Troubleshooting
•
CPU
•
Control Plane
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
25
Troubleshooting
CoPP
— Essentials
Goal
CoPP protects the SUP against the following classes of traffic
•
Control Plane packets , such as Protocols Hellos and other Receives
•
Data Plan transit packets , such as Glean, Exceptions, and Redirects
•
Management Plane packets , such as SNMP, and SSH
Operation
•
NX-OS device segregates different packets destined to the inband interface into different classes.
•
Once these classes are identified, the NX-OS device polices or marks down packets, which ensure that the supervisor module is not overwhelmed.
•
CoPP policer is attached to the interface “control-plane”
Implementation
CoPP Policing is implemented on each forwarding engine independently:
• the configured policer’s values apply on a per forwarding engine basis and the aggregate traffic prone to hit the CPU is the sum of the conformed/transmit traffic on all of the forwarding engines
• CoPP can be modified from the default VDC only.
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
26
Troubleshooting
CoPP
—
Tighten the grip on Received packets (OSPF example)
Problem
Flapping OSPF neighbors!!
•
A faulty OSPF neighbor or an offending server is blasting the switch with
Hello packets.
•
Default CoPP is ratelimiting as designed, but that results on dropping legitimate neighbors packets as well.
N7K-1# show policy-map interface control-plane module 2 | egrep "servicepolicy|critical|ospf|police cir 39600|malicious"
service-policy input: copp-system-policy
class-map copp-system-class-critical (match-any)
match access-grp name copp-system-acl-ospf
No “malicious” class to block malicious traffic
match access-grp name copp-system-acl-ospf6
police cir 39600 kbps , bc 250 ms
N7K-1# show class-map type control-plane copp-system-class-critical | egrep class|ospf
class-map type control-plane match-any copp-system-class-critical
match access-grp name copp-system-acl-ospf
match access-grp name copp-system-acl-ospf6
N7K-1# show ip access-lists copp-system-acl-ospf
IP access list copp-system-acl-ospf
10 permit ospf any any
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
27
Troubleshooting
CoPP
—
Tighten the grip on Received packets (OSPF example) Cont.
Modify
copp-system-acl-ospf to permit the neighbors only
N7K-1# show ip access-lists copp-system-acl-ospf
IP access list copp-system-acl-ospf
10 permit ospf any any
20 permit ip 40.9.0.0/16 224.0.0.5/32
30 permit ip 40.9.0.0/16 224.0.0.6/32
Create
copp-system- acl-malicious access-list
N7K-1# show ip access-lists copp-system-acl-malicious
IP access list copp-system-acl-malicious
10 permit ip any 224.0.0.0/24
Remove
Add neighbors
Add
copp-system-classmalicious class, right before the last class default, with zero-rate policer to block all malicious traffic .
N7K-1# show policy-map interface control-plane module 2 | egrep
"service-policy|critical|ospf|police cir 39600|malicious|police cir 1 "
service-policy input: copp-system-policy
class-map copp-system-class-critical (match-any)
match access-grp name copp-system-acl-ospf
match access-grp name copp-system-acl-ospf6
police cir 39600 kbps , bc 250 ms class-map copp-system-class-malicious (match-any)
match access-grp name copp-system-acl-malicious
police cir 1 bps , bc 200 ms
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
28
Troubleshooting
CoPP
—
Tighten the grip on Received packets (OSPF example) Cont.
Verify
Check the CoPP policer for drops
•
The new class-map shows high rate of dropped packets.
•
Furthermore, the statistics results point to the module where the offending device is connected .
N7K-1# show policy-map interface control-plane module 2 class copp-system-classmalicious control Plane
service-policy input: copp-system-policy
class-map copp-system-class-malicious (match-any)
match access-grp name copp-system-acl-malicious
police cir 1 bps , bc 200 ms
module 2 :
conformed 0 bytes; action: drop violated 1799505072 bytes; action: drop
N7K-1# show policy-map interface control-plane module 1 class copp-system-classmalicious control Plane
service-policy input: copp-system-policy
class-map copp-system-class-malicious (match-any)
match access-grp name copp-system-acl-malicious
police cir 1 bps , bc 200 ms
module 1 :
conformed 0 bytes; action: drop violated 0 bytes; action: drop
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
29
Troubleshooting
Control Plan
—
Hardware Rate-limiters
Essentials
•
Rate-limiters can prevent redirected packets for egress exceptions from overwhelming the supervisor module
•
As with CoPP policers, modifying the default rates should be carefully planned before any configuration changes.
N7K-1# show hardware rate-limiter ?
[snip]
access-list-log Packets copied to supervisor for access-list logging
copy Data and control packets copied to supervisor
f1 Control packets from F1 modules to supervisor
layer-2 Layer-2 control and Bridged packets
layer-3 Layer-3 control and Routed packets
module Optionally specify a module number
receive Packets redirected to supervisor
| Pipe command output to filter
N7K-1# show hardware rate-limiter layer-2 mcast-snooping module 1
Units for Config: packets per second
Allowed, Dropped & Total: aggregated since last clear counters
Rate Limiter Class Parameters
------------------------------------------------------------ layer-2 mcast-snooping Config : 1500
Allowed : 302128
Dropped : 0
Total : 302128
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
30
Troubleshooting
Control Plan
—
Verifying Software Services health (OSPF example)
Sysmgr
The System Manager handles processes and monitors their health. It keeps the mapping of PIDs to UUIDs.
N7K-1-PeerA# show system internal sysmgr service name ospf
Service "__inst_001__ospf" ("ospf", 14):
UUID = 0x41000119 , PID = 3725, SAP = 320
State: SRV_STATE_HANDSHAKED (entered at time Wed Mar 14 15:47:34 2012).
Restart count: 1
Time of last restart: Wed Mar 14 15:47:33 2012.
The service never crashed since the last reboot.
Tag = 1
Plugin ID: 1
N7K-1-PeerA# show system internal sysmgr service all | egrep -i netstack|name
Name UUID PID SAP state Start count Tag Plugin ID netstack 0x00000221 5588 246 s0009 1 N/A 0
© 2014 Cisco and/or its affiliates. All rights reserved.
NOTE
Remember this:
SAP = 320
Cisco Public
31
Troubleshooting
Control Plan
—
Verifying Software Services health (OSPF example) Cont.
NetStack
Netstack is a full Network
Stack designed with
Modularity, High availability, and Virtualization implementation goals.
N7K-1-PeerA# show ip client ospf
Client: ospf-6467, uuid: 1090519321, pid: 3981, extended pid: 3981
Protocol: 89, client-index: 19, routing VRF id: 65535
Data MTS-SAP: 2339
Data messages, send successful: 209867328, failed: 13263152
N7K-1-PeerA# show system internal pktmgr client 0x221
Client uuid: 545, 4 filters, pid 3841
Check for OSPF IP client failures
Filter 1: EthType 0x0800,
Rx: 299923608, Drop: 0
Filter 2: EthType 0x86dd,
Rx: 1412579, Drop: 0
[snip]
Check for L2 client packet drops
Total Rx: 301346464, Drop: 0 , Tx: 144295338, Drop: 0
COS=0 Rx: 15993531, Tx: 87699456 COS=1 Rx: 1903980, Tx: 0
COS=2 Rx: 0, Tx: 0 COS=3 Rx: 0, Tx: 0
COS=4 Rx: 0, Tx: 0 COS=5 Rx: 3694169, Tx: 1
COS=6 Rx: 56191519, Tx: 56595881 COS=7 Rx: 223563265, Tx: 0
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
32
Troubleshooting
Control Plan
—
Verifying Software Services health (OSPF example) Cont.
MTS
•
"Messages and
Transactional Services
”.
MTS offers SAPs (Service
Access Points) to allow services to exchange messages
•
MTS provides complete fault isolation by handling data structure communications.
N7K-1-PeerA# show system internal mts sup sap 320 stats msg tx: 3328 byte tx: 396657 msg rx: 527 byte rx: 65045 opc sent to myself: 8927 max_q_size q_len limit (soft q limit): 1024 max_q_size q_bytes limit (soft q limit): 15% max_q_size ever reached: 17 max_fast_q_size (hard q limit): 4096 rebind count: 0
Waiting for response: none buf in transit: 0 bytes in transit: 0
N7k# show system internal mts buffers summary node sapno recv_q pers_q npers_q log_q sup 320 0 0 4592 0
Make sure the counters are incrementing (no memory leak) npers high value indicates OSPF MTS buffer leak
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
33
Unicast L2 and L3 Forwarding, ARP
Control Plan
— Golden rule
In case the issue you have encountered is urgent, complicated or you can’t figure it out, collect show tech-support output asap!
Related show tech(s)
N7K-1-VDC2# show tech-support sysmgr
N7K-1-VDC2# show tech-support netstack detail
N7K-1-VDC2# show tech-support pktmgr
N7K-1-VDC2# show tech-support <service>
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
34
Thank you.
© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public
advertisement
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Related manuals
advertisement