This issue was first noticed during a normal check of the switch while another task was being performed. Ping times to the switch’s management interface were higher than expected and “sh proc cpu sort” showed the 4500 switch in question pegged at 99% CPU. The main culprit and cause of the high CPU was the process “Cat4k Mgmt LoPri”. It was over 90% alone. I did a bit of research and noticed the link above. I will now walk you through the process of what I did and what I found. After this fix, I got the overall CPU usage back to 34% on average.
The first commands I ran from the Cisco troubleshooting guide were:
Switch#debug platform packet all receive buffer
platform packet debugging is on
Switch#show platform cpu packet buffered | i Src
*Note: By adding “| i Src” on the end, I clean up the output to only show source and destination mac addresses of traffic hitting and being process by the CPU. ie: the source of our issue.
I noticed that everything was going to the same identical IP address. Every single packet was the same, destination-wise. For reference, we will call that MAC address 33:33:33:00:00:00. I did the usual to try to trace this with the idea in the back of my head that this was an invalid MAC, but nonetheless I pressed on. I did the usual “show mac address-table address 33:33:33:00:00:00” hoping for an easy one where I was handed the culprit workstation’s port on a silver platter, but we’re never that lucky are we? :)
Cisco also mentioned running a SPAN session as another troubleshooting step, so that’s what I did next. If you need reference on running a SPAN session, check my other article here: Embedded Packet Capture. I used one of the MAC addresses found in my work above under the Src field and traced that back to a physical port. The first one I tried traced back perfectly as expected as this was a true, physical host. My SPAN session looked like this:
Switch(config)#monitor session 1 source interface Gi2/24
Switch(config)#monitor session 1 destination interface Gi2/48
I had a laptop connected to the same Vlan as my problem traffic and that was connected in Gi2/48. I opened wireshark and started to capture what was going on from one of the problem ports in question, Gi2/24. Here’s an idea of what I saw:
There was an absolute flood of these messages. When I say flood, I am talking out of a capture of 500,000 packets, 490,000 of them were the messages above. No wonder the CPU was running hot!
I tossed that query into Google (“Cisco Multicast Listener Report”) and found some interesting information. Reading through a few of them, you will find some information about the Intel I217-LM NICs. Seems there was a bad driver out there in the past. It also seems a lot of people have seen this same issue. The recommend fix was to do these three things:
- Disable Intel AMT in the BIOS
- Update Intel I217-LM Driver to the latest one
- Disable IPv6 traffic on the individual network connection.
I started into this by finding the culprit model workstation that was in use. I then found and downloaded the latest driver which came out a couple of months ago. I tried to fix this with the least amount of alterations to our company configuration and here’s what resolved the issue for me. The first few workstations, I disabled IPv6 in the connection properties and updated the driver. Boom- that MAC dropped from my packet captures. Good. Next I tried a few more (I had about 12 culprits in total). On the next few I tried only updating the driver, nothing else. Once updated I rebooted and these machines dropped off my report as well.
I checked back with my switch and here is an idea of what I saw once all machines had been worked on:
Switch# show processes cpu sorted
CPU utilization for five seconds: 34%/0%; one minute: 35%; five minutes: 37%
So that’s all there is to it. That process helped me resolve the specific issue I had and even if the root cause is different from what mine was, you should be able to track it down that way. If you have any questions, leave a comment and I’d be happy to help where I can.
Original Cisco Document referenced for this “fix” is located here: