Cisco 3850 – High CPU Usage (fed and iosd processes)

Recently, out of nowhere, I had a 3850 stack that spiked in terms of CPU usage. This particular stack had no recent changes in configuration and had been  up and running for around 20 weeks since the last reboot. Some of the symptoms were high ping times for anything passing through this switch and a significant amount of lag when trying to SSH into this stack. The average CPU for each core on switch 1 (the master switch) was in the 90th percentile. It was pretty much pegged there with little change throughout the day.

After checking the processes and the CPU they were using, I noticed that the “fed” and “iosd” processes were using a large amount of CPU. I started diving deeper into this to try to resolve the issue. I then ran the following commands:

  • show process cpu detail process fed sorted | ex 0.0
  • show process cpu detail process iosd sorted | ex 0.0

That gave me some more insight as to what was happening. That is where something big jumped out at me. There was a single process that was running at 97%. It was “NGWC l2m”. I consulted with TAC and they mentioned that this was from IPv6 requests being flooded to this stack. Since I don’t use IPv6, we ran a command to disable these requests from affecting the system software. Enabling snooping allowed these requests to be handled by the hardware. This is enabled by entering the following command:

  • ipv6 mld snooping

This gave us IMMEDIATE relief in terms of CPU. We went from the 90’s down to the 30’s. If you happen to come across a similar issue, give this a try and see if you are affected by the same thing I was.

Hope that helps!

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail
Kevin Blackburn

Kevin Blackburn

Cisco CCNP, Senior Network Engineer in the Healthcare Industry. Currently working on my CCIE R&S which is the focus of most of my latest blog posts. #NFD15 Delegate.

17 thoughts on “Cisco 3850 – High CPU Usage (fed and iosd processes)

  • September 22, 2015 at 11:30 am
    Permalink

    MANY Thanks Mate….i came across the same issue..fixed now…

    Reply
    • kevin
      September 24, 2015 at 12:23 am
      Permalink

      Glad it worked! It really bothered me that it was something so easy to fix, but I know how glad I was to have it resolved!

      Reply
  • December 7, 2015 at 2:42 pm
    Permalink

    Exactly the same symptoms and solution. Thanks!

    Reply
  • December 11, 2015 at 5:13 pm
    Permalink

    Just experienced this today, same resolution. Thanks for posting this.

    Cisco engineer I spoke with had mentioned in his experience he’d seen this a few times, each of those times being caused by a software bug with Intel NIC’s. Apparently there’s an issue where they flood IPv6 once connected.

    I wasn’t able to get a capture of the data before I ended up applying the fix, but wondering if you ended up sourcing your problem?

    Reply
    • kevin
      December 24, 2015 at 2:57 pm
      Permalink

      Unfortunately not. In this case, we did not proceed further into the sourcing of the flooding. I did happen to read up a bit on it and did find other materials about Intel NICs causing some issue, but nothing concrete that I could use to pinpoint the culprit. Glad it helped

      Reply
  • December 17, 2015 at 7:02 am
    Permalink

    Hi,

    Would like to ask to those who have fixed it..does your log also shows NGWC l2m as the culprit?

    We are having the same issue.

    But on our side, the main cpu consumption is from fed-ots-main. Does the command, ipv6 mld snooping, can solve the same problem?

    Reply
    • kevin
      December 24, 2015 at 2:58 pm
      Permalink

      Are you running IPv6 on your network? I noticed this command was not disruptive on our network as we are IPv4 only at this time, so it could be worth trying as it seems to be similar in nature.

      Reply
  • April 29, 2016 at 11:13 pm
    Permalink

    very well. thanx. but i don’t undertand why this command goes well if my LAN Works exclusively in IPV4 mode. Some One knows?

    Reply
    • kevin
      April 30, 2016 at 7:07 pm
      Permalink

      I’m in the same boat with you. That being said though, some devices will send out IPv6 traffic even though you are running an exclusively IPv4 network. For instance, one of my other posts talks about a bad Intel driver that starts spamming IPv6 multicast traffic. That’s why I think this helps in our situations.

      Reply
  • May 2, 2016 at 5:33 pm
    Permalink

    Same thing here.. drops like a stone when you put that command in.

    This just happened.. we are running cat3k_caa-base.SPA.03.02.03.SE.pkg on this stack.

    We have two other stacks running 3.6 (3.2 is a little old.. need to upgrade those) and they have not shown this issue..

    Is this a bug? Is Cisco telling people about this or trying to fix it??

    Reply
    • kevin
      May 6, 2016 at 12:12 am
      Permalink

      Sometimes it’s not a bug and as I mentioned can be caused by IPv6 traffic coming from some devices on your network you might not even be aware of. In our case at this specific site, it was an Intel driver. Just like you, I have other sites with no issue at all. That’s the frustrating part!

      Reply
  • June 15, 2016 at 3:16 pm
    Permalink

    hi team,

    even i have same issue of high cpu utilization on 3850 & it is in stack.

    But on our side, the main cpu consumption is from fed.

    3850#sh processes cpu sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 30%; one minute: 30%; five minutes: 28%
    Core 1: CPU utilization for five seconds: 83%; one minute: 31%; five minutes: 29%
    Core 2: CPU utilization for five seconds: 18%; one minute: 41%; five minutes: 41%
    Core 3: CPU utilization for five seconds: 35%; one minute: 39%; five minutes: 46%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    5666 1524454 31555501 409 26.83 26.71 26.86 1088 fed
    12912 2930673 29927026 233 6.63 6.54 7.18 34816 iosd
    6208 2727373 31704221 1932 5.05 0.63 0.44 0 eicored
    6199 3758880 28193270 1543 0.48 0.62 0.62 0 pdsd
    5668 4262855 27079103 47 0.43 0.38 0.34 0 stack-mgr
    12908 3151748 25289615 18 0.29 0.17 0.16 0 wcm

    Sar-srvm-3850# show process cpu detail process fed sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 27%; one minute: 31%; five minutes: 28%
    Core 1: CPU utilization for five seconds: 18%; one minute: 35%; five minutes: 31%
    Core 2: CPU utilization for five seconds: 93%; one minute: 27%; five minutes: 31%
    Core 3: CPU utilization for five seconds: 9%; one minute: 58%; five minutes: 54%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    5666 L 591104 3153820 409 26.41 26.78 26.74 1088 fed
    5666 L 2 13604 1202240 8063681 0 23.76 23.90 23.98 0 PunjectRx
    5666 L 3 6131 291328 3349063 0 1.30 1.25 1.14 0 IntrDrv
    5666 L 1 6108 1321686 3428682 0 0.53 0.51 0.49 0 fed-ots-nfl
    5666 L 1 13605 2779899 1569305 0 0.43 0.60 0.56 0 PunjectTx
    5666 L 3 6105 2163267 4861843 0 0.24 0.24 0.24 0 fed-ots-main
    5666 L 0 11101 1429268 1837584 0 0.14 0.22 0.25 0 Xcvr

    even i tried by configuring ” ipv6 mld snooping” on global mode .it could not help.please advice us

    Reply
  • June 15, 2016 at 3:25 pm
    Permalink

    3850#show processes cpu detailed process iosd sort | ex 0.00
    Core 0: CPU utilization for five seconds: 50%; one minute: 25%; five minutes: 26%
    Core 1: CPU utilization for five seconds: 17%; one minute: 23%; five minutes: 29%
    Core 2: CPU utilization for five seconds: 12%; one minute: 22%; five minutes: 30%
    Core 3: CPU utilization for five seconds: 58%; one minute: 67%; five minutes: 57%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    12912 L 3046543 2993137 233 7.27 6.24 7.12 34816 iosd
    12912 L 1 14131 4129071 3416646 0 4.37 3.86 4.70 34816 iosd.fastpath
    12912 L 1 12912 4004280 3462402 0 2.90 2.33 2.36 34816 iosd
    243 I 911510 1096371 0 1.00 0.66 0.66 0 Spanning Tree
    201 I 1572423 1218806 0 0.77 1.22 1.44 0 IP Host Track Proce
    98 I 1713736 1342998 0 0.77 0.22 0.22 0 PLATFORM_MGR SPI in
    68 I 80916 2179130 0 0.66 0.22 0.22 0 Net Background
    244 I 1459120 3536814 0 0.22 0.11 0.11 0 UDLD
    64 I 694362 3533164 0 0.22 0.11 0.11 0 IPC Bootstrap
    286 I 3169390 1270789 0 0.11 0.22 0.22 0 DAI Packet Process
    221 I 1117452 7036651 0 0.11 0.22 0.22 0 CDP Protocol

    even we have running on csico iso “cat3k_caa-universalk9.SPA.03.03.05.SE.150-1.EZ5.bin”, please advice us

    Reply
  • August 22, 2016 at 9:24 pm
    Permalink

    It worked for me.

    Before:

    WRc3650-V120-STK120#show process cpu detail process fed sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 18%; one minute: 32%; five minutes: 40%
    Core 1: CPU utilization for five seconds: 18%; one minute: 39%; five minutes: 52%
    Core 2: CPU utilization for five seconds: 52%; one minute: 47%; five minutes: 53%
    Core 3: CPU utilization for five seconds: 94%; one minute: 72%; five minutes: 53%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    5838 L 3888990 5801644 199 20.99 22.09 22.51 1088 fed
    5838 L 3 13834 1218571 3646855 0 19.52 20.40 20.71 0 PunjectTx
    5838 L 0 6280 1035338 789887 0 0.37 0.49 0.50 0 fed-ots-nfl

    WRc3650-V120-STK120#show process cpu detail process iosd sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 70%; one minute: 37%; five minutes: 39%
    Core 1: CPU utilization for five seconds: 57%; one minute: 46%; five minutes: 51%
    Core 2: CPU utilization for five seconds: 32%; one minute: 47%; five minutes: 52%
    Core 3: CPU utilization for five seconds: 45%; one minute: 63%; five minutes: 54%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    13082 L 965710 2253554 709 23.46 23.43 23.60 0 iosd
    172 I 287247 121971 0 95.00 94.11 93.88 0 NGWC L2M
    253 I 3981685 3438306 0 1.22 1.66 1.66 0 Spanning Tree
    100 I 3192860 2340372 0 0.44 0.33 0.33 0 PLATFORM_MGR SPI in
    70 I 499470 1653348 0 0.44 0.22 0.22 0 Net Background
    ——————————————————————————————————————————-
    After:

    WRc3650-V120-STK120#show process cpu detail process fed sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 27%; one minute: 31%; five minutes: 31%
    Core 1: CPU utilization for five seconds: 36%; one minute: 32%; five minutes: 32%
    Core 2: CPU utilization for five seconds: 23%; one minute: 24%; five minutes: 25%
    Core 3: CPU utilization for five seconds: 15%; one minute: 18%; five minutes: 21%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    5838 L 208893 5853920 198 12.36 11.69 11.77 1088 fed
    5838 L 2 13833 1302720 5791908 0 6.25 5.83 5.66 0 PunjectRx
    5838 L 3 6306 629260 5839261 0 3.69 3.77 3.63 0 IntrDrv
    5838 L 0 6277 3724672 1813681 0 0.85 0.80 0.83 0 fed-ots-main
    5838 L 3 13834 1376561 3653525 0 0.57 0.44 0.84 0 PunjectTx
    5838 L 1 11524 2341322 4082066 0 0.38 0.25 0.24 0 Xcvr

    WRc3650-V120-STK120#show process cpu detail process iosd sorted | ex 0.0
    Core 0: CPU utilization for five seconds: 29%; one minute: 31%; five minutes: 31%
    Core 1: CPU utilization for five seconds: 35%; one minute: 32%; five minutes: 32%
    Core 2: CPU utilization for five seconds: 24%; one minute: 24%; five minutes: 25%
    Core 3: CPU utilization for five seconds: 16%; one minute: 19%; five minutes: 21%
    PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    (%) (%) (%)
    13082 L 1580710 2259871 710 13.02 12.20 12.34 0 iosd
    13082 L 1 13082 1435147 2059491 0 9.26 7.99 8.27 0 iosd
    13082 L 0 14366 3176350 8887148 0 3.67 4.17 4.02 34816 iosd.fastpath
    301 I 230650 424824 0 25.00 20.44 19.66 0 MLD_SNOOP
    70 I 503710 1655159 0 1.11 0.44 0.33 0 Net Background
    229 I 599190 1902601 0 1.00 0.77 0.66 0 CDP Protocol
    219 I 255290 6811628 0 0.55 0.55 0.55 0 Tunnel IOSd shim DB
    100 I 3197950 2344294 0 0.33 0.44 0.44 0 PLATFORM_MGR SPI in
    237 I 64350 1735553 0 0.33 0.11 0.11 0 IP ARP Retry Ager
    22 I 328562 7675661 0 0.33 0.55 0.55 0 CMI IOSd task
    254 I 205430 7129422 0 0.22 0.22 0.22 0 UDLD

    Within 15-20 seconds I could see my SSH connection going back to normal. Before that, I could barely type a command here. Great information. Thank you for sharing!

    Reply
    • kevin
      August 24, 2016 at 1:36 pm
      Permalink

      Awesome news, glad it worked out for you!

      Reply
  • October 10, 2016 at 10:26 am
    Permalink

    What are the reason if my cpu utilization is going high due to Punject rx . Actually i want to know Punject Rx .

    Reply
    • kevin
      November 11, 2016 at 1:29 am
      Permalink

      Havent seen that one myself unfortunately. Sorry

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *