- We have been using PNP for a while now. We have had some issues, but we have been slowly figuring them out one by one and giving feedback when possible. However, recently I have been running into an issue where my deployments can get to the controller and the controller updates the rule status to "Getting Device Info" and never continues after that point. 16 minutes go by and the deployment fails. I have mentioned this issue before in a long extended thread and wanted to move it over to a new thread. Old thread for reference: {PNP}Streamlining Zero-Touch
- We are using option 43 for deployment:
- "5A1D;B2;K5;I10.255.72.116;J443"
- We were recently experimenting with pnp startup vlan and having issues as our 6Ks are still running 15.1, for now. I am also having this issue in my lab with a 6K that is running 15.4.
- Output of sh pnp trace: (Excluded)
What happens if you change the option 43 to: 5A1N;B2;K4;I10.255.72.116;J80
1st contact is HTTP then it switches to HTTPS.
This also does not work. Seeing the same symptoms in the GUI. The only thing that I can think of that has changed with the controller recently is that we were finally able to add a cert to access the controller GUI after a lengthy TAC case to resolve some issues we were seeing when trying to add the cert. Here is the trace when trying to use port 80:
Switch#sh pnp trace <Output Excluded>
this looks very suspicious....
"
[03/13/17 17:15:32.908 UTC 16 398] pnpa_ntp_sync : Unable to configure NTP Server IP[10.255.72.116]
[03/13/17 17:15:32.935 UTC 2A 398] pnpa_disc_trustpool_install: NTP sync unsuccessful
[03/13/17 17:15:32.936 UTC 2B 398] 10.stdby Disabled;
Can you verify ntp server is ok and the NTP server IP address is correct. PKI is impossible without correct ntp.
Is this regarding the ntp server that the deployment switch receives from DHCP or is this pertaining to the NTP server that the controller is utilizing?
I looked at my DHCP scope options and the deployment switches should be receiving the correct NTP server which is 10.255.72.50
what certificate has been installed on APIC-EM? If that certificate's issuing CA is not part of trust bundle, do not think it will work for you with port 443 as initial connection. Port 80 should work as we just install server certificate (self-signed or not) to device, then switch to HTTPs after that.
Is the ntp server reachable from the device?
- The cert was installed in the "Certificate" section under "Network Settings" and the trust pool was updated, using the gui, after installing the cert. We did have plenty of issues with getting the cert to install and had a TAC case open for some time to get the controller to take the cert.
I tried with my option 43 set to K4 and to use port 80 again and I see the same issue. - Yes, the device can ping the ntp server
Can you try with unplanned workflow? From the log, it seems device contacted to PnP sever.
Device does not show up as unplanned when attempting to use the unplanned workflow. Trace below:
Switch#sh pnp trace
<Output Excluded>
[03/20/17 12:54:32.003 UTC 6E 284] send_timeout_response: Timeout status response sent for client 'PID:WS-C3650-24PS,VID:V03,SN:FDO2007Q084' : OPRESPSTATUS:OP1140:PID:WS-C3650-24PS,VID:V03,SN:FDO2007Q084OP123:1.0OP13OPTRANSS19:TRANSACTION TIMEOUTOPSTATUS33:PROXY CLIENT RETRY COUNT EXCEEDEDOPERRMSG75:Proxy client retry count exceeded. Proxy client process is no longer active
What is your switch code version? It seems that you only waited a little bit two and half minutes before you get into console.
Can you wait a little longer for next time, say at least 6 mins?
Also, what is APIC-EM version? Can you send me your APIC-EM server certificate on APIC-EM?
Actually let that switch sit for a couple of hours. But the trace doesn't really show that, but it sit from 8 EST this morning all the way until until like 1030 EST.
We are using 3.7.3, which has worked in the past and had several successful prod deployments.
APIC-EM is on 1.4.0.1959
Will dig out the cert that we used and PM it to you. We did have a TAC case when we were having issues with installing the cert. Here is that case # if you wanted to reference that and get a better picture of what was going on: SR 681437834
According to the release notes for PnP 1.4:
3650 --> 3.6.5E, 3.7.4.E, 16.1.3, 16.2.2, 16.3.1
3650-24PDM --> 16.2.2
3650-48FQM --> 16.2.2
- Going to try one of those versions, however, I believe the C881 I was testing had a compatible version, but I'll have to double check. What was the purpose of pushing up the supported version. Asking this because our environment is currently all on 3.7.3, potentially going to 3.6.6, but would still like to know if this is going to be a common occurrence in the future.
From the log it seem that after the cert is installed the deployment switch, the switch never communicates back with the controller.
What is device ?
FDO2038Z03P
did you configure this device in project/pre-provision workflow? If so, what you see in APIC-EM side?
configure this device rule in a project and in the GUI this is all I see:
2017-03-21 13:43:03 (Eastern Daylight Time) | Failed health check since device is stuck in non-terminal state FILESYSTEM_INFO_REQUESTED for more than threshold time: 0 hours, 16 minutes, 0 seconds |
2017-03-21 13:26:24 (Eastern Daylight Time) | Device first contact |
My rule is set up as such: 2017-03-21_1446
The EULA is accepted, its just cut out of the screen shot
Is it 2-member stack, Zak? Also, can you uncheck the "device certificate" and test again?
Yes, this example is a 2 member stack, however I have tried this with a single member stack and had the same result.
With that box unchecked, I get the same result. In the logs, the failed health check fails in the same place, right after the cert gets pushed to the switch.
latest pnp-service.log from server and "show pnp tech" from switch when you did the test?
Still exploring this issue with Peng and in a TAC case. We are theorizing that it is an issue with how the 3 tier cert chain that we installed on our controller is validated on the switch during the PNP process.
The latest analysis sounds like it is related name constraints field in certificate. I will keep you posted.
Comments
0 comments
Please sign in to leave a comment.