Unable to access the APIC-EM gui after upgrading to 1.4. Unable to access the development console either - it just hangs on login. Previously tried an upgrade and hit the same issue, need to restore to a previous snapshot to get it back up and running. Have completed multiple reboots ( It is a 3 node cluster - each node has 32G of RAM and 500G hdd space - disk IO is 316 MB/s).
All the services seem to be running
[Mon Feb 27 11:08:49 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
$ sudo service grapevine status
grapevine is running
grapevine_capacity_manager RUNNING pid 4138, uptime 0:53:36
grapevine_capacity_manager_lxc_plugin RUNNING pid 4143, uptime 0:53:36
grapevine_cassandra RUNNING pid 3273, uptime 0:54:26
grapevine_client RUNNING pid 3268, uptime 0:54:26
grapevine_coordinator_service RUNNING pid 3282, uptime 0:54:26
grapevine_dlx_service RUNNING pid 3279, uptime 0:54:26
grapevine_log_collector RUNNING pid 3283, uptime 0:54:26
grapevine_root RUNNING pid 4154, uptime 0:53:35
grapevine_supervisor_event_listener RUNNING pid 3267, uptime 0:54:26
grapevine_ui RUNNING pid 3434, uptime 0:54:25
reverse-proxy=4.0.2.509 RUNNING pid 3272, uptime 0:54:26
router=4.0.2.509 RUNNING pid 3277, uptime 0:54:26
(grapevine)
[Mon Feb 27 11:12:46 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
$ dd if=/dev/zero of=/tmp/foo bs=1M count=512 conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 1.70002 s, 316 MB/s
(grapevine)
[Mon Feb 27 11:14:24 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
Suspect all of the services did not start. Did you remove the hosts from the cluster prior to upgrade.
You can try running /home/grapevine/bin/reset_grapevine. Answer no to all the prompts and wait about an hour.
If this is a single host cluster then here is a shot in the dark.
Log in as grapevine
cd bin
./harvest_all_clients
grape config update enable_policy true
./grow_all_services
Wait
From the 1.4 release notes though:
In case a failure occurs on a multi-host cluster during any software updates (Linux files) and you have not increased the idle timeout using the GUI, then perform the following steps:
1) Log into each host and enter the following command: $ sudo cat /proc/net/xt_recent/ROGUE | awk '{print $1}’
Note: This command will list all IP addresses that have been automatically blocked by the internal firewall because requests from these IP addresses have exceeded a predetermined threshold.
2) If the command in Step 1 returns an IP address, then perform a reboot on the host where the above command has been entered (same host as the user is logged in).
Note: The hosts should be rebooted in a synchronous order and never two hosts rebooted at the same time. After the host or hosts reboot, upload the software update package file to the controller again using the GUI.
Fault seems to be with the Task service which is constantly failing,
Service could not be started. Refer to the service logs for more details (service=task-service, version=4.1.2.37, client_id=9361a77b-0968-41ed-bdd6-e39b276b18f3)
suggest opening a case at this point. have seen a few other upgrade problems and it is best to track them.
Comments
0 comments
Please sign in to leave a comment.