We are running on UCCE 8.0.3 and CVP 8.0.1 and facing issues with CVP call flows erroring out after executing action elements that do a DB lookup. Please note the affected modules have been running in production for around 2 years now and no changes were done to them in the recent past.
To provide background on the issue our Contact center servers had a McAfee patch (8.8.6) applied on them 3 weeks ago which caused memory leak issues on the servers. This resulted in nearly all the Contact center servers becoming unresponsive over the time and then finally crashing. The initial fix provided by server team along with McAfee for the memory leak issue was to reboot all the servers with this McAfee patch. Most of the servers by then had already crashed due to this issue and led to multiple application outages.
A change window was created and all the contact center servers were rebooted as per Cisco’s recommendations. Post to that we received complaints that of some calls not accepting user inputs. We worked on the issue and identified that it was a particular CVP module that was causing the problems. On further analysis we found the call flow defaulting to ICM at some menu options without recognizing the inputs. We found the error “A VoiceXML error occurred of type "error.com.cisco.media.resource.unavailable.tts” in all affected calls. The calls would take the default/failure path to ICM after this error. Below were the troubleshooting steps we followed.
1) Stopped and started the CVP call server and VXML services. Rebooted the CVP combo machines.
2) Deactivated the Mcafee AV as this error started increasing only after the memory leak issues started. Reactivated McAfee again after validating that turning it off did not fix the issue.
3) Engaged Cisco TAC. The TAC engineer also went through all the same troubleshooting steps, made test calls and collected the VXML application, call server and GW logs though no other error other than "error.com.cisco.media.resource.unavailable.tts” showed up on them. TAC also tried replacing the wavefiles around the failure point and also reloaded the VXML gateways but with no luck.
4) We released and re-deployed the application again on the production VXML servers.
5) When this did not fix the issue we redeployed the application from the lab source code.
6) We then suspected that the Action element node that was doing the lookups before the failure point could be causing the issue. We then deleted the Action elements in the flow, rewrote and redeployed the code to production. This seemed to do the trick and the call flow started working again.
But 3 days before we again received similar kind of reports with Another module. This module also had an action element doing a DB lookup and in a similar way errors after executing the action element. Please note the DB connection, data being passed to the DB and data returned back is fine. The DB connection also gets closed properly. The failure happens right after the first executable node say “Voice element” in the call flow. We have been able to replicate the error with Menu option voice elements, normal audio elements (wavefiles/tts) and audio elements with say it smart functionality.
The call flow executes the Action element doing the DB lookup and then when it encounters the first voice element in the flow errors out with "error.com.cisco.media.resource.unavailable.tts”. If there is a menu with a failure path down the flow after the error it takes the failure path or in case there is no menu it just disconnects the call.
We tried all the above mentioned troubleshooting steps (except the McAfee one) along with some new ones below but none of them were able to fix the issue.
7) Created new java class file and replaced them in the action element.
8) Forced the DB connection to be made directly instead of using the jdbc pool.
9) Rewrote part of code that was affected with different element names.
None of them have been able to resolve the issue. Has anyone come across this issue in CVP with Action element performing a DB lookup? Have we hit a cisco bug in the application or was there memory corruption on the servers due to the McAfee memory leak issues.
Any thoughts or ideas would be much help ful.
Could it be that the gateway's 'http client response timeout' is
expiring while it's waiting for the VXMLServer to return the next vxml page?
To prevent the timeout, you need to set the VXMLProperty
in the Settings tab of an Audio element just prior to the Database element
Usually when the gateway timer expires, you'd get the error.badfetch
event - but it's worth checking.
I don't know if this is the same issue. But I had some problems when the database lookups, if it had not been used a while and the first connection failed.
Review your JDBC configuration and confirm if it has testOnBorrow and ValidationQuery as in below.
Also use this as a reference: Tomcat JDBC Connection Pool configuration for production and development | Codingpedia.org
if you do not actually use the TTS, i recon update every voice element - input mode and change the it from "both" to "dtmf". Then reload the app.