dmacminn
10-12-07, 05:01 PM
Some staff admits to problems, some don't. Some don't have a clue! Read some of my rants about email support (ones that weren't edited, moved or deleted) and realize that I know why you are upset.
[/B]
ALL Staff admit problems - every single one of our agents in this company has been informed that there are issues with the Powweb MySQL services at present, and they have been told to acknowledge the issue.
The difficulty is, that some agents while completely comfortable with the system management/support tools, are not correctly isolating what the actual issue is ...
They are not expected to be able to analyze code or applications and determine where the issue is arising -- we simply do NOT support coding issues or applications, at all.
SO, someone contacts, and the first thing out of their mouth is "My website is slow" and consequently the ticket gets opened with the subject - "Slow website" --
NONSENSE, the webservers are running under 5% utilization, the CGI gives occasional issues (..overly frequent, 500 - Internal Server errors due to MySQL stuff), but the ROOT issue is that someone is using ... Joomla, phpBB, phpNuke, SMF, OsCommerce, ____your app here___, which is directly tied to the performance of the MySQL system.
"Houston, this is not a slow website" -- it is slow/stalling MySQL, and that's what should have been identified/communicated right up front ---
I mean if its just slow getting to your website, then you should:
1) clear your browser cache
2) run a trace route and check for routing problems
and, if if you've made recent changes then you should:
3) recheck your code and optimize your databases, etc.
About training on MySQL Issues:
Of course, you offer then, that we should better train our T1 agents to spot the issue ...
well, we do .. and we retrain ... and workshop ... and refresh... and revisit...
But, the reality is that MYSQL is a small portion of the support issue ... as I've noted before, if only 8-12% of Powweb customers ever contact about the issue, and we have a multiple centers with several hundred support staff, and Powweb total calls account for, say 8%-10% of the calls, that means that generally less than 1 in 100 calls relates to a Powweb MySQL -- so assume, on a busy day we take 15 calls about slow MySQL sites -- how many days, on average, does a Tier1 support agent go before he gets a Powweb MySQL call? 3 to 4 days?, 5 to 6 days? 1 a week, 1 in a week in a half ...
So .. lets finish the math ... maybe 1 or 2 calls in say 360 calls handled or .6% of calls related to the issue (yes... that's less than 1% of calls a T1 agent sees).
For comparison, they would handle email calls about 40% of the time ...
So, EVERY agent T1,T2,T3 KNOWS about the MySQL problem ....
but not every T1 will catch that the issue IS a MySQL problem for the call...
Now, lets look at handling a MySQL call:
What needs to be done?
1) analyze the site quickly and determine if the "website" that isn't coming up is actually a MySQL application -- if it is a MySQL issue, check to see if there are other reports within the last 15 minutes (trend analysis) and if possible rule out database corruption (which also cause a MySQL site to respond slowly)
2) Capture the report for trending analysis based on the CGI Pool and affected MySQL server
3) Check with Network Operations if they have a recent/current high connection/high load condition, or have shut down sites or issued a service restart in the last 10-15 minutes
4) Recommend a course of action for the specific account --- generally, no action at all, since the issue is systemic and is not related to the account.
5) Resolve the report once it is trended in and notify the customer.
That's the reality of the support process...
On the communication front ... we've almost cried "Wolf" several times, when we honestly thought that more of the issue would be corrected by upgrading both the software and the hardware in the system, from stem to stern --- as you know, the CGI upgrade made it worse! it drove even more connections/queries to the MySQL system ... and, the MySQL41 servers ended up having sometimes 40 connections open from a single user .... which brought things to a crawl very quickly.
So, like you (the customer), we are ALSO COMMITTED to resolving the MySQL Performance issues on a PERMANENT (or as nearly so as can be done in a rapidly growing/changing webhosting environment).
For that reason, we are also committed to not "announcing" change until it is sandboxed, evaluated, live tested and sometimes even implemented on a server in a "blind" comparison [even the support team are not told which system has been upgraded to avoid implementing things that people perceive (placcebo effect) have helped].
When any significant proven change is implemented across the platform, we communicate it.
We do not announce every setting change or tuning change that the project team try and test...
And, since the "complete" solution is not yet in view -- there will be no sweeping announcement made, until we are fairlyl certain that we can deliver a noticeable, significant improvement.
Until that solution (or set of them) is implemented, we will continue to improve monitoring, reduce outages, aggressively shutdown sites that cause issues and work forward on incremental changes that present themselves as likely to improve things.
Again, that is NOT what you want to hear ... (nor, I can assure you, is it what I want to hear as a support team member), but it is the reality ... and, I think we can all agree, that knowing where things are, without "pie-in-the-sky" promises, allows you to evaluate whether Powweb meets the requirements you need to host your website.
[/B]
ALL Staff admit problems - every single one of our agents in this company has been informed that there are issues with the Powweb MySQL services at present, and they have been told to acknowledge the issue.
The difficulty is, that some agents while completely comfortable with the system management/support tools, are not correctly isolating what the actual issue is ...
They are not expected to be able to analyze code or applications and determine where the issue is arising -- we simply do NOT support coding issues or applications, at all.
SO, someone contacts, and the first thing out of their mouth is "My website is slow" and consequently the ticket gets opened with the subject - "Slow website" --
NONSENSE, the webservers are running under 5% utilization, the CGI gives occasional issues (..overly frequent, 500 - Internal Server errors due to MySQL stuff), but the ROOT issue is that someone is using ... Joomla, phpBB, phpNuke, SMF, OsCommerce, ____your app here___, which is directly tied to the performance of the MySQL system.
"Houston, this is not a slow website" -- it is slow/stalling MySQL, and that's what should have been identified/communicated right up front ---
I mean if its just slow getting to your website, then you should:
1) clear your browser cache
2) run a trace route and check for routing problems
and, if if you've made recent changes then you should:
3) recheck your code and optimize your databases, etc.
About training on MySQL Issues:
Of course, you offer then, that we should better train our T1 agents to spot the issue ...
well, we do .. and we retrain ... and workshop ... and refresh... and revisit...
But, the reality is that MYSQL is a small portion of the support issue ... as I've noted before, if only 8-12% of Powweb customers ever contact about the issue, and we have a multiple centers with several hundred support staff, and Powweb total calls account for, say 8%-10% of the calls, that means that generally less than 1 in 100 calls relates to a Powweb MySQL -- so assume, on a busy day we take 15 calls about slow MySQL sites -- how many days, on average, does a Tier1 support agent go before he gets a Powweb MySQL call? 3 to 4 days?, 5 to 6 days? 1 a week, 1 in a week in a half ...
So .. lets finish the math ... maybe 1 or 2 calls in say 360 calls handled or .6% of calls related to the issue (yes... that's less than 1% of calls a T1 agent sees).
For comparison, they would handle email calls about 40% of the time ...
So, EVERY agent T1,T2,T3 KNOWS about the MySQL problem ....
but not every T1 will catch that the issue IS a MySQL problem for the call...
Now, lets look at handling a MySQL call:
What needs to be done?
1) analyze the site quickly and determine if the "website" that isn't coming up is actually a MySQL application -- if it is a MySQL issue, check to see if there are other reports within the last 15 minutes (trend analysis) and if possible rule out database corruption (which also cause a MySQL site to respond slowly)
2) Capture the report for trending analysis based on the CGI Pool and affected MySQL server
3) Check with Network Operations if they have a recent/current high connection/high load condition, or have shut down sites or issued a service restart in the last 10-15 minutes
4) Recommend a course of action for the specific account --- generally, no action at all, since the issue is systemic and is not related to the account.
5) Resolve the report once it is trended in and notify the customer.
That's the reality of the support process...
On the communication front ... we've almost cried "Wolf" several times, when we honestly thought that more of the issue would be corrected by upgrading both the software and the hardware in the system, from stem to stern --- as you know, the CGI upgrade made it worse! it drove even more connections/queries to the MySQL system ... and, the MySQL41 servers ended up having sometimes 40 connections open from a single user .... which brought things to a crawl very quickly.
So, like you (the customer), we are ALSO COMMITTED to resolving the MySQL Performance issues on a PERMANENT (or as nearly so as can be done in a rapidly growing/changing webhosting environment).
For that reason, we are also committed to not "announcing" change until it is sandboxed, evaluated, live tested and sometimes even implemented on a server in a "blind" comparison [even the support team are not told which system has been upgraded to avoid implementing things that people perceive (placcebo effect) have helped].
When any significant proven change is implemented across the platform, we communicate it.
We do not announce every setting change or tuning change that the project team try and test...
And, since the "complete" solution is not yet in view -- there will be no sweeping announcement made, until we are fairlyl certain that we can deliver a noticeable, significant improvement.
Until that solution (or set of them) is implemented, we will continue to improve monitoring, reduce outages, aggressively shutdown sites that cause issues and work forward on incremental changes that present themselves as likely to improve things.
Again, that is NOT what you want to hear ... (nor, I can assure you, is it what I want to hear as a support team member), but it is the reality ... and, I think we can all agree, that knowing where things are, without "pie-in-the-sky" promises, allows you to evaluate whether Powweb meets the requirements you need to host your website.