In my first article, I reviewed how easy it is to view complex objects returned by Web services. In the follow-up article, I explained a few of the pitfalls to watch out for when using ColdFusion MX to communicate with BEA WebLogic Web services. You may think that once you can easily see your data and are sure the code works, you're done, but unfortunately, you're not.
Sometimes even finishing a project to spec and on time is not enough to satisfy managers in an enterprise environment. The biggest complaint you will get is response time, or the lack thereof. On the Internet, time is of the essence. Take a few seconds too long to load your site, and the customer is gone. So an Internet site calling data from a Web service must do it correctly, dependably, and quickly. As the deadline approached for my project, I had two out of three of these requirements. Despite our reasoning that, "It takes awhile to run through a few hundred million records," the project management team didn't see it that way.
The rest of the story
If you missed my previous articles, catch up on your reading:
- "An easy way to view complex Web service objects"
- "Avoid CFMX/BEA WebLogic pitfalls in your Web service"
After receiving our directive to bring a 1.5- to 2-minute response time to under 30 seconds, the WebLogic developers, system administrators, and Web programmers got together to see how we could achieve our new goal. We needed to find out where the bottleneck was, what was causing it, and how to get rid of it.
Time the actions: Execution time measurements
Looking at the process, there were three areas where the delays could be occurring: The BEA environment, the network, or the ColdFusion servers.
The WebLogic programmer found two points of speed degradation, both in the database. It appears a couple of sorted columns didn't have indices applied to them, which, when looking through tens of millions of records, caused some slow down. The second problem area was data that the CFMX programmers were using for aggregation. If that data wasn't sent, some of the execution times could be cut in half.
The system administrators also found two items, network data transport and system differentiation. One quarter of the Web call execution was transport time of data from one system (the WebLogic servers) to the second system (the ColdFusion servers). Our calls had a peak of five megabytes of data. An option could have been to upgrade the wire to Fibre-channel, but that wasn't feasible due to costs and time constraints. The second thing they noticed was that the production servers were slower than the staging servers even though they were supposed to be of the exact same setup and configuration.
The CF programmers added various <cfoutput> tags to display timestamps throughout the pages. This enabled everyone to see how long things were taking and where the improvements could or could not be made.
Tame the beast: Revisit specs and system setups
The first task was to revisit the spec and design. The most significant time savings would be made by removing the columns used for aggregating data. Since it would cut our load times by at least half, the IT folks saw it as a necessary cut. Luckily, we were able to convince upper management that the time savings was worth the redesign excluding the aggregate data output.
Next, we took on the issue of the speed difference between production and staging environments. The system administrator immediately began researching various items (i.e., perhaps it was the firewall, maybe the data was being routed differently through the network, etc.), but after researching, they still couldn't find the difference between the two.
With the deadline rapidly approaching, everyone was at a loss. We thought the two environments were exact duplicates, so we decided to go through the system level settings and see if there were any discrepancies. During this process, the system administrator found the culprit: The memory settings in the Java And JVM Settings in the ColdFusion Administrator didn't match between the two environments (see Figure A). Once that discrepancy was fixed, both environments matched.
|ColdFusion Administrator memory settings|
Trim the fat: Cut down queries and unnecessary code
With the hardware systems fully optimized, the WebLogic programmers began to implement the query changes. The improvements were noticeable and incredible. The load times hovered around the forty-second mark for our largest dataset calls. We had narrowed six different data calls down to three after reviewing and refining the queries in the calls.
With the WebLogic team and system administrators done, there was only the CFMX code itself. The page calling the service was taking seven seconds on its own to run even after CFMX compiled and optimized the Web service call. Upon closer examination, we realized that we had a lot of logic written using the CFML tags. We took the logic and converted it to CFScript instead. This immediately brought the run time down to under one second.
As you can see, getting a Web service up and running correctly is still only half the solution. If the slowness in the execution time is unbearable, the users simply won't use the tool, not to mention management may not allow you to even roll it out to the users. Another danger to releasing a slow tool is that you also run the risk of your users shunning all future tools you create because of their bad experience with your slow tool.
Optimization is a very important aspect of the development process. Sometimes, as in my case, it's the deciding factor in a go/no-go implementation decision. Luckily, the management was open to design changes to help us get the tool out; IT departments aren't usually that lucky.
A week away from the go-live date, we had taken two disparate systems, connected them via Web services, and made them run in an acceptable execution timeframe. Or so we thought. Much like everything on the Internet and in managerial thinking, we were thrown this: "If you can take if from two minutes to thirty seconds, you can easily take it from thirty seconds down to zero. Correct?" We did, but I'll explain that process in a future article on caching Web services datasets.