Oracle Portal uses Oracle Internet Directory (OID) to power authentication (single-sign-on, SSO) and authorization. OID is an LDAP directory server, and even though the underlying implementation is an Oracle database, the only supported interface to access the directory information is thru LDAP. So to port some of the authentication and authorization features from our old homegrown “portal” into Oracle Portal required that we write a few LDAP packages.
It turns out that we wrote (okay, I wrote) some really poor LDAP code, which had a negative impact on our Portal. So much so that when we recently changed those packages to use a Materialized View instead of LDAP, the average late-afternoon CPU usage on our Portal database dropped from around 15% to just 1%.
Our sad story begins at the beginning: the front door of our Portal. On that front door there’s a little greeting widget. If you’re not logged in, the widget displays a “Log On” button. Once you are logged in (autheticated), it displays a greeting: “Hello, Rex Baldazo”. In order to obtain your name, I wrote my very first LDAP package (built around Oracle’s DBMS_LDAP package). It takes in your login ID, opens a connection to the LDAP server, queries with the ID to obtain all your user details, extracts first and last name, then closes the connection.
We don’t cache this front door; the theory is that if we make a change (like putting up an alert that the data warehouse is down) we want everyone’s front door to reflect it immediately. The little LDAP call happens every time you load or refresh the front door. That’s right — it’s setting up and bringing down an LDAP connection each and every time any authenticated user hits the front door.
It gets worse. There’s a Request Access page I also wrote, where users can come in and request access to Groups, which in turn we use to control access to various pages on the site. Any time somebody clicks on Request Access it opens and closes an incredible flurry of LDAP connections.
First, there’s a method that finds which groups you are already in — that’s one LDAP connection. Then it queries to find all available groups — that’s a second LDAP. The page uses those two lists to decide which groups you aren’t in, and then it has to display a list of those groups for you to choose from.
Now here’s where I really got into trouble: Even though I didn’t really understand how the LDAP data structures worked, I wrote a method (which again opened and closed an LDAP connection) that takes in each group name and spits back various attributes like description and who the administrators are for that group. In other words, once the page has determined you are not a member of say 20 groups, it has to call this detail method for each group to obtain the description attribute that it can display; thus, 20 LDAP connections could open and close just to show the groups you’re not in.
When you click to request access, it has to iterate thru the list of administrators and send them each an e-mail indicating you have requested access. Guess what? I reused the method I had written for the front door greeting, passing it each administrator’s ID in turn to pull back attributes like their name and e-mail address. So if you request access to five groups (each of which has two administrators), my approach would open and close 10 separate LDAP connections to obtain that information before sending the e-mails.
We recently hired an Oracle consultant to help us work thru the plethora of problems we’ve been having with the Portal. One of his suggestions was to replace this LDAP code with a few Materialized Views. So every morning at 2:00 A.M., the views update themselves by making just a few LDAP calls. Then I’ve rewritten my code to query those views instead of making LDAP calls.
The results were dramatic. The Request Access page went from taking five seconds to generate down to 200 milliseconds. I thought the slowness was latency from the LDAP server, and this was part of it. But there’s no question in my mind now that the way too many LDAP connections I was making was the biggest reason for the page’s slowness. I just had no idea those connections were so expensive.
Even more significant was the impact on the Portal’s database server. On an average quiet Friday afternoon, our DBA says she sees at least 15% CPU utilization on that server. Today it was down to 1 or 2 %.
If you have an LDAP newbie like me write a lot of your LDAP code, you should definitely have somebody with real LDAP experience audit the code. The potential performance benefits are tremendous.