Problems with Dynamic Remastering (DRM) with on Linux Itanium


Problems with Dynamic Remastering (DRM) with on Linux Itanium
One of my customers is having severe RAC performance issues, which appeared three times so far. Each time, the performance impact lasted around 10 minutes and caused basically a hang of the application. ASH investigation revealed that the time frame of performance issues exactly matches a DRM operation of the biggest segment of the database. During the problematic time period, there are +50 instead of 2-3 active sessions and they are mostly waiting for gc related events: "gc buffer busy","gc cr block busy", "gc cr block 2-way", "gc current block 2-way", "gc current request", "gc current grant busy", etc.
In addition, there is one single session which has wait event "kjbdrmcvtq lmon drm quiesce: ping completion" (on instance 1) and 1-3 sessions with wait event "gc remaster". (on instance 2)
Does anybody have any experience with DRM problems with on Linux Itanium?
I know that it is possible to deactive DRM, but usually it should be beneficial to have it enabled. I could not find any reports of performance impact during DRM operation on metalink. Support is involved but clueless.
Oracle Support has requested stacktraces of lms processes during the period of performance degradation. We decided to enable OSWatcher to get systemwide linux data and procwatcher to get lms process stacktraces. We created a Grid Control User Defined Metric to check whether the symptoms of a DRM performance problem is taking place. Then we triggered the lms stacktraces with a Grid Control Response Action script of the UDM.
Oracle Support has also requested global hanganalyze and system state dumps but we decided not to collect system state dumps because of the big additional performance impact.
The oswatcher data showed that during the drm period, the lms processes had very high CPU resource utilization.
In the meantime Oracle Support has confirmed that we are hitting 6960699. We have received patch 8516675 which includes the bugfix and have installed it. Now, we are waiting to see whether this indeed fixes the issue.
Read More: The other 8 answers