The objective of this three-part blog series is to summarize the most commonly implemented configuration changes to improve performance and operation of a large Enterprise Manager 12c environment. A “large” environment is categorized by the number of agents, targets and users. See the Oracle Enterprise Manager Cloud Control Advanced Installation and Configuration Guide chapter on Sizing for more details on sizing your environment properly.
- Part 1 of this series covers recommended configuration changes for the OMS and Repository
- Part 2 will cover recommended changes for the Weblogic server
- Part 3 will cover general configuration recommendations and a few known issues
The entire series can be found in the My Oracle Support note titled Oracle Enterprise Manager 12c Configuration Best Practices [1553342.1].
OMS Recommendations
Increase JAVA Heap Size
For larger enterprises, there may be a need to increase the amount of memory used for the OMS. One of the symptoms of this condition is a “sluggish” performance on the OMS. If it is determined that the OMS needs more memory, it is done by increasing the JAVA heap size parameters. However, it is very important to increase this parameter incrementally and be careful not to consume all of the memory on the server. Also, java does not always perform better with more memory.
Verify: The parameters for the java heap size are stored in the following file:
<MW_HOME>/user_projects/domains/GCDomain/bin/startEMServer.sh
Recommendation: If you have more than 250 agents, increase the -Xmx parameter which specifies the maximum size for the java heap to 2 gb. As the number of agents grows, it can be incrementally increased. Note: Do not increase this larger than 4gb without contacting Oracle. Change only the –Xmx value in the line containing USER_MEM_ARGS="-Xms256m –Xmx1740m …options…" as seen in the example below. Do not change the Xms or MaxPermSize values. Note: change both lines as seen below. The second occurrence will be used if running in debug mode.
Before
if [ "${SERVER_NAME}" != "EMGC_ADMINSERVER" ] ; thenUSER_MEM_ARGS="-Xms256m -Xmx1740m -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing -XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled"if [ "${JAVA_VENDOR}" = "Sun" ] ; thenif [ "${PRODUCTION_MODE}" = "" ] ; thenUSER_MEM_ARGS="-Xms256m -Xmx1740m -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing -XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:CompileThreshold=8000 -XX:PermSize=128m"fifiexport USER_MEM_ARGSfi
After
if [ "${SERVER_NAME}" != "EMGC_ADMINSERVER" ] ; thenUSER_MEM_ARGS="-Xms256m -Xmx2560m -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing -XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled"if [ "${JAVA_VENDOR}" = "Sun" ] ; thenif [ "${PRODUCTION_MODE}" = "" ] ; thenUSER_MEM_ARGS="-Xms256m –Xmx2560m -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing -XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:CompileThreshold=8000 -XX:PermSize=128m"fifiexport USER_MEM_ARGSfi
Repository Recommendations
Repvfy execute optimize
This command can be executed to establish a baseline and set the environment to the “recommended” values based on the configuration of that environment. The following command will check the existing settings and modify them if needed.
$ repvfy execute optimize
This command does several things some of which include the following:
1. Internal task system:
- Verify there are at least 2 short running and 2 long running worker threads
- Verify that the availability worker threads are disabled since these threads are now obsolete
2. Repository settings:
- Set the retention time for the MGMT_SYSTEM_ERROR_LOG table to 7 days (unless this setting has already been changed)
- Disable PL/SQL and metric tracing to reduce logging when not necessary
- Recompile any invalid SYSMAN objects
3. Target system:
- Disable the metric partitioned table cleanup to prevent unnecessary work when removing targets of ‘large’ systems
- Tune the PING grace period to allow the OMS to wait a longer period of time after startup before checking the heartbeat of the agents
Increase Task Workers
Task worker threads are used to pick up tasks from the dbms_scheduler jobs queue based on their type. These jobs are used to calculate metrics, rollup metrics for clusters and provide the self-monitoring metrics for EM. Tasks are defined as short or long. Many larger systems require more than one short and long task workers to do the housekeeping jobs in a timely manner without creating a backlog. The recommendation is to have at least 2 short-running worker threads and 2 long-running worker threads.
Verify: To determine if you have a backlog:
$ repvfy verify repository -test 1001
If you have a backlog, execute the command below to gather more details on the performance data for the task workers.
$ repvfy dump task_health
Recommendation: If the output from the dump task_health indicates a backlog, execute the following statement to set the recommended number of task workers for both short running tasks (type 0) and long running tasks (type 1). This will increase the settings to the recommended settings for your environment (this command is not necessary if you already ran it on your environment from the first recommended step above).
$ repvfy execute optimize
If after setting the recommended settings, the site had grown to such a size that there is still a task worker backlog, use this routine to increase the number of workers above 2:
$ sqlplus /nologSQL> connect SYSMAN;SQL> exec gc_diag2_ext.SetWorkerCounts(<number>);
The number can be 3 or 4 (the routine will not accept values larger than 4). If you need to go higher than 4, contact Oracle Support.
Disable Metric Cleanup on Target Delete
When a target is deleted, all of its metric history data must be purged. In larger systems or systems with frequent target deletion, this can take a while and can cause performance problems. Disabling the metric deletion will defer the target data cleanup in partitioned tables until the partition maintenance runs as scheduled. This reduces performance impact of deleting many targets.
Verify: To determine if you have a metric deletion enabled:
$ repvfy verify repository -test 6030
Recommendation: Execute the following statement to disable immediate metric deletion (this command is not necessary if you already ran it on your environment):
$ repvfy execute optimize
Alternatively, you can execute the following statement:
SQL> exec mgmt_admin.disable_metric_deletion;
Increase Ping Grace Period
Upon system startup, the OMS must ping each agent to get a current heartbeat and update the availability state for all the agents. In systems with 100’s or 1000’s of agents, this can take longer. By increasing the grace period for the ping/heartbeat system to kick in and contact Agents we allow more time for the agents to start uploading first.
Recommendation: Execute the following statement. This command will evaluate the system and set the appropriate value for the Ping Grace Period to give the majority of the agents a chance to begin their upload upon system startup (this command is not necessary if you already ran it on your environment).
$ repvfy execute optimize
If after an OMS restart, you still see a high number of pending agents for a prolonged period of time, this value may need to be set higher. Execute the following statement and contact Oracle Support, providing the output from the dump ping_health command.
$ repvfy dump ping_health