r8536 - 2000 Scripted Objects

Overview

A scripted object creates 2000 additional scripted objects. Each object updates its color and rotation at random intervals. In build 8536, the scene took 108 seconds to generate the objects and CPU utilization was ~160% (on a 4-core processor) at steady state. By adding a Dictionary in SceneGraph.cs and Timer.cs, the same scene is now generated in 90 seconds and the steady state CPU utilization is now ~25%.

Files

The OAR file and example patches I refer to below can be found at http://sciencesim.com/repository/vwperf/profiling/r8536/

Details

This workload creates 2000 objects. Each object contains the following script to rotate and change its color every 1-5 seconds.

// Object Rotation Script
default
{
    state_entry()
    {
        llSetTimerEvent(llFrand(4.0)+1.0);
    }
    
    timer()
    {
        vector eul = <0.0, 0.0, llFrand(TWO_PI)>;
        llSetRot(llEuler2Rot(eul));
        float r = llFrand(1);
        float g = llFrand(1);
        float b = llFrand(1);
        llSetColor(<r,g,b>, ALL_SIDES);
        llSetTimerEvent(llFrand(4.0)+1.0);
    }
}

The initial profiling shows that as the number of objects in the scene increases, the time it takes to access any of those objects by the scripting engine (and presumably any other area of the sim) take longer and longer due to several linear searches within loops over all entities, object groups, and parts.

One example is shown here. Once there are 2000 objects in the scene, simply looking up the SceneObjectPart based on a given localID consumes almost 50% of all CPU cycles.

Click to enlarge image

By adding a dictionary into SceneGraph.cs which remembers the localID→SceneObjectPart mappings, the CPU cycles associated with running the attached scripts was reduced by ~35%.

Note: The example patch is not a complete implementation because I did not handle removing items from the dictionary when they are removed from the scene. There may be other cases I didn't think of but it worked for this example workload.

The profiler output after the dictionary modification is shown here.

Click to enlarge image

Each time the script attempts to set a timer, SetTimerEvent calls UnSetTimerEvent which performs a linear search through every time in a list, removing the timer if it is found. Then, SetTimerEvent adds the new timer to the end of the list. The above output indicates that >50% of the cpu cycles for the steady state portion of the run are spent doing this linear search. This is another area where a dictionary could be useful. The current semantics of timer.cs allows for multiple timers for each (localID,objectID) pair. To simplify my testing, I assumed that there would be only 1 timer for each unique pair of IDs.

This graph shows the timeline of the workload before and after the previously described modifications. CPU utilization with 2000 scripted objects generating updates has been reduced by 80% (at steady state).

Click to enlarge image

The below profiler output shows the hot functions after adding dictionaries to both SceneGraph.cs and Timer.cs. It still shows more cycles are spent managing the list of updates than executing LSL functions such as llFrand(double). It depends on the exact timeslice of the run examined, but SetTimerEvent still shows up as 5-15% of cycles. Most time is spent generating DateTime.Now.Ticks and adding it to the interval to compute when the timer should trigger again. A MakeTimerKey that was not string-based would also be more efficient.

A top performance issue still seems to be SceneGraph.AddToUpdateList.

Click to enlarge image

Update Mar 19. 2009

The next step I took was to add a Dictionary to SceneGraph.cs which allows for updates to be added and queried much more quickly than a linear search through all pending updates. I also completed the implementation of the Dictionaries in SceneGraph for GetGroupByPrim(uint localID) and GetGroupByPrim(UUID fullID). These dictionaries replaced linear searches through every EntityBase in 9 different places within SceneGraph.cs.

The results of this new set of optimizations is shown in the updated graph below. While the CPU utilization associated with rezzing the objects has not been reduced, the phase of the workload where the simulator is managing the objects, timers, and updates has been reduced from ~150% to ~5%.

Click to enlarge image

Of the remaining cycles, more than half is now associated with LSL functions such as llFrand and llRot.

Click to enlarge image

Additional Investigations

The original intent of this workload was to create 2000 updating objects and then scale the number of clients viewing the scene. Although the steady state performance has increased dramatically, and the FPS remains solidly at 50 FPS with a single client, it drops precipitously to ~5 FPS with 10 clients connected to the sim. Additional profiling will help to understand the cause of the slowdown with additional clients.

I will also attempt to understand the high CPU usage during the phase where the objects are being created. It's apparent that running the LSL to CS compiler on the script text gets expensive when the same script is converted and compiled 2000 times. It would be extremely nice if the compiled script could be stored based on a hash and simply retrieved each time another copy of the same object was rezzed.

 
 
 
 
opensim/profiling_r8536.txt · Last modified: 2009/03/26 00:02 by dslake
 
RSS - 2008 © ScienceSim