r8854 - 40,000 Scripted Objects

Overview

In my previous post (profiling_r8536), I used a workload which rezzed 2,000 objects as quickly as possible. Each object a script and timer to rotate and change color every 1-5 seconds. After those optimizations to r8536, it took about 1.5 minutes to rez 2,000 objects with an 80% reduction in stead state CPU utilization.

Today's update describes the changes which enable the same scene to rez 10,000 scripted objects in the same 1.5 minutes (5X rez rate) and rez up to 40,000 objects in 8 minutes.

I also describe a change to script state persistence that may enable up to 10,000 objects to be rezzed per minute, and a much higher scene maximum with respect to state. I have tested that idea out and was able to rez 50,000 scripted objects in under 5 minutes. The sim was between 25-30 FPS and only 3 of 4 processor cores were utilized.

There are still a number of performance limitations with managing lists of updates, timers, and scene object inventories.

OAR File

The region archive for these experiments can be downloaded from here:

http://sciencesim.com/repository/vwperf/profiling/r8854/create_40000.oar.tgz

To start the experiment, login a client and shout “start”.

Patches

The performance improvements shown below are based on unsubmitted patches available at:

http://sciencesim.com/repository/vwperf/profiling/r8854/

and also patches submitted for the following Mantis entries:

Mantis 3311: UnSetTimerEvent() uses linear search which limits timer performance

Mantis 3312: Linear searches in SceneGraph create a performance problem

Mantis 3321: Race condition in XEngine DoOnRezScript causes scripts to fail compilation with exception

mantis 3325: Compiler function DeleteOldFiles() very slow when there are lots of state files in script engine directory

Mantis 3344: Debug output in DoOnRezScript causes slight performance problem

For these tests, I set physics to basicphysics in opensim.ini because there are no moving objects and I am focusing on scripted object creation, not physics. This didn't actually seem to make much difference in the results presented here.

My goal was to reduce the length of time it was taking to rez objects.

Redundant Script Conversion and Compilation

In opensim.ini, I set ScriptDelayTime = 0.0 for XEngine. This actually made the workload take 20% longer to rez the objects and the CPU was at 100% on all cores the entire time objects were being rezzed. When I profiled the section of the workload where the objects are being added to the scene, it revealed that compiling all of those LSL scripts for each and every rezzed object was > 40% of the total CPU time.

Click to enlarge image

Inside PerformScriptCompile() in Compiler.cs, the LSL script is first converted to CS and then compiled to generate a line mapping from LSL to CS. Then, if the compiled script already exists, it is simply returned an all of the conversion and compilation was for nothing except the line map. There are two solutions to this that I considered. The first and correct way would be to cache/remember the line map for each script asset after it is compiled, alongside the compiler output. The second, and easier route was to only compile each script one time and ignore the line map on subsequent compilation requests. The line map is not required for the script engine to work (from what I can tell). This is the option I took to test out this optimization. I may implement the linemap cache and submit it at a later time.

This change made a huge difference. Here is a comparison of:

  1. Build r8854 + scene dictionary patch from Mantis 3312
  2. Same build with compile_scripts_only_once.patch applied.
Click to enlarge image

There was also a race condition in the linemap handling in XEngine.cs which I patched in:

Mantis 3321: Race condition in XEngine DoOnRezScript causes scripts to fail compilation with exception

Debug output in DoOnRezScript()

Calling DebugFormat() 40,000 times showed up in profiling of XEngine as 1-2% of total cycles. That is with logging set to WARN. If the console actually tried to output this string 40,000 times, things would be a lot worse. Even so, I believe it should be commented out.

A patch was sumitted

Mantis 3344: Debug output in DoOnRezScript causes slight performance problem

It looks like this debug output is important in some situations. It would be nice if it could be wrapped in a #if DEBUG at least.

Old Compiler File Cleanup

I found that after creating a scene with 40,000 objects that the next time I started OpenSim, it took several minutes to start up. The reason is that in the Compiler.cs function DeleteOldFiles(), it was searching every filename in the ScriptEngine directory for the region looking for existing source and compiled output for scripts in the region. That's 40,000 (or usually many times that many) string comparisons to filenames before the sim could start up.

I patched this problem using a wildcard search in:

mantis 3325: Compiler function DeleteOldFiles() very slow when there are lots of state files in script engine directory

Region startup now takes only a few seconds regardless of the number of object state files in the ScriptEngine directory. Since there is only a single *_compiled and *_source file for each unique script, finding and deleting them is very fast.

DateTime.Now.Ticks

Previously, I added a dictionary to track timers in:

Mantis 3311: UnSetTimerEvent() uses linear search which limits timer performance

I suspected that more work was needed to efficiently track thousands of timers and now that some other bottlenecks have been cleared, SetTimerEvent has bubbled back up to the top of the list. The profiler results below show the first 8000 objects being rezzed. SetTimerEvent is > 12% of the total CPU cycles.

Click to enlarge image

Experiments seemed to indicate that using UtcNow instead of Now would eliminate the conversion to local time, reducing the cost of Now by 96%. That savings is apparently masked by the other costs of Ticks and converting various values to Int64 and comparing/subtracting them. I will continue to work on timer performance. There are a number of places which still linearly search the list of timers searching for a particular UUID. Those could become fast lookups through a dictionary but have not yet become performance bottlenecks.

Saving script state in files

When a script is first rezzed, its initial state is written to a file in the ScriptEngine direction with the .state extension. The following profiler output shows that creating these files and writing the serialized script state to them is a full 63% of total cycles. Rezzing a lot of objects quickly cannot involve creating files and writing to them at the same rate. It does not scale. I am not familiar with the reason why the filesystem was chosen to store script state. The database or memory mapped files would be a much more efficient solution. The database would allow the script state data to be stored efficiently as data and not tie it directly to the filesystem.

Click to enlarge image

If these calls to the filesystem could be batched or eliminated, it is quite possible that the number of scripted objects in a scene could become essentially unlimited and that they could be generated at a rate of 10,000 per minute or more. This is something to seriously consider along with the new applications it might enable for the OpenSim platform.

In the graph below, I have added the results from running OpenSim without saving the script state to files. Everything is exactly the same and the state is serialized to XML but not written to a file. This could be considered best case performance if a lower overhead alternative to saving state could be implemented (compared to thousands of 500 byte XML files).

Click to enlarge image
 
 
 
 
opensim/profiling_r8854.txt · Last modified: 2009/05/04 04:10 by dslake
 
RSS - 2008 © ScienceSim