Multithreading libmocha

The bug itself is fairly straightforward to understand. Although the problem is more generalizable, the OPS problem is easy to describe. First, an HTML page requests information about the user via the OPS interface. The OPS interface is a Java interface accessed via LiveConnect. The function call in question brings up a user dialog, which prompts the user to confirm or deny the request for information, much like the Java security dialogs. On the Java dialog, there is a help button, which launches a NetHelp window. When the NetHelp window opens, the browser hangs. 2. Problem Description

x=1; y=2

x

1

y

There is only one JavaScript thread in the client. This thread can process only one script at a time. Using LiveConnect and other mechanisms, there are ways to make blocking calls in JavaScript. When a call blocks, the script that is currently being parsed stops until the call returns. If another script needs evaluation while the original script is blocked, the second script blocks until the JavaScript thread lock is available.

Note: The terms "Mocha thread" and "JavaScript thread" are used interchangably here; the code refers to the thread as the "Mocha thread" because JavaScript was originally named "Mocha".

2a. Details of JavaScript execution in browser

Code flow

<SCRIPT>

ns/lib/layout/layscrip.c

<SCRIPT>

lo_ProcessScriptTag()

lo_ParseScriptLanguage()

First, lo_create_script_blockage() blocks the layout state machine for this page so that no further processing of the HTML happens until the <SCRIPT> tag finishes its evaluation. It fills the line buffer with the script data from the standard <SCRIPT></SCRIPT> pair. (javascript: URLs are handled via a different code path, although style sheets end up in layscrip.c eventually.) It then calls ET_EvaluateScript() to hand off the script to the JavaScript thread. lo_ScriptEvalExitFn() is called when the evaluation is finished, which then calls lo_unblock_script_tag() to tell layout to continue on.

ET_EvaluateScript() in ns/lib/libmocha/et_mocha.c creates a new event via PR_InitEvent() with the source code buffer and an event destructor, sets the event handler functions correctly, translates the buffer to unicode if the charset is non-ascii, and then adds the event to the mocha event queue by calling et_event_to_mocha() on it. (Note that there is a comment above the code saying that perhaps the buffer should always be translated in the 5.x timeframe.) There are two queues to deal with - lm_InterpretQueue, and et_TopQueue. If document.write() is not involved, the top queue is equal to the interpret queue. It enters the queue monitor, notifies the queue that a new event has arrived, and releases the monitor.

In ns/lib/libmocha/et_mocha.c the function lm_wait_for_events() is the function that sits around in the JavaScript thread and just waits for events to occur. It calls et_SubEventLoop() on the top queue in an infinite loop.

et_SubEventLoop() locks the JS thread, enters the queue monitor, and gets the next event. If it gets an event, it exits the queue monitor and begins evaluation. It sets the lm_owner_lock context to the MWContext found in the ETEvent structure so that the code to deal with a script being interrupted (either via a dropped network connection or a user hitting the stop button) is able to tell if the current script should be stopped in response to the interruption. PR_HandleEvent() handles the event synchronously, then et_SubEventLoop() unlocks the JS thread. (If there are no waiting events, it just unlocks the JS thread and waits for the next event.)

et_evalbuffer_handler() was registered by ET_EvaluateScript() in the Mozilla thread as the event handler, so it is called by PR_HandleEvent(). It gets the MochaDecoder from the MWContext in the ETEvent structure, and passes that to LM_EvaluateBuffer().

LM_EvaluateBuffer()

js_context

JS_EvaluateScriptForPrincipals() or JS_EvaluateUCScriptForPrincipals() if it's Unicode.

JS_EvaluateScriptForPrincipals()

JS_EvaluateUCScriptForPrincipals()

et_mocha.c

In ns/js/src/jsapi.c we find JS_EvaluateScriptForUCPrincipals(), which compiles the script first, then executes it via js_Execute(). Object calls are resolved via initialization of the objects and reflecting them into JavaScript via LM_GetMochaDecoder() in ns/lib/libmocha/lm_init.c, which calls lm_InitWindowContent() to fill in the decoder structures if they haven't been already. See lm_screen.c for an example of objects reflected into JavaScript. lm_DefineScreen() is the function that does the initialization and creation of a new screen object, called from lm_DefineWindowProps(). The window level objects and Navigator objects are kept on the MochaDecoder object. A complete list of objects can be found in ns/include/libmocha.h. (Note that there is only one navigator object, which is kept on the crippled_decoder. The lm_crippled_decoder is a bare bones decoder used as a default, which keeps the shared navigator object.)

After js_Execute() is called, the stack unwinds as mentioned above, lo_ScriptEvalExitFn() is called, lo_unblock_script_tag() is called, and the layout engine continues laying out the page.

Structures

MochaDecoder

ns/include/libmocha.h

JSContext,

MWContext

JSObject

MochaDecoder

LMWindow.

MochaDecoder

lm_NewWindow()

ns/lib/libmocha/lm_win.c

MWContext

MochaDecoder

JSContext

JS_NewContext()

JSContext is private to ns/js/src/jscntxt.h and contains much of the information the JavaScript engine needs to execute and evaluate the bytecode. Others pass the context around as an opaque type. A JSContext can only have one active JSScript running at a time. It contains such things as the version of the script, the runtime data, and the current stack.

MWContext is in ns/include/structs.h, and it's a bit of a dumping ground. It appears to have a reverse link to the JSContext that is kept in the MochaDecoder. There's one MWContext per window (MW stands for Mozilla Window.)

3. Proposed Solutions

1. Multiple JS threads per window group

MWContext

Problems: New thread per window may be overkill in terms of overhead (memory, CPU.) This is especially true under Windows 3.x and the Macintosh. Need to also have method for one context to access another context's data - a way to magically join another JS thread so the scripts can share data. (This problem is non-trivial and involves many fun race conditions and possible deadlocks.)

2. New JS thread upon request

Problems: Programmer still needs to know when a deadlock is going to occur. Conceivably, if we could detect a deadlock, we could spin a new thread automatically to avoid it, but we may be doing the user a favor they don't want. We would most likely still need a way to rejoin another thread to share data anyway - or perhaps this is an acceptable compromise.

3. Suspend JS thread context, run other script, resume previous JS context

Problems: It's unclear whether we can really do this, because saving the entire thread context might end up being more work than just creating new threads. Need more data. It's equivalent to doing a stack save and restore for an entire thread.

4. Implementation details

Overview

window.spawn()

window.open()

In order to help with the multithreading and keep thread overhead down, we create a new structure called an LM_WindowGroup. Instead of having one JSContext per MWContext, we keep a single JSContext per window group and set the correct context before we evaluate the script. Each LM_WindowGroup has a PRThread, an InterpretQueue, and a JSLock.

A collection of MochaDecoder objects are associated with an LMWindowGroup not directly but via an entry in the structure defining which LMWindowGroup they're in. When evaluation occurs, et_event_to_mocha() calls a function in libmocha that determines which of the threads' interpreter queues to put the event on. The code to exit a script already takes an MWContext parameter and shouldn't need any changes.

Note that in the 99% case, the browser will continue to only have a single JavaScript thread, but the same problems will still need to be solved as if we were creating new ones all the time. We can, however, document some behaviors of the new thread spawning rather than having to fix them transparently to mimic previous behavior.

The function window.spawn() would take the same arguments as window.open(), and would basically create a new thread and call the existing window.open() code, with the exception that it will not return the window object to the caller. If the newly spawned window attempts to do a window.open() on an existing window (which is now in another thread), we can either give them back the window from the other thread (which is subject to a race condition if other scripts in that thread are running and modifying that window structure), or we can define the function to always return null if you attempt to cross a thread boundary.

Code Flow

MWContext

LM_MWContextToGroup

In the libmocha implementations of win_open and win_spawn, we add the window to the parent's window group when the new window is created. This way subwindows opened via JavaScript inherit the correct thread.

In et_event_to_mocha, we make sure there is an MWContext for the event, and then use that MWContext to figure out which group to send the event to.

Code then proceeds again in the same fashion that it does today, with the exception that the scripts are now running in separate threads, and therefore avoid the deadlock mentioned above.

5. Notes (Random questions and observations I had when reading the code)

ET_ReflectObject()

ns/lib/libmocha/et_mocha.c

PR_InitEvent().

ns/lib/layout/layblock.c

ET_ReflectFormElement().

LO_EnumerateForms in ns/lib/layout/laymocha.c comments that it can only be safely called while the JS_Lock is being held. Why, I wonder? What if there's more than one JS thread? Same for LO_EnumerateFormElements() in the same file. Perhaps this is because there may be JS code in the middle of monkeying with the form data. (Chouck says that it's because it adds objects to the JSAtom tables. If Mocha threads need their own tasks, we need to figure out how to join and split them dynamically; if not, we need to lock the atom table.

Why does Moz ever want the JS lock? Maybe to add items to the interpret queue? (It's for plugins which can execute JavaScript code.)