|
3-15-02 I may have come up with something that alleviates the symptoms of the CarbonLib/FireWire issue. I was going to post a pointer on the forums to the file with a pointer to os9forever.com for this extended documentation, but the forums appear to be down right now so I'm sending this doc first. The proposed fix is in: ftp://ftp.gps.caltech.edu/pub/kby/Old%20World%209.2.x/test/FireWire%20Support.bin It is a patched version of the FireWire Support extension. I phrased my enthusiasm somewhat cautiously because, although it seems to "fix" the problem for me, I have to do a little hand waving to explain why it works. On the one hand, it's not a shot in the dark, but on the other hand there's some pieces missing in the path from what I believe the fix actually does and my knowledge of the symptoms that cause the crash. Below are the details of what I know and what I don't. I figured out what are probably the relevant unique aspects of the issue; i.e. why FireWire and CarbonLib both seem to be required. I don't know why it's 9.2.x specific, however, and it MAY just be a timing issue (but not likely: it would also take way too much effort to figure this out definitively). I also have this gap in the causality path between the ends of what I know that I mentioned above. I do also know why VM exacerbates the problem. As I previously mentioned, the crash occurs because of a return to code in CarbonLib after it's no longer around because the last application using it has released the fragments. The reason VM exacerbates the problem is because with VM on most of CarbonLib is file mapped (this is more or less a good thing) and when the code fragments are released, the page maps simply "destroy" the address space. With VM not turned on CarbonLib code fragments are instead into "real" memory and the memory blocks are deallocated by being returned to the free memeory list when the fragments are released, but they are probably not overwritten unless the system is fairly busy, and there's enough time left for the code to execute completely and get out. FireWire Support installs a particularly crocky kind of patch to WaitNextEvent (or the trap table) that is only supposed to be used by Apple to fix or augment broken/functionally deficient ROM code called a "come-from" patch. This is a patch that's hidden from most patch routines and is always called first in the patch chain so it can figure out where it "comes from" by examining the stack. It's basically guaranteed not to be tail-patched. But it complicates matters in that the SetTrapAddress routines that subsequent software calls altern not the address of the entry in the trap table (which points to the come-from patch), but the target of the exit JMP instruction of the come-from patch. Note that this is all 68K emulated code. The patch FireWire Support installs is in its gpch resource (there's only one) (this has nothing in particular to do with the gpchs in the system file other than by name/functionality but has nothing to do with the stuff we do to the system file). Things get more complicated with something like CarbonLib which also patches WaitNextEvent, but at the application level. MacOS apparently allows this but defines (as one would expect) that these patches apply only to the application that patches them (as discerned by the fact that the patch code itself resides in the application's heap and not the system heap). How does the system guarantee this is the case? Presumably (I saw hints of this but not a clear indication, however it is the way I would have implemented it and I can think of no other way) each application has its own copies of the trap dispatch tables that get patched and whenever a context switch occurs, part of that entails pointing the hardware trap table vector to the application's version of the trap table. This makes it quite convenient when the application goes away; it's no longer context switched to so effectively the application patched trap table never gets pointed to again. Incidentally Apple's doc recommends not patching at the application level if possible. However, it is not surprising that CarbonLib needs to do this since it is basically usurping functionality of some traps for the carbonized app to give Carbon versions of the traps. How this actually works with respect to native code, I am not sure and I could not find in my search how the traps actually are done for PowerPC native code which doesn't really "trap" in the same way the 68K code does. I have some theories on this, however there is too much missing info for me to speculate here. Now enter the special patch I mentioned earlier. Note that the trap table must always point to the come-from patch, so it's the exit JMP instruction of the come-from patch that has to be changed each time there's a context switch or the application gets destroyed. I don't know how the system does this and I am unlikely to find publicly available documentation on how this is done since this type of patch is only for Apple to do. The come-from patch itself as far as I know exists in globally-addressed code; there's not a separate physical copy for each app. Again, note that the exact mechanism of all this also deals with the way the 68K emulator handles things, not just the real hardware and clock interrupts. Note that this alteration of the JMP instruction is basically self-modifying code. Worse, it's code that's being modified by someplace else. This in a real machine can cause all sorts of caching issues since presumably the alteration is done accessing the patched instruction as data where as it executes as part of the instruction stream pipeline/cache. How the emulator implements this, and to what degree, I don't know, but the potential for pitfalls is enormous. Anyway the only thing that's changed in the patched FireWire Support is to call the cache-flush routine before executing the JMP. What bothers me about the fix, though, is how the code got as far as it does (presumably to some place that blocks to give things a chance to go away). Although it is possible to construct timing scenarios where this is the case, it seems unlikely that it would be as consistent as it is in those. It also may have some performance impact, although since we are talking about emulated code, I'm not sure how bad the impact really is relative to the performance of the emulator anyway. -kby |