Gaz's 32-bit DOS Protected Mode routines

Contents (changes since the last release are maked with a *):
1. Previous users / changes
2. How to use the routines / general notes
3. Subsystems -
     Memory management
    *SoundBlaster detection
     Video
     Mouse
     Keyboard
     File
     CD-Audio
4. Standard Library -
     Character type testing
     String manipulation
     Argc/argv/psp string manipulation
     StringList class
     System functions
     Miscellaneous functions

Previous users / changes:
    If you've not downloaded any previous versions then feel free to
    skip this section unless you're _really_ interested :).

    Changes since version 0.1b:

        o Sound subsystem (detection only)

        o Malloc/Free improved and bug-fixed (properly)

        o Reorganised directory structure

        o Improved macro files and fixed them to work under NASM 0.98

        o Some other minor bug fixes

    Changes since version 0.1:

        o Realloc, HeapBlockSpace, HeapBlockSize added to memory subsystem.

        o Dynamic string functions added to string functions.

        o Returned values now use val32, allowing things like [_var1] to be
          used to hold a return value.

        o File functions added

        o StringList object added

        o Code speed tester source added

        o Some slight bug fixes

        o New email address

        Yes I know I said that this would only be a maintenance release but I
        really wanted to at least get the library to a useable state (and the
        additions are very useful I think).

How to use the routines / general notes:
    One of the most useful things about most HLLs is the compiler's ability
    to see code that's never used and not include it (eg your standard C header
    files). I got so annoyed with writing lots of functions that I would only
    need now and then which were included in my .exe that I devised a way of
    ensuring that only the functions I actually used got included. This means
    I don't end up with a hundred 1K files of little functions I might use and
    loads of %include's for the functions, and don't end up with the opposite
    of a few include files but with loads of unneeded code.

    All this is accomplished by the .mac (macro) files. Simply include the
    macro files for the subsystems you want to use (eg memory.mac for the
    memory subsystem) before you use any of the functions and then, at the
    _very end_ of your source file, include the file 'routs.in' which will
    then include all the functions required. See any of the provided source
    for examples of this.

    Having decided on the subsytems you want to use, you'll need to know how
    to actually call the functions themselves. Every function has an
    easy-to-use macro defined, and makes calling it much more HLL like and
    much easier. Every function is described in a Pascal type way - first
    comes the macro name, then any parameters it takes (these are given in
    brackets) and then any results it returns. For those that wish to call
    the routine directly, you'll have to check the macro file as I can't
    see people wanting to do this too often. Parameters use the following
    format:

        regx  - a register where x denotes the size. ie, reg8, reg16, reg32
        immx  - an immediate value, again where x denotes the size.
        mem32 - memory reference (offset or indirect)
        val8  - any 8-bit value (ie reg8, imm8 or [mem32])
        val32 - any 32-bit value (ie reg32, imm32, mem32 or [mem32]),
                _except_ if the value is a return value from a function.
                In this case it can only be reg32 or [mem32].

    Return values are expressed using the same format except they come after
    a colon (':'). So, a full definition for a function would be like:

        Function1(start, end : reg32) : reg32

    The above description would show it takes two 32-bit registers as
    parameters, returns the result in a 32-bit register and the macro is
    called Function1. As another example, suppose we had the following
    function description:

        Function2(value : val32) : val32
    
    Then we could call this function by:

        Function2 eax,eax
        Function2 _var1,eax
        Function2 [_var2],[_var3]
        Function2 $01234,[esi]

    All the functions work in this way, and most of them share some
    common traits:

        o Carry is used for every function that it makes sense to have a
          return indicator for. It's set on error, clear otherwise.

        o Unless otherwise noted, all function preserve any registers
          that they use.

        o Several functions expect es=ds so set this at the start of your
          code, eg by push ds, pop es (This shouldn't be a problem though)

    There are a few common problems that it might help to be aware of:

        NASM generating 'Parser: instruction expected' errors
            This usually means that you've got a function name wrong, check
            the line the error reports it at.

        NASM generating a load of 'symbol undefined' errors
            Code and data is only included for functions that get used, and
            for functions that have a common initialisation function it is
            that function which includes the code and data. Because it
            hasn't been called all the errors get produced. For example,
            try calling any of the CD audio functions without calling
            CDInit.

        'Must call HeapInit to use xxx' errors
            Some of the functions use the heap for dynamic allocation and the
            heap requires initialising via a call to HeapInit. Call HeapInit
            in your code. (This actually applies to some other xxxInit's)

    It should be noted that NASM will often generate several other errors as
    a result of other errors (much like C compilers do), and that these will
    often disappear once the original errors were corrected. My approach is
    to deal with the obvious errors and then re-assemble.

    Set your tab size to 8 when viewing the source code.

-----------------------------------------------------------------------------

Subsystems:

Memory Management Subsytem - memory.mac:

    There are many occasions where it is useful to allocate small amounts of
    memory for say, strings, buffers, etc. or, at other times, large amounts
    for, eg, arrays. The allocation of memory under protected mode can be a
    little difficult at times though - the DPMI spec allows for allocation of
    blocks but its use is unreliable, especially as the number of blocks you
    can allocate is difficult to determine. Hence, it would be nice to have
    some functions that can do this for us. In C, you'd use the malloc and
    related functions to allocate memory from the heap so this subsystem
    provides heap functions.

    The basic operation is to initialise the heap on startup (or whenever
    else you need it - I just find it more convenient to do it at the start)
    and then call the functions as required. All functions use the carry flag
    is a result indicator - carry will be clear on success or set on failure:

    HeapInit(size : val32)
        Initialises the heap by allocating the amount specified (in bytes)
        to use for the heap, which must be at least 256 bytes. If you
        specify 0 as the size, all available free memory is allocated. 
    HeapExit
        Deallocates the heap and frees up the memory (or at least makes a
        request to the DPMI to free it). Called automatically by Exit.
    HeapReset
        Resets the heap to its initial state of all free
    HeapSize : val32
        Returns the total size of the heap (in bytes).
    HeapSpace : val32
        Returns the total amount of free space on the heap (though this might
        not be the largest free _block_ available).
    HeapBlockSpace(size : val32)
        Checks whether there's a free block of the specified size available.
    HeapBlockCheck(block : val32) : val32
        Checks whether a block is a valid heap block or not. If successful
        (ie the block is valid) then the block's size is returned. On
        failure, carry is set and the returned size is zero.
    Malloc(size : val32) : val32
        Allocates a block of memory from the heap (the size is specified in
        bytes) and returns the address of the first byte of the block. The
        returned address should be passed to Free to deallocate the block
        later. Note that on failure the address returned will be zero and
        that the block's contents are undefined.
    Calloc(size : val32) : val32
        Works exactly the same way as Malloc except it clears the allocated
        block by setting all its bytes to zero.
    Realloc(block, size : val32) : val32
        Resizes a previously allocated block to the new size specified,
        returning the block's new address. If the block can't be resized it
        is unchanged and the returned address is the block's address.
    Free(block : val32)
        Deallocates a previously allocated block.

    It is the programmer's responsibility to ensure that their code doesn't
    write outside the block allocated. If your code does, the heap's integrity
    will be lost and the results of further heap functions are undefined.

    I've tried to make the free routine as robust as possible. It performs a
    series of checks on passed addresses that should enable it to only work
    on valid block addresses (it uses the heap's structure itself to verify
    the validity of the block). Those interested in the actual heap
    implementation should feel free to look at the code, which contains
    details of the block formats used in the heap. However, do _not_ write
    code that takes advantage, or assumes knowledge of, the internal heap
    structure. The functions are designed so you don't need to know anything
    about this and the format may change in future.

    I've also tried to minimize the overhead in the heap blocks themselves
    and have worked a quite efficient system - free blocks take up 16 bytes
    (the free block header itself) but allocated blocks have a mere 8 byte
    header. All heap blocks will be allocated on aligned 8-byte boundaries.

    Realloc can be used to reallocate a block, though because this involves
    copying the old block's contents to the new it might be slow.

-----------------------------------------------------------------------------

SoundBlaster detection - sound.mac:

    I had originally intended to develop a full sound subsystem but with the
    move to Win32 this doesn't really seem worth it anymore. Since I was
    working on it, I've included the code I did for SoundBlaster detection.
    The actual detection is probably more tricky than actually playing a
    sample (in fact you can work out the code needed to play a sample from
    the detection code). You could extend this into a proper multi-sample
    player by having an area of memory which is being continuously played
    by the SoundBlaster card (via auto-initialised DMA, you can find source
    to play a sample by this method on the web). When the end of the memory
    block is reached, an interrupt is triggered, and at this point:

      o You are keeping track of which samples are playing, eg by having
        a structure in memory like:

        _sample_playing    resd 1
        _sample_address    resd 1
        _sample_cur_addr   resd 1

        You have say, eight of these to allow for eight samples to be played
        at once and on the interrupt you go through each one, adding the
        next bit of each sample to be played to the buffer which is about
        to be played by the sound card.

      o Since this takes time, you actually have two buffers, and when the
        first finished playing, you immediately tell it to start the second
        buffer, and then prepare the first buffer again. When the second
        buffer finishes, you tell it to play the first buffer (which is
        now prepared) and then do the second buffer.

      o You can add effects like sample volume, etc. and maybe support for
        .MOD type files if you want to.

    These are just some suggestions if you're into this sort of thing. I
    really don't see the point now when I can move to DirectSound under
    Win32 for it instead (though you never know...)

-----------------------------------------------------------------------------

Video Subsystem - video.mac:

    VBE is now the video standard to use to enable us to access all those
    nice high-resolution modes. However, its use is complicated by VBE's
    history and it would be nice to have an easy-to-use interface. This
    subsystem provides support for VBE 1.2 or above, either using the
    video card's linear framebuffer or providing a virtual one instead.
    Hardware double-buffering is used if possible to eliminate flicker
    (combined with an LFB provides direct screen access without the need
    for a screen buffer) and Pentium optimised screen copying functions
    are used for screen copying (if used) if the CPU is a Pentium class
    or better (approximately 5-6 times faster!).

    VESA? VBE? What the hell...?:
    
        Those readers that already know about VBE can feel free to skip this
        part, which is just a brief history of how VBE came about and why it's
        so useful for us all. A while back 320x200x256 resolution was the
        business. It's still used quite a lot but looks _very_ dated now. It
        was great except that people began to want more and in stepped the
        then king of the hill, IBM (oh how times change!). IBM brought out
        the first of the SVGA cards, it was capable of a much better resolution
        than standard 320x200x256. But IBM never actually released any hardware
        details and instead provided an API to the card. However, IBM's time
        as king was over and other manufacturer's brought out cards capable of
        the same or higher resolutions, and, naturally, they all had different
        ways of controlling the card. Writing a program to take advantage of
        the SVGA resolutions was a nightmare - you had to have a different
        routine for _every_ card on the market. Unsurprisingly, very few
        applications actually supported SVGA. IBM brought out their XGA
        standard to try and solve the problem but it was too late.

        So, out of all this mess, VESA, the Video Electronics Standards
        Association (or something like that) stepped in with their Video BIOS
        Extensions (or VBE for short). It defined a set of mode numbers for
        standard resolutions (this wasn't a great idea but fortunately they
        realised this and corrected it in version 2) and also standardised
        other information so it was easy to get card information about things
        like where the memory was accessed from, how to switch banks, etc.

        Most cards that were released in the last five or six years support
        VBE 1.2 (basically the first release), almost all in the last year
        or two support VBE 2.0 (I'm getting to it) _properly_.

        The main problem that VBE solved was bank switching. If you think about
        segments in real-mode (I sincerely hope you have at least a basic grasp
        of this sort of thing to be coding in protected mode) they're only 64K
        and your typical higher-res screen takes up 300K or more (eg, 640x480x
        256 used everywhere is 300K) so how do you access this 300K with only
        64K? The answer lies in bank switching, and also provides one of the
        most annoying features of SVGA. Imagine your 300K screen as a piece of
        paper. Now imagine we've got another piece of paper, the same size, on
        top of it but with a hole in it so we can see through. Now no matter
        where we position the top paper we can never see the whole screen at
        once, but we can see bits of the screen and hence by moving the top
        paper about we can get to any part of the screen we want. This is the
        process known as bank switching - we have a 64K window that we can move
        about our 300K (or whatever) to see any part of the screen.

        So what's the problem? Until VBE there was no standard method to change
        banks.

        So problem solved? Not quite, bank switching is _very_ slow, and I mean
        slow. We're talking snail pace slow here. And this is made even worse
        by using protected mode - why? Well, the bank switching procedure is
        a real-mode procedure which means that every time we want to change a
        bank we have to switch to real-mode, call the routine and then switch
        back to protected mode.

        But then came VBE 2.0 which solved this problem! VBE 2.0 defines a
        function for getting the address of a protected mode version of the bank
        switch routine (and others) and it also gives you the size so you could
        copy it locally to maybe improve speed. But even better than this VBE 2.0
        provided support for using a linear framebuffer. This is just a fancy
        term for saying that rather than use banks we actually have only one
        bank of (for a 640x480x256 screen) 300K. The trick then is to support
        VBE 1.2 but also take advantage of VBE 2.0 features if available.

        So where does all this leave us? Well, we want functions to detect and
        initialise VBE and also to set a VBE mode, but we don't want to have to
        worry about mode numbers and simply tell the function what resolution we
        want (especially since VBE 2.0 mode numbers are not static anymore, but
        are up to the implementor). Since we're protected mode coders we don't
        want to have to deal with 64K blocks but just a straight area of memory
        that we'll use as the screen, regardless of what the card can actually
        support.

        There are a few issues with VBE 2.0 that it might be worth being aware
        of. For one, many early implementations were pretty buggy and hence
        early support was limited. Not all VBE2 implementations implement VBE2.0
        fully, and some appear to implement features of VBE2.0 but claim to be
        an earlier version (I presume because they don't fully implement VBE2.0).

        I've also come across at least one program that requires an LFB (well,
        it actually says it requires VBE2.0 so it can use an LFB) but says that
        my Matrox Mystique doesn't support it. I presume this is because it
        doesn't return the VBE2.0 info... but it's annoying that it does actually
        have an LFB that some programs won't use! As far as I can tell,
        these functions should work on any card supporting any of the bank
        switching functions, regardless of the VBE revision (I don't actually check
        what VBE version is in use as it's not actually needed).

        Of course, you don't actually _need_ to know any of this, and can safely
        use the functions provided and not worry about a thing. Isn't that nice?

        All the functions use the carry flag as a result indicator - carry is clear
        on success, set on failure.

    Variables:
    _VESAScreen
        This dword variable holds the address of the start of the screen.
    _VESAScreenFlag
        This dword holds which screen is currently being displayed - it's 0 when
        screen 1 is being displayed and something else when screen 2's being
        displayed. (This only applies if a hardware double buffer and a LFB is
        being used, it's 0 at all other times).
    _VESADoubleFlag
        This dword is set to 1 if a hardware double buffer's in use, 0 otherwise.
    _VESALFBDoubleFlag
        This dword is set to 1 if a hardware double buffer's in use, and a LFB
        also, it's 0 otherwise.
    _VESAOptimisedFlag
        This dword is set to 1 if using the optimised copying functions, else 0.
    _VESABankFlag
        This dword holds the type of bank switching in use:
            1 - Linear FrameBuffer
            2 - Protected mode bank switching
            3 - Real mode bank switching

    Functions:
    VBEInit(flag : val32)
        Tries to detect the VBE. If the flag is 1 any information is
        reported to the screen, otherwise it isn't. The heap will be
        initialised if it hasn't already been.
    VBEExit
        Deallocates resources and returns to text mode (if a mode was set).
        Called automatically by Exit.
    VBESetMode(x resolution, y resolution, bits/pixel, report flag : val32)
        Sets a VBE mode of the specified resolution and colour depth. If
        the flag is 1 any information is reported to the screen, otherwise
        it isn't. Note that it only uses int $21 for reporting information
        so this won't work in all modes, notably LFB modes.

        Only the lower words of the x and y resolution are used, the upper
        words are used as flags, and only the upper word of the x resolution
        is used at present:

            bit 16 - set to only use a LFB
            bit 17 - set to only use protected mode bank switching
            bit 18 - set to only use real-mode bank switching
            bit 31 - set to use suggestion mode

            All other bits in both upper words should be zero.

        By setting one of the first three bits, you can force a particular
        type of bank switching to be used, and the function will fail if it
        can't use this type (even if another type is available). Suggestion
        mode allows the function to try lower types of bank switching if the
        forced mode isn't available. The function will detect whether there's
        enough memory for a hardware double-buffer and use it if available.

        Assuming the function succeeds, the start address of the screen will
        be held in _VESAScreen, and the screen itself is a linear block of
        memory. This means that you only ever have to deal with a buffer the
        same size as the screen, regardless of what the video card is
        actually using. The buffer itself is allocated dynamically from the
        heap except for one case - if you have an LFB and enough memory on
        the card for a hardware double-buffer, rather than use a buffer
        (which needs to be copied to the screen) you will instead write
        directly to the LFB (on the non-visible part). All of this happens
        transparently to your program (except for a small effect, detailed
        below), and brings us nicely to the next function:
    VBEDoScreen
        This function does the hard work of actually displaying the screen.
        How this works varies depending on the type of bank switching in
        use and whether a hardware double-buffer is also being used:

            Hardware double-buffer in use:
                LFB:
                    Swaps the currently displayed screen with the other
                    (currently not visible) screen, and updates _VESAScreen
                    to hold the start address of the now non-visible screen
                    in the LFB.
                Real/PMode bank switching:
                    Copies the screen starting at _VESAScreen to the other
                    (currently not visible) screen, and then displays this
                    screen at the next VBL

            Hardware double-buffer not in use:
               LFB/Real/PMode bank switching:
                    Waits for the VBL using the VGA registers and then
                    copies the screen starting at _VESAScreen to the actual
                    displayed screen.

        If you've got a Pentium or better CPU then the copying functions will
        use a Pentium-optimised copy method which is far superior to the
        usual method. (See below for more information).

        Using an LFB and hardware double-buffer means that all writes go
        directly to the card, which has one noticeable effect - each time
        you call this function, the screen you're currently drawing on gets
        swapped around. Usually this will have no effect because your main
        loop will be continually updating the screen. However, in some cases
        (see the demo code for an example), it can have an effect - remember
        that the buffer you're writing to actually holds the contents of the
        screen two frames ago, rather than the last frame. In the demo, which
        shows lines being drawn one by one, this has the effect of making it
        look like alternate lines were drawn (because the screens aren't
        being swapped now so only one of them is being shown).

        This will normally not be a problem. It will only ever show up if you
        rely on some kind of incremental display. One solution is to know of
        this and then draw one line at the start, and then two lines from then
        on (the current line and the previous). Another is to Malloc a block
        of memory for the screen and do the copy/display yourself. (You could
        use the other functions for setting the display).

        One situation where this sort of thing can become problematic is where
        you save part of the display, call VBEDoScreen and then restore the
        saved part of the display. In this situation you're not restoring the
        background of the current screen, but the frame before it. The mouse
        functions, which could be affected by this, check for this situation
        and adapt to it by keeping the background saved from the last frame
        and the frame before. They use _VESAScreenFlag to see which screen
        is currently being displayed. A similar situation would be where you
        draw in the background, save it, then put the sprites in over the
        top. After displaying the screen you restore the background and put
        in the sprites in the new positions. But the same solution exists for
        this problem, and I'm not convinced how often you'd use this technique
        anyway. If enough people want it though, I'll put in a bit to force
        VBESetMode to always use a Malloc'd buffer.

        Note that this function preserves no registers and might use any of
        eax, ebx, ecx, edx, esi and edi. (The actual ones used varies
        depending on the bank switching, etc).

    VBEGetDisplayStart : reg32 * 2 (x offset, y offset)
        Returns the current display position. Both the offset values are
        returned in pixels.
    VBESetDisplayStart(x offset, y offset : reg32)
        Sets the display start to the pixel offsets specified. Note that
        if this function succeeds, this _doesn't_ necessarily mean that
        the display got set to the new position. For example, my Matrox card
        reports success even though it didn't set it to the new co-ords. My
        solution was to try and set a new position and then use the get
        display start function to see if it was actually set.
    VBESetDisplayStartVBL(x offset, y offset : reg32)
        Identical to VBESetDisplayStart except it waits for the VBL. Note that
        this function isn't supported by many cards, and it's difficult to
        determine whether it is or not - after a call, my card returns info
        saying that the function's supported and that it succeeded, even
        though it didn't wait for the VBL at all...
    VBEGetPalette256(destination : val32)
        Copies the current 256-colour VGA palette to the address specified.
    VBESetPalette256(source : val32)
        Sets a new 256 colour VGA palette. The source should be the address
        of a 768-byte array of 256 palette entries. Each entry consists of
        three bytes holding the Red, Green and Blue values.
    VBEWaitVbl
        Waits for the vertical blank, using the VGA registers.

    Coding details:
        Writing VBE functions isn't difficult, just irritating and boring.
        The set mode routine is the only complicated one, but that's
        because of all the things it has to do.

        Some may be asking is all the extra hassle really worth it? Well,
        eliminating the flicker is a good point in my opinion (I was beginning
        to detect a hint of it in high-res modes) and the Pentium-optimised
        functions are well worth it - for those that are interested, yes, I did
        actually speed test the screen copying functions. Using a standard rep
        movsd for a 640x480x256 screen took about 1.5 million cycles (under both
        DOS and Win), which is 3/4 of a frame on a 100MHz machine (assuming all
        other things being equal) whilst using the Pentium optimised version
        took around a mere 250,000 cycles - 6 times faster!

        Note that the actual time for the Pentium optimised routine can be up to
        300,000 cycles, but usually stayed around 250,000. Even so, this is still
        5 times faster than normal... I was unable to determine the reason for the
        massive fluctuation every now and then. It appeared that the routine was
        taking around 250K cycles most of the time but would jump to 300K every
        now and then. I attribute this to the effect of the level 2 cache but
        only timing within a full application would give more info. I never saw
        it go above around 300K cycles so why worry? So many other factors can
        affect the speed anyway (eg Windoze, interrupts, sound playing, etc.).

-----------------------------------------------------------------------------

Mouse Subsystem - mouse.mac:

    Using the mouse under DOS is fairly straightforward, as is using it in
    mode $13. However, using it for VBE modes is a little bit tricky - mainly
    because there's no in-built support for the mouse cursor in these modes.
    These functions provide support for initialisation and de-initialisation
    of the mouse driver/handler, 16x16 and 32x32 cursors in any resolution
    and almost any color depth, clipped to the screen. Supported color depths
    are 8, 15, 16 and 32-bits. (All as far as I'm concerned, who really wants
    to use 3 bytes/pixel?). Note that these functions are all designed to work
    in VBE modes and hence require the video subsystem to work.
    
    Variables:
        _mymouse_xpos    The current x position (word)
        _mymouse_ypos    The current y position (word)
        _mymouse_state   The state of the buttons (word, each bit=1 button)
        _mymouse_cursor_default16_8    Some (poor) default cursors
        _mymouse_cursor_default16_16
        _mymouse_cursor_default16_32
        _mymouse_cursor_default32_8
        _mymouse_cursor_default32_16
        _mymouse_cursor_default32_32
    
    Functions:
    MouseInit(x resolution, y resolution, install flag : val32)
        Initialises the mouse driver and handler. The new handler is
        installed only if the install flag isn't 0. Carry is clear on
        success, set otherwise (no mouse driver).
    MouseExit
        De-installs the mouse handler (if one was installed) and resets
        the driver. Called automatically by Exit.
    MouseGetState
        If you've not installed the mouse handler you'll want to call this
        function to get the current mouse state. This may be useful if you
        don't want the handler being called every time the mouse moves (as
        it does usually).
    MouseSetCursor16_8(cursor : val32)
    MouseSetCursor16_16(cursor : val32)
    MouseSetCursor16_32(cursor : val32)
    MouseSetCursor32_8(cursor : val32)
    MouseSetCursor32_16(cursor : val32)
    MouseSetCursor32_32(cursor : val32)
        These functions all setup a new cursor for use by the (un)drawing
        functions. The new cursor should be simply an array of the color
        values, with color values of 0 being considered transparent. The
        last number (8, 16 or 32) is the bits/pixel. None of these functions
        preserve any registers (and use all)
    MouseUndraw16_8
    MouseUndraw16_16
    MouseUndraw16_32
    MouseUndraw32_8
    MouseUndraw32_16
    MouseUndraw32_32
        These functions restore the background of the cursor. Again, the
        last number is the bits/pixel. None of these functions preserve any
        registers (and use all).
    MouseDraw16_8
    MouseDraw16_16
    MouseDraw16_32
    MouseDraw32_8
    MouseDraw32_16
    MouseDraw32_32
        These functions draw the cursor on screen (saving the background
        in the process). The last number is the bits/pixel. None of these
        functions preserve any registers (and use all).

    Considerations:
        When you call the initialisation routine, be sure that you are already
        in the resolution you want. For example, setting the driver in the
        usual MS-DOS text mode to x=640,y=480 and then switching to this
        resolution will cause your cursor to jump across the screen...

        The 16 bits/pixel functions should be used for 15 bits/pixel color
        depths (why you'd _want_ to use 15bpp is another question entirely...)

        Why no 24-bit color support? Well, because I don't think many people
        actually want to use 24-bit color depth but mainly because it's a fiddly
        color depth to write functions for (I'm lazy) and if you want to use this
        many colors use 32-bit instead!

    Coding details:
        I'll detail the 256 color functions first.

        The cursors are in fact made up into 4 cursors, each copy shifted
        right by 0,1,2 and 3 pixels respectively. The 4 copies also hold
        the mask information so we can have transparent cursors. Why have
        four cursors though? I originally coded this with only one cursor
        but realised that by having the four copies I can ensure that all
        the data copying is aligned. You might be thinking that this
        really isn't worth it but each misaligned access requires at least
        3 extra clock cycles. For a 32x32 cursor this is at least 3456
        cycles extra for drawing and undrawing (assuming no clipping).
        Admittedly, this is only about 0.2%/frame on a 100MHz machine
        (at 50 frames/second) but it seems pointless to just waste cycles
        when there's no need for it, especially as using shifted versions
        makes the clipping (in the x direction) _much_ easier. Besides,
        you're using _assembler_ aren't you?

        When the _mymouse_set_cursor routine is called then the passed
        cursor is copied and its mask worked out. To keep things easy (for
        me anyway) internally the cursor is made up of a dword of a mask,
        a dword of pixels, etc. There is a routine that shifts the cursor
        by 1 pixel, but it's pretty boring so I'm not going to discuss it.

        The drawing functions are fairly easy, first we get to the right
        screen offset, work out which shifted cursor to use and see if we
        need to clip at all. Y clipping is easy - the loop count (usually
        the cursor height) is simply altered to whatever necessary so we
        don't draw off the screen. X clipping is slightly harder. For
        speed this is handled by another routine, otherwise it involves
        looping and checking overhead which is only required at the right
        hand side of the screen. The loop consists of saving the background,
        masking out the background and then oring in the cursor. And that's
        it, except that the x clipping routine has a loop to draw each
        cursor scanline (since it doesn't know how many dwords to do...)

        The undrawing functions are _very_ similar, except that they only
        restore the background instead of masking, etc.

        Having taken care of 8 bits/pixel, I eventually got round to doing
        functions for 15/16/32 bits/pixel. These functions work in a very
        similar way except they utilise rep in most of them (it's now
        quicker to save/restore registers and do a rep). I should probably
        do _mymouse_undraw_mouse32_8 with reps but I can't be bothered
        at the moment (I've not done a cycle count, but looking at it
        tells me that it would benefit from reps). The 16 bits/pixel
        functions use two cursors - one shifted, one not - whilst the 32
        bits/pixel just have one cursor (they're already aligned doh).
        Apart from these minor changes, they work in exactly the same
        way as the 8 bits/pixel functions, except they have to worry about
        the screen width not equalling the byte width, but the set cursor
        functions take care of (some of) this.

        I'm pretty sure these functions could be made faster but I don't
        really feel like going on a huge optimisation spree right now. I
        keep getting the nagging feeling that the functions are too long
        and too time-consuming for their own good. Any suggestions on how
        to improve them would be much appreciated.

-----------------------------------------------------------------------------

Keyboard Subsystem - kb.mac:

    If all you do is read single keypresses, then the standard DOS functions
    are fine. However, if you want to be able to detect any of those weird
    keys and be able to cope with multiple keypresses at once then you're
    out of luck. This subsystem provides a new keyboard handler which allows
    you to detect almost all keys on the keyboard, multiple keypresses at
    once, buffered input of up to sixteen characters at once and correct
    updating of the keyboard LEDs (except under NT which doesn't allow it).

    To use the subsystem, first call KBInit to install the new handler, and
    call KBExit before program exit to re-install the old one. Once the
    handler's installed there are two ways of reading the keyboard:

    1. Testing if a key's held down or not
        The table of longwords (dwords, I still think in 680x0) starting
        at _kb_key_table contains the current state of the entire keyboard.
        Test the key you want to by seeing if the value of the key in question
        is 0 or not. If it's 0 then it's not pressed, otherwise it is. Ie:

            cmp [_kb_key_table + _kb_space],dword 0        ; space pressed?
            jnz _space_pressed                             ; yes, handle it

        Note that I test if the key _isn't_ pressed. Don't test for a
        specific value representing a pressed key as these values might
        change! See kb.mac for the other scancode equates. You can test
        to see if capslock or any other weird key like left shift or something
        is held down just as easily. The example program demonstrates this.

    2. Read them using the buffered input

        Every key that is pressed gets put into the key buffer which can hold
        up to 16 keys. Once full, the keyboard state is still updated but no
        more keys are put into the buffer until some have been read out. There
        is a function available to read a key from the buffer with ax holding
        the ASCII code of the key pressed and also the scancode. Again, the
        example program provides a simple demonstration of this.

        There is an issue with the buffered input that it pays to be aware of.
        Under normal use, the keyboard LEDs are not updated and in this state,
        _every_ key is placed in the buffer, including as they repeat. And in
        this state even capslock and shift repeat and fill the buffer very
        quickly. Also, the shift state is _ignored_ when the LEDs aren't being
        updated. All in all, if you just want someone to type their name in
        it's a real bummer. However, if the LEDs are being updated, capslock
        et al are _not_ placed in the buffer and the shift states _are_
        updated. There are two functions to control the LEDs (see below), see
        the example program for a simple demonstration of this.

    All the functions use the carry flag as a result indicator - carry is
    clear on success, set on failure.

    KBInit
        Installs the new handler. By default the handler doesn't update the
        LEDs so call KBLedsOn to turn this on if required. If carry is set on
        return it means the handler's already been installed.
    KBExit
        Deinstalls the handler. Called automatically by Exit.
    KBReadKey : val32
        Reads the next key from the buffer. A failure result (carry set)
        indicates there are no keys in the buffer else the register specified
        contains the key information. The low-byte part contains the ASCII
        code (eg al for eax) and the high-byte part the scancode (eg ah for
        eax).
    KBWaitKey : val32
        Waits for a keypress, with the register holding the key pressed (as in
        KBReadKey). Note that the key buffer is cleared before waiting for a
        key so any keys already in the buffer will be discarded.
    KBLedsOn
        Turns the LED updating on. Only useful if you want a user to type
        something in.
    KBLedsOff
        Turns the LED updating off. This is the handler's default behaviour.
    KBBufferClear
        Clears the key buffer
    KBBufferCount : val32
        Returns the number of keys in the buffer. I must have been feeling
        really object-oriented to put this in.

    Considerations:
        It is _not_ a bug that there is a limit on the number of keys that
        can be detected simultaneously. This is a limitation of the keyboard
        itself. The actual number detected varies from keyboard to keyboard,
        and can also vary depending on what keys are being held down. I think
        most can detect four or five, and all I've come across can detect a
        couple more so this shouldn't really be a problem.

        The buffer only holds 16 characters at most. Once the limit is
        reached it continues to update the keyboard state as normal, but no
        more characters are added to the buffer. I can't really see this
        being a huge problem (DOS has a 16 character limit anyway).

        The arrow keys share the same scancodes as the corresponding keypad
        arrow keys (ie 8 is up, etc). This means that if you hold down the
        up arrow key and then press and release the keypad 8 key, it will
        be interpreted the same as pressing and releasing the up arrow key,
        even if you've still got the up arrow key pressed down.

        When running under plain DOS, even CTRL-ALT-DEL won't work. A reset
        will be required if your program goes rogue with no way of getting
        out. Of course, under Win95 the sequence will be trapped.

        Under plain DOS you can even test for the left and right Win keys
        and the menu key. Obviously, under Win95 these will be trapped.

-----------------------------------------------------------------------------

File Subsystem - file.mac:

    These provide some functions for file manipulation. All these functions
    use the carry flag as a result indicator - it's clear on success, set
    on failure.

    FileCreate(filename : val32) : val16
        Tries to create a file using the name pointed to by filename. If it
        succeeds, it returns a handle to the (open) file. If the file already
        exists, it will be reset to a length of 0 bytes.
    FileOpen(filename : val32) : val16
        Opens the specified file, using the path pointed to by filename (which
        should be a null-terminated string), and returns a handle to the file.
        This function is WFSE aware and will try and open a standard DOS file
        first. If this fails, it will then try and open a WFSE file.
    FileClose(handle : val16)
        Closes a file.

    FileEOF(handle : val16)
        Carry is set on return if the end of the file has been reached.
    FileExists(filename : val32)
        Checks whether a file exists or not - it doesn't if the call fails.
    FileSeek(handle : val16, offset : val32)
        Moves the file pointer for the file by the specified offset.
    FileSeekAbsolute(handle : val16, offset : val32)
        Seeks to the specified offset from the beginning of the file.
    FilePosition(handle : val16) : val32 (offset)
        Returns the current file pointer position.
    FileSize(handle : val16) : val32 (size)
        Returns the size of the specified file (bytes).

    FileRead(handle : val16, destination, size : val32)
        Reads the specified number of bytes from the file to the destination.
    FileWrite(handle : val16, source, size : val32)
        Writes the specified number of bytes from the source to the file.

    FileLoad(filename, destination : val32)
        Loads the specified file (from the null-terminated string pointed to
        by filename) into the destination address. This function is WFSE
        aware and will first try and load a standard DOS file. If this fails,
        it will then try and load a WFSE file.
    FileSave(filename, source, size : val32)
        Saves a file of the size specified from the address pointed to by
        source to a file of the name pointed to by filename (null-terminated).

-----------------------------------------------------------------------------

CD-Audio functions - cdaudio.mac:

    These provide the user with all the usual functions for accessing and
    playing audio tracks on CDs. You should call CDInit to detect the CD-ROM
    driver and drive, and then call other functions as required. If the call
    to CDInit fails calls to other functions will fail as well (ie you can
    still call them even if there's no CD drive there). All functions use the
    carry flag as a result indicator - carry will be clear if the function
    succeeded, set if it failed.

    Variables:
    _mycd_flag - dword holding whether the CD drive and driver have been
        detected. It will be 0 if undetected, otherwise will hold the driver
        version number.

    Functions:
    CDInit
        Call to detect the CD drive and driver. If successful, _mycd_flag will be
        some value other than 0 and carry will be clear (it will be set if fails)
    CDExit
        Deallocates resources. Called automatically by Exit.
    CDDoorOpen / CDDoorClose
        Open and close the tray.
    CDGetVolume : val32
        Call to get the current volume level of the drive.
    CDSetVolume(Level : val32)
        Call to set the volume level of the drive, with a value between 0 and 255.
        Values outside this range will be set to 255. Note that some drives can
        only interpret this as off (0) or on (1+).
    CDStop
        Stops any audio currently playing (if any is).
    CDResume
        Resumes playing any audio stopped by the above command. (The stop command
        to CD drives works more like a pause button).
    CDInfo : val32 * 3 (Number of tracks, start track, lead-out point)
        Returns info on the current CD. The lead-out point is in Red Book format.
    CDTrackInfo(Track number : val32) : val32 * 2 (start point, length)
        Returns info on the track specified.  This function will fail if the
        track isn't an audio track and the return values are in Red Book format.
    CDAudioPlaying
        Returns whether the drive is playing audio or not. For this function the
        carry flag indicates if it's playing (clear) or if it's not/error (set).
    CDPlayTrack(Track number : val32)
        Plays the specified track.
    CDPlayTrackLooped(Track number : val32)
        Plays the specified track looped. You'll need to invoke CDPoll periodically
        so that the track will keep looping (see below)
    CDPlayTracks(Start track, end track : val32)
        If the start track specified is before the first track on the CD or it's
        a data track then the function will look for the first useable track. For
        the end track, if it's beyond the last track it'll default to the last
        track on the CD. Hence, calling with a start track of 0 and an end track
        of $ff will play the entire CD. If the end track is before the start
        track then the function will swap them around.
    CDPlayTracksLooped(Start track, end track : val32)
        Same usage as with CDPlayTracks but it plays them looped. You'll have to
        invoke CDPoll periodically though.
    CDPoll
        This function should be called periodically (say in your main loop) and
        if you've called any functions that loop tracks it will do the work for
        you and keep the track(s) looping. If you're not looping any tracks then
        it won't do anything so you can call this in your main loop all the time
        without worry.
    CDRedBookToHSG(value : val32) : val32
        Takes the Red Book value and converts it to a High Sierra sector number.

-----------------------------------------------------------------------------

Standard Library:

Overview:

    The standard library is an attempt to provide many of the commonly used and
    required functions that assembly programmers would like, but don't have,
    much like the the standard C functions. Also included are various other
    functions that I've found useful or don't fit into any of the subsystems.

    Some parts are described as classes. This doesn't mean that they're classes
    in the proper OO sense, but that they really ought to be. However, until
    NASM becomes an OO assembler they'll remain as functions (but with a class
    like function definition).

-----------------------------------------------------------------------------

Character type testing - ctype.mac

    These provide several functions for testing bytes for different values. All
    the following routine use the carry flag as a result indicator - it's clear
    on success, set on failure - and they all take a single val8 parameter:

    Function:                  Returns success if the byte is a:
    IsAlphaNumeric             Alphanumeric character (a-z, A-Z, 0-9)
    IsAlpha                    Alphabetical character (a-z, A-Z)
    IsAscii                    ASCII character
    IsControl                  ASCII control character
    IsDigit                    Decimal digit (0-9)
    IsHexDigit                 Hexadecimal digit (0-9, a-f, A-F)
    IsLowerCase                Lowercase character (a-z)
    IsUpperCase                Uppercase character (A-Z)

    The remaining functions perform some simple conversions. These functions
    always succeed but the carry flag is undefined:

    Function:                  Result:
    ToAscii                    Converts byte to an ASCII character.
    ToLower                    Lowercases the character
    ToUpper                    Uppercases the character
    _ToLower                   As ToLower, but assumes its uppercase to start
                               with - ie if it wasn't results are unpredictable
    _ToUpper                   As ToUpper, but assumes its lowercase to start
                               with - ie if it wasn't results are unpredictable

-----------------------------------------------------------------------------

String manipulation - string.mac

    Most HLLs provide a useful set of functions for string manipulation, but
    from assembler the lack of any standard functions makes many simple tasks
    difficult. Here are the basic functions for manipulation of null-terminated
    strings.

    Unless otherwise noted, these functions always succeed and the carry flag
    is undefined:

    Length(source : val32) : val32
        Returns the length of the string pointed to by source.
    UpperCase(source : val32)
        Uppercases the string pointed to by source.
    LowerCase(source : val32)
        Lowercases the string pointed to by source.
    Copy(source, destination : val32)
        Copies the string pointed to by source to the address pointed to by
        destination (including the null terminator). It is the programmer's
        responsibility to ensure there is enough space for the copy.
    CopyPart(source, destination, start index, count : val32)
        Copies part of the string pointed to by source to the address pointed
        to by destination, starting from the index 
    CompareStrings(source, destination : val32)
        Compares the two strings byte for byte, failing if they're not the
        same. (The carry flag is set on failure, clear on success).
    CompareText(source, destination : val32)
        Compares the two strings case insensitively, failing if they're not
        the same. (eg 'hello' will be counted the same as 'HeLlO'). (The
        carry flag is set on failure, clear on success).
    PosChar(source : val32, character : val8) : val32
        Looks for the specified byte in the source string, returning the
        index of its first occurrence. This function will fail if it isn't
        found and the index returned will be zero. (The carry flag is set
        on failure, clear on success).
    PosLetter(source : val32, character : val8) : val32
        Works exactly like PosChar, but is case-insensitive.
    PosStr(source, substring : val32) : val32
        Searches for the first occurrence of substring in the source string,
        returning the index of the start of the substring in the source, or
        failing (the returned index will be zero in this case). (The carry
        flag is set on failure, clear on success).
    PosText(source, substring : val32) : val32
        Works exactly like PosStr, but is case-insensitive.
    Trim(source : val32)
        Removes all leading and trailing characters that are less than or
        equal to spaces in value.
    TrimLeft(source : val32)
        Removes all leading characters that are less than or equal to spaces
        in value.
    TrimRight(source : val32)
        Removes all trailing characters that are less than or equal to spaces
        in value.
    Reverse(source : val32)
        Reverses all the characters in the source string.
    Print(source : val32)
        WDOSXs Int $21 function 9 only works with $-terminated strings, and
        can also only cope with 16K strings at most. This function can be
        used to display a null-terminated string of any length (though why
        you'd _want_ to display a string that's > 16K is another matter...)
    PrintLine(source : val32)
        Works just like Print, except it also prints a newline afterwards.

    The following functions provide conversion functions, and those where the
    source is a string use the carry flag as a result indicator - it's set on
    error (ie number too large, invalid string), clear otherwise. On error,
    the result will always be zero.

    StrToInt(source : val32) : val32
        Converts a decimal string to a 32-bit signed integer.
    StrToIntU(source : val32) : val32
        Converts an unsigned decimal string to a 32-bit integer.
    HexToInt(source : val32) : val32
        Converts a hexadecimal string to a 32-bit integer.
    BinToInt(source : val32) : val32
        Converts a binary string to a 32-bit integer.
    IntToStr(source, destination : val32)
        Converts the source (signed) value to a string, storing the contents
        in the buffer pointed to by the destination. The buffer must be 12
        bytes minimum in size.
    IntToStrU(source, destination : val32)
        Converts the source value to a string, storing the contents in the
        buffer pointed to by the destination. The buffer must be 11 bytes
        minimum in size. Note that this function is unsigned.
    IntToHex(source, destination : val32)
        Converts the source value to a hex string, storing the contents in
        the buffer pointed to by the destination. The buffer must be 9
        bytes minimum in size.
    IntToBin(source, destination : val32)
        Converts the source value to a binary string, storing the contents
        in the buffer pointed to by destination. The buffer must be 33 bytes
        minimum in size.
    IntToStrFull(source, destination : val32)
    IntToStrUFull(source, destination : val32)
    IntToHexFull(source, destination : val32)
    IntToBinFull(source, destination : val32)
        These work exactly like the originals (ie without the Full at the
        end) except they return the full string (eg '0000FFFF' not 'FFFF').

    Dynamically Allocated versions:

    All these functions work exactly like their originals (ie without the
    'd' in front of them) except the strings they generate are allocated
    from the heap. The reg32 returned holds the address of the string (which
    should be used to Free it later on) or zero if it couldn't be allocated
    for some reason (heap full or not initialised). Carry is also set if the
    string couldn't be allocated:

    dUpperCase      (source : val32) : val32
    dLowerCase      (source : val32) : val32
    dCopy           (source : val32) : val32
    dCopyPart       (source, start index, count : val32) : val32
    dTrim           (source : val32) : val32
    dTrimLeft       (source : val32) : val32
    dTrimRight      (source : val32) : val32
    dReverse        (source : val32) : val32
    dIntToStr       (source : val32) : val32
    dIntToStrU      (source : val32) : val32
    dIntToHex       (source : val32) : val32
    dIntToBin       (source : val32) : val32
    dIntToStrFull   (source : val32) : val32
    dIntToStrUFull  (source : val32) : val32
    dIntToHexFull   (source : val32) : val32
    dIntToBinFull   (source : val32) : val32

-----------------------------------------------------------------------------

Argc/Argv/PSP string manipulation - string.mac

    These functions provide easy access to the arguments passed to your program,
    plus easy access to the environment variables. The functions use the carry
    flag as a result indicator - it's clear on success, set on error.

    Variables:
    _myargs_c
        A dword holding the number of passed parameters.
    _myargs_v
        A dword holding the address of the array of pointers to the parameters.
    _myargs_psp
        A dword holding the address of the array of pointers to the variables.

        You shouldn't really have any need to access these.

    Functions:
    ArgsInit
        Initialises the functions. This must be called at the very start of your
        program so it can get the addresses from esi, edi and ebp. This function
        always succeeds (carry is undefined however).
    Argc : val32
        Returns the number of passed arguments, which is always 1 or more. The
        first argument is always the full path of the program executed. This
        function always succeeds (carry is undefined however).
    Argv(index : val32) : val32
        Returns the address of a null-terminated string holding the parameter
        required. Parameters are numbered from 1 and on failure (ie parameter
        out of range) the returned address will be zero.
    GetEnv(name : val32) : val32
        Call with the address of a null-terminated string holding the name of
        the environment variable you want, to return the address of a (null-
        terminated) string holding the variable's contents. Note that the
        variable's name is considered case-insensitive and that if it isn't
        found the returned address wil be zero.

-----------------------------------------------------------------------------

StringList class - strlist.mac

    The StringList is a convenient and useful object that comes in useful in
    many different situations. It is a list of (null-terminated) strings,
    and functions are available to manipulate the list, sort it, load/save
    it to a file, etc. The list itself is not of a fixed size and will grow
    and shrink dynamically during its use. This is a powerful object, and the
    sort of thing not normally seen in a low-level language, and provides
    for easing many tasks that are usually quite difficult.

    All the functions use the carry flag as a result indicator - it's clear
    on success, set on failure. Items in the list are always numbered from 0.

    StringList.Create : val32
        Creates the list and returns a pointer to the list that should be
        used for other list functions.
    StringList.Free(list : val32)
        Deallocates any resources used by the list.
    StringList.Count(list : val32) : val32
        Returns the number of strings in the list.
    StringList.Sorted(list : val32) : val32
        Returns whether the list is sorted or not. (0=no, 1=yes).
    StringList.SetSorted(list, flag : val32)
        Sets whether the list should be sorted or not (0=no, 1=yes). Setting
        this flag on an unsorted list will cause it to be sorted.
    StringList.Duplicates(list : val32) : val32
        Returns the duplicates flag - what should happen when a duplicate
        string is added to a list:

            0 - Ignore the add
            1 - Flag it as an error
            2 - Add the duplicate string to the list

    StringList.SetDuplicates(list, flag : val32)
        Sets the duplicates flag, see above for the flag values.

    StringList.Add(list, string : val32)
        Adds a string to the list.
    StringList.Clear(list : val32)
        Removes all the strings from the list.
    StringList.Delete(list, index : val32)
        Deletes the string at the index specified.
    StringList.Exchange(list, index1, index2 : val32)
        Swaps the two list entries indexed around. Do not call this
        function on sorted lists.
    StringList.IndexOf(list, string : val32) : val32 (index)
        Returns the first occurrence of the string specified in the list.
        The returned index will be -1 if it wasn't found.
    StringList.IndexOfText(list, string : val32) : val32 (index)
        Same as IndexOf except that this is case insensitive.
    StringList.Insert(list, string, index : val32)
        Inserts a string at the index specified. Do not call this function
        on sorted lists.
    StringList.LoadFromFile(list, filename : val32)
        Loads the file specified by the (null-terminated) filename into the
        list. Each line of the file, delimitied by a CR/LF combination, is
        added as a string (without the CR/LF combination at the end).
    StringList.Move(list, index1, index2 : val32)
        Moves a string from index1 to index2 in the list.
    StringList.SaveToFile(list, filename : val32)
        Saves the entire list as a file. Each string is written to the file
        with a CR/LF pair appended to it.
    StringList.Sort(list : val32)
        Sets the list's sorted flag to true and sorts the list.
    StringList.String(list, index : val32) : val32
        Returns the address of the string at the index specified.
    StringList.SetString(list, index, string : val32)
        Sets the string at the given index to the new string specified.

    Coding details:
        The list itself is made up of two main structures - the main structure
        holding all the details of the object and the list structure which holds
        pointers to the strings themselves. The list structure will grow and
        shrink dynamically in 4K sections, and the strings themselves are
        dynamically allocated from the heap. This makes sorting, moving,
        setting, etc. strings much easier than having just one buffer for them. The
        drawback is the overhead of the heap blocks themselves though.

        The functions are not robust and do not perform any error checking on
        the parameters passed to them. Hence, if you end up passing an
        address that doesn't point to a StringList don't be surprised when
        page faults come back.

-----------------------------------------------------------------------------

System functions - system.mac

    This contains general system functions.
    
    AtExit(address : val32)
        Adds the address specified to the list of functions to call upon exit.
        When Exit is called, each address added is called to provide an easy
        way to deallocate resources, clear up anything, etc. Several points
        should be borne in mind:

          o Functions added are called on a first-in, last-out basis (ie in
            the reverse order they were added).

          o There is enough space for 64 exit functions. After 64 have been
            added further calls to this function will fail.

          o Exit functions may use any registers they like.

          o All of the subsystem functions which have exit functions, eg the
            VBE subsystem (VBEExit) are added automatically by the relevant
            initialisation routine (VBEInit in this example).
    Exit
        Terminates the program, calling any exit functions requested.

-----------------------------------------------------------------------------

Miscellaneous functions - misc.mac

    A collection of functions that don't fit into any category at all, but are
    useful nonetheless. Carry is used as a result indicator - clear on
    success, set on failure.

    Variables:
    _os_type
        Dword holding the OS your application is running under.
    _os_version
        Word holding the major version number of the OS.
    _os_subversion
        Word holding the subversion number.
    _cpu_type
        Dword holding the CPU type.
    _cpu_mmx
        Dword holding whether the CPU supports MMX or not.
    _cpu_fpu
        Dword holding whether an FPU is present or not.
    
    Functions:
    DetectOS
        Tries to detect the OS your application is running under. After
        calling, _os_type will hold the OS type. See below for the equates.
        If it's Windows, _os_version will hold the major version number and
        _os_subversion the minor one (both words). This function will detect
        Windows 95 even if the "prevent MS-DOS programs from detecting Windows"
        option is checked (though the version won't be detected in this case).
    DetectCPU
        Detects the CPU type. After calling, _cpu_type holds the CPU type,
        _cpu_mmx holds whether it supports MMX or not and _cpu_fpu holds
        whether an FPU is present. See below for the equates.
    MicrosecondDelay(time : val32)
        The passed value is the number of microseconds to pause for.
        According to the Interrupt List the resolution on most systems is
        997 microseconds so you shouldn't rely on this function too much.
    MillisecondDelay(time : val32)
        Pauses for the number of milliseconds passes. As above, note that
        the resolution on most systems is 997 microseconds, but this should
        be accurate enough for millisecond delays.
  
    For convenience, the OS and CPU equates are presented below:

        _OS_TYPE_DOS             0
        _OS_TYPE_WIN3            1
        _OS_TYPE_WIN95           2
        _OS_TYPE_WINNT           3
        _OS_TYPE_OS2             4
        _OS_TYPE_WARP            5
        _OS_TYPE_DOSEMU          6
        _OS_TYPE_OPENDOS         7

        _CPU_TYPE_386         $386
        _CPU_TYPE_486         $486
        _CPU_TYPE_586         $586
        _CPU_TYPE_PENTIUM     $586
        _CPU_TYPE_686         $686
        _CPU_TYPE_PENTIUMPRO  $686

        _CPU_MMX_UNSUPPORTED     0
        _CPU_MMX_SUPPORTED       1

        _CPU_FPU_UNSUPPORTED     0
        _CPU_FPU_SUPPORTED       1
