NAME

Tie::DistHash -- a tie implementation facilitating a shared cache.


SYNOPSIS

  use Tie::DistHash;
  my(%hash);
  my($cache) = tie(%hash,'Tie::DistHash',
                   ckptfile => 'mycache.dat',
                   gc_at_ckpt => 0);
  $hash{session_key} = {arbitrary=>[$data,\%structure],including=>$objects}; 
  ...keys %hash...values %hash...
  foreach my($key,$item_as_string) ( $cache->eachString ) {
    print "$key likely has objects!\n"
      if $item_as_string =~ m/bless\(/;
  }


DESCRIPTION

Tie::DistHash is a tie() implementation with some useful extras for implementing a cache, like checkpointing, garbage collection, and parsing of arbitrary data structures to strings that can be passed to eval.


CONSTRUCTOR

Tying a hash to this class will return an initialized DistHash object. Use this object to run methods (see METHODS) and set attributes (see ATTRIBUTES). Using the hash will change the object.


METHODS

Most of what is likely to get done with DistHash is as straightforward as using a normal hash (once that hash has been tied). Some things that are likely to get done fairly regularly, though, are difficult to implement purely through a tie interface. So it becomes a handy-dandy thing that tie returns an object, which can have methods, etc, etc. Tying to DistHash returns an object with the following methods:

set_timeout $timeout [,$key]
Sets the expiration interval to $timeout, the default for items subsequently added, or for just one item if $key is supplied.

get_timeout [$key]
Returns the default expiration interval, or the expiration interval for just one item if $key is supplied.

get_atime $key
Returns the last time $key was accessed. If time is greater than get_atime($key) and get_timeout($key) combined, the next garbage collection will remove $key.

checkpoint [options]
Performs a checkpoint. options is a hash that can override default attributes affecting checkpoints. See ATTRIBUTES.

collect_garbage
Goes through garbage_collection. Usually taken care of during checkpoints.

integrity_check
Performs an integrity check. This should only be necessary during development, or conceivably once in a while for mission-critical apps.

toString
Returns the entire hash as a string that can be passed to eval. Be aware that it always returns the entire hash...

eachString
Like each, but returns the next item as a string that can be passed to eval. Two out of three dentists prefer this over toString.

init_from FH
Takes the filehandle of an open checkpoint file and initializes the hash from it.

undump
Initializes the database from a string like the ones created by toString. Try to use eachString and eval rather than toString and <undump> for large hashes.

sync KEY
Sends a sync update of KEY to the sync peers. Useful if $db{KEY} is a ref to something (i.e., $db{KEY} = {name=``who'',phone=>5551234}>), and something inside that something has changed (i.e., $db{KEY}{name}="whoelse"). In this case, the value of $db{KEY} stays the same, and the tie implementation sees no only the access of KEY, so only a TOUCH sync happens. After making a change like that, $dbobj-sync(KEY)> would make sure that everyone in the sync pool sees the change.


OPTIONS

Most of the behavior of a hash tied to DistHash is controlled by options. Options are usually set at initialization, when the object is tied. Simply pass a hash to tie() after the first two arguments. See SYNOPSIS.

v
Verbose flag, warning messages are on if true. Defaults to false.

def_expint
The default expiration interval for items in this hash. 0 means never expire anything. Defaults to 7200 (2 hours).

basedir
Working directory. Default location for checkpoint files. Defaults to the contents of the PWD environment variable, or ``/tmp''.

ckptfile
Checkpoint file. '' means never write checkpoints. Defaults to CH_dump-PID.dat.

gc_at_ckpt
Whether to collect garbage at checkpoints. Defaults to 1.

ic_at_gc
Whether to check integrity at garbage-collection. This does NOT apply to the garbage collection at checkpoints! Defaults to 1.

ic_at_ckpt
Whether to check integrity at checkpoints. (gc_at_ckpt has no effect on this.) Default to 0.

sync
An arryref to a hash of hosts. See SYNCHRONIZATION. Takes the form of
 sync => [ { host => "hostname1", 
             addr => "1.1.1.1:1234",
             sync_to => 10
           },
           { host => "hostname1", 
             port => 1234, addr => "1.1.1.1",
             sync_to => 10
           }
         ]

The contents of {addr} will overwrite those of {host} (and {port}, if that part of {addr} is supplied.)

One of the elements of this array will usually specify a local address, and there will usually be one or more remote specifications. If there is no local spec that can be bound to, we will only send syncs (see sync_checkint.)

sync_port
Default sync port. Defaults to 8789. See SYNCHRONIZATION.

sync_to
Default UDP timeout for syncing. Defaults to 1. See SYNCHRONIZATION.

sync_atime
Minimum time between touches. Defaults to 60. See SYNCHRONIZATION.

sync_checkint
Time between checks for inbound sync commands. Defaults to 3. Set to 0 to enable send-only mode. See SYNCHRONIZATION.

sync_state_retrieval_hook
A CODEREF to run if no peers respond to our full sync requests upon startup. If it returns 0 or doesn't exist, state is retrieved from a checkpoint file.

sync_events
A hashref of whether to sync various events. See SYNCHRONIZATION. Defaults to
 sync_events => { ACCESS  => 1, # to refresh access time for timeouts 
                  DELETE  => 1,
                  MODIFY  => 1,
                  ADD     => 1,
                  SETTO   => 1, # when setting a timeout
                  CLEAR   => 0, # eg, before assigning to the hash
                  DESTROY => 0, # end of tie
                  CKPT    => 0, # at checkpoints
                  INTCK   => 0, # at integrity checks
                  GC      => 0  # at garbage collection
                }

NOTE: CKPT, INTCK and GC are not yet implemented.


SYNCHRONIZATION

At startup, the DistHash tries to retrieve state via a full sync from one of the servers in the pool. It will accept sync commands from other servers while this full sync completes. If no server in the pool is reachable, the CODEREF sync_state_retrieval_hook is run. If it returns 0 or doesn't exist, then state is retrieved from the checkpoint file. Only after a full state retrieval is complete will the tie operation return.

When an event occurs that requires synchronization per sync_events, a command is sent to each server in the sync pool.

Note: Be aware that only events actually handled by the tie can sync automatically. If a value is a reference, and something in what's referenced changes, there's no way for a tie to know automatically. See sync for a way to tell Tie::DistHash manually.

For fetches and other ATIME-only updates, a touch is sent; no more than one every sync_atime seconds. For data inserts, updates and deletes, an appropriate command is sent, basically duplicating the command received by the server performing the sync.

Sync commands are checked for every sync_checkint seconds. This mechanism is built using SIGALRM, so if you use sync, be careful when you set or modify $SIG{ALRM}!

Synchronization at this time is accomplished in the blind UDP way. How trustworthy this is depends on how trustworthy your network is. This also implies a limit to the size of a value before you get errors about too-large message sizes. Messages as large as 11K work fine, but this limitiation will hopefully be lifted in the future.