Unique Object Instances

Update: ticket:280 has been created for this task.

Clarification

Note that this feature is called "Unique object instances" not "object cache". The latter is quite different and more difficult (due to the classic "stale data" problem). However unique instances is probably a prerequisiste for any object caching, so as we decide exactly how to implement unique instances we should keep future object caching in mind.

The Current State

At the moment when you do this

$obj1 = SomePeer::retrieveByPK(1);
$obj2 = SomePeer::retrieveByPK(1);

you get 2 distinct instances containing the same information from the same DB row. This is bad for following reasons:

  • if you change $obj1 and $obj2 with setSomeField('some value') there is no way to resolve the conflicts if you want to save
  • memory is wasted with duplicate copies
  • the second retrieveByPK call should not require a roundtrip to the db
  • DB rows are unique (by PK) their object representations should be too
  • if your class/table is on the one side of a one-to-many relationship then propel will build you getChildren() type methods. These methods use "in-object caching" so that if you ask the same question again, the set of child records is not requeried from the DB. This in object cache is much less effective if you have multiple instances of the same object/row, because each instance has its own "in-object cache" which needs to be filled once.

Implementation Plan

We want to tap into as few as possible of the existing generated methods so that unique instances are always returned to the client code. This should be totally transparent to propel users and should break BC only in very obscure cases where people are purposely manipulating dual instances and performing their own conflict resolution before saving them.

The 2 key methods that need to change are:

retrieveByPK populateObjects

The former is easy, because we know the PK. ie if we already have an instance for that row, we just return it without querying the DB again.

The latter is the central place where new objects are created (hydrate is too late, because we already have an instance then). The current plan is "don't try to be too clever". In practice this means: "don't try to figure out if we need to run some complex select query at all because we may already have all the objects in the answer".

So unless we have the PK (see above), we will always run the query, but then return existing instances from the instance pool or new ones as required. This still saves the (relatively costly) hydrate() and keeps the code simple and reliable.

Below is a code sample on how all this can be done.

But first some TODOs/to be clarified:

a) find a tidy way to access the primary key (or compound primary key) from the result set perhaps via a method such as the one suggested in #187:

public static function getPrimaryKeyHash(ResultSet $rs, $startcol = 1)

b) provide a new version of retrieveByPKs which checks instance pool first before running the doSelect, because we may have all the keys in instance pool already

c) doSelectJoin* methods need to be changed too, because they don't use populateObjects but the change is not complex and very similar to what we are doing in populateObjects

d) consider moving the instance pool to a central class which would sit as a static property of Propel like Propel::ObjectInstances, rather than implementing static $instances on each peer class. This may have some advantages for future object caching, because it would provide one centralised array of instances, which in turn has advantages because of the way serialize() works with references.

e) other tweaks anyone can come with...?

class KeywordPeer extends BaseKeywordPeer {

  private static $instances = array();

  /**
   * needed in some rare circumstances
   */
  public static function clearInstancePool()
  {
    self::$instances = array();
  }

  /**
   * The returned array will contain objects of the default type or
   * objects that inherit from the default.
   * overridden here to implement a unique object instance pool
   *
   * @throws PropelException Any exceptions caught during processing will be
   *            rethrown wrapped into a PropelException.
   */
  public static function populateObjects(ResultSet $rs)
  {
    $results = array();

    // set the class once to avoid overhead in the loop
    $cls = KeywordPeer::getOMClass();
    $cls = Propel::import($cls);
    // populate the object(s)
    while($rs->next()) {
      // returning unique instances is faster and more effificient
      // on memory. plus uniqueness is better for a number of other issues
      // such as any caching that the objects themselves implement is much
      // more effective when you only have one instance. plus client code
      // can rest assured that it has the only object for that row in existence.
      // the doSelectJoin* methods should also implement unique instances.

      // following line could be pulled out into a separate method
      // which could also cope with compound primary keys
      $pk = $rs->getInt(1);
      if (array_key_exists($pk, self::$instances))
      {
        $results[] = self::$instances[$pk];
      }
      else
      {
        $obj = new $cls();
        $obj->hydrate($rs);
        $results[] = $obj;
        self::$instances[$pk] = $obj;
      }
    }
    return $results;
  }

  /**
   * Retrieve a single object by pkey.
   * overridden here to provide a hibernate style unique instance pool
   *
   * @param mixed $pk the primary key.
   * @param Connection $con the connection to use
   * @return Keyword
   */
  public static function retrieveByPK($pk, $con = null)
  {
    // for the retrieve by PK method we know what we are getting
    // so don't bother querying in the first place if we have the instance already
    if (!array_key_exists($pk, self::$instances))
    {
      self::$instances[$pk] = parent::retrieveByPK($pk, $con);
    }
    return self::$instances[$pk];
  }