P'unk Avenue Window

Doctrine 2: Jonathan Wage at #sflive2010

February 16th, 2010 by Tom 1 Comment

I’m attending Symfony Live. Below are my notes from Jon Wage’s presentation on the upcoming Doctrine 2.0. My more opinionated, less explanatory comments tend to be in parentheses. Everything paraphrased brutally to keep up.

“Not the same old PHP ORM”

100% re-written codebase for PHP 5.3

“Are you scared? You shouldn’t be! Change is a good thing!”

5,000 records hydrate in 1.4 seconds. Whee!

10,000 records in 3.5 seconds.

Twice the data and still faster than Doctrine 1.

% improvement over Doctrine 2 is big. PHP 5.3 is a big win with a heavily OO framework. Better optimized hydration algorithm. New query and result caching implementations. All around more explicit and less magical code. Killed the magical aspect of Doctrine 1.

“The only valid measurement of code quality: WTFs/minute”

Good code: “WTF….. WTF”

Bad code: “WTF WTF WTF WTF WTF WTF”


Doctrine 1 magical features are both a blessing and a curse

Magic is great when it works but also causes pain. When it doesn’t work it is hard to debug. Edge cases are hard to fix. Edge cases are hard to work around. Everything is okay until you try to go outside the magic box. Magic is slow.

Replacement: OOP. Object composition, inheritance, aggregation, containment, encapsulation, etc.

Will it have behaviors? Yes and no. No “model behaviors.” Made up to work with Doctrine’s “intrusive architecture.” Tries to do things PHP doesn’t allow, results in a lot of problems (like sfMixin did).

If you could do it in 1 you can do it in 2, just differnetly. Natural OOP that wraps/extends Doctrine code or is meant to be wrapped or extended by your entities. (Injection? Factories? How do you apply two extensions?)

What did we use to build it? phpUnit 3.4.10 unit testing. Phing. Symfony YAML. Sismo. Subversion. Jira. And Trac.

Doctrine 2 Architecture

“Everything is an entity.” A lightweight persistent domain object, a regular PHP class. You don’t extend a base doctrine class. No final methods.

namespace Entities;
class User
{
  private $id;
  private $name;
  private $address;
}

Note use of namespaces.

EntityManager is central access point to ORM. Used to query for persistent objects. Employs transactional write behind strategy that delays execution of SQL to execute it in the most efficient way.

Tests ran against multiple databases. Sqlite, MySQL, Oracle, PgSQL, more to come.

Unit Testing

Impressive #s of test cases and assertions. Tests run in a few seconds instead of 30-40 seconds. Very granular, explicit unit tests, not many functional tests. Easier to debug. Continuously integrated by Sismo.

No, Sismo is not available yet, bug Fabien! (We have a pkcommit script of our own that doesn’t let you commit without a warning unless you pass symfony tests, which is a nice poor man’s version)

Database Abstraction Layer

Standalone package and namespace (Doctrine\DBAL).

Can be used standalone.

Much improved over Doctrine 1 in regards to the API for database introspection and schema management. (Can you alter a table in a more db-portable way wrt type names?)

Hopefully defacto standard DBAL for PHP 5.3 in the future like MDB was for 4.x

prepare($sql)
executeUpdate($sql, $params)
fetchAll... fetchBoth($sql, $params) - both assoc and flat...

DBAL Introspection

listDatabases
listFunctions
listSequences
listTableColumns($tableName)

Schema Representation

$schema = new \Doctrine\DBAL\Schema\Schema();  <- namespaces are nutty
$myTable = $schema->createTable("my_table");

Replacing diff tool with the ability to compare schema objects.

Doctrine 2 Annotations

/** @Id @Column(type="integer") @GeneratedValue */
private $id;

This gets parsed out. Alternative to schema.yml. (Lets you write everything in one place, so perhaps more convenient.)

Things to Notice

Entities no longer require extending a heavy base class
Domain model has no magic, is not imposed on by Doctrine and is defined by raw PHP objects and normal OO
Big performance impact
Easier to understand due to less magic. As Fabien says, “Kill the magic…”

(We’ve introduced a little magic here and there in our own stuff, maybe it’s less evil within a shop than without.)

Doctrine 2 *does* allow YAML.

Entities\Address:
  type: entity
  table: addresses
  id:
    id:
      type: integer
etc

You can do XML too

PHP “use” all necessary namespaces and classes

Recommended very long use statement, but it’s in the bootstrap function
Then require the Doctrine ClassLoader
Core classes, entity classes, proxy classes
Configure your Doctrine implementation by talking to the ORM configuration object

(A class called Configuration? Won’t this tend to clash with other stuff, like Symfony 2? I know about namespaces but Configuration means we’ll have to deambiguate every time)

Entity manager…

$connectionOptions = array('driver' => ... )
$em = EntityManger::create($options, $config);

Use a closure to lazily load the EntityManager…?

$user = new User;
$user->setName('Jon Wage');
$em->persist($user);
$em->flush();  <- Performance optimizes all that has gone before. Multiple row insert in MySQL? OooOooo

No more magic methods!

Inserting 20 records with Doctrine: way fast... compared it to raw MySQL... Doctrine 2 is faster because he's rolling a single transation around the 20 inserts compared to naive PHP that has no transaction calls.

(I didn't realize transactions had such a big performance benefit, I thought of them solely as a consistency and concurrency issue... without them MySQL must do an implicit transaction for every INSERT which slows you down.)

Doctrine blog piece on transactions and performance

DQL parser re-written from scratch. DQL parsed by top down recursive descent parser that constructs an AST. Generates the SQL to execute for your DBMS. DQL has a real BNF now and tells you what your errors are

(You could more easily teach it to manage a non-SQL backend now. Or accidentally de-optimize your SQL. Either way)

$q = $em->createQuery('select u from MyProject\Model\User u');
$users = $q->execute();

Cache Drivers

fetch, contains, save, delete methods similar to sfCache classes

Wrap existing Symfony, ZF, etc cache driver instances with the Doctrine interface

Slick ways to clear part of the cache:

deleteByRegex($regex)
delteByPrefix($prefix)
deleteBySuffix($suffix)

These are used with the result caching feature. (Which we should probably leverage in Apostrophe, and use the above delete methods to avoid inconsistency. This stuff has been backported to 1.2)

$query->useResultCache(true, 3600, 'my_query_name'); <-- cache key. These are easily purged fast with the above

New CLI

dbql:run-sql executes a manually written raw SQL statement or file. (We could make sfSyncContentPlugin db-portable with this perhaps... it's half of it)
orm:clear-cache clears query, result and metadata cache, has options to be more specific
orm:convert-mapping converts metadata information between formats. Convert between YAML and XML and annotated PHP.
orm:ensure-production-settings verifies that Doctrine is properly configured for a production environment, throws an exception if the environment is not suitable for production. (? What's suitable?)
orm:generate-proxies
Proxy objects are objects that are put in place or used instead of the real object, adding behavior to the object being proxied without that object knowing. Makes lazy-loading features possible. (Is this how you add multiple "behaviors" to one object?)
orm:schema-tool --update compares local schema to database and updates database accordingly, period. What about fancy migrations? "They don't exist yet but they will as an extension"

Inheritance

  • Mapped Superclasses
  • Single Table Inheritance
  • Class Table Inheritance

A mapped superclass is an abstract class with no table in the database, it is a parent that can have multiple children, you get one table for each child class. "Concrete inheritance" in Doctrine 1, I believe.

Single table inheritance = column aggregation inheritance in Doctrine 1. A single table with the columns of all of the children. One of the fields is used to discriminate between types and hydrate the right class of object. (Will there be a prefix naming scheme of some sort to prevent column name conflicts when subclasses are written by multiple people? There isn't in Doctrine 1.)

Class table inheritance. This is new. Employee extends Person, BlogPost extends Node. MyCustomBlogPost extends BlogPost. Node is a base table with an ID. BlogPost has a foreign key back to node, MyCustomBlogPost has a foreign key back to BlogPost. Doctrine builds a join across all three tables.

Jon's example: a CMS with many node types. BlogPostCustom inherits from BlogPost inherits from node. BlogPostCustom has a blogpost_id field, which is a foreign key on blogpost, which has a node_id field, which is a foreign key on node. Again you still need a field to identify the right class to hydrate. And you need to know which subclasses exist so you know how many LEFT JOINs are needed.

(This is elegant, but is it fast? Going with Jon's CMS example, which seems inspired by Drupal, if you have ten node types on a page you have at least 11 classes in your join. In Apostrophe we use Doctrine column aggregation inheritance for slots, but generally serialize() things into the value column unless we need a foreign key. For the media repository we do use a refclass connecting slots to media objects, which is similar to what Jon proposes here. We'll be adding an API that lets you add additional joins of this kind. It's probably a win to avoid joins for simpler subclasses and use them only when there's a big payoff like ON DELETE CASCADE. Of course I'm prepared to be convinced otherwise by good benchmarks.)

Batch Processing

Batch processing by taking advantage of transactional write-behind behavior of entity manager... insert 10,000 objects with a batch size of 20, calling flush() and clear() every 20 objects. Calling clear() detaches all of the objects from doctrine. I can also detach single objects with detach(). If I don't detach them, Doctrine must keep a reference to them in case they change and therefore need to be included in the next flush(). So to avoid memory leaks in long running tasks one must detach or clear appropriately.

Query::iterate() method avoids loading everything into memory at once, bringing things into memory one at a time. Call flush() to execute your updates, then clear(). Note that you can change fields in your already-persisted object and call flush() and they save. save() is implicit and no longer exists as a method on an object. Remember, your objects are just objects now, and they do not inherit from Doctrine_Record anymore.

Bulk Delete

Same thing, you can iterate and then call remove() on things, and flush() does the actual work, call it at sensible intervals for performance.

Native Queries

You can now define an express mapping between a native SQL query and the fields of your objects to hydrate them exactly as you wish.

Proxies

Proxy objects seem to be wrappers for your original objects that come into play in a way that probably helps to implement stuff like what behaviors do in Doctrine 1, but this wasn't discussed at length. I'd like to know how one does NestedSet, Timestampable and the like elegantly with Doctrine 2.0.

Jon is thinking a stable version is 6-12 months away, possibly dovetailed with the next major releases of Symfony and Zend.

Code Generation

Jon was asked how you would add a phone number to a customer, or other one to many relation, if there is "no magic" in Symfony 2.0. It's clear how columns of a table are handled (a persisted Doctrine object's properties are grabbed and saved on flush() ) but relations seem to require some magic.

Jon replied that before the stable release there will be code generation available for such purposes.

Some people felt this amounted to an admission that Propel does things the right way.

An audience member asked Jon if he'd just reinvented the latest version of Propel. This is the most-tweeted comment of Symfony Live 2010 so far. But the answer is no.

Propel 1.5 resembles Doctrine today because Propel 1.5 borrowed much of the DQL query language for its new convenient, chainable query syntax. If Propel can lift concepts from Doctrine, then it doesn't make sense to knock Doctrine for adopting code generation if that has proven to be the right way to go.

What's more important is that, based on my reading of the what's new in Propel 1.5 documentation, Propel has not adopted the flush()/clear() technique that makes Doctrine 2.0 so fast.

Propel 1.5 seems to have caught up with Doctrine 1.3, no more or less.

Doctrine 2.0 does need to do a good job of replacing the behaviors feature it will apparently abandon, especially since Propel 1.5 has a number of behaviors (again imitating several seen in Doctrine). I look forward to hearing Jon articulate his plans for things like nested sets, sortable, and timestampable tables in Doctrine.

(Incidentally, limitations of Propel's nested set feature are one of the reasons why we didn't choose it for Apostrophe. A nested set without a level column can't be used to efficiently fetch children of a page only to a certain depth. This is a potentially fatal flaw in a CMS application with thousands of pages.)


Check out Apostrophe, our Symfony and Doctrine-based content management system!

One Response to “Doctrine 2: Jonathan Wage at #sflive2010

  1. The only valid measurement of code quality: WTFs/minute | nerdpress.org Says:

    [...] Hier gibt es ein paar Notizen von Jonathan Wage’s Präsentation dazu! [...]

Leave a Reply