The MongoDB OML allows storing objects which are subclasses of the MongoDBObject class in a MongoDB database. However, only the object attributes defined using the facets described in this section are actually stored in the database. All other object facets are simply discarded when the object is stored in the database.
All of the special MongoDB OML facets can be imported from facets.extra.mongodb.api and have names of the form DBxxx to help make their special database semantics stand out better when reading a class definition. In addition, most of the facets have semantics identical to other, standard facets that they are based on. In cases such as this, the facet name is formed simply by prepending the DB prefix to the standard facet name (e.g. DBStr).
The remainder of this section provides a brief description of all of the available MongoDB OML facets. A follow-on section then provides additional details about those MongoDB OML facets that have more complex database specific semantics.
Any simple value (Bool, Int, Float, Str, ...). It has the same semantics as the standard Any facet.
Note however that it should not be used for saving object references. Use the DBObject, DBRef or DBLink facet instead.
A list of heterogeneous simple data values (Bool, Int, Float, Str, ...). It has the same semantics as the standard List facet.
Note however that it should not be used to store object references. Use a DBObjects, DBRefs or DBLinks facet instead.
A dictionary of heterogeneous simple data values. It has the same semantics as the standard Dict facet.
Note however that it should not be used to store object references. Currently there is no way using the MongoDB OML to store database object references using a dictionary-based value.
A set of heterogeneous simple data values. It has the same semantics as the standard Set facet.
Note however that it should not be used to store object references. Currently there is no way using the MongoDB OML to store database object references using a set-based value.
If you’ve already read the preceding section on the MongoDB OML facets, you’ve probably noticed the perhaps confusing variety of facets provided for storing database object references. In this section we hope to dispel as much of that confusion as possible by explaining in detail the various nuances between the various object reference facets and why they exist.
As you’ve presumably already gathered, there are three basic object reference types: DBObject, DBRef and DBLink, along with their corresponding list forms: DBObjects, DBRefs, DBLinks.
In each case the difference between the singular and plural forms is the same. The singular form holds a single object reference, while the list form holds a list of object references which may contain zero, one or more object references. That difference affects how you use them within your Python code, but as far as the MongoDB OML layer is concerned, the singular and plural form both have the same semantics. Since it is the MongoDB OML semantics we are concerned with in this section, we will not make any further distinction between the singular and plural forms in the remainder of this discussion.
So now the question becomes what is the semantic difference between DBObject, DBRef and DBLink. All of them accept the same types of values, namely MongoDBObject (or one of its subclasses) instances. The differences lie in how they are stored in the underlying MongoDB database and how the MongoDB OML layer handles them.
If we were simply pickling a large object hierarchy and storing the result in a file, we wouldn’t need to make any distinction between a DBObject, DBRef or DBLink. We would simply serialize each object in the object graph as we encountered it, and write the entire resulting byte stream out to a file. When we unpickle the object graph, we would simply reverse the process by reading the entire file and reconstructing the original object graph.
The problem with this approach is time and memory. As the size of the object graph increases, the time required to load, and amount of memory needed to hold, all of the objects might eventually reach unacceptable levels.
This is one of many reasons why developers often turn to storing their data in a database of some sort. Databases are typically designed to store huge amounts of data and provide quick access to only those parts of the data needed by an application at any particular instant.
One of the major design goals of the MongoDB OML is to provide quick access to huge amounts of stored data while still preserving the convenience of defining and using an object oriented data model. And the approach it takes is to recognize that there is no “one size fits all” solution to mapping an object model onto an underlying database, and to provide several solutions that developers can choose from when creating their data model. Hence the need for the DBObject, DBRef and DBLink facets. Each provides nearly identical data modeling semantics, but provide different performance and size trade offs when looked at from the underlying database storage layer point of view.
So in what follows we’ll be looking at various common use cases and what effect the different MongoDB OML object reference facets might have when applied in those cases.
As we’ve seen, owned data is logically part of the owning object and is deleted when the owning object is deleted. However, in practical terms, we can make a further distinction between two different types of owned data: immediate and deferred.
As an example, let’s take the case of an application containing information about a server farm. Each object representing a server might contain information about the server’s hostname, IP address and a log of all server activity for the last month.
Now each of these three pieces of information is data owned by the server object, since it all applies to the specific server it is associated with. If we ever delete a server object, we should delete all of its owned data as well. The problem is that the log data may be very large and seldom used by most applications using the MongoDB database containing the server data, and so it seems very inefficient in terms of time and memory to load all of the log data every time an application instantiates a server object from the database.
The MongoDB OML helps you address this problem by providing both immediate and deferred object references. An immediate reference is an object whose data is loaded into storage at the same time its owning object is loaded. A deferred reference is an object which is not loaded into storage until the application code that loads the owning object actually references the deferred object.
With this distinction in mind, it should now be clear that we would probably want the server hostname and IP address to be defined as immediate data, since they are fairly small and used often by applications, while the server log data should be deferred, since it is large and infrequently used by applications.
In MongoDB OML, all facets except for DBRef(s) and DBLink(s) are immediate mode facets. Both DBRef(s) and DBLink(s) facets are deferred mode, meaning that their associated object data is not loaded until explicitly referenced by application code.
So now we can proceed with a possible definition of our hypothetical server object classes:
class LogData ( MongoDBObject ):
data = List( Str )
class Server ( MongoDBObject ):
hostname = DBStr
ip_address = DBStr
log = DBLink( LogData )
We’ve used a DBLink facet rather than a DBRef facet because the LogData object is owned by the Server object, and we want it to be deleted automatically when a Server object is deleted. But because DBLink is also a deferred facet, we know that the LogData object will never be loaded into memory unless some application code explicitly references it. For example:
for item in server.log.data:
print item
Now that we’ve explained the distinctions between owned and shared data, and between immediate and deferred access, it might be useful to provide a table showing how these concepts apply to the various MongoDB OML facets:
Facet | Data Is | Access Is |
---|---|---|
DB... | Owned | Immediate |
DBObject | Owned | Immediate |
DBObjects | Owned | Immediate |
DBLink | Owned | Deferred |
DBLinks | Owned | Deferred |
DBRef | Shared | Deferred |
DBRefs | Shared | Deferred |
where DB... refers to the DBAny, DBBool, DBInt, DBFloat, DBStr, DBList, DBDict, DBSet and DBRange facets.
Finally, let’s finish up this section with a couple of additional tips about the trade-offs in using the various MongoDB OML object reference types:
The facets of DBObject(s) object references can be used in query strings (e.g. “employee.first_name == ‘Jack’”, where first_name is a facet of the employee DBObject facet), while the facets of DBRef(s) and DBLink(s) object references cannot.
The reason for this is that DBObject(s) facets store their data directly in the owning object’s MongoDB database representation, where it can be accessed directly by the MongoDB query engine. Both DBRef(s) and DBLink(s) facets only store object reference information in the referencing object’s MongoDB database representation. The actual object data is stored separately in the database and is not directly accessible to the query engine.
Since DBLink(s) facets defer the storage and time penalty of loading referenced objects until they are actually used by an application, it might seem like using DBLink(s) facets would always be more efficient than using DBObject(s) facets. In practice, the actual storage and time efficiency probably depends heavily on the size and access patterns of the data involved.
If any of the object data is always accessed by applications, then deferring the object load may actually be slower than loading it with the owning object since multiple database requests are required to load the separate objects. Plus there is the additional database storage overhead required for saving the database object reference in the owning object.