Implementing Java Persistence with MongoDB and NoSQL Databases

From Opentaps Wiki
Revision as of 19:24, 25 July 2012 by Oandreyev (talk | contribs) (Switching Persistence Methods)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Why NoSQL Databases?

Wow, opentaps Notes is such a useful application! If only I can use it to take notes on my

  • Customers?
  • Suppliers?
  • Orders?
  • Quotes?
  • Products?
  • Articles?
  • Blog Posts?
  • Recipes?
  • Comic books?

The list goes on and on. The fact is, you probably want to write notes on any thing, don't you?

But a relational database is not so flexible. You have to define all the fields of your tables in advance, before you can use them. So, you're stuck with three options, each with their problems:

  1. You can define all the fields that your notes are related to. For example, you can add orderId, customerId, quoteId, productId, etc. fields on your Note field. This approach is either very limiting -- you can only use notes for those fields that you've defined here -- or very inefficient -- you will define hundreds of optional fields, most of which are never used. In either case, if you want to add a new field, you have to add a new field first.
  2. You can create a fully normalized data model with an additional table to join your Notes to things like customers, orders, quotes. (This is what the data model in opentaps 1.x does.) This is very flexible, but querying gets difficult very quickly. For example, it's easy to find all the notes of customer X or quote Y. But what if I want only the notes that are relevant to customer X and quote Y and order Z? Pretty soon you'll be creating a data warehouse for your notes just so you can do these queries.
  3. You can hack it by adding unspecified attribute-value pairs for your notes, and then tell yourself "Remember that attribute1 is order number, attribute 2 is customer number, etc." This may scale programmatically (Yay! No complex queries or nearly empty tables), but it won't scale programmer-matically (Did he say attribute 1 or 10 was order number? I thought it was attribute 10 too...)

Associating notes to other data is a trivial example. When you have a large scale enterprise system that combines data from many parts of your enterprise, choices like these would force either serious limitations or complexity, and often both, on the system.

This is where a NoSQL database like MongoDB comes in. Because it is schema-free, you are not constrained by an initial specification of the model. If new fields are needed, they can be added as you need them.

Implementing Persistence with MongoDB

First, we created a new repository bundle for MongoDB, to separate it from the openJPA/MySQL bundle. In the pom.xml, we specified that we need the mongodb driver:


<!-- ... -->

Then we implemented the repository to use MongoDB. Some key things to note:

  • com.mongodb.Mongo is analogous to the database server
  • com.mongodb.DB is analogous to a particular database on your server
  • com.mongodb.DBCollection is a collection of data in your database and analogous to a table in a relational database, though of course it is not a table and not limited by a predefined schema
  • com.mongodb.BasicDBObject is a database object and analogous to a row in a relational database table. Of course, it is also not limited by a predefined schema.

Persisting a Note looks like this:

        DBCollection coll = getNotesCollection();
// this converts a Note to a DBObject
            BasicDBObject noteDoc = noteToDbObject(note);
//now we put additional fields here
            noteDoc.put(Note.Fields.dateTimeCreated.getName(), note.getDateTimeCreated());
// more of the same, then insert it into the DBCollection

Look at how ridiculously easy it is to accommodate custom fields:

        for (String field : note.getAttributeNames()) {
            noteDoc.put(field, note.getAttribute(field));

That's it -- there is no additional code needed to alter tables, etc.

Searching looks like this:

        DBCollection coll = getNotesCollection();
        BasicDBObject query = new BasicDBObject(NoteMongo.MONGO_ID_FIELD, new ObjectId(noteId));
        DBObject noteDoc = coll.findOne(query);
        return dbObjectToNote(noteDoc);

Note that the query parameters is a BasicDBObject of key/value pairs.

You can take a look at in modules/notes/impl/repository.mongo/src/java directory for more details.

Configuring MongoDB

MongoDB configuration is done with blueprint.xml (in repository.mongo/src/main/resources/OSGI-INF/blueprint/ directory) using straightforward dependency injection:

   <bean id="mongoUri" class="com.mongodb.MongoURI">
        <argument value="mongodb://"/>

    <bean id="mongoDb" class="com.mongodb.Mongo">
        <argument ref="mongoUri"/>

    <bean id="noteRepositoryImpl" class="org.opentaps.notes.repository.impl.NoteRepositoryImpl">
        <property name="mongo" ref="mongoDb"/>
        <property name="noteFactory" ref="NoteFactoryBean"/>

We use the MongoURI class to configure the URI, use the Mongo class to set the database server from the URI, and then pass mongo to our NoteRepositoryImpl class.

Accessing Your Data

You must download and install MongoDB first and start the server to use opentaps 2 Notes with MongoDB. See tutorials for more details.

Once you have created some notes, you can start a mongodb shell and do a query to see them:

$ ./mongo
MongoDB shell version: 2.0.6
connecting to: test
> use notedb;
switched to db notedb
> db.notes.find().forEach(printjson);
	"_id" : ObjectId("4fea13964728ec978c290124"),
	"noteId" : null,
	"noteText" : "test another note",
	"createdByUserId" : null,
	"userIdType" : null,
	"clientDomain" : "localhost",
	"dateTimeCreated" : ISODate("2012-06-26T19:55:02.542Z"),
	"sequenceNum" : NumberLong(1)
	"_id" : ObjectId("4fea13d64728ec978c290125"),
	"noteId" : null,
	"noteText" : "some note with many attributes",
	"createdByUserId" : null,
	"userIdType" : null,
	"clientDomain" : "localhost",
	"customer" : "555",
	"order" : "12345",
	"quote" : "98765",
	"dateTimeCreated" : ISODate("2012-06-26T19:56:06.611Z"),
	"sequenceNum" : NumberLong(2)

In the second object, the attributes customer, order, and quote are stored as they were entered -- not as "attribute1, 2, 3", and not with other unnecessary attributes either. This is the advantage of a schema-free database: Data is stored as they are, without the tricks and hacks we're used to with SQL databases.

Switching Persistence Methods

By default, mongodb is used for persisting notes. If you want to switch to using openJPA and MySQL, copy the openjpa implementation first:

$ cp modules/notes/impl/repository.jpa/target/org.opentaps.notes.repository.impl.jpa-2.0.1-SNAPSHOT.jar ~/geronimo-tomcat7-javaee6-3.0.0/hotbundles/

Then, you will have both bundles loaded. You can switch between the two persistence backends by stopping one and starting another. For example, in this case, the openjpa/MySQL bundle (#1700) is stopped, and the mongodb bundle (#1674) is running: Opentaps-2-switch-openjpa-mongo-bundles.png

Style Considerations

If you love key-value programming, you're going to love MongoDB. Any time you need some new data field, just use a new key field. You don't even have to modify your table with SQL any more. Think about how much faster you'll be able to code!

But if you're the project manager/architect, you're probably thinking -- "Wait, wait, wait! How do I make sure everybody is using the same field names, let alone keeping my code object-oriented?" Good point.

On larger (i.e. more than 1 programmer) projects, we recommend that you declare the fields which will be used by more than one person as Java class members, instead of using literal strings. For example, in, you see:

    private String noteId;
    private String noteText;

You see us accessing the field like this:

        note.setNoteText((String) noteDoc.get(Note.Fields.noteText.getName()));

We are using set/get methods to access the members of the NoteMongo class, and we are using the field names to interact with the BasicDBObject MongoDB. This will increase your development cycle slightly, since you will have to rebuild your code when important fields are added to your classes, but it offers a few big advantages:

  1. It communicates a standard set of fields all the developers should be using.
  2. It helps eliminate spelling errors like "orrderId"
  3. It allows you to use object inheritance on fields whose values may need to be overridden in a child class. For example, you may have a field called "totalAmount" which is stored in the database, but a child class may want to override it and calculate the total amount. If all your data is accessed as key-value pairs, this would be impossible, and you will have a mess when you try to override the meaning of the parent class.

For more casual data fields which are used by just one developer, it's probably fine to use literal strings to store and retrieve them. MongoDB won't mind. If more people start using the data, you can always refactor at that time and make them class members.