Wednesday, September 29, 2010

Storing data as JSON in a field VS multiple fields

I've started to convert a CMS I'm maintaining at work to XPages. I'll probably do it over a longer time span, when I'm not doing other more important work. In the process, I'm brainstorming a little bit on how to improve some of the things in the application.

One of the things our customers request is more fields for links/the ability to sort them (they are displayed in "categorized" sections on the right side of the page in read mode). The form currently have 20 rows of links. There are five fields for each link (hidden?, category, title, url, target). This takes up quite a bit of real estate on the form.

In the XPages version, I'm considering changing the way links are stored. I'm thinking about storing them in a single field as JSON. I've made a list of pros/cons. There's no traditional Notes interface for the app, so I don't have the need for the values to be "notes view compatible".
Less fields to deal with
Takes up a lot less real estate on the form/xpage
Easy to change the order of the link values
Adds processing overhead(?), as the JSON has to be converted from/to JS objects when storing/retrieving
More complex
Needs a widget/logic to add/edit/remove
Currently, I'm in favor of implementing the change. I might implement this for other "repeating" fields as well if it works out well. If you can see any more pros/cons, please feel free to leave a comment :)


Jerry Carter said...

Hi Tommy,

I'm contemplating something similar with XML at the moment. My thought was to store the XML in one field, having parsed it when received to crate a field for each of my different node types. Caveat being that it is simple predefined XML that won't blow things up.

That's the big draw back that comes to my mind is that you wind up with JSON or XML that doesn't play nice with a text field. Can't remember off hand in field length limits are removed in 8.5 or not but out of old habit this is a concern to me. Meanwhile, rich text field handling has improved in the last few revisions to make it a viable data store for potentially lengthy JSON or XML.

If you find you have performance overhead parsing your data with LotusScript / SSJS, try Java. I had some seemingly simple functions like URLDecode/Encode that were LS based improve significantly when I moved to Java.

Good topic to think on. Thanks!

Tim Tripcony said...

I've been playing a bit in my spare time with a "secret" application (I'll reveal more about this when the time is right) that stores what would be a document collection in a traditional Notes application as a single document, and what would traditionally be a separate document in incrementally named items (i.e. link_1, link_2, etc.), with the value of each item stored as JSON. Thus far I've been extremely satisfied with the results.

Because, as you mentioned, this data doesn't make sense to be displayed in views, I don't need these to be summary items, so I specifically set summary to false when storing the value... this, in turn, increases the amount of data I can store in each item, so the concern that Jerry expressed hasn't bitten me yet.

Jerry Carter said...

Hi Tim,

I was doing something similar a while back and will tell you what did bite me - querying the data. Doing field level queries gets tricky. If you're building a class to take over Domino functions for parsing and searching your JSON objects stored in the document, you might have a way around this and these days performance there might not be a problem. My big problem was that when I designed the app, nobody wanted reports. Then they wanted reports, and calculations, and I didn't yet have a set of tools developed to decode the data so instead of just running normal / ftsearch, I had to parse and process everything to get anything. The tools I came up with for that instance eventually helped some but being as it was R5 and all LS, performance was pretty poor. As you say, there are some things you know up front you won't need, so hopefully this will not be a concern either.

Tommy Valand said...

@Jerry: I don't know about writing XML that doesn't blow up, but writing JSON that doesn't blow up should be quite easy if you're working with a solid JSON serializer/parser.

@Tim: Are you making a simplified document-based CouchDB emulator? :P

It would be great if IBM could be able to implement an alternate, more flexible storage solution for XPages (MongoDB/CouchDB). It's probably a herculean task, but one can dream :)

Dan Sickles said...

"more flexible storage solution for XPages (MongoDB/CouchDB). It's probably a herculean task"

Adding XML as a native type in DB2 was a herculean tasks but they did it. It adds a hierarchical on-disk structure.

Even if they don't enhance the NSF ODS for JSON, they could give us a nice OO API for accessing it in plain ol' Notes fields. The view indexer could use that API too ;-)

Tommy Valand said...

@Dan: I have no idea on NSF vs NoSQL (e.g. CouchDB) performance, but I would think having a storage solution that was built for the web would be more efficient/better in the long run if implemented properly, than sticking with NSF.

If IBM still uses a storage solution with 16k(with summary)/32k limited fields by 2015, I would not surprised, but I would be disappointed.

Erik Brooks said...

@Tim - Sounds like somebody might be tackling JOINs in NSF? :-)

Patpicos said...

This is a very interesting idea. Short of creating 1 doc for each link and have the fields on that form....u have to create a bunch of fields on a form and its a pain to keep each synchronized.
Personally its one of the most annoying thing with domino dev.

Erik Brooks said...

Oh, I realized I didn't add my thoughts:

We've got a class here for "big" data. It can read/write huge globs of data to multiple fields -- basically a "bigReplaceItemValue()" and "bigGetItemValue()". It handles removal of Summary flags, and can fallback to RichText in extreme cases.

If you're only setting/reading pure text, the AppendText and GetUnformattedText RT methods are very reliable.

On the topic of large numbers of fields: you have to keep in mind UNK table space (~26k fields max in the entire NSF.) There's also a limit of ~2000-3000 fields visible in the properties window for a doc -- you'll have to access the other fields programmatically.