In an earlier post I noted that Berkeley DB Java Edition cleaner performance had improved significantly in release 5.x. From an Oracle NoSQL Database point of view, this is important because Berkeley DB Java Edition is the core storage engine for Oracle NoSQL Database.
Many contemporary NoSQL Databases utilize log based (i.e. append-only) storage systems and it is well-understood that these architectures also require a "cleaning" or "compaction" mechanism (effectively a garbage collector) to free up unused space. 10 years ago when we set out to write a new Berkeley DB storage architecture for the BDB Java Edition ("JE") we knew that the corresponding compaction mechanism would take years to perfect. "Cleaning", or GC, is a hard problem to solve and it has taken all of those years of experience, bug fixes, tuning exercises, user deployment, and user feedback to bring it to the mature point it is at today. Reports like Vinoth Chandar's where he observes a 20x improvement validate the maturity of JE's cleaner.
Cleaner performance has a direct impact on predictability and throughput in Oracle NoSQL Database. A cleaner that is too aggressive will consume too many resources and negatively affect system throughput. A cleaner that is not aggressive enough will allow the disk storage to become inefficient over time. It has to
- Work well out of the box, and
- Needs to be configurable so that customers can tune it for their specific workloads and requirements.
The JE Cleaner has been field tested in production for many years managing instances with hundreds of GBs to TBs of data. The maturity of the cleaner and the entire underlying JE storage system is one of the key advantages that Oracle NoSQL Database brings to the table -- we haven't had to reinvent the wheel.