Why and how we use MongoDB at Licenseware

September 19, 2023/Alex Cojocaru/Comments Off

When it comes to selecting the right database layer for your application, MongoDB is undeniably a polarizing technology. It offers an accessible API, commendable performance, and the enticing prospect of eliminating the challenges associated with managing foreign keys, a common pain point in relational database systems. However, it does arrive with a substantial list of caveats and pitfalls if not wielded correctly, potentially resulting in a subpar developer experience and the costly ordeal of refactoring when transitioning to an RDBMS.

Our decision to adopt MongoDB was, at its core, similar to the path taken by many startups. We were in need of a robust database layer but had an aversion to the incessant schema changes that often accompany such systems. MongoDB provided a solution to this dilemma. While some may argue that ITAM data inherently exhibits a high degree of relationality, we discovered an innovative approach—storing related data within a single document, replete with nested fields. This allowed us to enjoy the best of both worlds.
Another compelling rationale for embracing a document-based database was our startup’s perpetual quest for agile data models, particularly in the realm of reporting.

Let’s delve into what we’ve found beneficial and the lessons we’ve gleaned from our MongoDB journey:

1. Deliberate Updates

Initially, we had high hopes of updating thousands of records simultaneously by relying on compound indices, like device and database name. However, this approach proved counterproductive, leading to painfully sluggish processing. We’ve since adopted two strategies, depending on the application. For some, we exclusively perform inserts and then filter the latest records using window functions, thereby preserving historical changes. For very large datasets, we opt for a delete-and-insert approach, significantly enhancing processing speed without overly complicating our application code.

2. Magnificent Aggregation Pipelines

Aggregation pipelines have emerged as our go-to tool for constructing report components. The range of possibilities is staggering, from straightforward grouping and filtering to intricate map-reduce operations. What sets MongoDB apart is the elegance of the code—a clean JSON document with a syntax that can only be described as beautiful. The Mongo Compass UI tool further simplifies the process, enabling the definition of each stage individually and offering real-time data transformation visualization. As a long-time SQL user, I find MongoDB aggregation pipelines easier to write and maintain. Notably, instead of storing raw SQL code in unwieldy strings, our queries are structured as Python dictionaries, facilitating syntax checking and direct referencing of variables and functions within the query.

3. Navigating Document Size and Aggregation Stage Limits

MongoDB imposes constraints on document size and aggregation stages, necessitating a methodical approach to data extraction and intelligent data modeling. Consider the $unwind aggregation stage, a valuable tool for dealing with nested data. However, when handling arrays with thousands of records, MongoDB promptly reminds us to reassess our grain level or reevaluate the necessity of retrieving the entire dataset for our query. The 16 MB document size limit, while seemingly sufficient, can be limiting, especially when the prevailing instinct is to consolidate everything into a single document to avoid joins. We’ve tackled these limitations through data model modifications, storing one document for each entity that would typically be nested, or exploring alternatives like DuckDB for storing and querying raw data. The key takeaway is to reserve our main MongoDB collections for data we genuinely require and routinely query.

4. Data Duplication Over Joins

Departing from the conventional wisdom of keeping data lean and normalized, we’ve embraced data duplication within MongoDB. While in the SQL world, a change from “Active” to “Enabled” in a “Statuses” table would require a single record update, in MongoDB, this necessitates modifying every record where the status equals “Active” to “Enabled.” It may seem cumbersome, but it aligns with our database optimization strategy geared toward expeditious read operations.

5. Adapting to Loose Schema Validation

MongoDB operates in a realm where strict schema validation takes a backseat. Here, you have the flexibility to define and evolve your data structures with a degree of freedom that might be unfamiliar to those accustomed to rigid relational databases. This liberty has its pros and cons.
Rather than relying on an Object-Relational Mapping (ORM) tool to enforce schema conformity, we’ve chosen to take control of data validation within our application code. It’s a conscious choice, one that places the responsibility squarely on our shoulders—and those of our developers—to ensure that data models align with our expectations.

To streamline this process, we use libraries like Marshmallow and Pydantic. These Python libraries make defining and validating data models a breeze. With them, we can define the structure of our data, set constraints, and validate incoming data before it ever touches the database. This approach ensures data integrity while affording us the flexibility to adapt our schemas as needed.

6. Prudent Use of Database Migrations

Database migrations are a familiar concept in the world of relational databases, where changes to the database schema necessitate careful planning and execution. While MongoDB takes a different approach to schema management, we’ve found a unique use case for database migrations in our MongoDB ecosystem. Rather than employing migrations to tweak database schemas, as is the norm in traditional databases, we’ve repurposed this tool to orchestrate changes within the data itself. This unconventional approach has proven valuable in several scenarios.
For instance, when a fundamental change in data structure is required, we turn to database migrations. These migrations serve as a mechanism to update and transform existing data to align with the new schema. It’s a way to ensure a smooth transition without compromising data integrity or causing data loss.
Additionally, database migrations become indispensable when we need to apply specific data transformations or updates across a large dataset. Whether it’s adjusting data formats, recalculating values, or reorganizing documents, migrations provide a structured and controlled means to enact these changes.

By adapting these unconventional practices to suit our MongoDB setup, we’ve found innovative ways to maintain data integrity and agility within our database, aligning it with the unique demands of our application.

In the realm of databases, the choice is laden with trade-offs, contingent on the specific application’s purpose and data characteristics. MongoDB may not be the ideal fit for an ERP system where changes in one entity have widespread ripple effects or for a system demanding stringent schema control. Our affinity for MongoDB is deeply intertwined with the nature of our application—data processing in all its complexity. Our data processors continuously adapt to evolving customer requirements, and our system’s prowess lies in our nimbleness. Consequently, duplicating the same attribute across thousands of records is a minor trade-off when it translates into report components loading in the blink of an eye. MongoDB may evoke mixed sentiments, but within the context of our operation, it’s the secret ingredient that elevates our application’s performance.

If you find our articles useful, register for our monthly newsletter for regular industry insights 👇

Posted in Licenseware, Licenseware Platform

Alex Cojocaru

Alex has been active in the software world since he started his career as an Analyst in 2011. He had various roles in software asset management, data analytics, and software development. He walked in the shoes of an analyst, auditor, advisor, and software engineer, being involved in building SAM tools, amongst other data-focused projects. In 2020, Alex co-founded Licenseware and is currently leading the company as CEO.

Why and how we use MongoDB at Licenseware

1. Deliberate Updates

2. Magnificent Aggregation Pipelines

3. Navigating Document Size and Aggregation Stage Limits

4. Data Duplication Over Joins

5. Adapting to Loose Schema Validation

6. Prudent Use of Database Migrations

Alex Cojocaru

W4 SAM & ITAM Jobs | #ITAMjobs

W3 SAM & ITAM Jobs | #ITAMjobs

ITAM & SAM Job Market Insights

W2 SAM & ITAM Jobs | #ITAMjobs

W1 SAM & ITAM Jobs | #ITAMjobs

LICENSEWARE @ CES Las Vegas 🇺🇸

Software Rationalization Playbook

From Copilot to Coworker: Licensing the New Digital Workforce

Robots.txt is Now a License Agreement: The New Rules of Data Licensing

From Noise to Narrative: Turning IT Inventory into Strategic Intelligence with 🔵 NEO