Content modelling > Data consistency: key concepts and implications

Data consistency: key concepts and implications

In DatoCMS, you are free to edit your project schema at any time. While this is great news for you, it also complicates the situation quite a bit on our part!

Suppose you have an Article model, and you already have a number of articles stored. What happens to these existing articles in one of the following situations?

  • You add a new mandatory field.

  • You transform a non-localized field into a localized one (or vice versa).

  • You add a new locale in your project settings.

Well, the existing articles (including those already published) suddenly become invalid: the data they contain does not comply with the new schema.

In this section, we will try to explore together how DatoCMS manages these and other similar cases. To avoid simply having an endless list of unclear rules, we will start by explaining the mental model that underlies these rules, so that hopefully they will become more intuitive.

How DatoCMS internally stores your content: a mental model

This is a simplified version of the mental model to keep in mind when working with DatoCMS:

The record's meta-information, like creation date, publication date, record creator, etc.. is stored directly at the record level. The record also contains all the details about its location in the collection, whether it's for simple or tree-like sorting.

However, the actual value of the record's fields are versioned, allowing the history of the record's changes over time to be tracked. You can think of these versions as being in a separate table, connected to the associated record.

In addition to field values, every version also keeps track of the editor who made the changes, and whether the data is valid or not, based on the compliance with the model's latest field-level validation rules.

Current and published versions

Out of all the historical record versions, two are particularly important: the current version and the published version:

  • The current version represents the latest available version: every time a record is updated, a new version is generated and marked as the new current version.

  • The published version represents the version currently marked as published. It might coincide with the current version, or it might not. It might also not exist at all!

The status of a record

The status of a record precisely represents the relationship between its current and published versions:

  • Record is in draft: only has the current version, and no published version;

  • Record is published: current version and published version coincide;

  • Record is updated: record has both current and published versions, but they differ.

How our APIs expose this data structure

All our APIs have been designed to be pragmatic and simplify the lives of developers by hiding some of this complexity. How?

When you're pulling data about records through an API, you have the option to specify whether you're referring to the current version or the published version of your records (if you're not actively doing this, a default is implicitly applied):

With this information at hand, all APIs can now represent a record as a single entity, encompassing both the meta-information present at the record level, and the model fields data that is present at the version level. This greatly simplifies the logic of 99% of web projects that interface with DatoCMS, which can therefore work considering a single entity instead of two.

What if a specific record does not have a published version, and the APIs are requested to refer to the published versions? Then that record simply won't be retrieved, as if it doesn't exist — which is exactly how one would normally want to handle this type of case on the app side.

With CMA, you can also access all other past versions

We've optimized our system to mainly work with the current and published versions of a record, as these are typically the ones of interest. However, our Content Management API can also return the full version history of a record if needed!

Data consistency rules guaranteed by the system

DatoCMS maintains two important guarantees:

  • The structure of the data contained in any version of a record (even past versions) is guaranteed to be consistent with the settings of its model and fields.

  • The validity of a current/published version always reflects the current validation rules.

Consequences on the published and current version of a record

It is crucial to understand a significant outcome of these guarantees and data setup: there are cases where the published version can change without a specific "publish" action on the record, and the current version can be modified without a distinct "update" action:

  • Changes in the sort order of a record in the collection are immediately reflected online: it is not possible to keep these changes "in draft" because they are information that live directly at the record level. When the position is changed, the "published" version will also display the updated information. The same applies to other meta-information: creator, creation/publication dates, etc.

  • There are situations where a change to a record/asset can have repercussions on the published and current versions of other records that reference them:

    • Imagine a record whose current or published version references an asset in the Media Area, and the field that contains it has validations (i.e., "the asset must be an image"). If the asset is subsequently modified, replacing the asset with a new file, the new file could potentially alter the validity status of the current or published version, which is therefore updated.

    • Imagine a record whose current or published version references another record via a Single Link, Multiple Links, or Structured Text field:

      • If the field has the setting "When deletion is requested for a record referenced by this field" set to "Try to remove the reference to the deleted record", then the system must respect this setting, altering the current and/or published version.

      • Similarly, if the field has the setting "When unpublishing is requested for a record referenced by this field" set to "Try to remove the reference to the unpublished record", then the system must respect this setting, altering the published version.

  • Changing the schema of a model, or the locales of a DatoCMS project can also cause an automatic update of multiple versions because:

    • When a new field is added/removed to the model, this field will be immediately added/eliminated in all record versions of that model (including the published and current version).

    • If a field that can hold a reference to another record (Single Link, Multiple Links, Structured Text) is altered by removing a model from the list of linkable models, then all record versions of that model will be updated by eliminating any references to those models.

    • If a model is deleted, but there are records of that model which are referenced by other records, then all these record versions are updated by eliminating any references to the deleted model.

    • The same principle applies to model fields that can hold blocks (Modular Content, Structured Text). If these are altered by removing a block type from the list of embeddable options, then all record versions of that model will be adjusted by eliminating any blocks of that type.

    • If a model is modified, enabling the "All locales required?" setting, then all previously unspecified locales will be added to all record versions of that model.

    • If a field is modified from localized to non-localized (or vice versa), then all record versions of that model will be modified to reflect this change.

    • If you add a new validation rule to a field, then all existing record versions of that model will be re-checked against the new validation rules, and potentially marked as invalid.

    • If a locale is added/removed from the project, all record versions of all models that contain localized fields will be adjusted accordingly.

Consequences in Webhooks

Webhooks allow you to be notified of changes to the records in your project. Based on the considerations made so far, it is important to make a few clarifications here:

  • The "Record creation" event is triggered when a record is generated for the first time (and consequently its current version).

  • The "Record update" event is triggered when the current version changes (due to an explicit modification of the record, or for some of the reasons listed above).

  • The "Record publish" event is triggered when the published version of a record changes (due to an explicit publication of the record, or for some of the reasons listed above).

  • In the webhook payload, the meta.status field of the record entity always reflects the relationship between the current and published versions of the item itself at the moment the webhook is triggered.

As a consequence:

  • You can still get "Record publish"/"Record update" events without an explicit new publish/update request from an editor or an API call. This occurs when the system automatically needs to adjusts an existing published/current version to keep it consistent with the new schema change.

  • When these automatic adjustments occur, it is completely normal for the meta.status of a record in the webhook payload of a "Record Publish" event to be "updated" instead of "published". This is because during the process, a record might be in an "updated" state, and the operation does not change this condition.