Schema Validation

Felis uses Pydantic to validate the schema and all of its defined objects, including tables and columns. Pydantic is a data validation library that uses Python type annotations to define the structure of the data being validated.

Pydantic validators are defined as occuring either “before” conversion to the model objects or “after.” When a validation error is raised in the “before” stage while processing the raw YAML input data, the “after” validators will not run. This can be important to keep in mind, as fixing these “before” error may reveal others that occur in the “after” stage.

After installing Felis, you can use the felis validate command to validate one or more schema files. These should be in YAML format and conform to the Felis schema. The command line documentation provides more information on how to use the validate command.

Validation Errors

If there are any errors in the schema, the validation process will raise an exception called a validation error containing a list of all the errors that were found. These errors will be printed to the console, and the command line process will return a non-zero exit code. An error message will be printed to the log indicating how many errors were found. For example:

ERROR:felis:1 validation error for Schema

When a field was included in the data which is not valid for the data model, as in the case of a misspelled field name, the error message will indicate the field name that was not recognized. For example:

tables.0.columns.0.not_a_field
  Extra inputs are not permitted [type=extra_forbidden, input_value=12345, input_type=int]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden

The first line points to the location of the error in the data, which here is the first table and the first column of the schema, indexed from zero. The second line states that the field not_a_field was found in the schema but that this is not a valid field according to the data model, indicated using the extra_forbidden error type and “Extra inputs are not permitted” message. The error message also includes the value of the field and the type of the value, as well as a link to relevant information in the Pydantic documentation.

Errors may also occur when a field is assigned a value which is not considered valid according to the model. For example:

tables.0.columns.0.description
  String should have at least 3 characters [type=string_too_short, input_value='xx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/string_too_short

Here the description field is too short; it must be at least 3 characters, which is indicated by the string_too_short error type and “String should have at least 3 characters” message.

An error will also occur if an object is missing a required field or has other issues that prevent it from being valid.

For example:

tables.0.columns.0.@id
  Field required [type=missing, input_value={'name': 'customer_id', '...ue customer identifier'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing

In this case, the column is missing an @id field, which is required on every object in the schema data.

Errors may also occur which generate a reference to an object like a column, as in:

tables.0.columns.1
  Value error, Length must be provided for type 'string' in column '#customers.name' [type=value_error, input_value={'name': 'name', '@id': '...ame', 'nullable': False}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

This error indicates that the column is missing a length field, which is required on string columns.

Optional Checks

There are a number of optional checks that may be performed during validation by turning on the corresponding command line flags. These include the following:

--check-description:

Check that all objects in the schema, including the schema itself, have a valid description field.

--check-redundant-datatypes:

Check that any column datatype overrides for specific database variants, such as mysql:datatype and postgresql:datatype, do not appear to be redundant, resulting in identical SQL being emitted compared with the datatype.

--check-tap-table-indexes:

Check that the tap:table_index field is unique for each table in the schema.

--check-tap-principal:

Check that the tap:principal field is set for at least one column in each table.

Simply include one or more of these flags on the command line to enable them:

felis validate --check-description --check-redundant-datatypes --check-tap-table-indexes --check-tap-principal schema.yaml

Validating with the Python API

The Python API can also be used to validate a schema by creating a Schema object and then calling the model_validate method:

import yaml
from pydantic import ValidationError

from felis.datamodel import Schema

data = yaml.safe_load(open("schema.yaml", "r"))

try:
    schema = Schema.model_validate(data)
except ValidationError as e:
    print(e)

If the schema contains validation errors, a message will be printed to the console/stdout printing all of them individually. Please see the detailed error descriptions for more information on the format of these error messages. If there are no errors, the schema object will be returned and can be used to create a database or perform other operations.