JSON Schema is a powerful tool for validating the structure of JSON data.

What is JSON schema

To define what JSON Schema is, we should probably first define what JSON is.

JSON stands for “JavaScript Object Notation”, a simple data interchange format. JSON is built on the following data structures:

JS Python3 JSON
string string “Hello World”
number int / float 2 / 3.14
boolean bool true false
null None null
object dict {“key”: “value”}
array list [“ABC”, 3.14, false]

With these simple data types, all kinds of structured data can be represented. For example, you could imagine representing information about a person in JSON in different ways:

{
  "name": "George Washington",
  "birthday": "February 22, 1732",
  "address": "Mount Vernon, Virginia, United States"
}

{
  "first_name": "George",
  "last_name": "Washington",
  "birthday": "1732-02-22",
  "address": {
    "street_address": "3200 Mount Vernon Memorial Highway",
    "city": "Mount Vernon",
    "state": "Virginia",
    "country": "United States"
  }
}

However, when an application says “give me a JSON record for a person”, it’s important to know exactly how that record should be organized. For example, we need to know what fields are expected, and how the values are represented. That’s where JSON Schema comes in. The following JSON Schema fragment describes how the second example above is structured.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "birthday": { "type": "string", "format": "date" },
    "address": {
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "country": { "type" : "string" }
      }
    }
  }
}

The Basics

Hello, World

When learning any new language, it’s often helpful to start with the simplest thing possible. In JSON Schema, an empty object is a completely valid schema that will accept any valid JSON.

1
2
// This accepts anything, as long as it’s valid JSON
{}

The type keyword

The most common thing to do in a JSON Schema is to restrict to a specific type. The type keyword is used for that.

1
2
// For example, in the following, only strings are accepted:
{ "type": "string" }

The type keyword is described in more detail in Type-specific keywords.

  • string
    • Length : minLength, maxLength
    • Regular Expressions : pattern
    • Format : date-time, date, time, email, hostname, ipv4, ipv6, uri, json-pointer, regex
  • Numeric types
    • integer
    • number
    • multipleOf
    • Range : minimum, maximum, exclusiveMinimum, exclusiveMaximum
  • object
    • properties
    • additionalProperties : is used to control the handling of extra stuff, may be either a boolean or an object.
    • required
    • propertyNames
    • Size : minProperties, maxProperties
    • dependencies
    • patternProperties
  • array
    • List validation : items, contains
    • Tuple validation : additionalItems
    • Length : minItems, maxItems
    • uniqueItems
  • boolean
  • null

The type keyword may either be a string or an array:

  1. If it’s a string, it is the name of one of the basic types above.
  2. If it is an array, it must be an array of strings, where each string is the name of one of the basic types, and each element is unique. In this case, the JSON snippet is valid if it matches any of the given types.

{ “type”: [“number”, “string”] }

Declaring a JSON Schema

Since JSON Schema is itself JSON, it’s not always easy to tell when something is JSON Schema or just an arbitrary chunk of JSON. The $schema keyword is used to declare that something is JSON Schema. It’s generally good practice to include it, though it is not required.

1
{ "$schema": "http://json-schema.org/schema#" }

Declaring a unique identifier

It is also best practice to include an $id property as a unique identifier for each schema. For now, just set it to a URL at a domain you control, for example:

1
{ "$id": "http://yourdomain.com/schemas/myschema.json" }

What are protocol buffers

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Why use protocol buffers

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the “old” format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

Compare with Rest (Json)

Language Guide (proto3)

Defining A Message Type

1
2
3
4
5
6
7
syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}
  • The first line of the file specifies that you’re using proto3 syntax: if you don’t do this the protocol buffer compiler will assume you are using proto2. This must be the first non-empty, non-comment line of the file.
  • The SearchRequest message definition specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type.

Assigning Field Numbers

As you can see, each field in the message definition has a unique number. These field numbers are used to identify your fields in the message binary format, and should not be changed once your message type is in use. Note that field numbers in the range 1 through 15 take one byte to encode, including the field number and the field’s type (you can find out more about this in Protocol Buffer Encoding). Field numbers in the range 16 through 2047 take two bytes. So you should reserve the numbers 1 through 15 for very frequently occurring message elements. Remember to leave some room for frequently occurring elements that might be added in the future.

Specifying Field Rules

Message fields can be one of the following:

  • singular: a well-formed message can have zero or one of this field (but not more than one). And this is the default field rule for proto3 syntax.
  • repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

Json Mapping

By walking through this example you’ll learn how to:

  • Define a service in a .proto file.
  • Generate server and client code using the protocol buffer compiler.
  • Use the Python gRPC API to write a simple client and server for your service.

Why use gRPC

This example is a simple route mapping application that lets clients get information about features on their route, create a summary of their route, and exchange route information such as traffic updates with the server and other clients.

With gRPC you can define your service once in a .proto file and implement clients and servers in any of gRPC’s supported languages, which in turn can be run in environments ranging from servers inside Google to your own tablet - all the complexity of communication between different languages and environments is handled for you by gRPC. You also get all the advantages of working with protocol buffers, including efficient serialization, a simple IDL, and easy interface updating.

Example code and setup

1
2
3
4
5
6
7
8
9
# Install requirement
pip3 install grpcio
## Python’s gRPC tools include the protocol buffer compiler protoc and the special plugin for generating server and client code from .proto service definitions.
pip3 install grpcio-tools


# Clone the repository to get the example code:
$ git clone -b v1.28.1 https://github.com/grpc/grpc

Example: helloworld

1
2
3
4
5
6
7
8
9
# Navigate to the "hello, world" Python example:
$ cd grpc/examples/python/helloworld

# Run the server
$ python greeter_server.py

# From another terminal, run the client:
$ python greeter_client.py
Greeter client received: Hello, you!

Update a gRPC service

Now let’s look at how to update the application with an extra method on the server for the client to call. Our gRPC service is defined using protocol buffers;

Let’s update this so that the Greeter service has two methods. Edit examples/protos/helloworld.proto and update it with a new SayHelloAgain method, with the same request and response types:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// The greeting service definition.
service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}
  // Sends another greeting
  rpc SayHelloAgain (HelloRequest) returns (HelloReply) {}
}

// The request message containing the user's name.
message HelloRequest {
  string name = 1;
}

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

Generate gRPC code

Next we need to update the gRPC code used by our application to use the new service definition.

From the examples/python/helloworld directory, run:

1
python -m grpc_tools.protoc -I../../protos --python_out=. --grpc_python_out=. ../../protos/helloworld.proto

This regenerates helloworld_pb2.py which contains our generated request and response classes and helloworld_pb2_grpc.py which contains our generated client and server classes.