Developing analytic apps is a bold new direction for product teams. The Toolbox is where we talk development best practices, tips, tricks, and success stories to help you build the future of analytics and empower your users with the insights and actions they need.

JSON is probably the most common format nowadays for open data interchange. Although it’s designed as a lightweight JavaScript-object-like format, JSON documents can get quite large, especially if they contain deeply nested objects and arrays. 

There’s a real need to be able to run general processing queries on JSON documents for filtering, shaping, and transforming JSON data. 

Some document databases like MongoDB and PostgreSQL have their own query language that allows running complex queries on JSON, but that’s usually irrelevant when our JSON data is outside of the context of database records (although you can use MingoJS, a JavaScript implementation of MongoDB query language).

What if we could provide users with built-in support for such queries? 

The use case:

  1. Your software outputs some JSON and you want to give the user an easy, standardized way to filter or manipulate the data (examples: APIs, SDKs, CLIs, online playgrounds).
  2. Your software takes JSON as input and you want to programmatically filter or manipulate the data in a uniform and reusable way. (Think JSON documents with arrays of objects nesting more arrays and objects.) Yes, you can write your own logic using map, filter, and reduce functions, but maybe there’s a better, more declarative way.

This post focuses mainly only on the first use case, where you want to give your users a built-in option to process the output (but in fact, the two cases are similar).

Candidates for JSON query languages

XML has XPath for querying and traversing XML nodes. What is the equivalent for JSON? 

JSONPath

JSONPath — “XPath for JSON” — is the first query language that comes to mind. It’s one of the earliest implementations of a JSON query language, and it gets the job done. One nice thing about it is that it allows traversing the entire JSON tree (accessing parent nodes, for example) and can output path locations in the tree as JSON-pointers (keys). 

The problem is that JSONPath’s syntax is not very intuitive, if we wish to provide an interface based on it to our users. However, a bigger issue is that it does not have a tight specification, which means that there are numerous implementations of JSONPath out there that might give different results. This makes it hard to rely on JSONPath for many scenarios. 

jq

jq is command-line JSON processor that uses a powerful JSON query language. You can do almost anything with jq! This  general purpose language is popular among DevOps people, probably because its CLI makes it easy to pipe another process’s output through, and then use jq’s pipe system to process that formatted output through additional pipes. 

Two problems, though: jq is a command-line tool, not an interface we can provide to users in our own APIs, SDKs, and so on. Libraries like node-jq and java-jq are simply wrappers around the jq binary, and may suffer from performance issues, which could be crucial for real-time performance. Secondly, with great power comes great complexity — and some users claim that jq is too complex of a language and may be overkill for our use case, since it allows much more than just manipulating existing JSON. 

JMESPath

JMESPath is a query language for JSON that allows complex filtering, extraction, and transformation on JSON documents. Unlike JSONPath and jq, it has a complete specification with precise grammar, so the syntax is well defined in all implementations. It has fully compliant libraries in a host of languages (Java, Python, JavaScript, Go, PHP and more). It is a powerful query language and yet the syntax is quite simple. 

For these reasons, JMESPath is our language of choice when it comes to integrating it in our own services, as a user-facing interface. It has a friendly playground and tutorial that we can provide to users as reference.

The examples on jmespath.org/examples.html can give you an idea of the power and simplicity of the language. 

jmespath

Power to the User

Thanks to libraries, adding JMESPath to our service’s code is very simple. By integrating JMESPath we can now provide our API, SDK, or CLI users with a powerful, simple, standardized method to shape the JSON response returned from our service without needing external tooling.

As an example, we’ll assume our service is a public REST API. By giving the user the ability to specify a JMESPath query on the data, they end up saving time, bytes on the wire, and lines of code doing post-processing, since the processing happens on the server and the response contains exactly what the user needs.

Integrating JMESPath in our API handler code is quite simple, as the library has only one function (“search”). Here, as an example in Node.js / JavaScript:

const jmespath = require(‘jmespath’);


const requestHandler = (req, res, next) => {
   // response res was passed here from previous handler
   if (req.query && req.query.filter) {
       res = applyJmesPathFilter(res, req.query.filter);
   } // ...more logic and eventually res.send()
}

const applyJmesPathFilter = (res, jmespathExpression) => {
   try {
       res = jmespath.search(res, jmespathExpression) || {};
   }
   catch (e) {
       res = {};
       log.error(e);
   }
   return res;
}

This way, users can immediately use the filter query parameter to specify a JMESPath expression to process the initial JSON result:

GET /api/players?filter=[JMESPATH_EXPRESSION]

Let’s assume our service’s output is the NBA players API, where information on some legendary basketball players live. A GET /api/players request returns the following JSON array of objects:

[
 {
   "playerId": "7c3f73dd-0b38-48dc-9347-c78811bd80c4",
   "playerName": "Scottie Pippen",
   "yearOfBirth": "1965",
   "collegeId": "77302082-2758-48cc-ab3a-7b811a8bdf80",
   "jerseyNumber": "33",
   "playerStats": {
       "points": 18940,
       "rebounds": 7494,
       "assists": 6135
   },
   "teamNames": [
      "Chicago Bulls",
      "Portland Trail Blazers",
      "Houston Rockets",
      "Chicago Bulls"
   ]
  },
  {
   "playerId": "8d75bb0f-a444-4264-a583-4ca5799169cf",
   "playerName": "Patrick Ewing",
   "yearOfBirth": "1962",
   "collegeId": "0456a17b-320d-4ddc-bfc2-011670af2b77",
   "jerseyNumber": "33",
   "playerStats": {
      "points": 24815,
      "rebounds": 11617,
      "blocks": 2894
   },
   "teamNames": [
      "New York Knicks",
      "Seattle SuperSonics",
      "Orlando Magic"
   ]
  }
]

Since the original result is an array, the JMESPath expressions below start with “[]” to reference that array.

GET /api/players?filter=[].playerName will return the following array:

[
 "Scottie Pippen",
 "Patrick Ewing"
]

We can also ask for objects instead, and modify the original key names:

GET /api/players?filter=[].{name: playerName, number: jerseyNumber} returns:


[
 {
   "name": "Scottie Pippen",
   "number": "33"
 },
 {
   "name": "Patrick Ewing",
   "number": "33"
 }
]

We can run filter queries based on one or more keys, and pipe the result to format it (here as plain string): 

GET /api/players?filter=[?yearOfBirth > `1964`].playerName | [0] returns:

“Scottie Pippen”

And a more advanced logic, which uses the sort() and join() functions:

GET /api/players?filter=[].playerName | sort(@) | { nbaLegends: join(‘, ‘, @) }

{
  "nbaLegends": "Patrick Ewing, Scottie Pippen"
}

The user can shape the result so it is tailored to their exact needs. For example:

GET /api/players?filter=[].{name: playerName, firstTeam: teamNames | [0], points: playerStats.points }

[
 {
   "name": "Scottie Pippen",
   "firstTeam": "Chicago Bulls",
   "points": 18940
 },
 {
   "name": "Patrick Ewing",
   "firstTeam": "New York Knicks",
   "points": 24815
 }
]

And these are just a few of the JSON processing options available through JMESPath. 

Some readers may notice that enabling the user to “get exactly the data they want” can be achieved by using GraphQL. True, but this particular option describes a REST API (and the scenario could be replaced by an SDK or CLI). In fact, GraphQL doesn’t give you the processing power that JSON query languages like JMESPath provide. Of course, the JSON processing runs after the initial dataset has been fetched entirely, which is not as efficient as filtering early with resolvers on the Data Access Layer, like in GraphQL APIs. 

I’ve integrated JMESPath into some of Sisense public REST APIs and as a feature in the GraphQL2REST package.

Limitations

JMESPath has some limitations compared to jq and JSONPath. The biggest limitation for me is that there is no easy way to get the list of paths of a given JSON document (aka node names or JSON pointers), and you can’t express keys as JMESPath expressions (so as to run operations on a set of keys which satisfy a test). Recursive traversal is not possible — you have to specify the full path to the JSON key. Another issue is that JMESPath does not allow referencing parent nodes when iterating. These are limitations that sometimes restrict usage for developers, but usually do not affect the common use cases discussed here.

Additional Resources

For adventurers — more query languages: 

Roy Mor is a technical lead in Sisense’s R&D group. Prior to working at Sisense, he worked as a full-stack developer in startups as well as at Microsoft and Intel, and as an independent consultant. He has over 15 years of professional experience writing software.

Tags: |