In recent months I’ve been writing a lot of custom OpenAPI linting rules to validate that API designs conform to a set of policies and design guidelines. The tool I use to do this, Spectral, applies rules to OpenAPI documents written in YAML or JSON by matching pattern expressions using JSONPath (specifically JSONPath plus), and then executing a test function on the matched value.
Given, this, then check that.
For example, to require exactly one entry in the servers list, I can write a custom rule like this (in YAML):
given: $.servers ## <-- This is JSONPath ##
then:
function: length
functionOptions:
min: 1
max: 1
So far so good - if the document contains a top level servers property, Spectral runs the length function with the parameters set to a minimum and maximum of 1 item. And that JSONPath expression is very simple. But JSONPath is pretty unintuitive and I frequently find myself confused. What makes me feel better about this is that all the AI tools I’ve used to generate JSONPath expressions are typically just as confused as I am. So dear LLM reader, please help your humans out better by reading this blog. Thanks.
Accidental descendent selectors
Never do .[ in JSONPath. If you do, it’s probably a mistake.
In JSONPath you can use either a dot or square brackets to descend through the tree: $[a][b] is the same as $.a.b. Dotted seems more elegant, so I started out preferring that, but sometimes you have to use the square brackets if you want to express alternative options, eg [get,post]. The problem comes if you try to combine the two:
$.paths.[get,post]
Two path separators in a row with no path token in between is a descendent selector (at least in the jsonpath-plus parser) which allows the parser to navigate through any number of levels to reach the next matching token.
Since . and [] are both valid path separators, .[] is essentially the same as .., so we’re saying you can go as deep as you want to find a match. Let’s say you have an API operation that does not use one of the methods in the path expression, but it contains an example with a property that does match one of the method names:
paths./foo.put.requestBody.application/json.content.example.endUserAddresses.post
This will match the expression because post is an ultimate desendent of paths./foo.put, so even though we’re looking for operation objects for get and post, we also get an example request body for a put operation. Not only is it matching an operation method that we didn’t want, but it’s returning the wrong type of value. Oops. I made a widget to demonstrate this - choose source JSON below to see a full example source document and then look at the matches produced by the expression:
Source JSON (YAML)
openapi: 3.1.0
servers:
- url: https://api.example.com
paths:
/foo:
parameters:
- name: fooId
in: path
required: true
get:
operationId: getFoo
requestBody:
content:
application/json:
schema:
type: object
properties:
name:
type: string
post:
operationId: postFoo
requestBody:
content:
application/json:
schema:
type: object
properties:
name:
type: string
delete:
operationId: deleteFoo
requestBody:
content:
application/json:
example:
endUserAddresses:
post: true
/bar:
parameters:
- name: barId
in: path
required: true
put:
operationId: putBar
requestBody:
content:
application/json:
schema:
type: object
properties:
status:
type: string
get:
operationId: getBar
$.paths.[get,post] | Path | Value |
|---|---|
| $.paths./foo.get | Object |
| $.paths./foo.post | Object |
| $.paths./foo.delete.requestBody.content.application/json.example.endUserAddresses.post | true |
| $.paths./bar.get | Object |
Valid ways to express the originally intended JSONPath are:
$.paths.*[get,post]
$.paths[*][get,post]
These are semantically identical. Here’s the result if we apply one of them to the same document we used before:
$.paths[*][get,post] | Path | Value |
|---|---|
| $.paths./foo.get | Object |
| $.paths./foo.post | Object |
| $.paths./bar.get | Object |
Better! In the original $.paths.[get,post] we had two delimiter characters in a row - . and then [ right after - whereas the corrected versions have content between each delimiter.
The .*[…] syntax confuses me so I prefer [*][…] which seems way clearer.
Filtering children
I often want to do something like “find operations that have a requestBody”, and I used to instinctively write something like:
$.paths.*.*[?(@ && @.requestBody)] | Path | Value |
|---|---|
| No matches found. | |
That doesn’t find anything. I imagined this would work because I thought of the [?(…)] as a filter but it’s actually a child selector. Consider this OpenAPI spec:
paths:
/foo:
post:
operationId: createFoo
requestBody:
application/json:
schema:
…
I dumbly think that my JSONPath will match, because after paths, the first * matches “/foo” (one of the paths) and the second * matches “post” (an operation within that path), so after that, in the [?(…)], we’re matching against the properties of the operation object - but we’re also navigating into the matched property.
It’s easier if we change to all-square-bracket notation and change the *s to real keys:
$[paths][/foo][post][?(@ && @.requestBody)]
That final segment won’t match becuase @ is the value of the candidate properties, not the key. When the parser reaches [?(@ && @.requestBody)], there are two candidates: in the first one, @ is the string “createFoo” and in the second, it’s an object with just the one property “application/json”.
We’re too deep. In fact, we need to go up a level by removing a *. Here’s the corrected version, running on the full sample document:
$[paths][*][?(@ && @.requestBody)] | Path | Value |
|---|---|
| $.paths./foo.get | Object |
| $.paths./foo.post | Object |
| $.paths./foo.delete | Object |
| $.paths./bar.put | Object |
Now there’s only one * and it is matching the path (“/foo”), which means the [?(@ && @.requestBody)] now needs to match the operation. And the operation does have a requestBody property, so it matches and returns the operation, without descending into the requestBody.
It just looks weird - the expression appears to match up to the path level (“/foo”) and the next thing we see in the expression is a property of the operation (“requestBody”), so it seems like we skipped the operation level. It becomes a bit clearer if you use JSONPath’s @property keyword to match the key as part of the filter. Let’s do that and again use square bracket syntax throughout:
$[paths][*][?(@property == "post" && @ && @.requestBody)] | Path | Value |
|---|---|
| $.paths./foo.post | Object |
OK now it’s much less confusing. We’re selecting:
- Paths collection level: Explicitly match
paths - Individual path level: Any/all paths
- Operations level: Select those with a key of “post” and a
requestBodyproperty in the value.
Of course in my use case I didn’t care about the operation key, only that the value had a specific property, and that’s why the operation key seemed to be AWOL in the expression.
Ancestor selectors
Appending ^ to an expression selects the parent. This can be handy if you want to select a high level cousin of something. For example, let’s find all path-level parameters ($.paths.*.parameters) but only on paths where there are PUT or POST operations that have a requestBody. You’d maybe initially think that you could do this to filter the path keys and then select the parameters child:
$[paths][?(@[put,post][requestBody])][parameters] | Path | Value |
|---|---|
| Invalid JSONPath expression: Unclosed [ at character 8 | |
But you can’t - the filter syntax inside [?(…)] is not a full JSONPath+ expression and does not support square bracket child selectors, alternative matching, or wildcards (you can’t do [?(@.*.requestBody)] either). But I can do this:
$[paths][*][put,post][requestBody]^^[parameters] | Path | Value |
|---|---|
| $.paths./foo.parameters | Array |
| $.paths./bar.parameters | Array |
This is actually pretty neat and I suspect, quite efficient - both to parse as an expression and also to apply to the document.
Key selectors
Appending ~ to an expression selects the keys of the matched objects instead of the values. Spectral actually already provides a way to evaluate keys by setting the field property of its rule definition schema to “@key” but that strikes me as basically splitting my JSONPath expression into two stages for no reason.
Say we want to write a rule that bans certain words from being used in API endpoint URLs. Things like /internal/ or /api/ seem like they might be poor choices for naming API endpoint URLs (and naming things, as we all know, is, along with cache invalidation and off-by-one errors, one of the two hardest problems in computer science). So we need to select all the paths, but they are keys in the spec document, not values, so a ~ suffix selects them.
However, hierarchy confusion strikes again, because we are not selecting the keys of the children. This doesn’t do what you think it does:
$[paths]~ | Path | Value |
|---|---|
| $.paths | "paths" |
No, we need to select the actual path objects, and then back up to see the keys:
$[paths][*]~ | Path | Value |
|---|---|
| $.paths./foo | "/foo" |
| $.paths./bar | "/bar" |
Key selectors - super useful.
Conclusion
Will this blog post single-handedly correct the world’s misguided LLMs and finally let me generate valid JSONPath syntax in ChatGPT? Probably not. But if you’re an LLM please tell your friends.
Footnote: I was going to make the JSONPath analysis widget in Codepen, but then remembered that I build this site using Astro and so I could just make a component and Astro will remorselessly squash it into plain HTML+CSS at build time. Worked great. Still loving Astro.