How Array Merging Produces Optional Fields
JSON arrays rarely contain identical objects. One item has a discount field, another does not. One includes metadata, the rest skip it. When you convert that array to a schema, the generator has to reconcile these differences into a single representative shape.
The result is a merged object where fields present in every item are required, and fields missing from at least one item are marked optional. This is more useful than it sounds.
A concrete example
Consider this array from an e-commerce API:
[
{
"id": 1,
"name": "Widget",
"price": 9.99,
"discount": 0.10
},
{
"id": 2,
"name": "Gadget",
"price": 24.99
},
{
"id": 3,
"name": "Doohickey",
"price": 4.50,
"tags": ["clearance"]
}
]
Three items, three slightly different shapes. A schema generator merges them into one:
[
{
id: number,
name: string,
price: number,
discount?: number,
tags?: [string]
}
]
Fields id, name, and price appear in all three items, so they stay required. discount appears in one out of three, so it gets the ? suffix. Same for tags.
Why this matters for documentation
If you document an API by picking a single sample response, you get a partial picture. The one response you chose might include discount, leading consumers to assume it is always present. Or it might lack tags, hiding a field that appears under certain conditions.
Merging multiple samples catches fields that only show up sometimes. The optional marker tells consumers exactly which fields to treat as guaranteed and which to guard against.
Why this matters for LLM prompts
When you describe a JSON structure to an LLM, optional fields change how the model generates output. If you include discount without marking it optional, the model will produce it in every response. Mark it optional, and the model learns that it can be omitted. This distinction matters when you use schemas to define expected output shapes in function calling or structured generation.
Type conflicts during merging
Optionality is not the only thing merging handles. Sometimes the same field has different types across items:
[
{ "status": "active" },
{ "status": null }
]
The merged schema produces a union type:
[
{
status: string | null
}
]
This is more precise than guessing which type is "correct." The schema reflects what the data actually contains, not what you hope it contains.
How many samples are enough
One sample gives you the structure of that specific response. Two samples catch the most common optional fields. Three to five samples from different scenarios (different users, different states, edge cases) usually cover the full shape. Beyond that, you hit diminishing returns.
A practical workflow: hit the endpoint a few times with different parameters, paste all the responses as an array, and let the generator merge them. You get a complete schema in seconds.
Try it
Paste a JSON array with varying item shapes into the converter and check which fields come back optional. The result is a single, accurate representation of your data's real structure.