Build LLM Function Schemas from Sample Data

LLM function calling requires you to describe your parameters as a schema. OpenAI, Anthropic, and Google all expect a structured definition of what the function accepts: field names, types, which fields are required, how objects nest. Writing these by hand is tedious and error-prone, especially for functions that accept complex inputs.

If you already have sample data that matches the shape you want, you can generate the schema instead.

The manual pain

Say you are building a tool that lets an LLM create calendar events. The input looks like this:

{
  "title": "Team standup",
  "start": "2026-04-16T09:00:00Z",
  "end": "2026-04-16T09:15:00Z",
  "attendees": ["alice@example.com", "bob@example.com"],
  "location": "Room 4B",
  "recurring": {
    "frequency": "daily",
    "until": "2026-06-01"
  }
}

To write the function-calling schema for this, you need to specify every field, its type, the nested object structure, and the array item type. For six fields with one nested object, that is around 40 lines of JSON Schema. For a function with 15 fields and multiple levels of nesting, it becomes a real time sink.

From sample to schema in one step

Paste the sample JSON into the converter and switch to schema mode. The output:

{
  title: string,
  start: string,
  end: string,
  attendees: [string],
  location: string,
  recurring: {
    frequency: string,
    until: string
  }
}

This gives you the structural blueprint. Every field name, its type, and the nesting are captured. You can drop this directly into your system prompt as the parameter description, or use it as a reference to write the formal JSON Schema definition.

Using the CLI in your workflow

If you are iterating on function definitions, the CLI version is faster than the browser. Suppose you have a file sample-event.json with your test data:

cat sample-event.json | npx @maisondigital/jsontoschema

The output goes to stdout. Pipe it into your prompt template, a script that generates the OpenAI function spec, or just read it in the terminal. When the sample data changes shape during development, re-run the command and the schema updates instantly.

Handling optional parameters

Function parameters are rarely all required. Some fields are optional depending on context. If you have two sample inputs where one includes location and the other does not, pass them as an array:

[
  { "title": "Standup", "start": "2026-04-16T09:00:00Z", "attendees": ["alice@example.com"] },
  { "title": "Lunch", "start": "2026-04-16T12:00:00Z", "location": "Cafe" }
]

The converter merges the array items and marks attendees and location as optional with a ? suffix. This maps directly to which parameters should be required vs. optional in your function definition.

Why this beats writing schemas from scratch

Writing function schemas by hand introduces two kinds of errors. First, typos and structural mistakes that cause the LLM to misinterpret the function. Second, drift between the actual data your code handles and the schema you described to the model. Starting from real sample data eliminates both. The schema reflects what the function actually accepts, not what you remember it accepting.

For functions with simple inputs, hand-writing the schema is fine. For anything with nested objects, arrays, or more than five parameters, generating it from sample data saves time and catches shapes you might miss.

Try it with your next function definition at the converter.