Skip to content

Structured output response support #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dchuk opened this issue Mar 13, 2025 · 47 comments
Open

Structured output response support #11

dchuk opened this issue Mar 13, 2025 · 47 comments
Labels
enhancement New feature or request

Comments

@dchuk
Copy link

dchuk commented Mar 13, 2025

Great looking project, very natural ruby approach!

Thinking about how to get official structured outputs supported (as in, not just validating a json schema against a text response from the LLM, but actually using the model’s officially supported output response formatting), it looks like there are two good projects that either can be leveraged in this project, or at least borrow some ideas from:

https://github.com/nicieja/lammy

https://github.com/instructor-ai/instructor-rb (Official clone of instructor from python, but doesn’t look to have been updated recently)

Do you have initial thoughts on how you want to approach this?

@kieranklaassen
Copy link
Contributor

I got one, maybe for more inspiration https://github.com/kieranklaassen/structify

@axcochrane
Copy link

nice, was also thinking about looking into this. happy to help out if needed

@onurozer
Copy link

Amazing gem! Structured output support would be very helpful indeed. Maybe it could be as simple as providing the schema as a ruby hash. Here's another example: https://mdoliwa.com/how-to-use-structured-outputs-with-ruby-openai-gem

@adenta
Copy link

adenta commented Mar 15, 2025

I’ve been using alexrudall/ruby-openai#508 (comment) for quite sometime.

Whatever direction we take should be different, because this is really brittle.

@kieranklaassen your thing looks great, I vote we use it as a dependency in this repo

@dchuk
Copy link
Author

dchuk commented Mar 15, 2025

I think we’d want an implementation for schema definition and structured responses that doesn’t have to rely on rails given this is a Ruby gem that can optionally be used with rails. I’ve also used this before and it works well ( alexrudall/ruby-openai#508 (comment) ) but I really like the syntax of structify @kieranklaassen shared, especially the native chain of thought capability (which really helps improve output quality).

Seems like maybe the essence of structify’s approach can be tapped into in this gem without the dependency of rails/rails models?

@adenta
Copy link

adenta commented Mar 15, 2025

@kieranklaassen are you comfortable if I point Claude 3.7 at your repo and have it extract/remove activerecord as a dependency?

Also, is this worth referencing?

@dchuk
Copy link
Author

dchuk commented Mar 15, 2025

I actually just used repomix to pack both rubyllm and structify's repos to use with claude 3.7 to ask how to combine the capabilities, it created a reasonably good plan from my quick skimming. I don't have time right now to create a PR, but I'll share its output here for now:

RubyLLM Gem Code Changes

lib/ruby_llm/schema.rb

module RubyLLM
  # Schema class for defining structured output formats
  class Schema
    attr_reader :title, :description, :version, :fields, :thinking_enabled
    
    # Class-level DSL methods for defining schemas
    class << self
      attr_reader :schema_title, :schema_description, :schema_version, 
                  :schema_fields, :schema_thinking_enabled
      
      # Set the schema title
      # @param name [String] The title
      def title(name)
        @schema_title = name
      end
      
      # Set the schema description
      # @param desc [String] The description
      def description(desc)
        @schema_description = desc
      end
      
      # Set the schema version
      # @param num [Integer] The version number
      def version(num)
        @schema_version = num
      end
      
      # Enable or disable thinking mode
      # @param enabled [Boolean] Whether to enable thinking mode
      def thinking(enabled)
        @schema_thinking_enabled = enabled
      end
      
      # Define a field in the schema
      # @param name [Symbol] Field name
      # @param type [Symbol] Field type (:string, :integer, :number, :boolean, :array, :object)
      # @param required [Boolean] Whether field is required
      # @param description [String] Field description
      # @param enum [Array] Possible values for the field
      # @param items [Hash] For array type, schema for array items
      # @param properties [Hash] For object type, properties of the object
      # @param options [Hash] Additional options
      def field(name, type, required: false, description: nil, enum: nil,
                items: nil, properties: nil, **options)
        @schema_fields ||= []
        
        field_def = {
          name: name,
          type: type,
          required: required,
          description: description
        }
        
        field_def[:enum] = enum if enum
        field_def[:items] = items if items && type == :array
        field_def[:properties] = properties if properties && type == :object
        
        # Add any additional options
        options.each { |k, v| field_def[k] = v }
        
        @schema_fields << field_def
      end
      
      # Create an instance from class definition
      def create_instance
        new(
          title: @schema_title,
          description: @schema_description,
          version: @schema_version || 1,
          thinking: @schema_thinking_enabled || false,
          fields: @schema_fields || []
        )
      end
    end
    
    def initialize(title: nil, description: nil, version: 1, thinking: false, fields: nil, &block)
      @title = title
      @description = description
      @version = version
      @thinking_enabled = thinking
      @fields = fields || []
      
      instance_eval(&block) if block_given?
    end
    
    # Define a field in the schema
    # @param name [Symbol] Field name
    # @param type [Symbol] Field type (:string, :integer, :number, :boolean, :array, :object)
    # @param required [Boolean] Whether field is required
    # @param description [String] Field description
    # @param enum [Array] Possible values for the field
    # @param items [Hash] For array type, schema for array items
    # @param properties [Hash] For object type, properties of the object
    # @param options [Hash] Additional options
    def field(name, type, required: false, description: nil, enum: nil,
              items: nil, properties: nil, **options)
      field_def = {
        name: name,
        type: type,
        required: required,
        description: description
      }
      
      field_def[:enum] = enum if enum
      field_def[:items] = items if items && type == :array
      field_def[:properties] = properties if properties && type == :object
      
      # Add any additional options
      options.each { |k, v| field_def[k] = v }
      
      @fields << field_def
    end
    
    # Enable or disable thinking mode
    # @param enabled [Boolean] Whether to enable thinking mode
    def thinking(enabled)
      @thinking_enabled = enabled
    end
    
    # Set the schema title
    # @param name [String] The title
    def title(name)
      @title = name
    end
    
    # Set the schema description
    # @param desc [String] The description
    def description(desc)
      @description = desc
    end
    
    # Set the schema version
    # @param num [Integer] The version number
    def version(num)
      @version = num
    end
    
    # Convert to JSON schema for various LLM providers
    def to_json_schema
      required_fields = @fields.select { |f| f[:required] }.map { |f| f[:name].to_s }
      
      properties = {}
      
      # Add chain_of_thought field if thinking mode is enabled
      if @thinking_enabled
        properties["chain_of_thought"] = {
          type: "string",
          description: "Explain your thinking process step by step before determining the final values."
        }
      end
      
      # Add all other fields
      @fields.each do |f|
        prop = { type: f[:type].to_s }
        prop[:description] = f[:description] if f[:description]
        prop[:enum] = f[:enum] if f[:enum]
        
        # Handle array specific properties
        if f[:type] == :array && f[:items]
          prop[:items] = f[:items]
        end
        
        # Handle object specific properties
        if f[:type] == :object && f[:properties]
          prop[:properties] = {}
          object_required = []
          
          f[:properties].each do |prop_name, prop_def|
            prop[:properties][prop_name] = prop_def.dup
            
            if prop_def[:required]
              object_required << prop_name
              prop[:properties][prop_name].delete(:required)
            end
          end
          
          prop[:required] = object_required unless object_required.empty?
        end
        
        properties[f[:name].to_s] = prop
      end
      
      # Return the complete schema
      {
        name: @title,
        description: @description,
        parameters: {
          type: "object",
          required: required_fields,
          properties: properties
        }
      }
    end
  end

  # Response parser class that converts structured JSON responses to Ruby objects
 
  class StructuredResponse
    attr_reader :raw_data, :schema
    
    def initialize(raw_data, schema)
      @raw_data = raw_data
      @schema = schema
      parse_response
    end
    
    def method_missing(method, *args, &block)
      key = method.to_s
      if @parsed_data && @parsed_data.key?(key)
        @parsed_data[key]
      else
        super
      end
    end
    
    def respond_to_missing?(method, include_private = false)
      key = method.to_s
      (@parsed_data && @parsed_data.key?(key)) || super
    end
    
    private
    
    def parse_response
      @parsed_data = {}
      
      # Skip parsing if empty response
      return if @raw_data.nil? || @raw_data.empty?
      
      # For each field in schema, extract and convert the value
      @schema.fields.each do |field|
        field_name = field[:name].to_s
        next unless @raw_data.key?(field_name)
        
        value = @raw_data[field_name]
        @parsed_data[field_name] = convert_value(value, field)
      end
      
      # Include chain_of_thought if present
      if @schema.thinking_enabled && @raw_data.key?("chain_of_thought")
        @parsed_data["chain_of_thought"] = @raw_data["chain_of_thought"]
      end
    end
    
    def convert_value(value, field)
      case field[:type]
      when :integer
        value.to_i
      when :number
        value.to_f
      when :boolean
        !!value
      when :array
        Array(value)
      when :object
        value.is_a?(Hash) ? value : {}
      else
        value.to_s
      end
    end
  end
end

lib/ruby_llm/chat_extensions.rb

module RubyLLM
  class Chat
    # Add a method for structured output
    def with_structured_output(schema_or_definition = nil, &block)
      schema = if schema_or_definition.is_a?(Schema)
                 schema_or_definition
               elsif schema_or_definition.is_a?(Class) && schema_or_definition < Schema
                 schema_or_definition.create_instance
               else
                 Schema.new(&block)
               end
      
      StructuredChat.new(self, schema)
    end
  end
  
  # StructuredChat wraps a regular Chat but adds structured output capabilities
  class StructuredChat
    def initialize(chat, schema)
      @chat = chat
      @schema = schema
    end
    
    # Forward most methods to the underlying chat
    def method_missing(method, *args, &block)
      @chat.send(method, *args, &block)
    end
    
    def respond_to_missing?(method, include_private = false)
      @chat.respond_to?(method, include_private) || super
    end
    
   # Override ask to handle structured responses
    def ask(prompt, **options)
      # Get the schema
      schema = @schema.to_json_schema
      
      # Determine which LLM provider we're using and adapt accordingly
      case @chat.provider.class.name
      when /OpenAI/
        # For OpenAI, use JSON mode with response_format
        # First, prepare our system message with schema information
        system_message = "You will extract structured information according to a specific schema."
        if schema[:description]
          system_message += " " + schema[:description]
        end
        
        # Prepare all messages including the system message
        all_messages = [
          {role: 'system', content: system_message}
        ] + @chat.conversation.messages + [{role: 'user', content: prompt}]
        
        # Call the API with JSON response format
        response = @chat.send(:perform_request, 
          model: options[:model] || @chat.provider.default_model,
          messages: all_messages,
          response_format: { 
            type: "json_object", 
            schema: schema[:parameters] 
          },
          temperature: options[:temperature] || 0.7
        )
        
        # Extract JSON from response
        if response['choices'] && response['choices'][0]['message']['content']
          begin
            # Parse the JSON response
            json_response = JSON.parse(response['choices'][0]['message']['content'])
            
            # Add response to conversation
            @chat.conversation.add('assistant', response['choices'][0]['message']['content'])
            
            # Create structured response object
            StructuredResponse.new(json_response, @schema)
          rescue JSON::ParserError => e
            # Handle parsing error
            response_content = response['choices'][0]['message']['content']
            @chat.conversation.add('assistant', response_content)
            # Log error but return the raw response
            puts "Warning: Could not parse JSON response: #{e.message}"
            response_content
          end
        else
          # Handle unexpected response format
          response_content = response['choices'][0]['message']['content'] rescue "No response content"
          @chat.conversation.add('assistant', response_content)
          response_content
        end
        
      when /Anthropic/
        # For Anthropic (Claude), use their tool calling
        tool = {
          type: "function",
          function: {
            name: "extract_structured_data",
            description: schema[:description] || "Extract structured data based on the provided schema",
            parameters: schema[:parameters]
          }
        }
        
        # Prepare the messages
        messages = @chat.conversation.messages.map do |msg|
          { role: msg[:role], content: msg[:content] }
        end
        messages << { role: 'user', content: prompt }
        
        # Prepare tool choice
        tool_choice = {
          type: "function",
          function: { name: "extract_structured_data" }
        }
        
        # Prepare request options
        request_options = options.merge(
          messages: messages,
          tools: [tool],
          tool_choice: tool_choice
        )
        
        # Perform the request
        response = @chat.send(:perform_request, request_options)
        
        # Extract tool outputs from response
        if response['content'] && response['content'][0]['type'] == 'tool_use'
          tool_use = response['content'][0]['tool_use']
          function_args = JSON.parse(tool_use['arguments'])
          
          # Add response to conversation
          text_content = response['content'].find { |c| c['type'] == 'text' }
          @chat.conversation.add('assistant', text_content ? text_content['text'] : '')
          
          # Create structured response object
          StructuredResponse.new(function_args, @schema)
        else
          # Handle regular response
          response_content = response['content'][0]['text']
          @chat.conversation.add('assistant', response_content)
          response_content
        end
        
      else
        # For other providers, use standard prompt engineering
        system_prompt = "You must respond with a valid JSON object that follows this schema: #{schema.to_json}. " +
                        "Do not include any explanatory text outside the JSON."
        
        # Add system prompt
        @chat.conversation.add('system', system_prompt)
        
        # Call the API
        response = @chat.ask(prompt, **options)
        
        # Try to parse response as JSON
        begin
          json_response = JSON.parse(response)
          StructuredResponse.new(json_response, @schema)
        rescue JSON::ParserError
          # If parsing fails, return raw response
          response
        end
      end
    end
  end
end

Usage Examples

require 'ruby_llm'

# Example 1: Define a schema class
class ArticleSchema < RubyLLM::Schema
  title "Article Extraction"
  description "Extract key information from article text"
  
  field :title, :string, required: true, description: "The article's title"
  field :summary, :text, description: "A brief summary of the article"
  field :category, :string, enum: ["tech", "business", "science"]
  field :tags, :array, items: { type: "string" }
  field :author_info, :object, properties: {
    "name" => { type: "string", required: true },
    "email" => { type: "string" }
  }
end

# Use the schema class
chat = RubyLLM::Chat.new
response = chat.with_structured_output(ArticleSchema)
               .ask("Here's an article about AI developments: [article text...]")

# Now you can access fields as methods
puts "Title: #{response.title}"
puts "Summary: #{response.summary}"
puts "Category: #{response.category}"
puts "Tags: #{response.tags.join(', ')}"
puts "Author: #{response.author_info['name']}"

# Example 2: Define a schema inline
response = RubyLLM::Chat.new.with_structured_output do
  title "Article Extraction"
  description "Extract key information from article text"
  thinking true # Enable chain of thought reasoning
  
  field :title, :string, required: true, description: "The article's title"
  field :summary, :text, description: "A brief summary of the article" 
  field :sentiment, :string, enum: ["positive", "neutral", "negative"]
end.ask("Here's a news article about climate change: [article text...]")

# Access the chain of thought reasoning
puts "Reasoning: #{response.chain_of_thought}"
puts "Title: #{response.title}"
puts "Summary: #{response.summary}"
puts "Sentiment: #{response.sentiment}"

# Example 3: Combining with tools
calculator = RubyLLM::Tool.new("Calculator", "Performs calculations") do |expression|
  eval(expression).to_s
end

product_schema = RubyLLM::Schema.new do
  field :name, :string, required: true
  field :price, :number, required: true
  field :quantity, :integer, required: true
  field :total_cost, :number, required: true
  field :categories, :array, items: { type: "string" }
end

# Combine tools with structured output
response = chat.with_tool(calculator)
               .with_structured_output(product_schema)
               .ask("I need a product entry for a MacBook Air that costs $1299. I'm ordering 3 units.")

puts "Product: #{response.name}"
puts "Price: $#{response.price}"
puts "Quantity: #{response.quantity}"
puts "Total Cost: $#{response.total_cost}" # The LLM likely used the calculator to compute this

Integrating Structured Output into RubyLLM

This guide explains how to integrate the structured output capabilities into the existing RubyLLM gem.

Step 1: Add New Files

Add these new files to the gem:

  • lib/ruby_llm/schema.rb: Contains the Schema and StructuredResponse classes
  • lib/ruby_llm/structured_chat.rb: Contains the StructuredChat class

Step 2: Update the Main File

Modify lib/ruby_llm.rb to require the new files:

require "ruby_llm/version"
require "ruby_llm/providers"
require "ruby_llm/conversation"
require "ruby_llm/chat"
require "ruby_llm/tool"
require "ruby_llm/schema"          # Add this line
require "ruby_llm/structured_chat" # Add this line

module RubyLLM
  class Error < StandardError; end
  # ... rest of the file
end

Step 3: Add Method to Chat Class

Add the with_structured_output method to the Chat class in lib/ruby_llm/chat.rb:

module RubyLLM
  class Chat
    # ... existing methods
    
    # Add a method for structured output
    def with_structured_output(schema_or_definition = nil, &block)
      schema = schema_or_definition.is_a?(Schema) ? 
               schema_or_definition : 
               Schema.new(&block)
      
      StructuredChat.new(self, schema)
    end
  end
end

Step 4: Update Tests

Add new test files:

  • spec/ruby_llm/schema_spec.rb: Tests for Schema class
  • spec/ruby_llm/structured_chat_spec.rb: Tests for StructuredChat class

Step 5: Update README and Documentation

Update the README.md with examples of using structured output, similar to the usage examples provided.

Step 6: Version Update

Increment the gem version in lib/ruby_llm/version.rb to reflect the new functionality.

Step 7: Update Changelog

Add an entry to CHANGELOG.md describing the new structured output feature.

Step 8: Integration with Existing Features

Ensure that structured output works well with existing features like tools, by testing combinations of methods like .with_tool(...).with_structured_output(...).

@dchuk
Copy link
Author

dchuk commented Mar 15, 2025

(completely untested btw, not sure how well it would work but it's a good starting point it looks like)

@crmne
Copy link
Owner

crmne commented Mar 17, 2025

Structured outputs are definitely on the roadmap. I like Kieran's approach with structify, but RubyLLM is committed to staying Rails-optional.

The code sample shared looks promising but has some rough edges - "thinking enabled" doesn't fit our design philosophy, and the response parsing should live within provider modules rather than being generic.

If anyone wants to tackle this, I'd love to see a PR that maintains our core principles:

  1. Beautiful, expressive Ruby code
  2. Minimal dependencies
  3. Provider-specific logic contained in provider modules

Check out sergiobayona/easy_talk and other alternatives before reinventing the wheel - but let's make sure our implementation is the cleanest in the ecosystem.

@adenta
Copy link

adenta commented Mar 17, 2025

Can you elaborate on:

response parsing should live within provider modules rather than being generic.

@beroneblitz
Copy link

Would love to have this feature implemented!!!!!!!!

@kieranklaassen
Copy link
Contributor

I'm personally going to use kieranklaassen/structify with ruby_llm most likely because I need it to integrate with rails. I would love to see BYO here. What I do not like with most libraries is that they do too many things at once, I rather pick my faves and plug them together.

@crmne I like the feel of your lib a lot. I have also looked at every single other scheme definition lib but have not found any that I like. Maybe there is a way to extract something from kieranklaassen/structify or be inspired.

I'll keep you all posted on how it goes using the two together.

@crmne
Copy link
Owner

crmne commented Mar 17, 2025

@adenta In the code sample shared earlier, there's parsing logic (JSON.parse, digs, etc.) in chat_extensions.rb. This doesn't align with RubyLLM's architecture, where parsing logic belongs in provider modules as it most likely provider specific.

I'd want to see that parsing code moved to provider-specific modules like lib/ruby_llm/providers/openai/structured_outputs.rb to maintain our clean separation of concerns.

@kieranklaassen Thank you!

I've been thinking more about this, and I believe we can create something even better than what's out there. What if we had a schema definition that felt truly Ruby-native?

class Delivery < RubyLLM::Schema
  datetime :timestamp
  array :dimensions
  string :address, required: true
end

# Then use it like:
response = chat.with_structured_output(Delivery)
               .ask("Extract delivery info from: Next day delivery to 123 Main St...")

puts response.timestamp   # => 2025-03-20 14:30:00
puts response.dimensions  # => [12, 8, 4] 
puts response.address     # => "123 Main St, Springfield"

This approach would be lightweight, expressive, and completely framework-agnostic. RubyLLM would handle adapting it to each provider's specific structured output mechanism.

I'd be very interested in collaborating on extracting something like this from structify. The core schema definition could be a standalone gem that other libraries could build upon. If you're up for it, let's think about what that might look like!

@adenta
Copy link

adenta commented Mar 17, 2025 via email

@crmne
Copy link
Owner

crmne commented Mar 17, 2025

After a quick search, I think we should make use of RBS https://www.honeybadger.io/blog/ruby-rbs-type-annotation/ as it's ruby native and very pretty:

class Delivery
  @timestamp : DateTime?  # ? means it can be nil
  @dimensions : Array[Integer]?
  @address : String
end

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Mar 17, 2025

I like the PORO route.

What about nested more complex json schemas? Enums, max-min etc?

for me it also should be easy to use in Rails, since that is where most people would use it and store it. So that's why I created structify. But it would be great if we can create some kind of Ruby like pydantic that works well for LLMs. I have not seen anything I like the syntax of. I'm not sure if I love rbs type annotations. Are they extendible enough?

Love the collaboration here!

@beroneblitz
Copy link

I wouldn't couple Rails with the RubyLLM interface for structured outputs. I would like to be able to pass in some data structure that represents the schema I am looking for, which then gets used in the LLM API calls, that will then feedback out a JSON / hash structure with the response from the LLM.

Moving in and out of Rails objects, I could do that on my own side decoupled from the interaction with the LLM for the purposes of coding my own business logic / etc.

@sergiobayona
Copy link

hey there glad to see my gem mentioned sergiobayona/easy_talk would gladly help integrating with ruby_llm.

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Mar 18, 2025

I'm thinking it would be really great if we can bring our own system too, since we all need something slightly different. Like we can use @sergiobayona sergiobayona/easy_talk or my more Rails focussed kieranklaassen/structify.

Maybe an interface like this:

# RubyLLM
class Delivery
  @timestamp : DateTime?  # ? means it can be nil
  @dimensions : Array[Integer]?
  @address : String
end

# Structify
class Delivery < ApplicationRecord
  include Structify::Model
  
  schema_definition do
    field :timestamp, :datetime
    field :dimensions, :array, items: { type: "number" }
    field :address, :string, required: true
  end
end

# Easytalk
class Delivery
  include EasyTalk::Model
  
  define_schema do
    property :timestamp, DateTime
    property :dimensions, T::Array[Float], optional: true
    property :address, String
  end
end

# Then use it like:
response = chat.with_structured_output(Delivery)
               .ask("Extract delivery info from: Next day delivery to 123 Main St...")

puts response.timestamp   # => 2025-03-20 14:30:00
puts response.dimensions  # => [12, 8, 4] 
puts response.address     # => "123 Main St, Springfield"

We can use an adapter pattern. Thoughts?

Also maybe we can add a raw_response, so you can do your own processing?

@adenta
Copy link

adenta commented Mar 18, 2025

I support the above proposal. At the end of the day the schema is getting evaluated down to jsonschema, so however one wants to get there should be fine

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Mar 19, 2025

This is how Langchain does it:

parser = Langchain::OutputParsers::StructuredOutputParser.from_json_schema(json_schema)
prompt = Langchain::Prompt::PromptTemplate.new(template: "Generate details of a fictional character.\n{format_instructions}\nCharacter description: {description}", input_variables: ["description", "format_instructions"])
prompt_text = prompt.format(description: "Korean chemistry student", format_instructions: parser.get_format_instructions)

While I do not like this long name Langchain::OutputParsers::StructuredOutputParser The idea of a parser is a good abstraction. Sometimes you want to extract using regex or something custom as well.

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Mar 19, 2025

question for community, can you all post some json schemas to get an idea of all the shapes?

This is one of mine:

{
  "name": "newsletter_summary",
  "description": "Extracts essential metadata from a newsletter or digest-style email and returns a detailed summary. Follows these writing rules: always use past tense (e.g., 'Matt asked'), be concrete and specific, and if there is a question in the subject, provide an answer in the summary. For digest emails, summarize the main story in one extensive paragraph, then list the three most important stories or points in three concise sentences (or lines).",
  "parameters": {
    "type": "object",
    "required": [
      "title",
      "summary",
      "priority",
      "labels"
    ],
    "properties": {
      "chain_of_thought": {
        "type": "string",
        "description": "Explain your thought process step by step before determining the final values."
      },
      "title": {
        "type": "string"
      },
      "summary": {
        "type": "text"
      },
      "priority": {
        "type": "integer",
        "description": "Priority rating for the newsletter: 0 means it's mostly promotional or unimportant, 1 indicates it has moderate usefulness, and 2 indicates high value and relevance.",
        "enum": [
          0,
          1,
          2
        ]
      },
      "newsletter_url": {
        "type": "string",
        "description": "Direct URL to the web version of the newsletter if available, otherwise null. Do NOT include any unsubscribe link."
      },
      "cover_image_url": {
        "type": "string",
        "description": "URL of the newsletter's cover image if present, otherwise null."
      },
      "labels": {
        "type": "array",
        "description": "Relevant thematic tags describing the newsletter content, e.g. 'AI', 'Productivity', 'Leadership'.",
        "items": {
          "type": "string"
        }
      }
    }
  }
}

@crmne crmne added the enhancement New feature or request label Mar 23, 2025
@crmne
Copy link
Owner

crmne commented Mar 24, 2025

Hey folks, just reviving the discussion here. Can you post some JSON schemas you are using as @kieranklaassen said? That'll help us prioritize and implement it better.

@adenta
Copy link

adenta commented Mar 24, 2025

Here is one where I was generating fake twitch chat messages to test my Pokemon Showdown bot

showdown-realtime(dev)> puts FakeChatReasoning.new.to_json
{"name":"fakechatreasoning","description":"Schema for the structured response","schema":{"type":"object","properties":{"chat_messages":{"type":"array","items":{"$ref":"#/$defs/chat_message"}}},"required":["chat_messages"],"additionalProperties":false,"strict":true,"$defs":{"chat_message":{"type":"object","properties":{"username":{"type":"string"},"body":{"type":"string"}},"required":["username","body"],"additionalProperties":false,"strict":true}}}}

Picking a button in a game of pokemon fire red

fire-red-agent(dev)> puts ButtonSequenceReasoning.new.to_json
{"name":"buttonsequencereasoning","description":"Schema for the structured response","schema":{"type":"object","properties":{"button":{"type":"string","enum":["up","down","left","right","a","b"]}},"required":["button"],"additionalProperties":false,"strict":true,"$defs":{}}}

Choosing a place to go to on a map based on a list of map coordinates

fire-red-agent(dev)> puts CharterReasoning.new.to_json
{"name":"charterreasoning","description":"Schema for the structured response","schema":{"type":"object","properties":{"x":{"type":"number","description":"The x-coordinate of the destination"},"y":{"type":"number","description":"The y-coordinate of the destination"},"description":{"type":"string","description":"the description field from the x and y coordinate chosen"}},"required":["x","y","description"],"additionalProperties":false,"strict":true,"$defs":{}}}

@crmne
Copy link
Owner

crmne commented Mar 24, 2025

@adenta could you send us the JSON version of all of them?

@adenta
Copy link

adenta commented Mar 24, 2025

Done!

@kieranklaassen
Copy link
Contributor

I started implementing this a bit, see what I can do this week, pushing as I go to: #65

@danielfriis
Copy link

danielfriis commented Mar 26, 2025

Just adding this gist for inspiration. IMO a clean way to define a schema, similar to how tools are defined now in RubyLLM.

class MathReasoning < StructuredOutputs::Schema
  def initialize
    super do
      define :step do
        string :explanation
        string :output
      end
      array :steps, items: ref(:step)
      string :final_answer
    end
  end
end

@axcochrane
Copy link

here's an example JSON schema i'm looking to implement:

{
  "name": "question_extractor",
  "description": "Extract structured data from a list of questions",
  "parameters": {
    "type": "object",
    "required": [
      "questions"
    ],
    "properties": {
      "questions": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "text": {
              "type": "string"
            },
            "speaker": {
              "type": "string"
            },
            "start_time": {
              "type": "number",
              "multipleOf": 0.01
            },
            "end_time": {
              "type": "number",
              "multipleOf": 0.01
            }
          },
          "required": [
            "text",
            "speaker",
            "start_time",
            "end_time"
          ]
        }
      }
    }
  }
}

@crmne
Copy link
Owner

crmne commented Apr 3, 2025

Thanks for the excitement about structured output support and the draft PRs @kieranklaassen and @danielfriis .

I need to take a deep dive into this before deciding anything since it's not something I personally use, so don't expect quick answers here or in the PRs. However, you all can help with information:

  1. Any JSON schema gems that you love?
  2. Any DSL proposals that don't follow any gems?
  3. What about RBS?
  4. Any quirks of OpenAI JSON schema?
  5. What about other providers? How do they define structured outputs? Is there something we can learn from their way?

My preference - as you may have guessed - goes to clean, elegant, beautiful DSLs.

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Apr 3, 2025

Long response coming in, I have thoughts from needing this in my app and trying to find a solution for 2 years. I think RubyLLM is the one! Thanks @crmne

I'm completely open to however we decide to implement this. As the repo owner, you're the one who should make the final call on the approach - I'm just providing options and considerations to help inform that decision.

Design Considerations

When implementing structured output support for RubyLLM, several key considerations should guide the design:

  1. Ruby-Native Feel: The API should follow Ruby idioms and conventions
  2. Framework Independence: No dependencies on Rails or other frameworks
  3. Extensibility: Support for custom schema definitions
  4. Developer Experience: Intuitive, low-friction usage
  5. Separation of Concerns: Provider-specific logic contained in provider modules

Proposed DSL Approaches

1. Simple Class-based Schema Definition

Define schemas using plain Ruby objects with a json_schema method:

class Delivery
  attr_accessor :timestamp, :dimensions, :address

  def self.json_schema
    {
      type: "object",
      properties: {
        timestamp: { type: "string", format: "date-time" },
        dimensions: { type: "array", items: { type: "number" } },
        address: { type: "string" }
      },
      required: ["address"]
    }
  end
end

# Usage
response = chat.with_response_format(Delivery)
               .ask("Extract delivery info from: Next day delivery to 123 Main St...")

# Response object
puts response.timestamp  # => "2025-03-20T14:30:00Z"
puts response.dimensions # => [12, 8, 4]
puts response.address    # => "123 Main St, Springfield"

Advantages:

  • Simple implementation
  • Works with existing Ruby classes
  • Full JSON Schema control

Disadvantages:

  • Verbose schema definitions
  • Requires JSON Schema knowledge

2. RBS Type Signatures

Leverage Ruby's type system using RBS syntax:

class Delivery
  @timestamp : DateTime?  # ? means optional
  @dimensions : Array[Float]?
  @address : String
end

# Usage
response = chat.with_response_format(Delivery)
               .ask("Extract delivery info from: Next day delivery to 123 Main St...")

# Response object
puts response.timestamp  # => #<DateTime: 2025-03-20T14:30:00>
puts response.dimensions # => [12.0, 8.0, 4.0]
puts response.address    # => "123 Main St, Springfield"

Advantages:

  • Leverages standard Ruby type system
  • Clean, minimalist syntax
  • Future-proof as Ruby's type system evolves

Disadvantages:

  • Does not capture all JSON Schema features
  • Less control over validation specifics

3. Schema DSL with Method Chaining

@danielfriis

A more expressive DSL for schema definition:

class Delivery < RubyLLM::Schema
  string :name, required: true
  number :price
  array :dimensions, items: { type: "number" }
  object :address do
    string :street
    string :city
    string :zip
  end
end

# Usage
response = chat.with_structured_output(Delivery)
               .ask("Extract: John bought a package for $25, size 12x8x4, sent to 123 Main St, Springfield, 12345")

# Response object
puts response.name       # => "John"
puts response.price      # => 25
puts response.dimensions # => [12, 8, 4]
puts response.address.street # => "123 Main St"
puts response.address.city   # => "Springfield"
puts response.address.zip    # => "12345"

Advantages:

  • Expressive and readable
  • Captures nested structure elegantly
  • Type-focused rather than schema-focused
  • Similar to tool schemad

Disadvantages:

  • Requires inheriting from base class
  • More complex implementation

4. Adapter Pattern for Multiple Schema Systems

Support multiple schema definition styles:

# RubyLLM's native schema definition
class Delivery < RubyLLM::Schema
  string :address, required: true
  array :dimensions
end

# Or use your own schema library:
require 'easy_talk'

class DeliveryET
  include EasyTalk::Model
  
  define_schema do
    property :address, String
    property :dimensions, T::Array[Float], optional: true
  end
end

# Usage with any schema system
response = chat.with_response_format(DeliveryET)
               .ask("Extract delivery info...")

# Response returns a DeliveryET instance
puts response.address    # => "123 Main St, Springfield"
puts response.dimensions # => [12.0, 8.0, 4.0]

Advantages:

  • Maximum flexibility
  • Works with existing schema libraries
  • Easier adoption for projects with existing schema definitions

Disadvantages:

  • Requires adapter implementations
  • Potential inconsistencies between adapters

Real-World Examples

Product Information Extraction

class Product < RubyLLM::Schema
  string :name, required: true
  string :category, enum: ["Electronics", "Clothing", "Food", "Other"]
  number :price, required: true
  array :features, items: { type: "string" }
  boolean :in_stock
end

response = chat.with_response_format(Product)
               .ask("Extract details: iPhone 15, $999, Electronics, features: A17 chip, 48MP camera. In stock: yes")

puts response.name      # => "iPhone 15"
puts response.category  # => "Electronics"
puts response.price     # => 999
puts response.features  # => ["A17 chip", "48MP camera"]
puts response.in_stock  # => true

Customer Sentiment Analysis

class FeedbackAnalysis < RubyLLM::Schema
  string :sentiment, enum: ["positive", "neutral", "negative"], required: true
  number :satisfaction_score, minimum: 1, maximum: 10
  array :key_concerns do
    string
  end
  array :positive_points do
    string
  end
  string :summary, required: true
end

feedback = "The product works well but shipping took too long and the packaging was damaged."

response = chat.with_response_format(FeedbackAnalysis)
               .ask("Analyze this customer feedback: #{feedback}")

puts response.sentiment           # => "neutral"
puts response.satisfaction_score  # => 6
puts response.key_concerns        # => ["slow shipping", "damaged packaging"]
puts response.positive_points     # => ["product works well"]
puts response.summary             # => "Customer is satisfied with product functionality but disappointed with shipping experience"

Method Naming Options

# Option 1: Focus on response format
chat.with_response_format(Schema)
    .ask("...")  # => Returns a Schema-based object

# Option 2: Focus on structured data
chat.with_structured_output(Schema)
    .ask("...")  # => Returns a Schema-based object

# Option 3: Focus on schema
chat.with_schema(Schema)
    .ask("...")  # => Returns a Schema-based object

# Option 4: Focus on extraction
chat.extract_as(Schema)
    .ask("...")  # => Returns a Schema-based object

Custom Parsers

Since we're already building a parser system for JSON, it makes sense to extend this framework to support additional response formats. The custom parser system allows for future extensibility without changing the core API.

Why Support Custom Parsers?

While structured JSON output handles most data extraction needs, LLMs can generate content in many other formats:

  1. XML responses for legacy systems or specific structured data needs
  2. Markdown code blocks that need extraction
  3. CSV data for tabular information
  4. Custom patterns that need specialized extraction via regex

By leveraging the same parsing architecture we use for JSON, we can provide a consistent interface for all response formats.

Custom Parser Examples

# XML Parser
response = chat.with_parser(:xml, tag: 'data')
               .ask("Can you provide the answer in XML? <data>42</data>")

puts response.content  # => "42"

# Regex Parser
email = chat.with_parser(:regex, pattern: 'Email: ([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})')
            .ask("My email is: Email: [email protected]")

puts email # => "[email protected]"

# CSV Parser
RubyLLM::ResponseParser.register(:csv, CsvParser)

result = chat.with_parser(:csv)
             .ask("Give me a CSV with name,age,city for 3 people")

puts result.first  # => {"name"=>"John", "age"=>"30", "city"=>"New York"}

Building Your Own Parser

Custom parsers are simple modules that implement a standard interface:

module CsvParser
  def self.parse(response, options)
    # Skip processing if not a string
    return response unless response.content.is_a?(String)
    
    rows = response.content.strip.split("\n")
    headers = rows.first.split(',')
    
    rows[1..-1].map do |row|
      values = row.split(',')
      headers.zip(values).to_h
    end
  end
end

# Register your parser
RubyLLM::ResponseParser.register(:csv, CsvParser)

This extensible design ensures RubyLLM can evolve to handle any response format needed in the future, while sharing the same infrastructure we build for structured JSON output.

Recommended Approach

A hybrid approach combining clean Ruby-native DSL with adapter support offers the best balance:

  1. Primary Schema Definition Method: Use RBS-inspired syntax for clean, Ruby-idiomatic schema definitions

    class Person < RubyLLM::Schema
      string :name, required: true
      number :age
      array :hobbies, items: { type: "string" }
    end
  2. Support for Plain Ruby Objects: Allow simple classes with json_schema method

    class Person
      attr_accessor :name, :age, :hobbies
      
      def self.json_schema
        { /* JSON schema definition */ }
      end
    end
  3. Adapter Support: Enable integration with external schema libraries

    # Register adapters for popular libraries
    RubyLLM::SchemaAdapters.register(EasyTalk::Model, EasyTalkAdapter)
    RubyLLM::SchemaAdapters.register(Structify::Model, StructifyAdapter)
  4. Method Name: Use with_response_format as it clearly indicates that we're specifying the structure of the response

This approach provides a clean, flexible API that feels natural to Ruby developers while allowing integration with existing code.

Open Questions

Rails Integration and Persistence

Since RubyLLM already has persistence built-in with chat storage, how should we approach structured output persistence with Rails?

The key question is whether we should store the raw response, the parsed response, or both. Storing only the parsed response is more space-efficient but loses information. Storing both provides the most flexibility but requires more storage. The best approach may depend on specific use cases and how often users need to reprocess historical responses with new schemas.

@danielfriis
Copy link

danielfriis commented Apr 4, 2025

@kieranklaassen I'm onboard with your recommended approach, but I suggest we limit the first PR to just the first method (Primary Schema Definition) and then expand from there in other PRs. Just to keep changes as small and contained as possible.

Similarly, I would argue to also add customer parsers in new, separate, PRs. Also — while I like the idea of parsers — I'm not sure, if @crmne would consider them within the scope of RubyLLM?

Edit: Worth mentioning for you @crmne, as you asked about quirks with OpenAI schemas, that both json_mode and structured outputs are features of OpenAI only. Hence, most of the logic will have to reside under providers. Other LLM providers (e.g. Anthropic) work with those concepts through prompt engineering, which could favour the implementation of parsers as suggested by @kieranklaassen .

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Apr 4, 2025

Google also supports Structured outputs. And agreed in taking small steps. I just want to make sure I can actually use the RubyLLM for my real world use case so I want to agree on the direction we go in.

Regarding parsers, I think it aligns well with the batteries included philosophy.

@frmsaul
Copy link

frmsaul commented Apr 9, 2025

+1 for @kieranklaassen approach!

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Apr 10, 2025

Example of code before and after a possible parser addition from above:

# frozen_string_literal: true

# Extracts industry and time-sensitive categories from an account's emails
# and stores them as memories for future reference
class FactExtractorService
  # @param account [Account] The account to extract facts from
  def initialize(account)
    @account = account
    Current.account = @account # Set Current.account for tools
  end

  # Runs the fact extraction process and stores the results as memories
  # @return [Boolean] true if extraction was successful, false otherwise
  def run
    result = extract_facts.content
    return false unless result

    # Parse the result to extract industry and time-sensitive categories
    industry, explanation, categories = parse_result(result)

    # Store the extracted information as memories
    store_industry_memory(industry, explanation)
    store_category_memories(categories)

    true
  rescue StandardError => e
    Rails.logger.error("Error in FactExtractorService: #{e.message}")
    false
  ensure
    Current.account = nil # Clean up Current.account after use
  end

  private

  # Extracts facts about the account using the fact_extractor_prompt
  # @return [String, nil] The raw response from the LLM or nil if extraction failed
  def extract_facts
    # Use PromptReader to read the prompt with the account's context
    prompt_content = PromptReader.read("fact_extractor_prompt", **@account.to_llm_context)

    # Create an ephemeral RubyLLM instance - no chat persistence
    chat = RubyLLM.chat(model: "gemini-2.0-flash-exp")
    # Ask the question and return the response
    chat.ask(prompt_content)
  end

  # Parses the raw LLM response to extract structured information
  # @param result [String] The raw LLM response
  # @return [Array<String, String, Array>] Industry, explanation, and categories
  def parse_result(result)
    # Extract industry
    industry_match = result.match(/<determined_industry>(.*?)<\/determined_industry>/m)
    industry = industry_match ? industry_match[1].strip : "Unknown"

    # Extract explanation
    explanation_match = result.match(/<explanation>(.*?)<\/explanation>/m)
    explanation = explanation_match ? explanation_match[1].strip : ""

    # Extract time-sensitive categories
    categories_match = result.match(/<time_sensitive_categories>(.*?)<\/time_sensitive_categories>/m)
    categories_text = categories_match ? categories_match[1].strip : ""

    # Split categories text into individual category definitions
    categories = categories_text.split(/(?=- [A-Z])/).map(&:strip).reject(&:empty?)

    [industry, explanation, categories]
  end

  # Stores the industry as a memory
  # @param industry [String] The detected industry
  # @param explanation [String] The explanation for the industry detection
  # @return [Memory] The created memory
  def store_industry_memory(industry, explanation)
    # Stuff
  end

  # Stores time-sensitive categories as memories
  # @param categories [Array<String>] Array of category descriptions
  # @return [Array<Memory>] Array of created memories
  def store_category_memories(categories)
    # Stuff
  end
end

With parsers:

# frozen_string_literal: true

# Extracts industry and time-sensitive categories from an account's emails
# and stores them as memories for future reference
class FactExtractorService
  # @param account [Account] The account to extract facts from
  def initialize(account)
    @account = account
    Current.account = @account # Set Current.account for tools
  end

  # Runs the fact extraction process and stores the results as memories
  # @return [Boolean] true if extraction was successful, false otherwise
  def run
    response = extract_facts
    return false unless response

    # Store the extracted information as memories
    store_industry_memory(response.industry, response.explanation)
    store_category_memories(response.categories)

    true
  rescue StandardError => e
    Rails.logger.error("Error in FactExtractorService: #{e.message}")
    false
  ensure
    Current.account = nil # Clean up Current.account after use
  end

  private

  # Extracts facts about the account using the fact_extractor_prompt with regex parser
  # @return [ParsedResponse, nil] The parsed response from the LLM or nil if extraction failed
  def extract_facts
    # Use PromptReader to read the prompt with the account's context
    prompt_content = PromptReader.read("fact_extractor_prompt", **@account.to_llm_context)

    # Create an ephemeral RubyLLM instance - no chat persistence
    chat = RubyLLM.chat(model: "gemini-2.0-flash-exp")
    
    # Use the regex parser pattern to extract structured data from the response
    chat.with_parser(:regex, patterns: {
      industry: /<determined_industry>(.*?)<\/determined_industry>/m,
      explanation: /<explanation>(.*?)<\/explanation>/m,
      categories_text: /<time_sensitive_categories>(.*?)<\/time_sensitive_categories>/m
    }).ask(prompt_content)
  end

  # Stores the industry as a memory
  # @param industry [String] The detected industry
  # @param explanation [String] The explanation for the industry detection
  # @return [Memory] The created memory
  def store_industry_memory(industry, explanation)
    # Stuff
  end

  # Stores time-sensitive categories as memories
  # @param categories [Array<String>] Array of category descriptions
  # @return [Array<Memory>] Array of created memories
  def store_category_memories(categories_text)
    # Stuff
  end
end

@adenta
Copy link

adenta commented Apr 10, 2025

I’m having a hard time following the above code.

Should we be thinking of formatting prompts as a separate concern than formatting/structuring outputs?

I love the idea of standard ways to format prompt data! Just unsure if we should try to tackle both at once.

@kieranklaassen
Copy link
Contributor

The example is just for the parser, the rest is just code I have. I'm just sharing real world code and problems it could solve.

Eric-Guo added a commit to Eric-Guo/ruby_llm that referenced this issue Apr 16, 2025
@Eric-Guo
Copy link

I just switch from ruby-openai to ruby_llm, I think we can just add with_response_format({type: 'json_object'}) very similar to with_temperature(0.2)

@jayelkaake

This comment has been minimized.

@crmne
Copy link
Owner

crmne commented Apr 18, 2025

Okay, given the awesome amount of interest let's do the following to merge it quickly:

  1. Add .with_output_schema(json_schema_hash) to Chat.
  2. Put provider-specific logic (OpenAI, Gemini, etc.) in their modules.
  3. DSLs, parsers, adapters? Later. Let's get the core working first.
  4. Needs to do only this, across relevant providers.

Let's make it simple. Who's up for it?

@kieranklaassen
Copy link
Contributor

I can do it! Just want to make sure we do not do double work

@kieranklaassen
Copy link
Contributor

@danielfriis Maybe you can take the RubyLLM::Schema in a different PR? As long as we have a.json_schema method on it they can work together.

@danielfriis
Copy link

@kieranklaassen sure, I'll follow your lead.

@kieranklaassen
Copy link
Contributor

kieranklaassen commented Apr 18, 2025

I think what you had is already almost done, I'll make it work like this:

response = chat.with_output_schema(Product.json_schema)
               .ask("Extract details: iPhone 15, $999, Electronics, features: A17 chip, 48MP camera. In stock: yes")
JSON.parse(response.content)
  • I will do the with_output_schema in my PR.
  • You can to the Product. json_schema in your PR you already started.

They can work nicely together

@rkh
Copy link

rkh commented Apr 18, 2025

Awesome stuff btw, just been lurking on the PR, would love to get this working with Gemini.

@kieranklaassen
Copy link
Contributor

Sadly we need this before we can get this working on Gemini #121

It works for OpenAI:

Image

@jayelkaake
Copy link

After some collaboration with @tpaulshippy I've posted this PR #124 to add support for more more complex param schemas without changing the way the gem currently does things. It is reverse compatible.

I really like the .with_structured_response(json_schema) idea by @kieranklaassen @ #122.

If we can get both these PRs merged it'll be awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests