Skip to content

Use base64 format to improve performance of embeddings #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jdubois opened this issue Feb 10, 2025 · 4 comments
Open

Use base64 format to improve performance of embeddings #211

jdubois opened this issue Feb 10, 2025 · 4 comments

Comments

@jdubois
Copy link

jdubois commented Feb 10, 2025

When Embeddings are created, they have an EncodingFormat which can either be of float of base64 as seen in

Unless this is configured by the user, base64 should be the default, as this will provide a significant performance improvement.

Here's a similar issue:

As this SDK is generated, we can't contribute directly to it, but I'd be happy to help fix this.

@TomerAberbach
Copy link
Collaborator

Feel free to send a PR!

Note that both the Python and JS SDKs are also generated, but we are able to add custom code (which will be preserved)

@yoshioterada
Copy link

@TomerAberbach

Results of Performance Measurements Using Java SDK

After conducting measurements with the Java SDK, the following results were observed:

Base64 (File Size: 84,390 bytes)

  • 1179 ms
  • 1241 ms
  • 1246 ms
  • 1209 ms
  • 1151 ms

Float (File Size: 209,678 bytes)

  • 1241 ms
  • 2216 ms
  • 1199 ms
  • 1242 ms
  • 1226 ms

Summary of Results:

I generated 10 random strings and measured both the latency and file size when retrieving these strings in two different formats: Base64 and Float (Java Double). The results showed that there were no significant differences in latency across 10 calls. However, the file size between the two formats was significantly different, with the Float format being approximately 2.5 times larger than the Base64 format. Based on these findings, adjusting the default embedding format could lead to improved network bandwidth efficiency.


Proposed Implementation Plan:

  • Default Behavior: When no specific format is specified, the system would internally use Base64 for processing and return the result in Base64 format. But internally covert to float for confirmation.
  • Float Option: If the Float format is explicitly specified, the system would perform the operations using Float and return the result in Float format.
  • Base64 Option: If the Base64 format is explicitly specified, the system would use Base64 for processing and return the result in Base64 format.

It would be beneficial to retain the option to output results in Base64, as some developers may prefer to work with values in the Base64 format for their specific use cases.


This proposed approach aims to strike a balance between flexibility for developers and optimizing network bandwidth by allowing the selection of the most appropriate data format for their needs.

I used following sample code to evaluate.

package com.openai.example;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import com.azure.identity.AuthenticationUtil;
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.openai.azure.AzureOpenAIServiceVersion;
import com.openai.azure.credential.AzureApiKeyCredential;
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.credential.BearerTokenCredential;
import com.openai.credential.Credential;
import com.openai.models.ChatCompletionCreateParams;
import com.openai.models.ChatModel;
import com.openai.models.CreateEmbeddingResponse;
import com.openai.models.EmbeddingCreateParams;
import com.openai.services.async.EmbeddingServiceAsync;
import com.openai.services.async.EmbeddingServiceAsyncImpl;
import com.openai.services.blocking.EmbeddingService;
import com.openai.models.EmbeddingCreateParams.Builder;
import com.openai.models.EmbeddingCreateParams.EncodingFormat;

public final class EmbeddingsExample {
    private EmbeddingsExample() {}

    public static void main(String[] args) {

        OpenAIClient client = OpenAIOkHttpClient.builder()
                .baseUrl("https://AOAI-INSTANCE.openai.azure.com")
                .credential(AzureApiKeyCredential.create(""))
                .azureServiceVersion(AzureOpenAIServiceVersion.getV2024_02_15_PREVIEW()).build();
        // Created Dummy data
        List<String> poems = new ArrayList<>();
        poems.add("In the quiet night, stars whisper secrets, dreams take flight.");
        poems.add("Beneath the moon's glow, shadows dance, hearts begin to know.");
        poems.add("Waves crash on the shore, time stands still, love forevermore.");
        poems.add("Autumn leaves fall, painting the ground, nature's final call.");
        poems.add("Morning dew glistens, a new day dawns, hope always listens.");
        poems.add("Mountains stand tall, silent guardians, witnessing it all.");
        poems.add("In a field of green, flowers bloom bright, a serene scene.");
        poems.add("Winter's chill bites, fireside warmth, cozy, long nights.");
        poems.add("Spring's gentle breeze, life awakens, hearts find ease.");
        poems.add("Sunset hues blend, day meets night, a perfect end.");

        EmbeddingService embeddings = client.embeddings();

        try {
            long startTime = System.nanoTime();
            List<String> results = poems.stream().map(poem -> {
                EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
                        .input(poem)
//                        .encodingFormat(EncodingFormat.BASE64)
                        .model("text-embedding-3-small")
                        .build();
                CreateEmbeddingResponse createEmbeddingResponse =
                        embeddings.create(embeddingCreateParams);
                return createEmbeddingResponse.toString();
            }).collect(Collectors.toList());

            Files.write(Paths.get("result-base64.txt"), results, StandardCharsets.UTF_8,
                    StandardOpenOption.CREATE, StandardOpenOption.WRITE);

            long endTime = System.nanoTime();
            long durationInMillis = (endTime - startTime) / 1_000_000;
            System.out.println("Execution time: " + durationInMillis + " ms");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

@yoshioterada
Copy link

This is the actual result data of the above program.
result-base64.txt
result-float.txt

@yoshioterada
Copy link

yoshioterada commented Mar 6, 2025

And I notice that there is a bug on current implementation.

if I execute following code, it will run successfully.

        EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
                .input("Sunset hues blend, day meets night, a perfect end.")
                // .encodingFormat(EncodingFormat.BASE64)
                .model("text-embedding-3-small").build();
        CreateEmbeddingResponse createEmbeddingResponse = embeddings.create(embeddingCreateParams);
        createEmbeddingResponse.data().forEach(embedding -> {
            embedding.embedding().forEach(System.out::println);
        });

Output is look like follows

....
0.015402689
-0.0012530729
0.01653222
0.026133228
-0.0046496866
0.0055866833
-0.004704238
-0.014337336

However If I use Base64, following exception occure

        EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
                .input("Sunset hues blend, day meets night, a perfect end.")
                .encodingFormat(EncodingFormat.BASE64)
                .model("text-embedding-3-small").build();
        CreateEmbeddingResponse createEmbeddingResponse = embeddings.create(embeddingCreateParams);
        createEmbeddingResponse.data().forEach(embedding -> {
            embedding.embedding().forEach(System.out::println);
        });

Actual Exception is as follows

6RDmu3nXEb0qnk670cZNPXhoHTuUbgc6yEDLumBO+zw50dg8kMpwPdSHjbzoofE7HUQ/uiOC0ry8ctk8JedqPKssAr0rDUO7R5pcPFfIeLvKkpY8xfhbO2uy5rwOhZe8GlMJPDblED1ioEY9akNyvAwzzLzwGRU8UJTBvEYraLyA8409a5+ZPKYGKrvGT5W7V8j4vNsvpzz+UQ25pnoMPBrfJrzXYAg8TMWiPJKoWTykEAY7AGDsvITf1TzEcaw8zWuROibKwTy+1/E8otZ1PNE6MDxMNBe8PKrTPDm5HTy76ym8J60YvBevcjwvOIk85pMSPeVBR7xPJU08G/LzPOrzvDslQxK9o7lMPY+3o7vtQJq8UJRBPBlwsjgId/q8v7pIu0JXWzxXmAI8uwNlPK2uwzyGMaG8UXeYOtcEYTxGE608zvLAPMfWRL0/axO78IgJvI9bfDwmPqS6lG6HPFoV1jxmXJi7fxC3OwImmrsl52q8
        at com.openai.core.JsonField.getRequired$openai_java_core(Values.kt:133)
        at com.openai.models.Embedding.embedding(Embedding.kt:38)
        at com.openai.example.EmbeddingsExample.lambda$0(EmbeddingsExample.java:64)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at com.openai.example.EmbeddingsExample.main(EmbeddingsExample.java:63)

Because currently embedding.embedding() method return only List<Double>, however if I specified the .encodingFormat(EncodingFormat.BASE64), it will return as String (Base64 String).
As a result, it seems that the above Exception occure.

Proposal

Would it be better to remove the current embedding() method and implement the new embeddingDouble() and embeddingBase64() methods instead?"

Updated Proposal (3/10)

Based on the actual behavior in REST response, I have thought about it a bit more.

Following is the REST invokation Sample:

curl -X POST "https://AOAI.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: $API-KEY" \
-d '{
  "input": "SOME DATA",
  "model": "text-embedding-3-small",
}'
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.010497476,
        -0.030576246,
        -0.007097663,
curl -X POST "https://AOAI.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: $API-KEY" \
-d '{
  "input": "SOME DATA",
  "model": "text-embedding-3-small",
  "encoding_format": "base64"
}'
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": "m/0rPAl7+ryDk+i77Q0gvHsqizwBYSc79xjtOZrqyjy8THo8XvE0vSxaG7wFrGc8prluuxM/Yrxyk8Y8rOMLu3QIE71tsdI8KpbEus22rDw7y6489iASvJ4I6DwmFjy9Qu0mPet+kTwfv3s8zImJPDYeAz3Pcl48Vx7Hux7HoLy7VB89yi5/vOjcITslA1s7fhNgvNinN7whg9K8FWUkvc56gz1oz149o5tRPS/HQj1HMQY8L8fCPN8rG7zwekc8U9OGPVcEBTxGHiU6wt9jPPOYZLx6MWy8Gz9zvIeqJDvYcu+8SSlhvEkp4TxE3qC8+K+gujQLIjsbJTE76y+HPGPtars/Fm88HFJUPH1qj7vzfiK8bbHSPOZnVbysTNi64jbXvD28KD0y5V89iAwQPHp5lTx/qhM70IW/vLy2Cr1wz2+8w6O6PLcjIb3tDSA9guLyOxQ4gbyLk3m9UDEXPM0YGLoPIUU92uc7vU1AHTz9kRS9oZAVPRpHGDzFZxE8MuXfPMpJhbxSPNM8SF4pvMLFoTosWps8dhNPPGUTrT2VCNc7bE9nOnVP+Ds2HgM83FwIPMdyTTzhhWG9WovuvIzborxbnk+9VgvmPODckLyivTg9I1oKPWcmDj08+FE77mdmPFB4fD1gtQs9o4GPvUOW9zwWeAU9y3YoPKI5Zjtqk7W88T6evBpHmDx7RE29gIgsvIZi+zpKIgA9OF4

As a result, since the result returns with "embedding," I believe it would be better to handle both within the embedding() method rather than splitting it into two separate methods (above proposal).
Therefore, I suggest assigning the return value to EmbeddingResponse and EmbeddingValue to retrieve the result.
If I implement these by Java, it may look like follows. (Sorry I don't have enough knowlege of Kotlin)

import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import java.util.List;
import java.util.Optional;

@JsonDeserialize(using = EmbeddingValueDeserializer.class)
public class EmbeddingValue {
    private Optional<List<Double>> floatEmbedding = Optional.empty();
    private Optional<String> base64Embedding = Optional.empty();

    public Optional<List<Double>> getFloatEmbedding() {
        return floatEmbedding;
    }
    public void setFloatEmbedding(Optional<List<Double>> floats) {
        this.floatEmbedding = floats;
    }

    public Optional<String> getBase64Embedding() {
        return base64Embedding;
    }
    public void setBase64Embedding(Optional<String> base64) {
        this.base64Embedding = base64;
    }
}
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.*;
import com.fasterxml.jackson.databind.node.ArrayNode;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

public class EmbeddingValueDeserializer extends JsonDeserializer<EmbeddingValue> {
    @Override
    public EmbeddingValue deserialize(JsonParser jp, DeserializationContext ctxt) throws IOException {
        EmbeddingValue embeddingValue = new EmbeddingValue();
        JsonNode node = jp.getCodec().readTree(jp);

        if (node.isArray()) {
            List<Double> floats = new ArrayList<>();
            for (JsonNode item : (ArrayNode) node) {
                floats.add(item.asDouble());
            }
            embeddingValue.setFloatEmbedding(Optional.of(floats));
        } else if (node.isTextual()) {
            embeddingValue.setBase64Embedding(Optional.of(node.asText()));
        }
        return embeddingValue;
    }
}
import com.fasterxml.jackson.annotation.JsonProperty;
import java.util.List;

public class EmbeddingResponse {
    private String object;
    private List<DataItem> data;

    public String getObject() {
        return object;
    }
    public void setObject(String object) {
        this.object = object;
    }

    public List<DataItem> getData() {
        return data;
    }
    public void setData(List<DataItem> data) {
        this.data = data;
    }

    public static class DataItem {
        private String object;
        private int index;

        @JsonProperty("embedding")
        private EmbeddingValue embeddingValue;

        public String getObject() {
            return object;
        }
        public void setObject(String object) {
            this.object = object;
        }

        public int getIndex() {
            return index;
        }
        public void setIndex(int index) {
            this.index = index;
        }

        public EmbeddingValue getEmbeddingValue() {
            return embeddingValue;
        }
        public void setEmbeddingValue(EmbeddingValue embeddingValue) {
            this.embeddingValue = embeddingValue;
        }
    }
}
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Optional;

public final class EmbeddingsExample {
    private EmbeddingsExample() {}

    public static void main(String[] args) {
        // ...existing code...

        // Sample JSON Response
        String jsonResponse = "{ \"object\": \"list\", \"data\": [ { \"object\": \"embedding\", \"index\": 0, \"embedding\": [ 0.010497476, -0.030576246 ] } ] }";

        try {
            ObjectMapper mapper = new ObjectMapper();
            EmbeddingResponse customResponse =
                    mapper.readValue(jsonResponse, EmbeddingResponse.class);

            customResponse.getData().forEach(item -> {
                EmbeddingValue embeddingValue = item.getEmbeddingValue();
                Optional<List<Double>> floatEmbedding = embeddingValue.getFloatEmbedding();
                Optional<String> base64Embedding = embeddingValue.getBase64Embedding();

                floatEmbedding.ifPresent(floats -> {
                    System.out.println("Float embedding: " + floats);
                });

                base64Embedding.ifPresent(base64 -> {
                    System.out.println("Base64 embedding: " + base64);
                });
            });
        } catch (Exception e) {
            e.printStackTrace();
        }

        // ...existing code...
    }
}

yoshioterada pushed a commit to yoshioterada/openai-java that referenced this issue Mar 11, 2025
This commit includes the fix described in Issue openai#211.

* Addressed the issue where Base64 encoding could not be handled.
* Improved performance by using Base64 encoding by default.
yoshioterada pushed a commit to yoshioterada/openai-java that referenced this issue Mar 12, 2025
First commit to fix Issue openai#211
This commit includes the fix described in Issue openai#211.

* Addressed the issue where Base64 encoding could not be handled.
* Improved performance by using Base64 encoding by default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants