Skip to content

Commit ec254cc

Browse files
authored
Merge pull request #6 from Barqawiz/text-to-speech
support text to speech models
2 parents e4fd0c1 + e8d19e3 commit ec254cc

23 files changed

+956
-45
lines changed

README.md

+29-29
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
# Intelligent Java
2-
[![Maven Central](https://img.shields.io/maven-central/v/io.github.barqawiz/intellijava.core?style=for-the-badge)](https://central.sonatype.com/artifact/io.github.barqawiz/intellijava.core/0.6.2)
3-
![GitHub](https://img.shields.io/github/license/Barqawiz/IntelliJava?style=for-the-badge)
2+
[![Maven Central](https://img.shields.io/maven-central/v/io.github.barqawiz/intellijava.core?style=for-the-badge)](https://central.sonatype.com/artifact/io.github.barqawiz/intellijava.core/0.7.0)
3+
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/Barqawiz/IntelliJava?style=for-the-badge)](https://github.com/Barqawiz/IntelliJava/releases)
4+
[![GitHub](https://img.shields.io/github/license/Barqawiz/IntelliJava?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
45

56

6-
Intelligent java (IntelliJava) is the ultimate tool for Java developers looking to integrate with the latest language models and deep learning frameworks. The library provides a simple and intuitive API with convenient methods for sending text input to models like GPT-3 and DALL·E, and receiving generated text or images in return. With just a few lines of code, you can easily access the power of cutting-edge AI models to enhance your projects.
7+
Intelligent java (IntelliJava) is the ultimate tool for Java developers looking to integrate with the latest language models and deep learning frameworks. The library provides a simple and intuitive API with convenient methods for sending text input to models like GPT-3 and DALL·E, and receiving generated text, speech or images in return. With just a few lines of code, you can easily access the power of cutting-edge AI models to enhance your projects.
78

89
The supported models:
910
- **OpenAI**: Access GPT-3 to generate text and DALL·E to generate images. OpenAI is preferred when you want quality results without tuning.
1011
- **Cohere.ai**: Generate text; Cohere allows you to generate your language model to suit your specific needs.
12+
- **Google AI**: Generate audio from text; Access DeepMind’s speech models.
1113

1214
# How to use
1315

1416
1. Import the core jar file OR maven dependency (check the Integration section).
1517
2. Add Gson dependency if using the jar file; otherwise, it's handled by maven or Gradle.
16-
3. Call the ``RemoteLanguageModel`` for the language models and ``RemoteImageModel`` for image generation.
18+
3. Call the ``RemoteLanguageModel`` for the language models, ``RemoteImageModel`` for image generation and ``RemoteSpeechModel`` for text to speech models.
1719

1820
## Integration
1921
The package released to Maven Central Repository:
@@ -23,25 +25,25 @@ Maven:
2325
<dependency>
2426
<groupId>io.github.barqawiz</groupId>
2527
<artifactId>intellijava.core</artifactId>
26-
<version>0.6.2</version>
28+
<version>0.7.0</version>
2729
</dependency>
2830
```
2931

3032
Gradle:
3133

3234
```
33-
implementation 'io.github.barqawiz:intellijava.core:0.6.2'
35+
implementation 'io.github.barqawiz:intellijava.core:0.7.0'
3436
```
3537

3638
Gradle(Kotlin):
3739
```
38-
implementation("io.github.barqawiz:intellijava.core:0.6.2")
40+
implementation("io.github.barqawiz:intellijava.core:0.7.0")
3941
```
4042

4143
Jar download:
42-
[intellijava.jar](https://repo1.maven.org/maven2/io/github/barqawiz/intellijava.core/0.6.2/intellijava.core-0.6.2.jar).
44+
[intellijava.jar](https://repo1.maven.org/maven2/io/github/barqawiz/intellijava.core/0.7.0/intellijava.core-0.7.0.jar).
4345

44-
For ready integration: try the [sample_code](https://github.com/Barqawiz/IntelliJava/tree/main/sample_code).
46+
For ready integration: [try the sample_code](https://github.com/Barqawiz/IntelliJava/tree/main/sample_code).
4547

4648
## Code Example
4749
**Language model code** (2 steps):
@@ -69,31 +71,30 @@ List<String> images = imageModel.generateImages(imageInput);
6971
```
7072
Output:<br>
7173
<img src="images/response_image.png" height="220px">
74+
<br><br>
75+
**Text to speech code** (2 steps):
76+
```java
77+
// 1- initiate the remote speech model
78+
RemoteSpeechModel model = new RemoteSpeechModel(apiKey, SpeechModels.google);
79+
80+
// 2- call generateEnglishText with any text
81+
SpeechInput input = new SpeechInput.Builder("Hi, I am Intelligent Java.").build();
82+
byte[] decodedAudio = model.generateEnglishText(input);
83+
```
84+
Output:<br>
85+
```Java
86+
// save temporary audio file for testing
87+
AudioHelper.saveTempAudio(decodedAudio);
88+
```
7289

7390
For full example check the code inside sample_code project.
7491

7592
## Third-party dependencies
7693
The only dependencies is **GSON**.
7794
*Required to add manually when using IntelliJava jar. However, if you imported this repo through Maven, it will handle the dependencies.*
7895

79-
For Maven:
80-
```
81-
<dependency>
82-
<groupId>com.google.code.gson</groupId>
83-
<artifactId>gson</artifactId>
84-
<version>2.8.9</version>
85-
</dependency>
86-
```
87-
88-
For Gradle:
89-
```
90-
dependencies {
91-
implementation 'com.google.code.gson:gson:2.8.9'
92-
}
93-
```
94-
9596
For jar download:
96-
[gson download repo](https://search.maven.org/artifact/com.google.code.gson/gson/2.8.9/jar)
97+
[gson download repo](https://search.maven.org/artifact/com.google.code.gson/gson/2.10.1/jar)
9798

9899
## Documentation
99100
[Go to Java docs](https://barqawiz.github.io/IntelliJava/javadocs/)
@@ -105,12 +106,11 @@ Call for contributors:
105106
- [ ] Add support to other OpenAI functions.
106107
- [x] Add support to cohere generate API.
107108
- [ ] Add support to Google language models.
109+
- [x] Add support to Google speech models.
108110
- [ ] Add support to Amazon language models.
109-
- [ ] Add support to Azure models.
111+
- [ ] Add support to Azure nlp models.
110112
- [ ] Add support to Midjourney image generation.
111113
- [ ] Add support to WuDao 2.0 model.
112-
- [ ] Add support to an audio model.
113-
114114

115115
# License
116116
Apache License

core/com.intellijava.core/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>io.github.barqawiz</groupId>
88
<artifactId>intellijava.core</artifactId>
9-
<version>0.6.3</version>
9+
<version>0.7.0</version>
1010

1111
<name>Intellijava</name>
1212
<description>IntelliJava allows java developers to easily integrate with the latest language models, image generation, and deep learning frameworks.</description>
@@ -66,7 +66,7 @@
6666
<dependency>
6767
<groupId>com.google.code.gson</groupId>
6868
<artifactId>gson</artifactId>
69-
<version>2.8.9</version>
69+
<version>2.10.1</version>
7070
</dependency>
7171
</dependencies>
7272

core/com.intellijava.core/src/main/java/com/intellijava/core/controller/RemoteLanguageModel.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ public String generateText(LanguageModelInput langInput) throws IOException {
156156
langInput.getPrompt(), langInput.getTemperature(),
157157
langInput.getMaxTokens(), langInput.getNumberOfOutputs()).get(0);
158158
} else {
159-
throw new IllegalArgumentException("This version support openai keyType only");
159+
throw new IllegalArgumentException("the keyType not supported");
160160
}
161161

162162
}
@@ -185,7 +185,7 @@ public List<String> generateMultiText(LanguageModelInput langInput) throws IOExc
185185
langInput.getPrompt(), langInput.getTemperature(),
186186
langInput.getMaxTokens(), langInput.getNumberOfOutputs());
187187
} else {
188-
throw new IllegalArgumentException("This version support openai keyType only");
188+
throw new IllegalArgumentException("the keyType not supported");
189189
}
190190

191191
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
/**
2+
* Copyright 2023 Github.com/Barqawiz/IntelliJava
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package com.intellijava.core.controller;
17+
18+
import java.io.IOException;
19+
import java.util.ArrayList;
20+
import java.util.HashMap;
21+
import java.util.List;
22+
import java.util.Map;
23+
import com.intellijava.core.model.AudioResponse;
24+
import com.intellijava.core.model.SpeechModels;
25+
import com.intellijava.core.model.input.SpeechInput;
26+
import com.intellijava.core.model.input.SpeechInput.Gender;
27+
import com.intellijava.core.utils.AudioHelper;
28+
import com.intellijava.core.wrappers.GoogleAIWrapper;
29+
30+
/**
31+
* RemoteSpeechModel class provides a remote speech model implementation.
32+
* It generates speech from text using the Wrapper classes.
33+
*
34+
* This version support google speech models only.
35+
*
36+
* To use Google speech services:
37+
* 1- Go to console.cloud.google.com.
38+
* 2- Enable "Cloud Text-to-Speech API".
39+
* 3- Generate API key from "Credentials" page.
40+
*
41+
* @author github.com/Barqawiz
42+
*/
43+
public class RemoteSpeechModel {
44+
45+
private SpeechModels keyType;
46+
private GoogleAIWrapper wrapper;
47+
48+
/**
49+
*
50+
* Constructs a new RemoteSpeechModel object with the specified key value and key type string.
51+
* If keyTypeString is empty, it is set to "google" by default.
52+
*
53+
* @param keyValue the API key value to use.
54+
* @param keyTypeString the string representation of the key type.
55+
*/
56+
public RemoteSpeechModel(String keyValue, String keyTypeString) {
57+
58+
if (keyTypeString.isEmpty()) {
59+
keyTypeString = SpeechModels.google.toString();
60+
}
61+
62+
List<String> supportedModels = this.getSupportedModels();
63+
64+
65+
if (supportedModels.contains(keyTypeString)) {
66+
this.initiate(keyValue, SpeechModels.valueOf(keyTypeString));
67+
} else {
68+
String models = String.join(" - ", supportedModels);
69+
throw new IllegalArgumentException("The received keyValue not supported. Send any model from: " + models);
70+
}
71+
}
72+
73+
/**
74+
*
75+
* Constructs a new RemoteSpeechModel object with the specified key value and key type.
76+
*
77+
* @param keyValue The API key value to use.
78+
* @param keyType The SpeechModels enum value representing the key type.
79+
*/
80+
public RemoteSpeechModel(String keyValue, SpeechModels keyType) {
81+
this.initiate(keyValue, keyType);
82+
}
83+
84+
/**
85+
* Initiate the object with the specified key value and key type.
86+
*
87+
* @param keyValue the API key value to use.
88+
* @param keyType the SpeechModels enum value representing the key type.
89+
*/
90+
private void initiate(String keyValue, SpeechModels keyType) {
91+
92+
this.keyType = keyType;
93+
wrapper = new GoogleAIWrapper(keyValue);
94+
}
95+
96+
/**
97+
* Get a list of supported key type models.
98+
*
99+
* @return list of the supported SpeechModels enum values.
100+
*/
101+
public List<String> getSupportedModels() {
102+
SpeechModels[] values = SpeechModels.values();
103+
List<String> enumValues = new ArrayList<>();
104+
105+
for (int i = 0; i < values.length; i++) {
106+
enumValues.add(values[i].name());
107+
}
108+
109+
return enumValues;
110+
}
111+
112+
/**
113+
* Generates speech from text using the support models.
114+
*
115+
* You can save the returned byte to audio file using FileOutputStream("path/audio.mp3").
116+
*
117+
* @param input SpeechInput object containing the text and gender to use.
118+
* @return byte array of the decoded audio content.
119+
* @throws IOException in case of communication error.
120+
*/
121+
public byte[] generateEnglishText(SpeechInput input) throws IOException {
122+
123+
if (this.keyType == SpeechModels.google) {
124+
return this.generateGoogleText(input.getText(), input.getGender(), "en-gb");
125+
} else {
126+
throw new IllegalArgumentException("the keyType not supported");
127+
}
128+
}
129+
130+
/**
131+
* Generates speech from text using the Google Speech service API.
132+
*
133+
* @param text text to generate the speech.
134+
* @param gender gender to use (male or female).
135+
* @param language en-gb.
136+
* @return
137+
* @throws IOException in case of communication error.
138+
*/
139+
private byte[] generateGoogleText(String text, Gender gender, String language) throws IOException {
140+
byte[] decodedAudio = null;
141+
142+
Map<String, Object> params = new HashMap<>();
143+
params.put("text", text);
144+
params.put("languageCode", language);
145+
146+
if (gender == Gender.FEMALE) {
147+
params.put("name", "en-GB-Standard-A");
148+
params.put("ssmlGender", "FEMALE");
149+
} else {
150+
params.put("name", "en-GB-Standard-B");
151+
params.put("ssmlGender", "MALE");
152+
}
153+
154+
AudioResponse resModel = (AudioResponse) wrapper.generateSpeech(params);
155+
decodedAudio = AudioHelper.decode(resModel.getAudioContent());
156+
157+
return decodedAudio;
158+
}
159+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
/**
2+
* Copyright 2023 Github.com/Barqawiz/IntelliJava
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package com.intellijava.core.model;
17+
18+
import com.google.gson.annotations.SerializedName;
19+
20+
/**
21+
*
22+
* AudioResponse represents the response from the speech API that contains the audio content.
23+
*
24+
* @author github.com/Barqawiz
25+
*
26+
*/
27+
public class AudioResponse extends BaseRemoteModel {
28+
29+
/**
30+
* Default AudioResponse constructor.
31+
*/
32+
public AudioResponse() {}
33+
34+
/**
35+
* The audio content generated from a text.
36+
*/
37+
@SerializedName("audioContent")
38+
private String audioContent;
39+
40+
/**
41+
* Gets the audio content generated from a text.
42+
* @return audio content as a base64 string.
43+
*/
44+
public String getAudioContent() {
45+
return audioContent;
46+
}
47+
48+
/**
49+
* Sets the audio content generated from a text.
50+
*
51+
* @param audioContent audio content as a base64 string.
52+
*/
53+
public void setAudioContent(String audioContent) {
54+
this.audioContent = audioContent;
55+
}
56+
57+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
package com.intellijava.core.model;
2+
3+
/**
4+
* Supported speech models.
5+
*
6+
* @author github.com/Barqawiz
7+
*
8+
*/
9+
public enum SpeechModels {
10+
/** google model */google
11+
}

0 commit comments

Comments
 (0)