[Pipeline Refactor] Update routes, text generation initial functionality #1348

dsikka · 2023-10-25T00:38:49Z

Summary

The operators, pipeline, and routes added for text_generation so far include the operators and the flow shown above

Testing

Testing script to evaluate various situations:

Just running the single-token engine as prompt_sequence_length > number of prompt tokens
Just running the multi-token engine as the number of tokens % prompt_sequence_length == 0
Running a combination of multi-token engine and single-token engine, as the number of prompt_sequence_length <= number of prompt tokens but prompt tokens % prompt_sequence_length != 0

All 3 cases are tested using the test script below, by changing the prompt_sequence_length.
Prompt logits are evaluated against the ground truth logits, produced using the transformers model.

import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer

import torch
from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
from huggingface_hub import snapshot_download


def create_tokenizer(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.padding_side = "left"
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    return tokenizer


def get_ground_truth(prompt):
    model = AutoModelForCausalLM.from_pretrained("roneneldan/TinyStories-1M")
    tokenizer = create_tokenizer("roneneldan/TinyStories-1M")

    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    out = model(input_ids=input_ids)
    prompt_logits = out.logits.detach().numpy()
    return prompt_logits

cur_len = 16
prompt = "Hello there, how are you?"
model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, engine_kwargs={"engine_type": "onnxruntime"}, prompt_sequence_length=cur_len)
input_values = TextGenerationInput(prompt=prompt)
logits = pipeline(input_values)
ground_truth = get_ground_truth(prompt)

print("All Close?", np.allclose(logits, ground_truth, atol=0.0001))

dbogunowicz

The general design makes sense to me. Added my comments. After we finish honing the logic, let's spend some time on the structure and aesthetics of the code. Since everyone will be building on top of those abstractions, I think it is very important to introduce prudent and good aesthetics that other pipelines will carry over.

src/deepsparse/v2/operators/engine_operator.py

src/deepsparse/v2/pipeline.py

src/deepsparse/v2/text_generation/multi_engine_prefill_operator.py

src/deepsparse/v2/text_generation/pipeline.py

src/deepsparse/v2/utils/state.py

src/deepsparse/v2/routers/router.py

rahul-tuli

The overall structure looks good to me, there is a lil cleanup left pertaining to docstrings and typing (did not comment there), good comments from Damian; One pain point I have is reviewing this PR is slightly difficult, I would at least like to see some more unit tests, with coverage reports (if possible)

(P.S: Great work on this)

src/deepsparse/v2/operators/operator.py

src/deepsparse/v2/routers/router.py

src/deepsparse/v2/schedulers/scheduler.py

src/deepsparse/v2/text_generation/autoregressive_preprocess_operator.py

bfineran · 2023-10-30T16:35:23Z

src/deepsparse/v2/text_generation/nl_engine_operator.py

+            # we want to pass the empty kv cache inputs
+            # (batch_size=0) to the engine. Therefore,


do we still have the logic that handles this?

as in is the comment still relevant?

src/deepsparse/v2/text_generation/nl_engine_operator.py

bfineran · 2023-10-30T16:37:17Z

src/deepsparse/v2/text_generation/pipeline.py

+        pipeline_state = PipelineState()
+        pipeline_state_vals = {}
+
+        # TODO: The code below will be replaced with a transformers set-up Operator.


not sure if we should ever be setting up operators with other operators outside of construction - especially with stuff like engines that should be compiled on construction.

oh sorry, the comment just means we should add in an operator that handles things specific to transformers (such as setting up the config, tokenizer, model).

anything in setup should happen on init, not an operator that gets run every inference though, right?

src/deepsparse/v2/text_generation/pipeline.py

bfineran · 2023-10-30T16:39:30Z

src/deepsparse/v2/text_generation/prep_for_prefill.py

+__all__ = ["PrepareforPrefill"]
+
+
+class PrepareforPrefill(Operator):


will we need these "prep" operators after the refactor?

so far they're responsible for doing things needed before prompt_inference (create a kv_cache) and before generation (set-up the token_generator for example) so they should still be here.

src/deepsparse/v2/text_generation/process_inputs.py

bfineran

looks great, like the updated diagram, could maybe use a separate graphic that breaks down the KVCache interactions (understand that it may make the main diagram messy)

bfineran · 2023-11-02T13:40:41Z

src/deepsparse/v2/text_generation/pipeline.py

+        )
+        compile_prompt_logits = CompilePromptLogits()
+        """
+        prep_for_single_engine = PrepareforSingleEngine(


intentional?

mixed in follow-up PR

bfineran · 2023-11-02T13:42:13Z

src/deepsparse/v2/text_generation/pipeline.py

+        pipeline_state = PipelineState()
+        pipeline_state_vals = {}
+
+        # TODO: The code below will be replaced with a transformers set-up Operator.


anything in setup should happen on init, not an operator that gets run every inference though, right?

rahul-tuli

The structure is very good! really appreciate the diagram in PR description; most of my comments now are nits, I'll leave them upto you to address now or in a follow up PR

More tests would obviously be nicer, but I ran through the script attached in PR description which works for me, Great work on this.

rahul-tuli · 2023-11-02T21:00:15Z

src/deepsparse/v2/operators/engine_operator.py

        num_cores: int = None,
        num_streams: int = None,
        scheduler: Scheduler = None,
        input_shapes: List[List[int]] = None,
-        engine_context: Optional[Context] = None,
+        engine_context: Optional[EngineContext] = None,
+        engine_kwargs: Dict = None,


Suggested change

engine_kwargs: Dict = None,

engine_kwargs: Optional[Dict] = None,

src/deepsparse/v2/routers/router.py

src/deepsparse/v2/operators/engine_operator.py

rahul-tuli · 2023-11-02T21:09:48Z

src/deepsparse/v2/operators/engine_operator.py

@@ -87,7 +87,7 @@ def __init__(
        self._engine_args = engine_args
        self._engine_type = engine_type

-        self.engine = self.create_engine()
+        self.engine = self.create_engine(**engine_kwargs)


I don't like that we are passing kwargs to create_engine, which also accepts **kwargs; there is no clear way to figure out what arguments should be passed to create_engine without looking at the implementation?

rahul-tuli · 2023-11-02T21:12:05Z

src/deepsparse/v2/operators/engine_operator.py

@@ -87,7 +87,7 @@ def __init__(
        self._engine_args = engine_args


we have engine_args, and engine_kwargs, but both are dicts; aren't the names slightly misleading?

rahul-tuli · 2023-11-02T21:23:32Z

src/deepsparse/v2/text_generation/autoregressive_preprocess_operator.py

+        return False
+
+    def run(self, tokens: Any, kv_cache: Any, pipeline_state: PipelineState, **kwargs):
+


rahul-tuli · 2023-11-02T21:27:10Z

src/deepsparse/v2/text_generation/compile_logits.py

+    Combine the prompt logits. Currently relying on the inference state to store the
+    prompt logits for each token or multi-token batch processed. This operator will
+    take prompt logits from each iteration run and update the inference state.
+    """


rahul-tuli · 2023-11-02T21:27:18Z

src/deepsparse/v2/text_generation/compile_logits.py

+    """
+
+    def run(self, logits, inference_state: InferenceState, **kwargs):
+        logit_type = "prompt_logits"


rahul-tuli · 2023-11-02T21:28:29Z

src/deepsparse/v2/text_generation/kv_cache_operator.py

+
+
+class KVCacheCreatorOutput(BaseModel):
+    kv_cache: Any = Field(description="KV Cache Created")  # DecoderKVCache


Does setting arbitrary types allowed also doesn't allow the typing to be DecoderKVCache

rahul-tuli · 2023-11-02T21:30:01Z

src/deepsparse/v2/text_generation/multi_engine_prefill_operator.py

+class MultiEnginePrefill(Operator):
+    def __init__(self, prompt_sequence_length, sequence_length):
+        """
+        Prepare the tokens for the multi-token engine. This requires creating the


I'm stopping here with docstring and param comments, kindly check and update

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <[email protected]> * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <[email protected]> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

…ity (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <[email protected]> * pipeline runs, but incorrectly * it works for a single sequence * cleanup. now lets figure out how to run multiple sequences * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <[email protected]> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * integration tests pass * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * fix tricky rebase * one more cleanup * got tests to work after rebase. implementing SPLIT and JOIN in linearouter now * pipeline working, with GraphRouter. Needs some more testing * ready for review * cleanup * simplify after PR review round * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) * bring back functionalities that were lost in v2 during rebasing * Update src/deepsparse/transformers/helpers.py * ready for review * bring tests back" * quality * original readme * addressing Dipikas comments * Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py * addressing PR review --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

dsikka changed the title ~~Features/v2/prompt inference~~ [Pipeline Refactor] Update routes, text generation initial functionality Oct 25, 2023

dsikka marked this pull request as ready for review October 25, 2023 01:12

dbogunowicz reviewed Oct 25, 2023

View reviewed changes

rahul-tuli reviewed Oct 26, 2023

View reviewed changes

src/deepsparse/v2/routers/router.py Show resolved Hide resolved

rahul-tuli reviewed Oct 26, 2023

View reviewed changes

dsikka requested review from bfineran and Satrat October 27, 2023 03:13

dsikka mentioned this pull request Oct 27, 2023

[Pipeline Refactor] Feature branch for v2 text-generation #1358

Closed

dsikka changed the base branch from feature/v2/text_generation to ds_refactor October 27, 2023 20:56

dsikka changed the base branch from ds_refactor to feature/v2/text_generation October 27, 2023 20:57

bfineran suggested changes Oct 30, 2023

View reviewed changes

dbogunowicz reviewed Oct 30, 2023

View reviewed changes

src/deepsparse/v2/text_generation/process_inputs.py Show resolved Hide resolved

dsikka force-pushed the feature/v2/text_generation branch from fe9ed2a to d909992 Compare October 31, 2023 21:51

dsikka force-pushed the features/v2/prompt_inference branch from 3721907 to 6007a75 Compare November 1, 2023 00:50

dsikka added 3 commits November 1, 2023 12:36

initial functionality and working example with image classification

6c75b65

remove testing image

75de103

rebase fixes

aa5d885

dsikka force-pushed the feature/v2/text_generation branch from 4eb4ab7 to aa5d885 Compare November 1, 2023 16:37

dsikka added 8 commits November 1, 2023 12:37

initial functionality and working example with image classification

8cc63ee

text gen

ab2b711

updates func

00cb85e

prompt inference, initial functionality

5cf4b3f

remove image; update state docstring

1b951dc

Fix typo

809cfc1

add todo for split/join

6336d8e

remove context, clean-up args, remove prefill_preprocess_operaator

3f2193d

dsikka force-pushed the features/v2/prompt_inference branch from 625a1c3 to 3f2193d Compare November 1, 2023 16:45

dsikka changed the base branch from feature/v2/text_generation to v2 November 1, 2023 16:49

fix docstrings

216ceea

dsikka requested a review from rahul-tuli November 1, 2023 20:08

dsikka requested review from dbogunowicz and bfineran November 1, 2023 20:08

bfineran approved these changes Nov 2, 2023

View reviewed changes

rahul-tuli approved these changes Nov 2, 2023

View reviewed changes

dsikka merged commit e1ff108 into v2 Nov 3, 2023

dsikka deleted the features/v2/prompt_inference branch November 3, 2023 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline Refactor] Update routes, text generation initial functionality #1348

[Pipeline Refactor] Update routes, text generation initial functionality #1348

dsikka commented Oct 25, 2023 •

edited

Loading

dbogunowicz left a comment

rahul-tuli left a comment •

edited

Loading

bfineran Oct 30, 2023

dsikka Nov 1, 2023

bfineran Oct 30, 2023

dsikka Oct 31, 2023

bfineran Nov 2, 2023

dsikka Nov 2, 2023

bfineran Oct 30, 2023

dsikka Oct 31, 2023

bfineran left a comment

bfineran Nov 2, 2023

dsikka Nov 2, 2023

bfineran Nov 2, 2023

rahul-tuli left a comment

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

rahul-tuli Nov 2, 2023

		# we want to pass the empty kv cache inputs
		# (batch_size=0) to the engine. Therefore,

		__all__ = ["PrepareforPrefill"]


		class PrepareforPrefill(Operator):

	engine_kwargs: Dict = None,
	engine_kwargs: Optional[Dict] = None,

		@@ -87,7 +87,7 @@ def __init__(
		self._engine_args = engine_args

		return False

		def run(self, tokens: Any, kv_cache: Any, pipeline_state: PipelineState, **kwargs):



		class KVCacheCreatorOutput(BaseModel):
		kv_cache: Any = Field(description="KV Cache Created") # DecoderKVCache

[Pipeline Refactor] Update routes, text generation initial functionality #1348

[Pipeline Refactor] Update routes, text generation initial functionality #1348

Conversation

dsikka commented Oct 25, 2023 • edited Loading

Summary

Testing

dbogunowicz left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsikka commented Oct 25, 2023 •

edited

Loading

rahul-tuli left a comment •

edited

Loading