Skip to content

Feature Request: reasoning_effort parameter for reasoning models like R1 #11351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
zakkor opened this issue Jan 22, 2025 · 10 comments
Closed
4 tasks done

Feature Request: reasoning_effort parameter for reasoning models like R1 #11351

zakkor opened this issue Jan 22, 2025 · 10 comments
Labels
enhancement New feature or request stale

Comments

@zakkor
Copy link
Contributor

zakkor commented Jan 22, 2025

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Similar to how the o1 family of models accepts a reasoning_effort parameter (low, medium, high), it's concievable that OSS reasoning models should have this too.

Deepseek mentions that this param will soon be available via their API: https://api-docs.deepseek.com/guides/reasoning_model#api-parameters

Note that the CoT output can reach up to 32K tokens, and the parameter to control the CoT length (reasoning_effort) will be available soon.

Motivation

This would allow us to run reasoning models continuously for long periods of time, hopefully, in exchange for better results

Possible Implementation

I have no idea on the best implementation, though I've seen a few early options in the wild:

what if the lever is just a penalty on the </think> token? that would be so stupid and therefore must be correct.
-- https://x.com/doomslide/status/1881310996515680411

If </think> appears before enough thinking tokens have been generated, the sampler replaces it with a random string like "Wait, but". Works surprisingly well for how simple it is!
-- https://x.com/voooooogel/status/1881966969043464365

@zakkor zakkor added the enhancement New feature or request label Jan 22, 2025
@jukofyork
Copy link
Collaborator

jukofyork commented Jan 22, 2025

what if the lever is just a penalty on the token

You could probably do this to some extent using the --logit-bias option:

    {
            "id": 128799,
            "content": "</think>",
            "single_word": false,
            "lstrip": false,
            "rstrip": false,
            "normalized": true,
            "special": false
        },

@steampunque
Copy link

steampunque commented Jan 22, 2025

A simple system prompt seems to be able to limit the thinking. Distill R1 Qwen 7B IQ4_XS speculated with Distill R1 Qwen 1.5B IQ4_XS running on a 1070:

star.txt :
How much time in years would pass on earth and for the traveler when a traveler moving at 0.9c travels from earth to the star nearest to the sun?
Use step-by-step reasoning to answer the question.

Freestyle think:

lm star.txt

<think>

Okay, so I need to figure out how much time passes on Earth and for the traveler when they move at 0.9c from Earth to the nearest star. Hmm, let me start by recalling some basic concepts from special relativity because this seems like a time dilation problem.

First, I remember that when an object is moving at a significant fraction of the speed of light, time dilation occurs. That means the time experienced by the traveler (proper time) will be different from the time experienced on Earth. The formula for time dilation is given by the Lorentz factor, which is gamma (γ). The formula is:

γ = 1 / sqrt(1 - v²/c²)

Where v is the velocity of the traveler and c is the speed of light. So, I need to calculate gamma first.

Given that v is 0.9c, let me plug that into the formula. So, v² would be (0.9c)², which is 0.81c². Then, 1 - v²/c² becomes 1 - 0.81, which is 0.19. Taking the square root of 0.19, I get approximately 0.4359. Therefore, gamma is 1 divided by 0.4359, which is roughly 2.294.

So, gamma is about 2.294. This means that time for the traveler will be shorter than the time experienced on Earth by a factor of gamma.

Next, I need to find out how long the journey takes from Earth's perspective. The nearest star to the Sun is about 4.24 light-years away. So, the distance is 4.24 light-years, and the speed is 0.9c. Time is distance divided by speed, so time on Earth (t_earth) is 4.24 / 0.9. Let me calculate that: 4.24 divided by 0.9 is approximately 4.71 years. So, from Earth's perspective, the journey takes about 4.71 years.

Now, for the traveler, the time experienced (t_traveller) is t_earth divided by gamma. So, that's 4.71 / 2.294. Let me compute that: 4.71 divided by 2.294 is approximately 2.05 years. So, the traveler experiences about 2.05 years during the trip.

Wait, let me double-check my calculations to make sure I didn't make any mistakes. The distance is 4.24 light-years, speed is 0.9c, so time on Earth is 4.24 / 0.9, which is indeed approximately 4.71 years. Gamma is 1 / sqrt(1 - 0.81) = 1 / sqrt(0.19) ≈ 2.294. So, dividing 4.71 by 2.294 gives roughly 2.05 years. That seems correct.

I also remember that from the traveler's perspective, the distance is contracted. The length contraction formula is L = L0 * sqrt(1 - v²/c²), where L0 is the proper length (Earth's frame). So, L = 4.24 * sqrt(0.19) ≈ 4.24 * 0.4359 ≈ 1.85 light-years. Then, time is distance divided by speed, so 1.85 / 0.9 ≈ 2.06 years, which is consistent with what I got before. So, that cross-checks and confirms the time experienced by the traveler is about 2.05 to 2.06 years.

Therefore, summarizing, from Earth, the journey takes about 4.71 years, and for the traveler, it's approximately 2.05 years. I think that's the answer.

Final Answer
The time that passes on Earth is \boxed{4.71} years, and for the traveler, it is \boxed{2.05} years.

</think>

To determine how much time passes on Earth and for the traveler when moving at 0.9c from Earth to the nearest star, we use the concepts of time dilation and length contraction from special relativity.

  1. Calculate the Lorentz factor (γ):
    [
    \gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}} = \frac{1}{\sqrt{1 - (0.9)^2}} = \frac{1}{\sqrt{1 - 0.81}} = \frac{1}{\sqrt{0.19}} \approx 2.294
    ]

  2. Time experienced on Earth (t_earth):
    The distance to the nearest star is 4.24 light-years. The time from Earth's perspective is:
    [
    t_{\text{earth}} = \frac{\text{distance}}{\text{speed}} = \frac{4.24 \text{ light-years}}{0.9c} \approx 4.71 \text{ years}
    ]

  3. Time experienced by the traveler (t_traveller):
    Using the Lorentz factor, the time experienced by the traveler is:
    [
    t_{\text{traveller}} = \frac{t_{\text{earth}}}{\gamma} = \frac{4.71 \text{ years}}{2.294} \approx 2.05 \text{ years}
    ]

Final Answer

The time that passes on Earth is \boxed{4.71} years, and for the traveler, it is \boxed{2.05} years.

Good answer.

Hint to think less using system prompt:

SYSTEM="Think briefly." lm star.txt

<think>

First, I need to determine the distance to the nearest star from Earth. The closest star is Proxima Centauri, which is approximately 4.2465 light-years away.

Next, I'll calculate the time it takes for the traveler to reach Proxima Centauri at a speed of 0.9c. Time is equal to distance divided by speed, so the time experienced by the traveler (proper time) is 4.2465 light-years divided by 0.9c, which equals approximately 4.718 years.

Then, I'll use the time dilation formula from Einstein's Theory of Special Relativity to find the time experienced on Earth. The formula is Earth time = traveler time divided by the square root of (1 minus the square of the speed divided by the speed of light squared). Plugging in the numbers, Earth time is 4.718 years divided by the square root of (1 - 0.81), which equals approximately 8.66 years.

Finally, I'll summarize the results: the traveler experiences about 4.72 years, while approximately 8.66 years pass on Earth.

</think>

Sure, let's break down the problem step by step.

Problem Statement:
A traveler moves at a speed of ( 0.9c ) from Earth to the nearest star, Proxima Centauri, which is approximately ( 4.2465 ) light-years away. We need to determine:

  1. The time experienced by the traveler.
  2. The time experienced on Earth.

Step 1: Calculate the Time Experienced by the Traveler (Proper Time)

The proper time (( \tau )) experienced by the traveler can be calculated using the formula:

[
\tau = \frac{d}{v}
]

Where:

  • ( d ) = Distance to Proxima Centauri = ( 4.2465 ) light-years
  • ( v ) = Speed of the traveler = ( 0.9c )

[
\tau = \frac{4.2465 \text{ light-years}}{0.9c} = \frac{4.2465}{0.9} \text{ years} \approx 4.718 \text{ years}
]

Step 2: Calculate the Time Experienced on Earth

Due to time dilation, the time experienced on Earth (( t )) is related to the proper time by the Lorentz factor (( \gamma )):

[
t = \gamma \tau
]

Where:
[
\gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}} = \frac{1}{\sqrt{1 - (0.9)^2}} = \frac{1}{\sqrt{1 - 0.81}} = \frac{1}{\sqrt{0.19}} \approx 2.294
]

Now, calculate ( t ):

[
t = 2.294 \times 4.718 \text{ years} \approx 8.66 \text{ years}
]

Final Answer:

  • Time experienced by the traveler: (\boxed{4.72 \text{ years}})
  • Time experienced on Earth: (\boxed{8.66 \text{ years}})

Thought less, but gave wrong answer. Doesn't seem to like limited thinking in some other questions either.

EDIT:
SYSTEM="Do not overthink."

seems to work better at stopping infinite reflection generation with greedy sampling, also gave the right answer to star.txt.

Most likely would be both more versatile and reliable to keep track of token count after <think> phase starts, then use a word trigger on something like "Wait," to inject forced generation of "Wait, I am done thinking\n</think>".

EDIT2:
This system prompt enabled R1 Qwen 7B to correctly solve star and two other fairly tricky problems with greedy sampling while minimizing tokens in the <think> block.

SYSTEM="Minimize thoughts. Do not overcomplicate."

@jukofyork
Copy link
Collaborator

jukofyork commented Jan 22, 2025

I found adding "think carefully - this is a trick question!" to the end of a prompt could improve a lot of models from 6-12 months ago (even if it's not actually a trick question!).

This could often be the difference between solving and not solving many of the puzzles people post on Reddit.

@zakkor
Copy link
Contributor Author

zakkor commented Jan 23, 2025

Indeed, system prompts would be a way of doing this, although I don't believe they can guarantee that you'll ALWAYS get the model to generate at least a minimum amount of tokens, so maybe a more technical solution would be good here.

On the other hand, AFAIK for o1 they tell it in the system prompt that every few words consumes "gas" and it starts with "N gas", where N is the reasoning effort (100, 1000, etc). But I believe they specifically trained that into it, which is why it works.

Edit: Indeed, I played a bit with this and R1 doesn't seem to respect the system prompt when it comes to what it outputs in <think> at all.

@steampunque
Copy link

steampunque commented Jan 27, 2025

I experimented more with R1 distill Qwen 7B and found it is not very reliable just letting it freestyle think with greedy sampling. On some problems it goes into infinite reflection thrashing mode (never stops) and other problems it will stop but doesn't generate good context. I think the quality of the think context is far more important than how long it thinks. So I tried bootstrapping the start of generation with a forced prompt and got quite good results. Using this I have never seen the model go into infinite reflection mode with greedy sampling and it can also tackle some fairly trick problems correctly even using Qwen 7B distill.

The think bootstrap I use is :

"<think>\nLets use the reasoning principle which states that the simplest solution to a problem is often the correct one. First lets create a small amount of clear and accurate reasoning context to help solve the problem correctly. Here,"

Note that <think> is a special token and must be tokenized as such. In upstream it might be possible to implement this bootstrap by just modifying assistant prompt template for the model. In my downstream server patch I have a forced gen option. I give an example harmonic series problem to illustrate use. With freestyle think the model will just thrash infinitely but with a bootstrap it converges to correct solution:

cars.txt:
You have 10 cars, each with a full tank of gas and a range of 300 miles. What is the maximum distance you can drive away from a starting point?

bash-5.1$ FORCE=1 lm @cars.txt "<think>\nLets use the reasoning principle which states that the simplest solution to a problem is often the correct one. First lets create a small amount of clear and accurate reasoning context to help solve the problem correctly. Here,"
<think>
Lets use the reasoning principle which states that the simplest solution to a problem is often the correct one. First lets create a small amount of clear and accurate reasoning context to help solve the problem correctly. Here, we have 10 cars, each with a full tank of gas and a range of 300 miles. The goal is to determine the maximum distance we can drive away from the starting point.

To begin, each car can travel 300 miles on a full tank. With 10 cars, we can initially consider that the total distance might be extended by using multiple cars to support each other during the journey. This is similar to scenarios where multiple vehicles are used to extend the range of a single vehicle, often referred to as the "jeep problem" or "car carrier problem."

In such scenarios, the idea is to transfer fuel from one car to another to extend the range. For example, if we have multiple cars, they can drive together, and at certain points, some cars can transfer their remaining fuel to a central car, allowing the central car to continue further into the journey. This process can be repeated, gradually extending the maximum distance.

Let's break it down step by step:

  1. Initial Setup:

    • We have 10 cars, each with a full tank of 300 miles.
    • Total fuel available is 10 cars * 300 miles = 3000 miles worth of fuel.
  2. First Leg of the Journey:

    • All 10 cars start together.
    • They drive a certain distance, say 'x' miles.
    • At this point, some cars will transfer their remaining fuel to others to support the remaining journey.
  3. Fuel Transfer:

    • The idea is to have as many cars as possible support the remaining distance by transferring their fuel to a central car.
    • For each mile driven, each car consumes 1 mile of fuel.
    • Therefore, after driving 'x' miles, each car has 300 - x miles of fuel left.
  4. Optimizing Fuel Usage:

    • To maximize the distance, we need to minimize the fuel consumed in transferring.
    • This involves having as few cars as possible support the remaining distance, thereby conserving fuel for the central car.
  5. Recursive Application:

    • The process is repeated for each segment of the journey, with fewer cars supporting the remaining distance as the journey progresses.
    • This continues until only one car is left with enough fuel to reach the final destination.
  6. Calculating the Maximum Distance:

    • The total distance is the sum of all segments where fuel is transferred.
    • Each segment's contribution to the total distance is calculated based on the number of cars supporting that segment.
  7. Final Calculation:

    • By systematically applying the fuel transfer strategy, we can determine the maximum distance achievable with 10 cars, each having a range of 300 miles.
  8. Conclusion:

    • The maximum distance is the sum of all the segments where fuel was transferred, optimized to minimize fuel consumption and maximize the range.
      </think>

To determine the maximum distance you can drive away from the starting point using 10 cars, each with a full tank of gas (range of 300 miles), we can use a strategy similar to the "jeep problem" or "car carrier problem." The key idea is to use multiple cars to support each other during the journey, transferring fuel from some cars to others to extend the range.

Step-by-Step Solution:

  1. Initial Setup:

    • Number of Cars (N): 10
    • Range per Car (R): 300 miles
    • Total Fuel Available: 10 cars * 300 miles = 3000 miles worth of fuel
  2. Fuel Transfer Strategy:

    • The strategy involves driving a certain distance with all cars, then transferring fuel from some cars to a central car to support the remaining journey.
    • This process is repeated, reducing the number of supporting cars as the journey progresses.
  3. Calculating the Maximum Distance:

    • The maximum distance can be calculated using the formula for the jeep problem:
      [
      \text{Maximum Distance} = R \times \left(1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{N}\right)
      ]
    • For ( N = 10 ):
      [
      \text{Maximum Distance} = 300 \times \left(1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{10}\right)
      ]
    • The harmonic series sum ( H_{10} ) is approximately 2.928968.
  4. Final Calculation:
    [
    \text{Maximum Distance} = 300 \times 2.928968 \approx 878.69 \text{ miles}
    ]

Conclusion:

By strategically transferring fuel from one car to another, you can extend the range beyond the capability of a single car. The maximum distance you can drive away from the starting point is approximately 878.69 miles.

[
\boxed{878.69 \text{ miles}}
]

The model gave a correct answer but some of the thinking process is bogus. The harmonic series solution is based on traveling a distance dx where one car can exactly use all its remaining gas to fill up the remainder of the cars to the top of their capacity, then leaving that car stranded with 0 gas and the remaining cars continue with full tanks, then repeating this recursively until only one car is left. Thus I would say its hallucinating in the thinking phase but correctly applies harmonic series in solution based on its latent space. So it got the right answer without really showing a true understanding of the solution. Clearly not actually intelligent but relying on learned solution patterns to other problems. Still amazing that it can handle a fairly tricky problem like this at all with 7B params and a fairly simple bootstrap prompt to help kick it into a good reply.

@github-actions github-actions bot added the stale label Feb 27, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@zakkor
Copy link
Contributor Author

zakkor commented Mar 13, 2025

Bump

@Interpause
Copy link

Interpause commented Apr 4, 2025

https://github.com/and270/thinking_effort_processor might be a good approach, the main improvement would be to make it eventually API-controllable in llama-server rather than requiring a restart every time if applying via the current logit bias arg.

@jukofyork
Copy link
Collaborator

jukofyork commented Apr 4, 2025

https://github.com/and270/thinking_effort_processor might be a good approach, the main improvement would be to make it eventually API-controllable in llama-server rather than requiring a restart every time if applying via the current logit bias arg.

Interesting: I actually tried using the logit bias option to try to make it think longer and it didn't seem to work or it just wrote <\think> using multiple tokens. I even tried using the (non-chat) completion endpoint with mikupad and still couldn't get it to work?

I wonder if there was a bug in llama-server back then as around the same time I tried using grammars to block Chinese characters getting output by qwq-preview and these didn't seem to work either (even though others seemed to successfully get them working...!?).

@steampunque
Copy link

steampunque commented Apr 5, 2025

Interesting: I actually tried using the logit bias option to try to make it think longer and it didn't seem to work or it just wrote <\think> using multiple tokens. I even tried using the (non-chat) completion endpoint with mikupad and still couldn't get it to work?

Triggered prompt injection does seem to work to guide the model through different solutions. Explanation of below : TRIG is a prompt injection trigger similar to STOP function which seeks for a phrase then halts, except it seeks for a phrase and when found injects a specified prompt after the trigger sequence with option to replace the last token in the trigger sequence. Since </think> is a single special token it can be overwritten with anything.

TRIG (default null), single or array of trig records:
trig record = [pattern, force, range, replace,oneshot]
pattern : trig pattern string
force : force string following trigger
range : 0 : trigger anywhere (default)
>0 : trigger only after range tokens
<0 : trigger only before range tokens
replace : true,false (default) replace last pattern token
oneshot : true, false (default) this trigger can activate only once

bash-5.1$ TRIG='[["</think>","\nHold the phone, Chuck. Let me cross check that using another approach.",0,true,true],["</think>","\nWait, I think that is right but let me just do one more sanity check to make sure.",0,true,true]]' lm sum one and two in a mod three basis
<think>
First, I need to understand what it means to sum the numbers one and two within a modulo three system. Modulo three arithmetic involves performing operations where the result is the remainder when divided by three.

Next, I'll add the two numbers: 1 + 2 equals 3.

Finally, I'll apply the modulo three operation to the sum. Since 3 divided by 3 leaves a remainder of 0, the result of 1 + 2 modulo three is 0.

Hold the phone, Chuck. Let me cross check that using another approach. If I consider the numbers on a clock with three hours, adding one and two would bring me back to the starting point, which is zero. That confirms that 1 + 2 modulo three is indeed 0.

Wait, I think that is right but let me just do one more sanity check to make sure. If I have one apple and someone gives me two more, I end up with three apples. But if I can only keep track of apples modulo three, I would have none left because three divided by three is exactly one with no remainder. So, yes, 1 + 2 modulo three is 0.
</think>

Sure! Let's solve the problem of summing 1 and 2 in a modulo 3 basis step by step.

Step 1: Understand Modulo Operation

The modulo operation finds the remainder after division of one number by another. In this case, we're working modulo 3, which means we're interested in the remainder when numbers are divided by 3.

Step 2: Add the Numbers

First, add the two numbers:
[
1 + 2 = 3
]

Step 3: Apply Modulo Operation

Now, find the remainder when the sum is divided by 3:
[
3 \div 3 = 1 \text{ with a remainder of } 0
]
So,
[
3 \mod 3 = 0
]

Final Answer

[
\boxed{0}
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

4 participants