-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Feature Request: reasoning_effort
parameter for reasoning models like R1
#11351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You could probably do this to some extent using the {
"id": 128799,
"content": "</think>",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": false
}, |
A simple system prompt seems to be able to limit the thinking. Distill R1 Qwen 7B IQ4_XS speculated with Distill R1 Qwen 1.5B IQ4_XS running on a 1070: star.txt : Freestyle think: lm star.txt
Okay, so I need to figure out how much time passes on Earth and for the traveler when they move at 0.9c from Earth to the nearest star. Hmm, let me start by recalling some basic concepts from special relativity because this seems like a time dilation problem. First, I remember that when an object is moving at a significant fraction of the speed of light, time dilation occurs. That means the time experienced by the traveler (proper time) will be different from the time experienced on Earth. The formula for time dilation is given by the Lorentz factor, which is gamma (γ). The formula is: γ = 1 / sqrt(1 - v²/c²) Where v is the velocity of the traveler and c is the speed of light. So, I need to calculate gamma first. Given that v is 0.9c, let me plug that into the formula. So, v² would be (0.9c)², which is 0.81c². Then, 1 - v²/c² becomes 1 - 0.81, which is 0.19. Taking the square root of 0.19, I get approximately 0.4359. Therefore, gamma is 1 divided by 0.4359, which is roughly 2.294. So, gamma is about 2.294. This means that time for the traveler will be shorter than the time experienced on Earth by a factor of gamma. Next, I need to find out how long the journey takes from Earth's perspective. The nearest star to the Sun is about 4.24 light-years away. So, the distance is 4.24 light-years, and the speed is 0.9c. Time is distance divided by speed, so time on Earth (t_earth) is 4.24 / 0.9. Let me calculate that: 4.24 divided by 0.9 is approximately 4.71 years. So, from Earth's perspective, the journey takes about 4.71 years. Now, for the traveler, the time experienced (t_traveller) is t_earth divided by gamma. So, that's 4.71 / 2.294. Let me compute that: 4.71 divided by 2.294 is approximately 2.05 years. So, the traveler experiences about 2.05 years during the trip. Wait, let me double-check my calculations to make sure I didn't make any mistakes. The distance is 4.24 light-years, speed is 0.9c, so time on Earth is 4.24 / 0.9, which is indeed approximately 4.71 years. Gamma is 1 / sqrt(1 - 0.81) = 1 / sqrt(0.19) ≈ 2.294. So, dividing 4.71 by 2.294 gives roughly 2.05 years. That seems correct. I also remember that from the traveler's perspective, the distance is contracted. The length contraction formula is L = L0 * sqrt(1 - v²/c²), where L0 is the proper length (Earth's frame). So, L = 4.24 * sqrt(0.19) ≈ 4.24 * 0.4359 ≈ 1.85 light-years. Then, time is distance divided by speed, so 1.85 / 0.9 ≈ 2.06 years, which is consistent with what I got before. So, that cross-checks and confirms the time experienced by the traveler is about 2.05 to 2.06 years. Therefore, summarizing, from Earth, the journey takes about 4.71 years, and for the traveler, it's approximately 2.05 years. I think that's the answer. Final Answer
To determine how much time passes on Earth and for the traveler when moving at 0.9c from Earth to the nearest star, we use the concepts of time dilation and length contraction from special relativity.
Final AnswerThe time that passes on Earth is \boxed{4.71} years, and for the traveler, it is \boxed{2.05} years. Good answer. Hint to think less using system prompt: SYSTEM="Think briefly." lm star.txt
First, I need to determine the distance to the nearest star from Earth. The closest star is Proxima Centauri, which is approximately 4.2465 light-years away. Next, I'll calculate the time it takes for the traveler to reach Proxima Centauri at a speed of 0.9c. Time is equal to distance divided by speed, so the time experienced by the traveler (proper time) is 4.2465 light-years divided by 0.9c, which equals approximately 4.718 years. Then, I'll use the time dilation formula from Einstein's Theory of Special Relativity to find the time experienced on Earth. The formula is Earth time = traveler time divided by the square root of (1 minus the square of the speed divided by the speed of light squared). Plugging in the numbers, Earth time is 4.718 years divided by the square root of (1 - 0.81), which equals approximately 8.66 years. Finally, I'll summarize the results: the traveler experiences about 4.72 years, while approximately 8.66 years pass on Earth.
Sure, let's break down the problem step by step. Problem Statement:
Step 1: Calculate the Time Experienced by the Traveler (Proper Time) The proper time (( \tau )) experienced by the traveler can be calculated using the formula: [ Where:
[ Step 2: Calculate the Time Experienced on Earth Due to time dilation, the time experienced on Earth (( t )) is related to the proper time by the Lorentz factor (( \gamma )): [ Where: Now, calculate ( t ): [ Final Answer:
Thought less, but gave wrong answer. Doesn't seem to like limited thinking in some other questions either. EDIT: seems to work better at stopping infinite reflection generation with greedy sampling, also gave the right answer to star.txt. Most likely would be both more versatile and reliable to keep track of token count after <think> phase starts, then use a word trigger on something like "Wait," to inject forced generation of "Wait, I am done thinking\n</think>". EDIT2: SYSTEM="Minimize thoughts. Do not overcomplicate." |
I found adding "think carefully - this is a trick question!" to the end of a prompt could improve a lot of models from 6-12 months ago (even if it's not actually a trick question!). This could often be the difference between solving and not solving many of the puzzles people post on Reddit. |
Indeed, system prompts would be a way of doing this, although I don't believe they can guarantee that you'll ALWAYS get the model to generate at least a minimum amount of tokens, so maybe a more technical solution would be good here. On the other hand, AFAIK for Edit: Indeed, I played a bit with this and R1 doesn't seem to respect the system prompt when it comes to what it outputs in |
I experimented more with R1 distill Qwen 7B and found it is not very reliable just letting it freestyle think with greedy sampling. On some problems it goes into infinite reflection thrashing mode (never stops) and other problems it will stop but doesn't generate good context. I think the quality of the think context is far more important than how long it thinks. So I tried bootstrapping the start of generation with a forced prompt and got quite good results. Using this I have never seen the model go into infinite reflection mode with greedy sampling and it can also tackle some fairly trick problems correctly even using Qwen 7B distill. The think bootstrap I use is : "<think>\nLets use the reasoning principle which states that the simplest solution to a problem is often the correct one. First lets create a small amount of clear and accurate reasoning context to help solve the problem correctly. Here," Note that <think> is a special token and must be tokenized as such. In upstream it might be possible to implement this bootstrap by just modifying assistant prompt template for the model. In my downstream server patch I have a forced gen option. I give an example harmonic series problem to illustrate use. With freestyle think the model will just thrash infinitely but with a bootstrap it converges to correct solution: cars.txt: bash-5.1$ FORCE=1 lm @cars.txt "<think>\nLets use the reasoning principle which states that the simplest solution to a problem is often the correct one. First lets create a small amount of clear and accurate reasoning context to help solve the problem correctly. Here," To begin, each car can travel 300 miles on a full tank. With 10 cars, we can initially consider that the total distance might be extended by using multiple cars to support each other during the journey. This is similar to scenarios where multiple vehicles are used to extend the range of a single vehicle, often referred to as the "jeep problem" or "car carrier problem." In such scenarios, the idea is to transfer fuel from one car to another to extend the range. For example, if we have multiple cars, they can drive together, and at certain points, some cars can transfer their remaining fuel to a central car, allowing the central car to continue further into the journey. This process can be repeated, gradually extending the maximum distance. Let's break it down step by step:
To determine the maximum distance you can drive away from the starting point using 10 cars, each with a full tank of gas (range of 300 miles), we can use a strategy similar to the "jeep problem" or "car carrier problem." The key idea is to use multiple cars to support each other during the journey, transferring fuel from some cars to others to extend the range. Step-by-Step Solution:
Conclusion:By strategically transferring fuel from one car to another, you can extend the range beyond the capability of a single car. The maximum distance you can drive away from the starting point is approximately 878.69 miles. [ The model gave a correct answer but some of the thinking process is bogus. The harmonic series solution is based on traveling a distance dx where one car can exactly use all its remaining gas to fill up the remainder of the cars to the top of their capacity, then leaving that car stranded with 0 gas and the remaining cars continue with full tanks, then repeating this recursively until only one car is left. Thus I would say its hallucinating in the thinking phase but correctly applies harmonic series in solution based on its latent space. So it got the right answer without really showing a true understanding of the solution. Clearly not actually intelligent but relying on learned solution patterns to other problems. Still amazing that it can handle a fairly tricky problem like this at all with 7B params and a fairly simple bootstrap prompt to help kick it into a good reply. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Bump |
https://github.com/and270/thinking_effort_processor might be a good approach, the main improvement would be to make it eventually API-controllable in llama-server rather than requiring a restart every time if applying via the current logit bias arg. |
Interesting: I actually tried using the logit bias option to try to make it think longer and it didn't seem to work or it just wrote I wonder if there was a bug in |
Triggered prompt injection does seem to work to guide the model through different solutions. Explanation of below : TRIG is a prompt injection trigger similar to STOP function which seeks for a phrase then halts, except it seeks for a phrase and when found injects a specified prompt after the trigger sequence with option to replace the last token in the trigger sequence. Since </think> is a single special token it can be overwritten with anything. TRIG (default null), single or array of trig records:
Sure! Let's solve the problem of summing 1 and 2 in a modulo 3 basis step by step. Step 1: Understand Modulo OperationThe modulo operation finds the remainder after division of one number by another. In this case, we're working modulo 3, which means we're interested in the remainder when numbers are divided by 3. Step 2: Add the NumbersFirst, add the two numbers: Step 3: Apply Modulo OperationNow, find the remainder when the sum is divided by 3: Final Answer[ |
Prerequisites
Feature Description
Similar to how the o1 family of models accepts a
reasoning_effort
parameter (low, medium, high), it's concievable that OSS reasoning models should have this too.Deepseek mentions that this param will soon be available via their API: https://api-docs.deepseek.com/guides/reasoning_model#api-parameters
Motivation
This would allow us to run reasoning models continuously for long periods of time, hopefully, in exchange for better results
Possible Implementation
I have no idea on the best implementation, though I've seen a few early options in the wild:
The text was updated successfully, but these errors were encountered: