Skip to content

Adding custom instruction to the elf tool-chain #837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
subhajit26 opened this issue Feb 3, 2021 · 24 comments
Closed

Adding custom instruction to the elf tool-chain #837

subhajit26 opened this issue Feb 3, 2021 · 24 comments
Labels

Comments

@subhajit26
Copy link

Hi,
I am trying to get familiar with the whole flow of adding custom instruction to the elf tool-chain and I am planning to implement a mod instruction since it is easily accessible over the internet. Since tool-chain building is very time consuming process I need to confirm certain things before I go ahead with custom instruction flow. First, there are 2 riscv-opc.c and riscv-opc.h files in riscv-gdb and riscv-binutils folder. Do I need to update both of them in order to get the custom instruction implemented successfully.
Second, do I need to run this command "./configure --prefix=/opt/riscv
make"
again after adding the custom instruction because while building for the first time I could see that as soon as make is executed the tool-chain started cloning and building. So my doubt is that if I start building again will the opc files contain those updated information . Please suggest.
Please also suggest do I need to make any modification in the Makefile regarding the addition of custom instruction.

@Nelson1225
Copy link

Nelson1225 commented Feb 3, 2021

You can check this issue at first, this is helpful,
#782

Since tool-chain building is very time consuming process I need to confirm certain things before I go ahead with custom instruction flow.

If you want to add the instructions in gcc, then it's more complicate. If you just want to add the instructions in binutils, then only build the binutils is enough, you don't need to build the entire toolchain.

First, there are 2 riscv-opc.c and riscv-opc.h files in riscv-gdb and riscv-binutils folder. Do I need to update both of them in order to get the custom instruction implemented successfully.

If you want to disassemble the instructions by gdb, then yes, you should update both of them. If you need new operands or relocation for your custom instructions, then you need to update more files. Otherwise, just update the files you mentioned should be enough.

Second, do I need to run this command "./configure --prefix=/opt/riscv
make"
again after adding the custom instruction because while building for the first time I could see that as soon as make is executed the tool-chain started cloning and building. So my doubt is that if I start building again will the opc files contain those updated information.

Generally, no need to configure again. But it is better to run "make clean" in your build folder before running "make". As the previous said, if you don't need to add the custom instructions in gcc, then you can just rebuild the binutils/gdb.

cd /build-[binutils|gdb]-[newlib|linux]/
make && make install

@subhajit26
Copy link
Author

Hi Nelson ,
Thank you very much for this useful information. I am gonna try to implement all the approaches that you have mentioned.

@subhajit26
Copy link
Author

Hi,
In order to add the mod instruction in gcc I added this 2 line #define MATCH_MOD 0x200006b #define MASK_MOD 0xfe00707f and DECLARE_INSN(mod, MATCH_MOD, MASK_MOD) to the riscv-opc.h file in both binutils and gdb directory. for riscv-opc.c files in both the directories I added {"mod", "I", "d,s,t", MATCH_MOD, MASK_MOD, match_opcode, 0 } somewhere in between in "const struct riscv_opcode riscv_opcodes[] = ".
Then I did make clean and then Make.
After the build is done I tried to compile a file but it shows me the error below:
internal error: can't hash `sll': exists
Assembler messages:
Fatal error: Broken assembler. No assembly attempted.
Please suggest how to fix this issue because I am not being able to understand that where am I going wrong.

@subhajit26
Copy link
Author

Hi,
I fixed the error and now it works . I got the mod instruction in the dump file. I believe the error was because of putting the mod instruction between the subsets in the riscv-opc.c file although I am not sure but somehow I figured it out from the word sll in the error.

@subhajit26
Copy link
Author

subhajit26 commented Feb 4, 2021

Hi,
I am trying to add the functionality of the custom instruction mod in the below code. But the problem the issue is that in the dump file mod is coming only in the place of inline assembly instruction but not for the "c = a % b" which makes me realize that the instruction has been added bit the functionality is not implemented rather for "c = a % b" rem instruction is coming which is already added beforehand. Can you please tell me where am I going wrong or it will be helpful if you can point me to the file where the functionality needs to be implemented.

 `#include <stdio.h>
int main(){
  int a,b,c;
  a = 5;
  b = 2;
  asm volatile
  (
    "mod   %[z], %[x], %[y]\n\t"
    : [z] "=r" (c)
    : [x] "r" (a), [y] "r" (b)
  );
  c = a % b
  printf("The modulus is:%d", c);  
 /*
  if ( c != 1 ){
     printf("\n[[FAILED]]\n");
     return -1;
  }
  
  printf("\n[[PASSED]]\n");
*/
  return 0;
}`

snippets of dump file:
asm volatile
   10122:	fec42783          	lw	a5,-20(s0)
   10126:	fe842703          	lw	a4,-24(s0)
   1012a:	02e787eb          	mod	a5,a5,a4
   1012e:	fef42223          	sw	a5,-28(s0)
    "mod   %[z], %[x], %[y]\n\t"
    : [z] "=r" (c)
    : [x] "r" (a), [y] "r" (b)
  );
  
  c = a%b;
   10132:	fec42703          	lw	a4,-20(s0)
   10136:	fe842783          	lw	a5,-24(s0)
   1013a:	02f767b3          	rem	a5,a4,a5
   1013e:	fef42223          	sw	a5,-28(s0)
  printf("The modulus is:%d", c);  
   10142:	fe442583          	lw	a1,-28(s0)
   10146:	67f9                	lui	a5,0x1e
   10148:	3d078513          	addi	a0,a5,976 # 1e3d0 <__clzsi2+0x3c>
   1014c:	229d                	jal	102b2 <printf>
     return -1;
  }

@jim-wilson
Copy link
Collaborator

If you want the compiler to emit new instructions, you have to modify the compiler. You have probably only modified binutils so far.

The compiler work is much more involved than the binutils work, and depends on the complexity of the instruction. For simple instructions, you can just add a pattern to the machine description file. For complex instructions you may need to write a new optimization pass or modify an existing one. For a mod instruction, this can be done with a md file pattern. See the gcc/config/riscv/riscv.md file and search for "DIVISION and REMAINDER". The patterns here use macros "any_div" and "insn" that are defined above. If mod is same as rem, then you can modify the define_code_attr insn and substitute mod for rem. Otherwise you might need to write a new pattern.

You can find docs for this file in gcc/docs/md.texi or read the online internals docs
https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html#Machine-Desc

@subhajit26
Copy link
Author

Hi Jim ,
I got your point as I have already gone through the files you have mentioned. In the .md file I could see that in define_code_attr insn section mod functionality is implemented but supposedly with the naming of rem:

(define_code_attr insn [(ashift "sll")
			(ashiftrt "sra")
			(lshiftrt "srl")
			(div "div")
			(mod "rem")
			(udiv "divu")
			(umod "remu")
			(ior "or")
			(xor "xor")
			(and `"and")`
			(plus "add")
			(minus "sub")])

My purpose was to implement custom instruction to reduce the cycle but replacing the mod with rem will not solve my purpose it seems since it will only change the name. In the riscv-opc.h files in both the binutils and gdb folder I also found that there are lot of instruction declaration named custom. Are they already implemented with some functionality because I can see the match and mask bit for them or in other words can I use them for custom instruction purpose if they are unused, because in that way I do not need to define a pattern myself rather I can use the functionality if it solves my purpose.
Please suggest.

@jim-wilson
Copy link
Collaborator

If you want a custom instruction you have to do a lot of work. There is no easy solution here.

I don't know what the "custom" stuff is for. I think it is some obsolete broken feature. Perhaps a feature that was replaced by the .insn feature. .insn is documented in the gas manual, and allows you to generate custom instruction encodings without modifying the gas sources.

@subhajit26
Copy link
Author

Hi Jim,
Thanks a lot for guiding in a right way regarding the actual implementation of custom instruction. Regarding the custom instruction till now I have seen only video from British Computer Society Open Source Specialists named "Adding instruction to GNU assembler" which somewhat resembles what you have mentioned about the gas manual. Along with riscv-opc.h an driscv-opc.c, do I have to make certain modification and additions in riscv-opc.dis , tc-riscv.h, tc-riscv.c. I believe the last 2 files mentioned in the riscv-gdb/gas/config path where I can see there are encode type is mentioned and some pattern is defined.
Since, I am absolutely new to this flow I am just worried that I do not break the tool-chain in any way but then I also want to learn the flow of adding custom instruction and make sure that the functionality implemented is bringing certain changes to the code.
If there is any resources or any method that you can refer to apart from the document that you have already mentioned above, it will really help me to get the correct flow of adding custom instruction.

@jim-wilson
Copy link
Collaborator

I filed
https://sourceware.org/bugzilla/show_bug.cgi?id=27348
to get the "custom" stuff fixed. It is obsolete as I expected.

Binutils has testsuites that you can use to check for errors. Try "make check". It is a good idea to add new testcases when adding new features. You can also test binutils by building the compiler and running the compiler testsuite.

How many files you need to modify depends on the complexity of the instruction you are adding. In the simplest case, you only need to modify riscv-opc.h and riscv-opc.c. Basically, write a testcase, and when you can assemble and disassemble it then you are done. You continue making changes until you reach this point.

@Nelson1225
Copy link

I filed
https://sourceware.org/bugzilla/show_bug.cgi?id=27348
to get the "custom" stuff fixed. It is obsolete as I expected.

Fixed and committed to mainline, thanks.

Nelson

@subhajit26
Copy link
Author

Hi Jim,
I want to implement the absolute instruction for integer number but in the riscv.md file I can see this part for the floating type.

(define_insn "abs<mode>2"
  [(set (match_operand:ANYF           0 "register_operand" "=f")
	(abs:ANYF (match_operand:ANYF 1 "register_operand" " f")))]
  "TARGET_HARD_FLOAT"
  "fabs.<fmt>\t%0,%1"
  [(set_attr "type" "fmove")
   (set_attr "mode" "<UNITMODE>")])

I suppose that the functionality of the absolute has already been implemented. Can you please help me out with in which section of the file should I look to confirm the absolute functionality since I want to implement the functionality for integer.

(define_insn "abs<mode>2"
  [(set (match_operand:ANYI           0 "register_operand" "=r")
	(abs:ANYI (match_operand:ANYI 1 "register_operand" " r")))]

I was trying to define in this way for the integer part after going through the "parameterized name" from the docs that you have mentioned previously but then I am confused about how to write the rest of the portion of the define_insn.
Since I am absolutely new to this compiler and tool-chain stuffs and dealing with machine description files for the first time I am getting bit confused on how to proceed with the instructions that are not defined or naming patterns that are not hard coded in compiler.
I have one more request: Can you please give me the flow for any custom instruction as an example that needs to build from scratch because I am really lost between the exact number of files that needs to be modified. In the mean time I am going through the machine description document that you have referred and trying to understand the .md file.

@subhajit26
Copy link
Author

Hi Jim,
I have tried to write the template for 32 bit hardware supported integer mode as per my understanding of the riscv.md file. Please correct me if I am wrong.

(define_insn "abs<mode>2"
  [(set (match_operand:SI          0 "register_operand" "=r")
	(abs:SI (match_operand:SI 1 "register_operand" " r")))]
  "TARGET_32_BIT"
  "abs.<fmt>\t%0,%1"
  [(set_attr "type" "move")
   (set_attr "mode" "<UNITMODE>")])

I am confused about this line: "abs.\t%0,%1" and set_attr "mode" " . What should be in place of UNITMODE.

@LevyHsu
Copy link

LevyHsu commented Feb 8, 2021

mode definition is also in the riscv.md

;; Main data type used by the insn (define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,SF,DF,TF" (const_string "unknown"))

;; This attribute gives the upper-case mode name for one unit of a ;; floating-point mode. (define_mode_attr UNITMODE [(SF "SF") (DF "DF")])

Since both of you op0 and op1 are in SImode, and your target is 32bit, I believe it would be ok to use SI

Some extra info from gccint you probably need:
https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html

@LevyHsu
Copy link

LevyHsu commented Feb 8, 2021

if like you said, you just want an abssi2, then adjust (set_attr "mode" "SI")]), also remove the <fmt>, then it should be fine.

Further more, if you want something like abssi2, absdi2 for SI and DImode for (define_insn "abs<mode>2"
Then it doesn't seem to be a great idea to restrict match_operand:SI 0 and match_operand:SI to SImode only. otherwise absdi with DI operand won't go through this pattern. You may need to have a look at define_mode_iteratorfor complex mode, if none of them fits, write your own mode iterator.
Also, the "TARGET_32_BIT" can be removed if it needs work for DImode under 64bit. just place it with "", then (set_attr "mode" "<MODE>")])

@subhajit26
Copy link
Author

Hi Levy,
Does this look Ok?

(define_insn "abssi2"
  [(set (match_operand:SI          0 "register_operand" "=r")
	(abs:SI (match_operand:SI 1 "register_operand" " r")))]
  "abs\t%0,%1"
  [(set_attr "type" "move")
   (set_attr "mode" "SI")])

Regarding restriction of the of the match operand that you have mentioned , I want the custom instruction only for 32 bit data. That is the reason I have replaced ANYI with SI in particular as was mentioned in the define_mode_iterator .

Now according to my understanding from the overview section of the Machine-Desc document the template given above is inserted into instruction list. So is that all if I want get an absolute functionality implemented in the toolchain or do I need to make any more modifications in this file or any other file like if u can figure out my doubt from the chain mail above.
This is my disassembly:

0001010e <main>:
#include <stdio.h>
#include <stdlib.h>
int main()
{
   1010e:	1101                	addi	sp,sp,-32
   10110:	ce06                	sw	ra,28(sp)
   10112:	cc22                	sw	s0,24(sp)
   10114:	1000                	addi	s0,sp,32
   int m = 200;     // m is assigned to 200
   10116:	0c800793          	li	a5,200
   1011a:	fef42623          	sw	a5,-20(s0)
   int n = -400;    // n is assigned to -400
   1011e:	e7000793          	li	a5,-400
   10122:	fef42423          	sw	a5,-24(s0)
   
   int c = abs(m);
   10126:	fec42783          	lw	a5,-20(s0)
   1012a:	87fd                	srai	a5,a5,0x1f
   1012c:	fec42703          	lw	a4,-20(s0)
   10130:	8f3d                	xor	a4,a4,a5
   10132:	40f707b3          	sub	a5,a4,a5
   10136:	fef42223          	sw	a5,-28(s0)
   int d =abs(n);
   1013a:	fe842783          	lw	a5,-24(s0)
   1013e:	87fd                	srai	a5,a5,0x1f
   10140:	fe842703          	lw	a4,-24(s0)
   10144:	8f3d                	xor	a4,a4,a5
   10146:	40f707b3          	sub	a5,a4,a5
   1014a:	fef42023          	sw	a5,-32(s0)
 

As you see abs function is taking instructions srai, xor and sub. But all I want is that when the compiler sees the abs function in C code it should show abs instead of all the three instructions. So how do I implement the functionality for the compiler to recognize it. Any suggestion will be of great help.

@LevyHsu
Copy link

LevyHsu commented Feb 9, 2021

The md file you wrote should work now, but like mentioned by Jim and Nelson, "If you want a custom instruction you have to do a lot of work. There is no easy solution here."

Because C code and assembly code are not that 1 on 1 match (or say, equal), therefore we need a compiler to do a series of "translation".

First, you need to know where a certain function is implemented, for your case, abs() is defined in /glibc(or newlib)/stdlib/abs.c:
`
int
abs (int i)
{
return i < 0 ? -i : i;
}
Then how does this c code translated to AST, GIMPLE, the match the MD template to produce RTL code? maybe dump some info:
https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html
If still, you find the dump info is not detailed enough, modify gcc to produce the info you need.

Narrowing down this a bit, how do other patterns in the .md file get matched (have a look at optabs.def), then trace it backward.

in your case I believe the most feasible way is to call the inline assembly:
https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html#Using-Assembly-Language-with-C
Then modify other necessary parts.

And my other suggestion includes using make -jnproc`` to accelerate the compiling process if you really want the hard way. It should take about 5~15 mins to build a new tool chain if you have 10+ cores with enough RAM and a nvme SSD.

Jim says a lot of work and no easy, he means it. :)

@AnushaGoel14
Copy link

Hi, I tried Adding the custom instruction to spike ISA simulator, and have done below given steps:

In the riscv-isa-sim/riscv/encoding.h add the following lines:
#define MATCH_MOD 0x200006b
#define MASK_MOD 0xfe00707f
...
DECLARE_INSN(mod, MATCH_MOD, MASK_MOD)

Create a file riscv-isa-sim/riscv/insns/mod.h and add these lines:
WRITE_RD(sext_xlen(RS1 % RS2));

Add this file to riscv-isa-sim/riscv/riscv.mk.in
riscv_insn_list =
...
mod
...

In riscv-isa-sim/spike_main/disasm.cc add the following lines:
DEFINE_RTYPE(mod);

And then I again build it, but its showing the following error:
Building project riscv-isa-sim
../spike_main/disasm.cc: In constructor ‘disassembler_t::disassembler_t(int)’:
../spike_main/disasm.cc:295:38: error: ‘match_mod’ was not declared in this scope
add_insn(new disasm_insn_t(name, match_##code, mask_##code | (extra), VA_ARGS));
^
../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’
#define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})
^~~~~~~~~~~
../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’
DEFINE_RTYPE(mod);
^~~~~~~~~~~~
../spike_main/disasm.cc:295:38: note: suggested alternative: ‘match_fsd’
add_insn(new disasm_insn_t(name, match_##code, mask_##code | (extra), VA_ARGS));
^
../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’
#define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})
^~~~~~~~~~~
../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’
DEFINE_RTYPE(mod);
^~~~~~~~~~~~
../spike_main/disasm.cc:295:52: error: ‘mask_mod’ was not declared in this scope
add_insn(new disasm_insn_t(name, match_##code, mask_##code | (extra), VA_ARGS));
^
../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’
#define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})
^~~~~~~~~~~
../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’
DEFINE_RTYPE(mod);
^~~~~~~~~~~~
../spike_main/disasm.cc:295:52: note: suggested alternative: ‘mask_fsd’
add_insn(new disasm_insn_t(name, match_##code, mask_##code | (extra), VA_ARGS));
^
../spike_main/disasm.cc:298:30: note: in expansion of macro ‘DISASM_INSN’
#define DEFINE_RTYPE(code) DISASM_INSN(#code, code, 0, {&xrd, &xrs1, &xrs2})
^~~~~~~~~~~
../spike_main/disasm.cc:425:3: note: in expansion of macro ‘DEFINE_RTYPE’
DEFINE_RTYPE(mod);
^~~~~~~~~~~~
make: *** [disasm.o] Error 1

Please let me know if am I missing any steps that I need to do?Or anything else that can be done to add new instructions in riscv

@jim-wilson
Copy link
Collaborator

jim-wilson commented Mar 31, 2021 via email

@chenc6
Copy link

chenc6 commented May 13, 2021

Hi @subhajit26
Did you see the Fatal error: duplicate add (maybe other instructions) when you put your new instructions in the subset?

@subhajit26
Copy link
Author

subhajit26 commented May 13, 2021 via email

@iams07
Copy link

iams07 commented Jun 23, 2021

Hi @subhajit26
Did you see the Fatal error: duplicate add (maybe other instructions) when you put your new instructions in the subset?

I believe that error is because your opcode is matching with the opcode of the add instruction, so there are basically two instructions which have the same opcode now. Just change the funct7 opcode and you should be fine

@leduchuybk
Copy link

Hi @subhajit26,
I believe that you solved this problem. Can you share steps to add a new instruction to compiler?

@TommyMurphyTM1234
Copy link
Collaborator

This is a list of all issues logged that I can find that relate to adding new instructions to the toolchain:

At this stage in the evolution of RISC-V,the majority of instruction addition will be done under the auspices of the relevant groups working on standardising, prototyping, and implementing/enhancing RISC-V extensions and their corresponding instructions. As such, there should be less of a demand for others to do so.

However, where somebody still wants to implement a non stadard custom extension/instruction here are some useful resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants