技术标签: CSAPP
准备
官网下好解压。
载入tar文件,运用 tar xvf archlab-handout.tar
将文件解压。里面包含README, Makefile, sim.tar, archlab.ps, archlab.pdf, and simguide.pdf.
于是你可能有以下问题
如果出现can not locate 就是镜像源不行。可以去网上搜个阿里云的。然后再把/etc/apt/sources.list把里面的网址都换了。
换后注意sudo update
/usr/bin/ld: cannot find -lfl
sudo apt-get install flex
/usr/bin/ld: cannot find -ltk
/usr/bin/ld: cannot find -ltcl
sudo apt-get install tk8.5
sudo apt-get install tcl8.5
同时把自己的实验文件Makefile修改了。
修改格式如下:
# Comment this out if you don't have Tcl/Tk on your system
#GUIMODE=-DHAS_GUI
# Modify the following line so that gcc can find the libtcl.so and
# libtk.so libraries on your system. You may need to use the -L option
# to tell gcc which directory to look in. Comment this out if you
# don't have Tcl/Tk.
TKLIBS=-L/usr/lib -ltk8.5 -ltcl8.5 /*改成这样*/
# Modify the following line so that gcc can find the tcl.h and tk.h
# header files on your system. Comment this out if you don't have
# Tcl/Tk.
TKINC=-isystem /usr/include/tcl8.5
最后重新make clean ;make就可以了
若之后出现同样问题照做。后面有个实验需要把Makefile里面的含GUI的一行给删除掉
手写Y86汇编。要实现的函数在example.c中。本想着偷懒直接反汇编把得到的反汇编文件改成Y86。发现反汇编出来的代码更麻烦。所以还是手写吧。
对着书上第四章的一个大例子模仿出来
自己新建一个文件 vim sum_list.ys
三者的结果均在%rax中,若没有%rax的变化即代码存在bug。%rax均是cba
相关编译运行代码如下
unix > ./yas A-sum.ys
unix > ./yis A-sum.yo
# sum_list.ys example.c
#Excution begins at address 0
.pos 0
irmovq stack, %rsp
call main
halt
# Sample linked list
.align 8
ele1:
.quad 0x00a
.quad ele2
ele2:
.quad 0x0b0
.quad ele3
ele3:
.quad 0xc00
.quad 0
main:
irmovq ele1,%rdi
call sum_list
ret
sum_list:
xorq %rax,%rax #rax=0
jmp test
loop:
mrmovq (%rdi),%r10
addq %r10,%rax
mrmovq 8(%rdi),%rdi
test:
andq %rdi,%rdi
jne loop
ret
#Stack starts here and grows to lower addresses
.pos 0x100
stack:
这里直接写递归,保存寄存器到栈里去然后递归
# sum_list.ys example.c
#Excution begins at address 0
.pos 0
irmovq stack, %rsp
call main
halt
# Sample linked list
.align 8
ele1:
.quad 0x00a
.quad ele2
ele2:
.quad 0x0b0
.quad ele3
ele3:
.quad 0xc00
.quad 0
main:
irmovq ele1,%rdi
call sum_list
ret
sum_list:
xorq %rax,%rax #rax=0
andq %rdi,%rdi
je return
mrmovq (%rdi),%r10 #long val =ls-val
pushq %r10
mrmovq 8(%rdi),%rdi
call sum_list
popq %rbx
addq %rbx,%rax
ret
return:
ret
#Stack starts here and grows to lower addresses
.pos 0x1000
stack:
#Excution begins at address 0
.pos 0
irmovq stack, %rsp
call main
halt
.align 8
#Source block
src:
.quad 0x00a
.quad 0x0b0
.quad 0xc00
# Destination block
dest:
.quad 0x111
.quad 0x222
.quad 0x333
main:
xorq %rax,%rax #long result=0
irmovq src,%rdi
irmovq dest,%rsi
irmovq $1,%r9
irmovq $3,%r8
irmovq $8,%r11
andq %r8,%r8
jmp test
loop:
mrmovq (%rdi),%rcx
addq %r11,%rdi
rmmovq %rcx,(%rsi)
addq %r11,%rsi
xorq %rcx,%rax
subq %r9,%r8
test:
jne loop
ret
#Stack starts here and grows to lower addresses
.pos 0x100
stack:
根据第四章流水线的讲解,结合opq和irmovq的表格来写。
得出的iaddq格式如下
阶段 iaddq V,rB
取指 icode:ifun <-- M1[PC]
rA:rB <-- M1[PC+1]
valC <-- M8[PC+2]
valP <-- PC+10
译码 valB <-- R[rB]
执行 valE <-- valB+valC
set CC
访存 None
写回 R[rB] <-- valE
更新 PC <-- valP
我们在sim/seq/seq-full.hcl里添加"IIADDQ",这里就要结合书上的知识判每个顺序过程
#/* $begin seq-all-hcl */
####################################################################
# HCL Description of Control for Single Cycle Y86-64 Processor SEQ #
# Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2010 #
####################################################################
## Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work
####################################################################
# C Include's. Don't alter these #
####################################################################
quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'word_t gen_pc(){return 0;}'
quote 'int main(int argc, char *argv[])'
quote ' {plusmode=0;return sim_main(argc,argv);}'
####################################################################
# Declarations. Do not change/remove/delete any of these #
####################################################################
##### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 'I_NOP'
wordsig IHALT 'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ 'I_ALU'
wordsig IJXX 'I_JMP'
wordsig ICALL 'I_CALL'
wordsig IRET 'I_RET'
wordsig IPUSHQ 'I_PUSHQ'
wordsig IPOPQ 'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ 'I_IADDQ'
##### Symbolic represenations of Y86-64 function codes #####
wordsig FNONE 'F_NONE' # Default function code
##### Symbolic representation of Y86-64 Registers referenced explicitly #####
wordsig RRSP 'REG_RSP' # Stack Pointer
wordsig RNONE 'REG_NONE' # Special value indicating "no register"
##### ALU Functions referenced explicitly #####
wordsig ALUADD 'A_ADD' # ALU should add its arguments
##### Possible instruction status values #####
wordsig SAOK 'STAT_AOK' # Normal execution
wordsig SADR 'STAT_ADR' # Invalid memory address
wordsig SINS 'STAT_INS' # Invalid instruction
wordsig SHLT 'STAT_HLT' # Halt instruction encountered
##### Signals that can be referenced by control logic ####################
##### Fetch stage inputs #####
wordsig pc 'pc' # Program counter
##### Fetch stage computations #####
wordsig imem_icode 'imem_icode' # icode field from instruction memory
wordsig imem_ifun 'imem_ifun' # ifun field from instruction memory
wordsig icode 'icode' # Instruction control code
wordsig ifun 'ifun' # Instruction function
wordsig rA 'ra' # rA field from instruction
wordsig rB 'rb' # rB field from instruction
wordsig valC 'valc' # Constant from instruction
wordsig valP 'valp' # Address of following instruction
boolsig imem_error 'imem_error' # Error signal from instruction memory
boolsig instr_valid 'instr_valid' # Is fetched instruction valid?
##### Decode stage computations #####
wordsig valA 'vala' # Value from register A port
wordsig valB 'valb' # Value from register B port
##### Execute stage computations #####
wordsig valE 'vale' # Value computed by ALU
boolsig Cnd 'cond' # Branch test
##### Memory stage computations #####
wordsig valM 'valm' # Value read from memory
boolsig dmem_error 'dmem_error' # Error signal from data memory
####################################################################
# Control Signal Definitions. #
####################################################################
################ Fetch Stage ###################################
# Determine instruction code
word icode = [
imem_error: INOP;
1: imem_icode; # Default: get from instruction memory
];
# Determine instruction function
word ifun = [
imem_error: FNONE;
1: imem_ifun; # Default: get from instruction memory
];
bool instr_valid = icode in
{ INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ ,IIADDQ };
# Does fetched instruction require a regid byte?
bool need_regids =
icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ,
IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };
# Does fetched instruction require a constant word?
bool need_valC =
icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };
################ Decode Stage ###################################
## What register should be used as the A source?
word srcA = [
icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : rA;
icode in { IPOPQ, IRET } : RRSP;
1 : RNONE; # Don't need register
];
## What register should be used as the B source?
word srcB = [
icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ } : rB;
icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
1 : RNONE; # Don't need register
];
## What register should be used as the E destination?
word dstE = [
icode in { IRRMOVQ } && Cnd : rB;
icode in { IIRMOVQ, IOPQ,IIADDQ } : rB;
icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
1 : RNONE; # Don't write any register
];
## What register should be used as the M destination?
word dstM = [
icode in { IMRMOVQ, IPOPQ } : rA;
1 : RNONE; # Don't write any register
];
################ Execute Stage ###################################
## Select input A to ALU
word aluA = [
icode in { IRRMOVQ, IOPQ } : valA;
icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : valC;
icode in { ICALL, IPUSHQ } : -8;
icode in { IRET, IPOPQ } : 8;
# Other instructions don't need ALU
];
## Select input B to ALU
word aluB = [
icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL,
IPUSHQ, IRET, IPOPQ,IIADDQ } : valB;
icode in { IRRMOVQ, IIRMOVQ } : 0;
# Other instructions don't need ALU
];
## Set the ALU function
word alufun = [
icode == IOPQ : ifun;
1 : ALUADD;
];
## Should the condition codes be updated?
bool set_cc = icode in { IOPQ,IIADDQ };
################ Memory Stage ###################################
## Set read control signal
bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET };
## Set write control signal
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };
## Select memory address
word mem_addr = [
icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;
icode in { IPOPQ, IRET } : valA;
# Other instructions don't need address
];
## Select memory input data
word mem_data = [
# Value from register
icode in { IRMMOVQ, IPUSHQ } : valA;
# Return PC
icode == ICALL : valP;
# Default: Don't write anything
];
## Determine instruction status
word Stat = [
imem_error || dmem_error : SADR;
!instr_valid: SINS;
icode == IHALT : SHLT;
1 : SAOK;
];
################ Program Counter Update ############################
## What address should instruction be fetched at
word new_pc = [
# Call. Use instruction constant
icode == ICALL : valC;
# Taken branch. Use instruction constant
icode == IJXX && Cnd : valC;
# Completion of RET instruction. Use value from stack
icode == IRET : valM;
# Default: Use incremented PC
1 : valP;
];
#/* $end seq-all-hcl */
最后这个lab,做的有点无语。首先把上面的iaddq指令放到这次的hcl里面。修改pipe-full.hcl
#/* $begin pipe-all-hcl */
####################################################################
# HCL Description of Control for Pipelined Y86-64 Processor #
# Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2014 #
####################################################################
## Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work
####################################################################
# C Include's. Don't alter these #
####################################################################
quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "pipeline.h"'
quote '#include "stages.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'
####################################################################
# Declarations. Do not change/remove/delete any of these #
####################################################################
##### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 'I_NOP'
wordsig IHALT 'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ 'I_ALU'
wordsig IJXX 'I_JMP'
wordsig ICALL 'I_CALL'
wordsig IRET 'I_RET'
wordsig IPUSHQ 'I_PUSHQ'
wordsig IPOPQ 'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ 'I_IADDQ'
##### Symbolic represenations of Y86-64 function codes #####
wordsig FNONE 'F_NONE' # Default function code
##### Symbolic representation of Y86-64 Registers referenced #####
wordsig RRSP 'REG_RSP' # Stack Pointer
wordsig RNONE 'REG_NONE' # Special value indicating "no register"
##### ALU Functions referenced explicitly ##########################
wordsig ALUADD 'A_ADD' # ALU should add its arguments
##### Possible instruction status values #####
wordsig SBUB 'STAT_BUB' # Bubble in stage
wordsig SAOK 'STAT_AOK' # Normal execution
wordsig SADR 'STAT_ADR' # Invalid memory address
wordsig SINS 'STAT_INS' # Invalid instruction
wordsig SHLT 'STAT_HLT' # Halt instruction encountered
##### Signals that can be referenced by control logic ##############
##### Pipeline Register F ##########################################
wordsig F_predPC 'pc_curr->pc' # Predicted value of PC
##### Intermediate Values in Fetch Stage ###########################
wordsig imem_icode 'imem_icode' # icode field from instruction memory
wordsig imem_ifun 'imem_ifun' # ifun field from instruction memory
wordsig f_icode 'if_id_next->icode' # (Possibly modified) instruction code
wordsig f_ifun 'if_id_next->ifun' # Fetched instruction function
wordsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
wordsig f_valP 'if_id_next->valp' # Address of following instruction
boolsig imem_error 'imem_error' # Error signal from instruction memory
boolsig instr_valid 'instr_valid' # Is fetched instruction valid?
##### Pipeline Register D ##########################################
wordsig D_icode 'if_id_curr->icode' # Instruction code
wordsig D_rA 'if_id_curr->ra' # rA field from instruction
wordsig D_rB 'if_id_curr->rb' # rB field from instruction
wordsig D_valP 'if_id_curr->valp' # Incremented PC
##### Intermediate Values in Decode Stage #########################
wordsig d_srcA 'id_ex_next->srca' # srcA from decoded instruction
wordsig d_srcB 'id_ex_next->srcb' # srcB from decoded instruction
wordsig d_rvalA 'd_regvala' # valA read from register file
wordsig d_rvalB 'd_regvalb' # valB read from register file
##### Pipeline Register E ##########################################
wordsig E_icode 'id_ex_curr->icode' # Instruction code
wordsig E_ifun 'id_ex_curr->ifun' # Instruction function
wordsig E_valC 'id_ex_curr->valc' # Constant data
wordsig E_srcA 'id_ex_curr->srca' # Source A register ID
wordsig E_valA 'id_ex_curr->vala' # Source A value
wordsig E_srcB 'id_ex_curr->srcb' # Source B register ID
wordsig E_valB 'id_ex_curr->valb' # Source B value
wordsig E_dstE 'id_ex_curr->deste' # Destination E register ID
wordsig E_dstM 'id_ex_curr->destm' # Destination M register ID
##### Intermediate Values in Execute Stage #########################
wordsig e_valE 'ex_mem_next->vale' # valE generated by ALU
boolsig e_Cnd 'ex_mem_next->takebranch' # Does condition hold?
wordsig e_dstE 'ex_mem_next->deste' # dstE (possibly modified to be RNONE)
##### Pipeline Register M #########################
wordsig M_stat 'ex_mem_curr->status' # Instruction status
wordsig M_icode 'ex_mem_curr->icode' # Instruction code
wordsig M_ifun 'ex_mem_curr->ifun' # Instruction function
wordsig M_valA 'ex_mem_curr->vala' # Source A value
wordsig M_dstE 'ex_mem_curr->deste' # Destination E register ID
wordsig M_valE 'ex_mem_curr->vale' # ALU E value
wordsig M_dstM 'ex_mem_curr->destm' # Destination M register ID
boolsig M_Cnd 'ex_mem_curr->takebranch' # Condition flag
boolsig dmem_error 'dmem_error' # Error signal from instruction memory
##### Intermediate Values in Memory Stage ##########################
wordsig m_valM 'mem_wb_next->valm' # valM generated by memory
wordsig m_stat 'mem_wb_next->status' # stat (possibly modified to be SADR)
##### Pipeline Register W ##########################################
wordsig W_stat 'mem_wb_curr->status' # Instruction status
wordsig W_icode 'mem_wb_curr->icode' # Instruction code
wordsig W_dstE 'mem_wb_curr->deste' # Destination E register ID
wordsig W_valE 'mem_wb_curr->vale' # ALU E value
wordsig W_dstM 'mem_wb_curr->destm' # Destination M register ID
wordsig W_valM 'mem_wb_curr->valm' # Memory M value
####################################################################
# Control Signal Definitions. #
####################################################################
################ Fetch Stage ###################################
## What address should instruction be fetched at
word f_pc = [
# Mispredicted branch. Fetch at incremented PC
M_icode == IJXX && !M_Cnd : M_valA;
# Completion of RET instruction
W_icode == IRET : W_valM;
# Default: Use predicted value of PC
1 : F_predPC;
];
## Determine icode of fetched instruction
word f_icode = [
imem_error : INOP;
1: imem_icode;
];
# Determine ifun
word f_ifun = [
imem_error : FNONE;
1: imem_ifun;
];
# Is instruction valid?
bool instr_valid = f_icode in
{ INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ,IIADDQ };
# Determine status code for fetched instruction
word f_stat = [
imem_error: SADR;
!instr_valid : SINS;
f_icode == IHALT : SHLT;
1 : SAOK;
];
# Does fetched instruction require a regid byte?
bool need_regids =
f_icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ,
IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };
# Does fetched instruction require a constant word?
bool need_valC =
f_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };
# Predict next value of PC
word f_predPC = [
f_icode in { IJXX, ICALL } : f_valC;
1 : f_valP;
];
################ Decode Stage ######################################
## What register should be used as the A source?
word d_srcA = [
D_icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : D_rA;
D_icode in { IPOPQ, IRET } : RRSP;
1 : RNONE; # Don't need register
];
## What register should be used as the B source?
word d_srcB = [
D_icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ } : D_rB;
D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
1 : RNONE; # Don't need register
];
## What register should be used as the E destination?
word d_dstE = [
D_icode in { IRRMOVQ, IIRMOVQ, IOPQ,IIADDQ} : D_rB;
D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
1 : RNONE; # Don't write any register
];
## What register should be used as the M destination?
word d_dstM = [
D_icode in { IMRMOVQ, IPOPQ } : D_rA;
1 : RNONE; # Don't write any register
];
## What should be the A value?
## Forward into decode stage for valA
word d_valA = [
D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
d_srcA == e_dstE : e_valE; # Forward valE from execute
d_srcA == M_dstM : m_valM; # Forward valM from memory
d_srcA == M_dstE : M_valE; # Forward valE from memory
d_srcA == W_dstM : W_valM; # Forward valM from write back
d_srcA == W_dstE : W_valE; # Forward valE from write back
1 : d_rvalA; # Use value read from register file
];
word d_valB = [
d_srcB == e_dstE : e_valE; # Forward valE from execute
d_srcB == M_dstM : m_valM; # Forward valM from memory
d_srcB == M_dstE : M_valE; # Forward valE from memory
d_srcB == W_dstM : W_valM; # Forward valM from write back
d_srcB == W_dstE : W_valE; # Forward valE from write back
1 : d_rvalB; # Use value read from register file
];
################ Execute Stage #####################################
## Select input A to ALU
word aluA = [
E_icode in { IRRMOVQ, IOPQ } : E_valA;
E_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : E_valC;
E_icode in { ICALL, IPUSHQ } : -8;
E_icode in { IRET, IPOPQ } : 8;
# Other instructions don't need ALU
];
## Select input B to ALU
word aluB = [
E_icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL,
IPUSHQ, IRET, IPOPQ,IIADDQ } : E_valB;
E_icode in { IRRMOVQ, IIRMOVQ } : 0;
# Other instructions don't need ALU
];
## Set the ALU function
word alufun = [
E_icode == IOPQ : E_ifun;
1 : ALUADD;
];
## Should the condition codes be updated?
bool set_cc = E_icode in {IIADDQ,IOPQ} &&
# State changes only during normal operation
!m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT };
## Generate valA in execute stage
word e_valA = E_valA; # Pass valA through stage
## Set dstE to RNONE in event of not-taken conditional move
word e_dstE = [
E_icode == IRRMOVQ && !e_Cnd : RNONE;
1 : E_dstE;
];
################ Memory Stage ######################################
## Select memory address
word mem_addr = [
M_icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : M_valE;
M_icode in { IPOPQ, IRET } : M_valA;
# Other instructions don't need address
];
## Set read control signal
bool mem_read = M_icode in { IMRMOVQ, IPOPQ, IRET };
## Set write control signal
bool mem_write = M_icode in { IRMMOVQ, IPUSHQ, ICALL };
#/* $begin pipe-m_stat-hcl */
## Update the status
word m_stat = [
dmem_error : SADR;
1 : M_stat;
];
#/* $end pipe-m_stat-hcl */
## Set E port register ID
word w_dstE = W_dstE;
## Set E port value
word w_valE = W_valE;
## Set M port register ID
word w_dstM = W_dstM;
## Set M port value
word w_valM = W_valM;
## Update processor status
word Stat = [
W_stat == SBUB : SAOK;
1 : W_stat;
];
################ Pipeline Register Control #########################
# Should I stall or inject a bubble into Pipeline Register F?
# At most one of these can be true.
bool F_bubble = 0;
bool F_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVQ, IPOPQ } &&
E_dstM in { d_srcA, d_srcB } ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
# Should I stall or inject a bubble into Pipeline Register D?
# At most one of these can be true.
bool D_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVQ, IPOPQ } &&
E_dstM in { d_srcA, d_srcB };
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Cnd) ||
# Stalling at fetch while ret passes through pipeline
# but not condition for a load/use hazard
!(E_icode in { IMRMOVQ, IPOPQ } && E_dstM in { d_srcA, d_srcB }) &&
IRET in { D_icode, E_icode, M_icode };
# Should I stall or inject a bubble into Pipeline Register E?
# At most one of these can be true.
bool E_stall = 0;
bool E_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Cnd) ||
# Conditions for a load/use hazard
E_icode in { IMRMOVQ, IPOPQ } &&
E_dstM in { d_srcA, d_srcB};
# Should I stall or inject a bubble into Pipeline Register M?
# At most one of these can be true.
bool M_stall = 0;
# Start injecting bubbles as soon as exception passes through memory stage
bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT };
# Should I stall or inject a bubble into Pipeline Register W?
bool W_stall = W_stat in { SADR, SINS, SHLT };
bool W_bubble = 0;
#/* $end pipe-all-hcl */
测试编译:
make VERSION=full
./correctness.pl #结果是否正确
./benchmark.pl #得出分数
开始尝试六路展开,然后把条件跳转换成条件转移。
测完了之后喜提0分。
因为条件转移要的指令更多。
0分代码
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src, %rsi = dst, %rdx = len
ncopy:
##################################################################
# You can modify this portion
# Loop header
xorq %rax,%rax # count = 0;
Loop:
iaddq $-6,%rdx
jl Remain # 先判断剩下的长度是否<6,进入特判;不然循环做
iaddq $6,%rdx # 把长度变回来,最后再减掉
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
rrmovq %rax,%r13
iaddq $1,%rax
andq %r8,%r8
cmovle %r13,%rax
rmmovq %r8,(%rsi)
iaddq $8,%rsi
jmp S2
S2:
rrmovq %rax,%r13
iaddq $1,%rax
andq %r9,%r9
cmovle %r13,%rax
rmmovq %r9,(%rsi)
iaddq $8,%rsi
jmp S3
S3:
mrmovq 16(%rdi),%r10
mrmovq 24(%rdi),%r11
rrmovq %rax,%r13
iaddq $1,%rax
andq %r10,%r10
cmovle %r13,%rax
rmmovq %r10,(%rsi)
iaddq $8,%rsi
jmp S4
S4:
rrmovq %rax,%r13
iaddq $1,%rax
andq %r11,%r11
cmovle %r13,%rax
rmmovq %r11,(%rsi)
iaddq $8,%rsi
jmp S5
S5:
mrmovq 32(%rdi),%r12
mrmovq 40(%rdi),%r14
rrmovq %rax,%r13
iaddq $1,%rax
andq %r12,%r12
cmovle %r13,%rax
rmmovq %r12,(%rsi)
iaddq $8,%rsi
jmp S6
S6:
rrmovq %rax,%r13
iaddq $1,%rax
andq %r14,%r14
cmovle %r13,%rax
rmmovq %r14,(%rsi)
iaddq $8,%rsi
iaddq $-6,%rdx
iaddq $48,%rdi
jmp Loop
#####################################################################
Solveremain:
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
rrmovq %rax,%r13
iaddq $1,%rax #条件转移
andq %r8,%r8
cmovle %r13,%rax
rmmovq %r8,(%rsi)
iaddq $8,%rsi
jmp Solver1
Solver1:
iaddq $-1,%rdx
jl Done
rrmovq %rax,%r13
iaddq $1,%rax
andq %r9,%r9
cmovle %r13,%rax
rmmovq %r9,(%rsi)
iaddq $8,%rsi
jmp Solver2
Solver2:
mrmovq 16(%rdi),%r10
mrmovq 24(%rdi),%r11
iaddq $-1,%rdx
jl Done
rrmovq %rax,%r13
iaddq $1,%rax
andq %r10,%r10
cmovle %r13,%rax
rmmovq %r10,(%rsi)
iaddq $8,%rsi
jmp Solver3
Solver3:
iaddq $-1,%rdx
jl Done
rrmovq %rax,%r13
iaddq $1,%rax
andq %r11,%r11
cmovle %r13,%rax
rmmovq %r11,(%rsi)
iaddq $8,%rsi
jmp Solver4
Solver4:
mrmovq 32(%rdi),%r12
iaddq $-1,%rdx
jl Done
rrmovq %rax,%r13
iaddq $1,%rax
andq %r12,%r12
cmovle %r13,%rax
rmmovq %r12,(%rsi)
iaddq $8,%rsi
jmp Done
Remain:
iaddq $5,%rdx #如果此时为负数说明原来就是0 此时rdx存的是下标0~4
jl Done
jmp Solveremain #跳转到处理剩余函数的部分
Done:
ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
然后出去吃了个饭回来看了看别人的博客。得到了启发:直接进行六路展开,>=6的不断跑循环直到<6为止。对于>=6的直接if跳就完事。<6的部分直接对半判断然后开整。<6的部分处理得不够好。只拿了40.分
##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:
##################################################################
# You can modify this portion
xorq %rax,%rax
jmp StartLoop1
Loop6:
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
mrmovq 16(%rdi),%r10
mrmovq 24(%rdi),%r11
mrmovq 32(%rdi),%r12
mrmovq 40(%rdi),%r13
rmmovq %r8,(%rsi)
andq %r8,%r8
jle L61
iaddq $1,%rax
L61:
rmmovq %r9,8(%rsi)
andq %r9,%r9
jle L62
iaddq $1,%rax
L62:
rmmovq %r10,16(%rsi)
andq %r10,%r10
jle L63
iaddq $1,%rax
L63:
rmmovq %r11,24(%rsi)
andq %r11,%r11
jle L64
iaddq $1,%rax
L64:
rmmovq %r12,32(%rsi)
andq %r12,%r12
jle L65
iaddq $1,%rax
L65:
rmmovq %r13,40(%rsi)
andq %r13,%r13
jle L66
iaddq $1,%rax
L66:
iaddq $48,%rdi
iaddq $48,%rsi
StartLoop1:
iaddq $-6,%rdx
jge Loop6
iaddq $6,%rdx
jmp StartLoop2
Loop2:
iaddq $3,%rdx
iaddq $-1,%rdx
jl Done
rmmovq %r8,(%rsi)
andq %r8,%r8
jle L21
iaddq $1,%rax
L21:
iaddq $-1,%rdx
jl Done
rmmovq %r9,8(%rsi)
andq %r9,%r9
jle L22
iaddq $1,%rax
L22:
iaddq $-1,%rdx
jl Done
rmmovq %r10,16(%rsi)
andq %r10,%r10
jle Done
iaddq $1,%rax
jmp Done
Loop3:
iaddq $-1,%rdx
rmmovq %r8,(%rsi)
andq %r8,%r8
jle L31
iaddq $1,%rax
L31:
iaddq $-1,%rdx
rmmovq %r9,8(%rsi)
andq %r9,%r9
jle L32
iaddq $1,%rax
L32:
iaddq $-1,%rdx
rmmovq %r10,16(%rsi)
andq %r10,%r10
jle L33
iaddq $1,%rax
L33:
iaddq $-1,%rdx
jl Done
rmmovq %r11,24(%rsi)
andq %r11,%r11
jle L34
iaddq $1,%rax
L34:
iaddq $-1,%rdx
jl Done
rmmovq %r12,32(%rsi)
andq %r12,%r12
jle L35
iaddq $1,%rax
L35:
iaddq $-1,%rdx
jl Done
rmmovq %r13,40(%rsi)
andq %r13,%r13
jle Done
iaddq $1,%rax
jmp Done
StartLoop2:
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
mrmovq 16(%rdi),%r10
iaddq $-3,%rdx
jle Loop2
iaddq $3,%rdx
mrmovq 24(%rdi),%r11
mrmovq 32(%rdi),%r12
jmp Loop3
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:
ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
再去学习了其他人的博客。
由CSAPP4.5.8节,对流水线的优化有:
还有CSAPP第五章的循环展开+提高并行性。(个人认为这个要求的代码主要也只能继续优化这两点)
于是我们看到若直接把rmmovq 放mrmovq (%rdi),%r8的下面。会有一个加载/冒险冲突。我们中间拿其他可用的代码代替即可。
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
rmmovq %r8,(%rsi)
对于下面<6的部分,我们对其二路展开。喜提47.3
##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:
##################################################################
# You can modify this portion
xorq %rax,%rax
jmp Start1
Loop6:
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
rmmovq %r8,(%rsi)
andq %r8,%r8
jle L61
iaddq $1,%rax
L61:
mrmovq 16(%rdi),%r10
rmmovq %r9,8(%rsi)
andq %r9,%r9
jle L62
iaddq $1,%rax
L62:
mrmovq 24(%rdi),%r11
rmmovq %r10,16(%rsi)
andq %r10,%r10
jle L63
iaddq $1,%rax
L63:
mrmovq 32(%rdi),%r12
rmmovq %r11,24(%rsi)
andq %r11,%r11
jle L64
iaddq $1,%rax
L64:
mrmovq 40(%rdi),%r13
rmmovq %r12,32(%rsi)
andq %r12,%r12
jle L65
iaddq $1,%rax
L65:
rmmovq %r13,40(%rsi)
andq %r13,%r13
jle L66
iaddq $1,%rax
L66:
iaddq $48,%rdi
iaddq $48,%rsi
Start1:
iaddq $-6,%rdx
jge Loop6
iaddq $6,%rdx
jmp Start2
Loop2:
mrmovq (%rdi),%r8
mrmovq 8(%rdi),%r9
rmmovq %r8,(%rsi)
andq %r8,%r8
jle L21
iaddq $1,%rax
L21:
rmmovq %r9,8(%rsi)
andq %r9,%r9
jle L22
iaddq $1,%rax
L22:
iaddq $16,%rdi
iaddq $16,%rsi
Start2:
iaddq $-2,%rdx #二路循环
jge Loop2
mrmovq (%rdi),%r8
iaddq $1,%rdx
jne Done
rmmovq %r8,(%rsi)
andq %r8,%r8
jle Done
iaddq $1,%rax
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:
ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
看到知乎的那篇文章说按照他的代码再六路展开能上50分.实测那份代码四路能跑48分。
但是有一篇16年的文章四路跑了60分我就比较迷惑了。怀疑是数据水了。copy过来那份代码改了一定的编译问题之后还是无法编译。
暂时先这样了
文章浏览阅读1.6k次。安装配置gi、安装数据库软件、dbca建库见下:http://blog.csdn.net/kadwf123/article/details/784299611、检查集群节点及状态:[root@rac2 ~]# olsnodes -srac1 Activerac2 Activerac3 Activerac4 Active[root@rac2 ~]_12c查看crs状态
文章浏览阅读1.3w次,点赞45次,收藏99次。我个人用的是anaconda3的一个python集成环境,自带jupyter notebook,但在我打开jupyter notebook界面后,却找不到对应的虚拟环境,原来是jupyter notebook只是通用于下载anaconda时自带的环境,其他环境要想使用必须手动下载一些库:1.首先进入到自己创建的虚拟环境(pytorch是虚拟环境的名字)activate pytorch2.在该环境下下载这个库conda install ipykernelconda install nb__jupyter没有pytorch环境
文章浏览阅读5.2k次,点赞19次,收藏28次。选择scoop纯属意外,也是无奈,因为电脑用户被锁了管理员权限,所有exe安装程序都无法安装,只可以用绿色软件,最后被我发现scoop,省去了到处下载XXX绿色版的烦恼,当然scoop里需要管理员权限的软件也跟我无缘了(譬如everything)。推荐添加dorado这个bucket镜像,里面很多中文软件,但是部分国外的软件下载地址在github,可能无法下载。以上两个是官方bucket的国内镜像,所有软件建议优先从这里下载。上面可以看到很多bucket以及软件数。如果官网登陆不了可以试一下以下方式。_scoop-cn
文章浏览阅读4.5k次,点赞2次,收藏3次。首先要有一个color-picker组件 <el-color-picker v-model="headcolor"></el-color-picker>在data里面data() { return {headcolor: ’ #278add ’ //这里可以选择一个默认的颜色} }然后在你想要改变颜色的地方用v-bind绑定就好了,例如:这里的:sty..._vue el-color-picker
文章浏览阅读640次。基于芯片日益增长的问题,所以内核开发者们引入了新的方法,就是在内核中只保留函数,而数据则不包含,由用户(应用程序员)自己把数据按照规定的格式编写,并放在约定的地方,为了不占用过多的内存,还要求数据以根精简的方式编写。boot启动时,传参给内核,告诉内核设备树文件和kernel的位置,内核启动时根据地址去找到设备树文件,再利用专用的编译器去反编译dtb文件,将dtb还原成数据结构,以供驱动的函数去调用。firmware是三星的一个固件的设备信息,因为找不到固件,所以内核启动不成功。_exynos 4412 刷机
文章浏览阅读2w次,点赞24次,收藏42次。Linux系统配置jdkLinux学习教程,Linux入门教程(超详细)_linux配置jdk
文章浏览阅读3.3k次,点赞5次,收藏19次。xlabel('\delta');ylabel('AUC');具体符号的对照表参照下图:_matlab微米怎么输入
文章浏览阅读119次。顺序读写指的是按照文件中数据的顺序进行读取或写入。对于文本文件,可以使用fgets、fputs、fscanf、fprintf等函数进行顺序读写。在C语言中,对文件的操作通常涉及文件的打开、读写以及关闭。文件的打开使用fopen函数,而关闭则使用fclose函数。在C语言中,可以使用fread和fwrite函数进行二进制读写。 Biaoge 于2024-03-09 23:51发布 阅读量:7 ️文章类型:【 C语言程序设计 】在C语言中,用于打开文件的函数是____,用于关闭文件的函数是____。
文章浏览阅读3.4k次,点赞2次,收藏13次。跟随鼠标移动的粒子以grid(SOP)为partical(SOP)的资源模板,调整后连接【Geo组合+point spirit(MAT)】,在连接【feedback组合】适当调整。影响粒子动态的节点【metaball(SOP)+force(SOP)】添加mouse in(CHOP)鼠标位置到metaball的坐标,实现鼠标影响。..._touchdesigner怎么让一个模型跟着鼠标移动
文章浏览阅读178次。项目运行环境配置:Jdk1.8 + Tomcat7.0 + Mysql + HBuilderX(Webstorm也行)+ Eclispe(IntelliJ IDEA,Eclispe,MyEclispe,Sts都支持)。项目技术:Springboot + mybatis + Maven +mysql5.7或8.0+html+css+js等等组成,B/S模式 + Maven管理等等。环境需要1.运行环境:最好是java jdk 1.8,我们在这个平台上运行的。其他版本理论上也可以。_基于java技术的停车场管理系统实现与设计
文章浏览阅读3.5k次。前言对于MediaPlayer播放器的源码分析内容相对来说比较多,会从Java-&amp;gt;Jni-&amp;gt;C/C++慢慢分析,后面会慢慢更新。另外,博客只作为自己学习记录的一种方式,对于其他的不过多的评论。MediaPlayerDemopublic class MainActivity extends AppCompatActivity implements SurfaceHolder.Cal..._android多媒体播放源码分析 时序图
文章浏览阅读2.4k次,点赞41次,收藏13次。java 数据结构与算法 ——快速排序法_快速排序法