1. 前言

1.1 可执行文件及调试文件

elf, dwarf and Golang. 本来英文版的标题应该是这样的,翻译成中文后就变成妖精、矮人与Golang(气场突然提升),最后还是觉得不用这个标题了。

Linux环境下ELF全称是Executable and Linkable Format,DWARF全称为Debugging With Attributed Record Formats

无论使用什么语言编写程序,在Linux环境下编译后都会输出一份ELF文件。ELF是可执行文件、object code、shared library和core dumps的标准文件格式。默认构建指令编译出的Go程序包含调试信息,以dwarf格式压缩后存储,gdb、delve等通过读取调试信息提供debug功能。下述输出示例都是在amd64架构,Linux环境下操作的。

关于ELF信息不再赘述,更多的具体资料如下:

1.2 编译器

从代码到可执行文件一般经历过词法分析、语法分析、编译、汇编、链接等过程,而在Go语言中,我们可见的命令有

  • go tool compile:处理go文件,执行词法分析、语法分析、汇编、编译,输出obj文件
  • go tool asm:处理汇编文件(.s文件),输出obj文件
  • go tool pack:打包package下的所有obj文件,输出.a文件
  • go tool link:链接不同package的.a文件,输出可执行文件
  • go tool objdump:反汇编obj文件
  • go tool nm:输出obj文件、.a文件或可执行文件中定义的符号

Go汇编代码是一种Plan 9风格汇编代码,我们甚至可以使用go tool objdump去反汇编一个C程序,可以得到Plan 9风格的汇编代码,反之也可以使用objdump反汇编Go程序,得到x86风格的汇编代码。

关于编译器的资料,以下两份文档都值得一读

2. 程序启动入口

下面的命令可以分别构建出无调试信息的最小可知程序、默认包含压缩调试信息的程序、包含无压缩调试信息的程序

1
2
3
4
5
6
// 移除调试信息构建可执行程序
go build -ldflags "-s -w" -o hello cmd/hello/main.go
// 包含压缩dwarf格式的可执行程序(默认)
go build -o hello cmd/hello/main.go
// 关闭dwarf压缩构建可执行程序
go build -ldflags "-compressdwarf=false" -o hello cmd/hello/main.go

执行readelf -e hello读取包含调试信息的可执行文件,得到完整的ELF头部输出

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
ELF Header:                                                               
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00                
  Class:                             ELF64                                
  Data:                              2's complement, little endian        
  Version:                           1 (current)                          
  OS/ABI:                            UNIX - System V                      
  ABI Version:                       0                                    
  Type:                              EXEC (Executable file)               
  Machine:                           Advanced Micro Devices X86-64        
  Version:                           0x1                                  
  Entry point address:               0x4605e0                             
  Start of program headers:          64 (bytes into file)                 
  Start of section headers:          456 (bytes into file)                
  Flags:                             0x0                                  
  Size of this header:               64 (bytes)                           
  Size of program headers:           56 (bytes)                           
  Number of program headers:         7                                    
  Size of section headers:           64 (bytes)                           
  Number of section headers:         25                                   
  Section header string table index: 3                                    
                                                                          
Section Headers:                                                          
  [Nr] Name              Type             Address           Offset        
       Size              EntSize          Flags  Link  Info  Align        
  [ 0]                   NULL             0000000000000000  00000000      
       0000000000000000  0000000000000000           0     0     0         
  [ 1] .text             PROGBITS         0000000000401000  00001000      
       00000000009ef97a  0000000000000000  AX       0     0     16        
  [ 2] .rodata           PROGBITS         0000000000df1000  009f1000      
       000000000046b19a  0000000000000000   A       0     0     32        
  [ 3] .shstrtab         STRTAB           0000000000000000  00e5c1a0      
       00000000000001a1  0000000000000000           0     0     1         
  [ 4] .typelink         PROGBITS         000000000125c360  00e5c360      
       0000000000008174  0000000000000000   A       0     0     32        
  [ 5] .itablink         PROGBITS         00000000012644d8  00e644d8      
       0000000000002868  0000000000000000   A       0     0     8         
  [ 6] .gosymtab         PROGBITS         0000000001266d40  00e66d40      
       0000000000000000  0000000000000000   A       0     0     1         
  [ 7] .gopclntab        PROGBITS         0000000001266d40  00e66d40      
       0000000000652074  0000000000000000   A       0     0     32        
  [ 8] .go.buildinfo     PROGBITS         00000000018b9000  014b9000      
       0000000000000020  0000000000000000  WA       0     0     16        
  [ 9] .noptrdata        PROGBITS         00000000018b9020  014b9020      
       000000000004e920  0000000000000000  WA       0     0     32        
  [10] .data             PROGBITS         0000000001907940  01507940      
       0000000000015cd0  0000000000000000  WA       0     0     32        
  [11] .bss              NOBITS           000000000191d620  0151d620      
       000000000002a170  0000000000000000  WA       0     0     32        
  [12] .noptrbss         NOBITS           00000000019477a0  015477a0      
       0000000000003af8  0000000000000000  WA       0     0     32        
  [13] .zdebug_abbrev    PROGBITS         000000000194c000  0151e000      
       0000000000000119  0000000000000000           0     0     8         
  [14] .zdebug_line      PROGBITS         000000000194c119  0151e119      
       000000000011074c  0000000000000000           0     0     8         
  [15] .zdebug_frame     PROGBITS         0000000001a5c865  0162e865      
       000000000004c5c3  0000000000000000           0     0     8         
  [16] .zdebug_pubnames  PROGBITS         0000000001aa8e28  0167ae28      
       00000000000089ed  0000000000000000           0     0     8         
  [17] .zdebug_pubtypes  PROGBITS         0000000001ab1815  01683815      
       00000000000228ed  0000000000000000           0     0     8         
  [18] .debug_gdb_script PROGBITS         0000000001ad4102  016a6102      
       0000000000000024  0000000000000000           0     0     1         
  [19] .zdebug_info      PROGBITS         0000000001ad4126  016a6126      
       00000000001fa247  0000000000000000           0     0     8         
  [20] .zdebug_loc       PROGBITS         0000000001cce36d  018a036d      
       0000000000159fa8  0000000000000000           0     0     8         
  [21] .zdebug_ranges    PROGBITS         0000000001e28315  019fa315      
       0000000000071a60  0000000000000000           0     0     8         
  [22] .note.go.buildid  NOTE             0000000000400f9c  00000f9c      
       0000000000000064  0000000000000000   A       0     0     4         
  [23] .symtab           SYMTAB           0000000000000000  01a6c000      
       00000000000b5440  0000000000000018          24   542     8         
  [24] .strtab           STRTAB           0000000000000000  01b21440      
       0000000000171f69  0000000000000000           0     0     1         
Key to Flags:                                                             
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),    
  L (link order), O (extra OS processing required), G (group), T (TLS),   
  C (compressed), x (unknown), o (OS specific), E (exclude),              
  l (large), p (processor specific)                                       
                                                                          
Program Headers:                                                          
  Type           Offset             VirtAddr           PhysAddr           
                 FileSiz            MemSiz              Flags  Align      
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040 
                 0x0000000000000188 0x0000000000000188  R      1000       
  NOTE           0x0000000000000f9c 0x0000000000400f9c 0x0000000000400f9c 
                 0x0000000000000064 0x0000000000000064  R      4          
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000 
                 0x00000000009f097a 0x00000000009f097a  R E    1000       
  LOAD           0x00000000009f1000 0x0000000000df1000 0x0000000000df1000 
                 0x0000000000ac7db4 0x0000000000ac7db4  R      1000       
  LOAD           0x00000000014b9000 0x00000000018b9000 0x00000000018b9000 
                 0x0000000000064620 0x0000000000092298  RW     1000       
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000 
                 0x0000000000000000 0x0000000000000000  RW     8          
  LOOS+5041580   0x0000000000000000 0x0000000000000000 0x0000000000000000 
                 0x0000000000000000 0x0000000000000000         8          
                                                                          
 Section to Segment mapping:                                              
  Segment Sections...                                                     
   00                                                                     
   01     .note.go.buildid                                                
   02     .text .note.go.buildid                                          
   03     .rodata .typelink .itablink .gosymtab .gopclntab                
   04     .go.buildinfo .noptrdata .data .bss .noptrbss                   
   05                                                                     
   06                                                                     

执行readelf -e hello读取不包含调试信息的可执行文件,得到完整的ELF头部输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
ELF Header:                                                              
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00               
  Class:                             ELF64                               
  Data:                              2's complement, little endian       
  Version:                           1 (current)                         
  OS/ABI:                            UNIX - System V                     
  ABI Version:                       0                                   
  Type:                              EXEC (Executable file)              
  Machine:                           Advanced Micro Devices X86-64       
  Version:                           0x1                                 
  Entry point address:               0x4605e0                            
  Start of program headers:          64 (bytes into file)                
  Start of section headers:          456 (bytes into file)               
  Flags:                             0x0                                 
  Size of this header:               64 (bytes)                          
  Size of program headers:           56 (bytes)                          
  Number of program headers:         7                                   
  Size of section headers:           64 (bytes)                          
  Number of section headers:         14                                  
  Section header string table index: 3                                   
                                                                         
Section Headers:                                                         
  [Nr] Name              Type             Address           Offset       
       Size              EntSize          Flags  Link  Info  Align       
  [ 0]                   NULL             0000000000000000  00000000     
       0000000000000000  0000000000000000           0     0     0        
  [ 1] .text             PROGBITS         0000000000401000  00001000     
       00000000009ef97a  0000000000000000  AX       0     0     16       
  [ 2] .rodata           PROGBITS         0000000000df1000  009f1000     
       000000000046b19a  0000000000000000   A       0     0     32       
  [ 3] .shstrtab         STRTAB           0000000000000000  00e5c1a0     
       000000000000008a  0000000000000000           0     0     1        
  [ 4] .typelink         PROGBITS         000000000125c240  00e5c240     
       0000000000008174  0000000000000000   A       0     0     32       
  [ 5] .itablink         PROGBITS         00000000012643b8  00e643b8     
       0000000000002868  0000000000000000   A       0     0     8        
  [ 6] .gosymtab         PROGBITS         0000000001266c20  00e66c20     
       0000000000000000  0000000000000000   A       0     0     1        
  [ 7] .gopclntab        PROGBITS         0000000001266c20  00e66c20     
       0000000000652074  0000000000000000   A       0     0     32       
  [ 8] .go.buildinfo     PROGBITS         00000000018b9000  014b9000     
       0000000000000020  0000000000000000  WA       0     0     16       
  [ 9] .noptrdata        PROGBITS         00000000018b9020  014b9020     
       000000000004e920  0000000000000000  WA       0     0     32       
  [10] .data             PROGBITS         0000000001907940  01507940     
       0000000000015cd0  0000000000000000  WA       0     0     32       
  [11] .bss              NOBITS           000000000191d620  0151d620     
       000000000002a170  0000000000000000  WA       0     0     32       
  [12] .noptrbss         NOBITS           00000000019477a0  015477a0     
       0000000000003af8  0000000000000000  WA       0     0     32       
  [13] .note.go.buildid  NOTE             0000000000400f9c  00000f9c     
       0000000000000064  0000000000000000   A       0     0     4        
Key to Flags:                                                            
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),   
  L (link order), O (extra OS processing required), G (group), T (TLS),  
  C (compressed), x (unknown), o (OS specific), E (exclude),             
  l (large), p (processor specific)                                      
                                                                         
Program Headers:                                                         
  Type           Offset             VirtAddr           PhysAddr          
                 FileSiz            MemSiz              Flags  Align     
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000188 0x0000000000000188  R      1000      
  NOTE           0x0000000000000f9c 0x0000000000400f9c 0x0000000000400f9c
                 0x0000000000000064 0x0000000000000064  R      4         
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000009f097a 0x00000000009f097a  R E    1000      
  LOAD           0x00000000009f1000 0x0000000000df1000 0x0000000000df1000
                 0x0000000000ac7c94 0x0000000000ac7c94  R      1000      
  LOAD           0x00000000014b9000 0x00000000018b9000 0x00000000018b9000
                 0x0000000000064620 0x0000000000092298  RW     1000      
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8         
  LOOS+5041580   0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000         8         
                                                                         
 Section to Segment mapping:                                             
  Segment Sections...                                                    
   00                                                                    
   01     .note.go.buildid                                               
   02     .text .note.go.buildid                                         
   03     .rodata .typelink .itablink .gosymtab .gopclntab               
   04     .go.buildinfo .noptrdata .data .bss .noptrbss                  
   05                                                                    
   06                                                                    

可以看出上述两者之间最大的区别就是section数量,包含调试信息的可执行文件有更多的debug信息,可以通过dwarfdump工具导出dwarf文件

通过网上的资料我们已知:

  • Program Headers告诉操作系统如何创建一个运行时的内存镜像,操作系统使用mmap将这些分段映射到虚拟地址空间
  • Section Headers定义了链接和重定位的数据,每个Section都包含起始地址及长度,根据类型划分为
    • text:可执行代码
    • data:已初始化数据,可读写,例如已初始化的全局变量
    • rodata:已初始化数据,只读,例如全局常量
    • .bss:未初始化数据,可读写,例如未初始化的全局变量
    • 其他

以上述包含调试信息的ELF文件为例,接下来寻找程序启动的入口

从ELF文件头部(Entry point address)可以得到程序入口虚拟地址为:0x4605e0,正好落在编号为1的section:text(虚拟地址空间为0x401000~0xdf097a),包含可执行代码,属于只读section。

然后使用readelf获取导出符号表,寻找0x4605e0,执行readelf -s hello|grep 4605e0

1
9049: 00000000004605e0     5 FUNC    GLOBAL DEFAULT    1 _rt0_amd64_linux

可见Linux下的Go程序启动入口代码为_rt0_amd64_linux,接下来可以回到源码了

3. runtime启动代码

“I could be bounded in a nutshell, and count myself a king of infinite space, were it not that I have bad dreams.” - Hamlet

在看过runtime代码后,突然想到哈姆雷特的这句话,没看过启动代码的话,永远不会知道自己的代码都封闭在一个个goroutine中

我们可以在Go标准库中的runtime/rt0_linux_amd64.s文件找到上一小节中的入口代码定义

1
2
TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
    JMP _rt0_amd64(SB)

执行跳转到runtime/asm_amd64.s中的启动代码

1
2
3
4
5
6
7
8
// _rt0_amd64 is common startup code for most amd64 systems when using
// internal linking. This is the entry point for the program from the
// kernel for an ordinary -buildmode=exe program. The stack holds the
// number of arguments and the C-style argv.
TEXT _rt0_amd64(SB),NOSPLIT,$-8
    MOVQ    0(SP), DI   // argc
    LEAQ    8(SP), SI   // argv
    JMP runtime·rt0_go(SB)

上述代码最终携带启动程序信息及命令行参数,跳转到了同一文件下的另一段汇编代码入口,这段代码十分长,但概括下来就是

  • 获取命令行参数
  • 获取CPU和操作系统信息
  • 初始化runtime
  • 启动main函数,runtime的main函数间接启动用户main函数
  • 等待main函数退出,处理debug信息

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    
    TEXT runtime·rt0_go(SB),NOSPLIT,$0
        // copy arguments forward on an even stack
        MOVQ    DI, AX      // argc
        MOVQ    SI, BX      // argv
        SUBQ    $(4*8+7), SP        // 2args 2auto
        ANDQ    $~15, SP
        MOVQ    AX, 16(SP)
        MOVQ    BX, 24(SP)
        // create istack out of the given (operating system) stack.
        // _cgo_init may update stackguard.
        MOVQ    $runtime·g0(SB), DI
        LEAQ    (-64*1024+104)(SP), BX
        MOVQ    BX, g_stackguard0(DI)
        MOVQ    BX, g_stackguard1(DI)
        MOVQ    BX, (g_stack+stack_lo)(DI)
        MOVQ    SP, (g_stack+stack_hi)(DI)
        // find out information about the processor we're on
        MOVL    $0, AX
        CPUID
        MOVL    AX, SI
        CMPL    AX, $0
        JE  nocpuinfo
        // Figure out how to serialize RDTSC.
        // On Intel processors LFENCE is enough. AMD requires MFENCE.
        // Don't know about the rest, so let's do MFENCE.
        CMPL    BX, $0x756E6547  // "Genu"
        JNE notintel
        CMPL    DX, $0x49656E69  // "ineI"
        JNE notintel
        CMPL    CX, $0x6C65746E  // "ntel"
        JNE notintel
        MOVB    $1, runtime·isIntel(SB)
        MOVB    $1, runtime·lfenceBeforeRdtsc(SB)
    notintel:
        // Load EAX=1 cpuid flags
        MOVL    $1, AX
        CPUID
        MOVL    AX, runtime·processorVersionInfo(SB)
    nocpuinfo:
        // if there is an _cgo_init, call it.
        MOVQ    _cgo_init(SB), AX
        TESTQ   AX, AX
        JZ  needtls
        // arg 1: g0, already in DI
        MOVQ    $setg_gcc<>(SB), SI // arg 2: setg_gcc
    #ifdef GOOS_android
        MOVQ    $runtime·tls_g(SB), DX  // arg 3: &tls_g
        // arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
        // Compensate for tls_g (+16).
        MOVQ    -16(TLS), CX
    #else
        MOVQ    $0, DX  // arg 3, 4: not used when using platform's TLS
        MOVQ    $0, CX
    #endif
    #ifdef GOOS_windows
        // Adjust for the Win64 calling convention.
        MOVQ    CX, R9 // arg 4
        MOVQ    DX, R8 // arg 3
        MOVQ    SI, DX // arg 2
        MOVQ    DI, CX // arg 1
    #endif
        CALL    AX
        // update stackguard after _cgo_init
        MOVQ    $runtime·g0(SB), CX
        MOVQ    (g_stack+stack_lo)(CX), AX
        ADDQ    $const__StackGuard, AX
        MOVQ    AX, g_stackguard0(CX)
        MOVQ    AX, g_stackguard1(CX)
    #ifndef GOOS_windows
        JMP ok
    #endif
    needtls:
    #ifdef GOOS_plan9
        // skip TLS setup on Plan 9
        JMP ok
    #endif
    #ifdef GOOS_solaris
        // skip TLS setup on Solaris
        JMP ok
    #endif
    #ifdef GOOS_illumos
        // skip TLS setup on illumos
        JMP ok
    #endif
    #ifdef GOOS_darwin
        // skip TLS setup on Darwin
        JMP ok
    #endif
        LEAQ    runtime·m0+m_tls(SB), DI
        CALL    runtime·settls(SB)
        // store through it, to make sure it works
        get_tls(BX)
        MOVQ    $0x123, g(BX)
        MOVQ    runtime·m0+m_tls(SB), AX
        CMPQ    AX, $0x123
        JEQ 2(PC)
        CALL    runtime·abort(SB)
    ok:
        // set the per-goroutine and per-mach "registers"
        get_tls(BX)
        LEAQ    runtime·g0(SB), CX
        MOVQ    CX, g(BX)
        LEAQ    runtime·m0(SB), AX
        // save m->g0 = g0
        MOVQ    CX, m_g0(AX)
        // save m0 to g0->m
        MOVQ    AX, g_m(CX)
        CLD             // convention is D is always left cleared
        CALL    runtime·check(SB)
        MOVL    16(SP), AX      // copy argc
        MOVL    AX, 0(SP)
        MOVQ    24(SP), AX      // copy argv
        MOVQ    AX, 8(SP)
        CALL    runtime·args(SB)
        CALL    runtime·osinit(SB)
        CALL    runtime·schedinit(SB)
        // create a new goroutine to start program
        MOVQ    $runtime·mainPC(SB), AX     // entry
        PUSHQ   AX
        PUSHQ   $0          // arg size
        CALL    runtime·newproc(SB)
        POPQ    AX
        POPQ    AX
        // start this M
        CALL    runtime·mstart(SB)
        CALL    runtime·abort(SB)   // mstart should never return
        RET
        // Prevent dead-code elimination of debugCallV1, which is
        // intended to be called by debuggers.
        MOVQ    $runtime·debugCallV1(SB), AX
        RET
    DATA    runtime·mainPC+0(SB)/8,$runtime·main(SB)
    GLOBL   runtime·mainPC(SB),RODATA,$8

3.1 参数拷贝

将栈上的参数拷贝到寄存器中

  • MOVQ指将第二个操作数中的数据拷贝4个字的长度到第一个操作数中
  • ANDQ、SUBQ同样操作4个字的数据,第二个操作数是数据源,第一个操作数是目标

    1
    2
    3
    4
    5
    6
    7
    
    // copy arguments forward on an even stack
        MOVQ    DI, AX      // argc
        MOVQ    SI, BX      // argv
        SUBQ    $(4*8+7), SP        // 2args 2auto
        ANDQ    $~15, SP
        MOVQ    AX, 16(SP)
        MOVQ    BX, 24(SP)

3.2 同步主线程系统栈

  • 运行时定义了全局变量g0及m0,m0是启动用户main函数的线程
  • 由于runtime未初始化,需要手动绑定m0及g0,并让g0使用进程创建后的主线程栈空间

    1
    2
    3
    4
    5
    6
    7
    8
    
        // create istack out of the given (operating system) stack.
        // _cgo_init may update stackguard.
        MOVQ    $runtime·g0(SB), DI
        LEAQ    (-64*1024+104)(SP), BX
        MOVQ    BX, g_stackguard0(DI)
        MOVQ    BX, g_stackguard1(DI)
        MOVQ    BX, (g_stack+stack_lo)(DI)
        MOVQ    SP, (g_stack+stack_hi)(DI)

3.3 获取CPU信息

  • 执行一个CPUID指令,尝试获取CPU信息
  • 用特定参数判断CPU为通用类型、英特尔类型、非英特尔类型

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    	// find out information about the processor we're on
    	MOVL	$0, AX
    	CPUID
    	MOVL	AX, SI
    	CMPL	AX, $0
    	JE	nocpuinfo
    
    	// Figure out how to serialize RDTSC.
    	// On Intel processors LFENCE is enough. AMD requires MFENCE.
    	// Don't know about the rest, so let's do MFENCE.
    	CMPL	BX, $0x756E6547  // "Genu"
    	JNE	notintel
    	CMPL	DX, $0x49656E69  // "ineI"
    	JNE	notintel
    	CMPL	CX, $0x6C65746E  // "ntel"
    	JNE	notintel
    	MOVB	$1, runtime·isIntel(SB)
    	MOVB	$1, runtime·lfenceBeforeRdtsc(SB)
    notintel:
    
    	// Load EAX=1 cpuid flags
    	MOVL	$1, AX
    	CPUID
    	MOVL	AX, runtime·processorVersionInfo(SB)

3.4 配置TLS(Thread Local Storage)

TLS表示限定于一个线程内的静态或全局变量,具体概念可以查看维基百科:Thread Local Storage

另外也涉及runtime/go_tls.h文件,内容如下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#ifdef GOARCH_arm
#define LR R14
#endif

#ifdef GOARCH_amd64
#define	get_tls(r)	MOVQ TLS, r
#define	g(r)	0(r)(TLS*1)
#endif

#ifdef GOARCH_amd64p32
#define	get_tls(r)	MOVL TLS, r
#define	g(r)	0(r)(TLS*1)
#endif

#ifdef GOARCH_386
#define	get_tls(r)	MOVL TLS, r
#define	g(r)	0(r)(TLS*1)
#endif

可以看到代码中定义了TLS相关的宏,主要是x86架构的实现,其他架构应该是直接使用m结构体中的tls字段

下面的代码中,主要执行了以下操作

  • 尝试初始化cgo
  • 根据操作系统配置TLS
  • 更新g0系统栈
  • 调用TLS以测试是否可用

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    
    nocpuinfo:
    	// if there is an _cgo_init, call it.
    	MOVQ	_cgo_init(SB), AX
    	TESTQ	AX, AX
    	JZ	needtls
    	// arg 1: g0, already in DI
    	MOVQ	$setg_gcc<>(SB), SI // arg 2: setg_gcc
    #ifdef GOOS_android
    	MOVQ	$runtime·tls_g(SB), DX 	// arg 3: &tls_g
    	// arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
    	// Compensate for tls_g (+16).
    	MOVQ	-16(TLS), CX
    #else
    	MOVQ	$0, DX	// arg 3, 4: not used when using platform's TLS
    	MOVQ	$0, CX
    #endif
    #ifdef GOOS_windows
    	// Adjust for the Win64 calling convention.
    	MOVQ	CX, R9 // arg 4
    	MOVQ	DX, R8 // arg 3
    	MOVQ	SI, DX // arg 2
    	MOVQ	DI, CX // arg 1
    #endif
    	CALL	AX
    
    	// update stackguard after _cgo_init
    	MOVQ	$runtime·g0(SB), CX
    	MOVQ	(g_stack+stack_lo)(CX), AX
    	ADDQ	$const__StackGuard, AX
    	MOVQ	AX, g_stackguard0(CX)
    	MOVQ	AX, g_stackguard1(CX)
    
    #ifndef GOOS_windows
    	JMP ok
    #endif
    needtls:
    #ifdef GOOS_plan9
    	// skip TLS setup on Plan 9
    	JMP ok
    #endif
    #ifdef GOOS_solaris
    	// skip TLS setup on Solaris
    	JMP ok
    #endif
    #ifdef GOOS_illumos
    	// skip TLS setup on illumos
    	JMP ok
    #endif
    #ifdef GOOS_darwin
    	// skip TLS setup on Darwin
    	JMP ok
    #endif
    
    	LEAQ	runtime·m0+m_tls(SB), DI
    	CALL	runtime·settls(SB)
    
    	// store through it, to make sure it works
    	get_tls(BX)
    	MOVQ	$0x123, g(BX)
    	MOVQ	runtime·m0+m_tls(SB), AX
    	CMPQ	AX, $0x123
    	JEQ 2(PC)
    	CALL	runtime·abort(SB)

3.5 绑定g0与m0、检查基础数据类型

  • 已完成TLS配置,配置goroutine和主线程的虚拟寄存器
  • 绑定g0与m0
  • 执行基础数据类型检查,check函数位于runtime/runtime1.go文件中

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    ok:
    	// set the per-goroutine and per-mach "registers"
    	get_tls(BX)
    	LEAQ	runtime·g0(SB), CX
    	MOVQ	CX, g(BX)
    	LEAQ	runtime·m0(SB), AX
    
    	// save m->g0 = g0
    	MOVQ	CX, m_g0(AX)
    	// save m0 to g0->m
    	MOVQ	AX, g_m(CX)
    
    	CLD				// convention is D is always left cleared
    	CALL	runtime·check(SB)

3.6 初始化调度器并启动main函数

虽然主线程早已启动,我们也执行大量的汇编指令和函数调用,但直到调用m0的mstart启动调度前,一切都还是静止的状态,下面的代码中主要执行了以下操作

  • 参数拷贝
  • runtime.args:执行参数初始化,位于runtime/runtime1.go文件中
  • runtime.osinit:执行操作系统相关初始化,位于runtime下os开头的go文件中,主要是获取CPU数量与HugePage大小
  • runtime.schedinit:执行调度器初始化,位于runtime/proc.go文件中
  • 创建一个启动main函数的goroutine,加入m0的g队列中,它将会是接下来第一个启动的goroutine
  • runtime.mstart:启动m0,开始执行调度

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    	MOVL	16(SP), AX		// copy argc
    	MOVL	AX, 0(SP)
    	MOVQ	24(SP), AX		// copy argv
    	MOVQ	AX, 8(SP)
    	CALL	runtime·args(SB)
    	CALL	runtime·osinit(SB)
    	CALL	runtime·schedinit(SB)
    
    	// create a new goroutine to start program
    	MOVQ	$runtime·mainPC(SB), AX		// entry
    	PUSHQ	AX
    	PUSHQ	$0			// arg size
    	CALL	runtime·newproc(SB)
    	POPQ	AX
    	POPQ	AX
    
    	// start this M
    	CALL	runtime·mstart(SB)
    
    	CALL	runtime·abort(SB)	// mstart should never return
    	RET
    
    	// Prevent dead-code elimination of debugCallV1, which is
    	// intended to be called by debuggers.
    	MOVQ	$runtime·debugCallV1(SB), AX
    	RET

上述代码中的mainPC其实就是runtime的main函数

1
2
DATA    runtime·mainPC+0(SB)/8,$runtime·main(SB)
GLOBL   runtime·mainPC(SB),RODATA,$8

runtime.main位于runtime/proc.go文件中

3.7 调度器初始化

这里需要拆解一下调度器初始化函数,一如既往,其中的每个函数调用都能展开成一节,但是概括下

  • 获取g0,若启用竞态分析则执行raceinit
  • sched.maxmcount:限制调度器最多开启1万个线程
  • tracebackinit:位于runtime/traceback.go,初始化traceback,在异常退出时打印调用栈
  • moduledataverify:位于runtime/symtab.go,模块数据验证,读取section中的go内部定义数据
  • stackinit:位于runtime/stack.go,负责goroutine栈池初始化,两个全局变量stackpool和stackLarge分别缓存小栈和大栈,取当前进程的堆内存
  • mallocinit:位于runtime/malloc.go,负责内存分配器初始化,go的内存分配器实现类似tcmalloc,预定义了一组tinySize内存块,超过最大尺寸时取缓存,若无合适的缓存再取当前进程堆内存,垃圾回收负责释放这些内存
  • mcommoninit:位于runtime/proc.go,如函数名一样,执行通用的m初始化,只不过这时初始化的是m0
  • cpuinit:位于runtime/proc.go,获取CPU指令集特性
  • alginit:位于runtime/alg.go,根据受支持的CPU指令集,初始化AES和Hash组件,由于map是哈希表,因此必须执行这个函数后才能使用map
  • modulesinit:位于runtime/symtab.go,初始化所有已加载的模块
  • typelinksinit:位于runtime/type.go,扫描所有加载的模块,构建模块数据的类型字典,以执行类型指针的去重
  • itabsinit:位于runtime/iface.go,构建itabTable,初始化具体类型与interface类型的关联数据
  • msigsave:位于runtime下所有os开头的go文件中,将当前线程的信号掩码存入m中
  • goargs:位于runtime/runtime1.go文件中,格式化存储命令行参数
  • goenvs:位于runtime下所有os开头的go文件中,格式化存储环境变量
  • parsedebugvars:位于runtime/runtime1.go文件中,处理GODEBUG环境变量涉及的参数
  • gcinit:位于runtime/mgc.go文件中,初始化垃圾回收器
  • GOMAXPROCS:尝试读取环境变量GOMAXPROCS覆盖默认以读取到的ncpu,用于限制p数量

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    
    // The bootstrap sequence is:
    //
    //	call osinit
    //	call schedinit
    //	make & queue new G
    //	call runtime·mstart
    //
    // The new G calls runtime·main.
    func schedinit() {
    	// raceinit must be the first call to race detector.
    	// In particular, it must be done before mallocinit below calls racemapshadow.
    	_g_ := getg()
    	if raceenabled {
    		_g_.racectx, raceprocctx0 = raceinit()
    	}
    
    	sched.maxmcount = 10000
    
    	tracebackinit()
    	moduledataverify()
    	stackinit()
    	mallocinit()
    	mcommoninit(_g_.m)
    	cpuinit()       // must run before alginit
    	alginit()       // maps must not be used before this call
    	modulesinit()   // provides activeModules
    	typelinksinit() // uses maps, activeModules
    	itabsinit()     // uses activeModules
    
    	msigsave(_g_.m)
    	initSigmask = _g_.m.sigmask
    
    	goargs()
    	goenvs()
    	parsedebugvars()
    	gcinit()
    
    	sched.lastpoll = uint64(nanotime())
    	procs := ncpu
    	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
    		procs = n
    	}
    	if procresize(procs) != nil {
    		throw("unknown runnable goroutine during bootstrap")
    	}
    
    	// For cgocheck > 1, we turn on the write barrier at all times
    	// and check all pointer writes. We can't do this until after
    	// procresize because the write barrier needs a P.
    	if debug.cgocheck > 1 {
    		writeBarrier.cgo = true
    		writeBarrier.enabled = true
    		for _, p := range allp {
    			p.wbBuf.reset()
    		}
    	}
    
    	if buildVersion == "" {
    		// Condition should never trigger. This code just serves
    		// to ensure runtime·buildVersion is kept in the resulting binary.
    		buildVersion = "unknown"
    	}
    	if len(modinfo) == 1 {
    		// Condition should never trigger. This code just serves
    		// to ensure runtime·modinfo is kept in the resulting binary.
    		modinfo = ""
    	}
    }

3.8 runtime main函数

main不是程序的直接入口,更像是一个约定俗成的函数调用,在编译时被链接到指定位置,在Go中我们编写的main函数就会被链接到runtime/proc.gofunc main_main(),runtime的main函数已经脱离了系统栈,位于m0的g队列的一个goroutine中,主要执行了以下操作

  • racectx处理
  • 根据指针长度(而不是CPU)限制最大栈长度,64位上最大1GB,32位上最大250 MB,以10进制计算
  • 设置mainStarted标志位,允许newproc函数启动新的m
  • 启动一个不绑定p的m,执行sysmon函数,该函数自动触发垃圾回收和netpoll
  • 锁定主线程
  • 执行所有runtime的init函数
  • 启动垃圾回收
  • 启动template线程
  • cgo检查,确保cgo初始化成功
  • 执行用户main函数及引用的所有包中的init函数
  • 解锁主线程
  • 执行用户main函数,等待退出
  • 关闭静态检查
  • panic检查

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    
    //go:linkname runtime_inittask runtime..inittask
    var runtime_inittask initTask
    
    //go:linkname main_inittask main..inittask
    var main_inittask initTask
    
    // main_init_done is a signal used by cgocallbackg that initialization
    // has been completed. It is made before _cgo_notify_runtime_init_done,
    // so all cgo calls can rely on it existing. When main_init is complete,
    // it is closed, meaning cgocallbackg can reliably receive from it.
    var main_init_done chan bool
    
    //go:linkname main_main main.main
    func main_main()
    
    // mainStarted indicates that the main M has started.
    var mainStarted bool
    
    // runtimeInitTime is the nanotime() at which the runtime started.
    var runtimeInitTime int64
    
    // Value to use for signal mask for newly created M's.
    var initSigmask sigset
    
    // The main goroutine.
    func main() {
    	g := getg()
    
    	// Racectx of m0->g0 is used only as the parent of the main goroutine.
    	// It must not be used for anything else.
    	g.m.g0.racectx = 0
    
    	// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
    	// Using decimal instead of binary GB and MB because
    	// they look nicer in the stack overflow failure message.
    	if sys.PtrSize == 8 {
    		maxstacksize = 1000000000
    	} else {
    		maxstacksize = 250000000
    	}
    
    	// Allow newproc to start new Ms.
    	mainStarted = true
    
    	if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
    		systemstack(func() {
    			newm(sysmon, nil)
    		})
    	}
    
    	// Lock the main goroutine onto this, the main OS thread,
    	// during initialization. Most programs won't care, but a few
    	// do require certain calls to be made by the main thread.
    	// Those can arrange for main.main to run in the main thread
    	// by calling runtime.LockOSThread during initialization
    	// to preserve the lock.
    	lockOSThread()
    
    	if g.m != &m0 {
    		throw("runtime.main not on m0")
    	}
    
    	doInit(&runtime_inittask) // must be before defer
    	if nanotime() == 0 {
    		throw("nanotime returning zero")
    	}
    
    	// Defer unlock so that runtime.Goexit during init does the unlock too.
    	needUnlock := true
    	defer func() {
    		if needUnlock {
    			unlockOSThread()
    		}
    	}()
    
    	// Record when the world started.
    	runtimeInitTime = nanotime()
    
    	gcenable()
    
    	main_init_done = make(chan bool)
    	if iscgo {
    		if _cgo_thread_start == nil {
    			throw("_cgo_thread_start missing")
    		}
    		if GOOS != "windows" {
    			if _cgo_setenv == nil {
    				throw("_cgo_setenv missing")
    			}
    			if _cgo_unsetenv == nil {
    				throw("_cgo_unsetenv missing")
    			}
    		}
    		if _cgo_notify_runtime_init_done == nil {
    			throw("_cgo_notify_runtime_init_done missing")
    		}
    		// Start the template thread in case we enter Go from
    		// a C-created thread and need to create a new thread.
    		startTemplateThread()
    		cgocall(_cgo_notify_runtime_init_done, nil)
    	}
    
    	doInit(&main_inittask)
    
    	close(main_init_done)
    
    	needUnlock = false
    	unlockOSThread()
    
    	if isarchive || islibrary {
    		// A program compiled with -buildmode=c-archive or c-shared
    		// has a main, but it is not executed.
    		return
    	}
    	fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
    	fn()
    	if raceenabled {
    		racefini()
    	}
    
    	// Make racy client program work: if panicking on
    	// another goroutine at the same time as main returns,
    	// let the other goroutine finish printing the panic trace.
    	// Once it does, it will exit. See issues 3934 and 20018.
    	if atomic.Load(&runningPanicDefers) != 0 {
    		// Running deferred functions should not take long.
    		for c := 0; c < 1000; c++ {
    			if atomic.Load(&runningPanicDefers) == 0 {
    				break
    			}
    			Gosched()
    		}
    	}
    	if atomic.Load(&panicking) != 0 {
    		gopark(nil, nil, waitReasonPanicWait, traceEvGoStop, 1)
    	}
    
    	exit(0)
    	for {
    		var x *int32
    		*x = 0
    	}
    }