php-profiler 是一个 php 性能分析工具,类似 gdb/strace
等调试软件,可以查看当前正在运行的 php 脚本执行到哪一行代码了
相比 xdebug 等性能分析工具,该软件是无侵入的,即不需要添加 php 扩展,而且对 php 的性能基本是无损的。(使用 xdebug 的话会对 php 本身的性能有影响)
通过对这个程序的分析,可以知道一些调试软件的实现原理。
在 linux 软件中,即使是二进制执行文件,我们也是有办法提取 『软件源码』 的(通过 cat /proc/{pid}/maps
命令获取)。
maps
信息大致如下:
➜ cat /proc/14/maps | head -n 20 561609fa9000-56160a0b5000 r--p 00000000 fe:01 797066 /usr/local/bin/php 56160a1a9000-56160a53c000 r-xp 00200000 fe:01 797066 /usr/local/bin/php ... 7f49c4686000-7f49c4687000 r--p 00018000 fe:01 1580983 /usr/lib/x86_64-linux-gnu/libzip.so.4.0 7f49c4687000-7f49c4688000 rw-p 00019000 fe:01 1580983 /usr/lib/x86_64-linux-gnu/libzip.so.4.0 7f49c4688000-7f49c468d000 r--p 00000000 fe:01 1710251 /usr/local/lib/php/extensions/no-debug-non-zts-20190902/zip.so
通过 maps 信息,我们可以得到哪些信息呢?
1. 当前运行的 php 二进制路径 2. 是否使用了 libpthread 以及 libpthread 的路径 3. 执行文件的 base address(后面会用到)【即 maps 文件中,begin 字段中最小的值】
通过 php 二进制路径,我们可以通过读取二进制的 elf 信息,从而获得 php 源码中某个变量的信息,采用命令 readelf -a /path/to/bin/php
可以实现相同的效果,例如
➜ readelf -a `which php` | grep executor_globals 2279: 0000000001222d20 1664 OBJECT GLOBAL DEFAULT 26 executor_globals
至此,我们已经可以拿到 php 源码中 executor_globals
这个符号的『相对地址』了,有了『相对地址』还要获取『基地址(base address)』。 基地址+相对地址
才是符号的真正地址
获取 base address
1. 如果程序有加载 libpthread,那么 base address需要特定的方法获取
1.1. 读取寄存器 fs_base 里面,读取另外一个程序的寄存器值是需要通过 ptrace 来实现
1.2. 然后加上 libpthread 的其他符号相对地址,得到 base address,具体可以看代码 https://github.com/sj-i/php-profiler/blob/2f47fdfba183ed31507064aa4df2d95867a12375/src/Lib/Elf/Tls/LibThreadDbTlsFinder.php#L66
有了某个变量的 address 之后,我们需要通过 address 解析出具体的值,在 linux 中,该功能是通过 process_vm_readv
这个函数调用来实现的
#include <sys/uio.h> ssize_t process_vm_readv(pid_t pid, const struct iovec *local_iov, unsigned long liovcnt, const struct iovec *remote_iov, unsigned long riovcnt, unsigned long flags); ssize_t process_vm_writev(pid_t pid, const struct iovec *local_iov, unsigned long liovcnt, const struct iovec *remote_iov, unsigned long riovcnt, unsigned long flags); struct iovec { void *iov_base; /* Starting address */ size_t iov_len; /* Number of bytes to transfer */ };
拿到 executor_globals 之后,我们就可以通过解析 $zend_executor_globals->current_execute_data
取得我们想要的信息
注意 $zend_executor_globals->current_execute_data
也是一个地址,我们需要继续调用 process_vm_readv
来读取 current_execute_data
current_execute_data
存储了当前 php 的运行函数。假设我们每隔 10ms 执行这个解析过程,我们就可以知道 php 每隔 10ms 都执行了哪些函数,就可以大致分析出 php 的执行过程
https://stackoverflow.com/questions/1401359/understanding-linux-proc-id-maps
address perms offset dev inode pathname
08048000-08056000 r-xp 00000000 03:0c 64593 /usr/sbin/gpm
1. address - This is the starting and ending address of the region in the process's address space
2. permissions - This describes how pages in the region can be accessed. There are four different permissions: read, write, execute, and shared. If read/write/execute are disabled, a - will appear instead of the r/w/x. If a region is not shared, it is private, so a p will appear instead of an s. If the process attempts to access memory in a way that is not permitted, a segmentation fault is generated. Permissions can be changed using the mprotect system call.
3. offset - If the region was mapped from a file (using mmap), this is the offset in the file where the mapping begins. If the memory was not mapped from a file, it's just 0.
4. device - If the region was mapped from a file, this is the major and minor device number (in hex) where the file lives.
5. inode - If the region was mapped from a file, this is the file number.
6. pathname - If the region was mapped from a file, this is the name of the file. This field is blank for anonymous mapped regions. There are also special regions with names like [heap], [stack], or [vdso]. [vdso] stands for virtual dynamic shared object. It's used by system calls to switch to kernel mode. Here's a good article about it: "What is linux-gate.so.1?" 301 Moved Permanently]]