XV6(1) Introduction and Examples

2023-08-09 14:53:19 # MIT # 6.S081

Chapter 1 Introduction and Examples

操作系统应该提供的功能

抽象硬件 Abstract Hardware
- 硬件是底层级的资源，一些应用程序实现了高层级的接口和抽象，例如进程、文件系统。这些高层级的接口和抽象（Abstraction）方便了应用的开发，也提供了更好的移植性。
多进程支持 Multiplex
进程间隔离 Isolation
数据的共享 Sharing
权限系统 Security
帮助应用程序获得高性能 Performance
支持大量不同应用程序

操作系统结构

user application: VI、CC、Shell等，这里程序都运行在同一个空间中，这个空间通常会被称为用户空间（User space）。
kernel: 为运行的程序提供服务的一种特殊程序。只能有一个
- 文件系统
- 进程管理系统：每个运行着的程序叫做进程，它们有自己的内存(存储指令、数据和堆栈)和共享的CPU时间。Kernel会管理内存的分配。
- Access Control (security)
h/w: CPU, RAM, disk, net, &c

什么是 application / kernel interface?

每当进程需要调用内核服务时，它会触发一个system call(系统调用)，system call进入内核执行相应的服务然后返回。因此，进程在user space和kernel space中交替执行

shell: 一个普通的程序，其功能是读取用户输入的命令并执行它们，shell不是内核的一部分

1.1 Processes and memory

每个进程拥有自己的用户空间内存以及内核空间状态，当进程不再执行时xv6将存储和这些进程相关的CPU寄存器直到下一次运行这些进程。kernel将每一个进程用一个PID(process identifier)指代。

常用syscall

fork: 形式: int fork()。
- 其作用是让一个进程生成另外一个和这个进程的内存内容相同的子进程。在父进程中，fork的返回值是这个子进程的PID，在子进程中，返回值是0
exit: 形式: int exit(int status)。
- 让调用它的进程停止执行并且将内存等占用的资源全部释放。需要一个整数形式的状态参数，0代表以正常状态退出，1代表以非正常状态退出
wait: 形式: int wait(int *status)。
- 返回当前进程的已退出（或已终止）子进程的PID，子进程的退出状态存储到status这个地址中。如果没有已退出的子进程则等待。如果调用者没有子进程，wait立即返回-1。如果不关心子进程的退出状态，可以传递0地址
- 如果有任何子进程退出，wait就会直接返回。因此如果有多个子进程，需要调用多个wait

int pid = fork();
if (pid > 0) {
    printf("parent: child=%d\n", pid);
    pid = wait((int *) 0);
    printf("child %d is done\n", pid);
} else if (pid == 0) {
    printf("child: exiting\n");
    exit(0);
} else {
    printf("fork error\n");
}

前两行输出可能是

1 2	`parent: child=1234 child: exiting`

也可能是

1 2	`child: exiting parent: child=1234`

这是因为在fork了之后，父进程和子进程将同时开始判断PID的值，在父进程中，PID为1234，而在子进程中，PID为0。输出取决于哪个进程先调用了printf

子进程在判断完pid == 0之后将exit，父进程发现子进程exit之后，wait执行完毕，打印输出

1	`parent: child 1234 is done`

尽管fork了之后子进程和父进程有相同的内存内容，但是内存地址和寄存器是不一样的，也就是说在一个进程中改变变量并不会影响另一个进程。

exec: 形式: int exec(char *file, char *argv[])。
- 加载一个文件，获取执行它的参数，执行。将调用进程的内存替换为从文件加载的内存映像。如果执行错误返回-1，执行成功则不会返回(内存已经被替换，没有地方返回)，而是开始从文件入口位置开始执行命令。文件必须是ELF格式。
- 如果直接调用exec，这里会用echo指令来替代Shell进程，当echo退出了，一切就结束了。因此常见方法是先执行fork，子进程再执行exec
- 参数
  - 第一个参数是包含可执行文件的文件名
  - 第二个参数是字符串参数数组

char *argv[3];

argv[0] = "echo";
argv[1] = "hello";
argv[2] = 0; //标记数组的结尾
exec("/bin/echo", argv);
printf("exec error\n");

大多数程序忽略参数数组的第一个元素，通常是该程序的名称

执行流程

xv6 shell使用以上四个system call来为用户执行程序。在shell进程的main中主循环先通过getcmd来从用户获取命令，然后调用fork来运行一个和当前shell进程完全相同的子进程。父进程调用wait等待子进程exec执行完（在runcmd中调用exec）

/* sh.c */
int
main(void)
{
  static char buf[100];
  int fd;

  // Ensure that three file descriptors are open.
  while((fd = open("console", O_RDWR)) >= 0){
    if(fd >= 3){
      close(fd);
      break;
    }
  }

  // Read and run input commands.
  while(getcmd(buf, sizeof(buf)) >= 0){
    if(buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' '){
      // Chdir must be called by the parent, not the child.
      buf[strlen(buf)-1] = 0;  // chop \n
      if(chdir(buf+3) < 0)
        fprintf(2, "cannot cd %s\n", buf+3);
      continue;
    }
    if(fork1() == 0)
      runcmd(parsecmd(buf));
    wait(0);
  }
  exit(0);
}

Xv6隐式分配大部分用户空间内存：fork分配父内存的子副本所需的内存，exec分配足够的内存来保存可执行文件。在运行时需要更多内存的进程（可能是malloc）可以调用sbrk(n)将其数据内存增加 n 个字节；sbrk返回新内存的位置。

System call	Description
int fork()	创建进程，返回子进程PID
int exit(int status)	终止当前进程；status报告给wait()，没有返回值
int wait(int *status)	等待子进程结束；结束状态存放在status，返回子进程PID
int kill(int pid)	终止进程号为pid的进程，返回0，出现错误返回-1
int getpid()	获取当前进程PID
int sleep(int n)	暂停n个时钟滴答
int exec(char file, char argv[])	加载文件并按参数执行，只在出现错误才返回
char *sbrk(int n)	进程内存增长n字节，返回新内存的起始位置
int open(char *file, int flags)	打开文件，flags指示读/写；返回fd
int write(int fd, char *buf, int n)	从buf写n个字节到fd，返回n
int read(int fd, char *buf, int n)	从fd读n个字节到buf，返回读到的数量，文件结尾返回0
int close(int fd)	关闭打开的fd
int dup(int fd)	返回一个新的fd，和参数fd引用相同对象
int pipe(int p[])	创建管道，把读、写fd放到p[0],p[1]
int chdir(char *dir)	改变当前目录
int mkdir(char *dir)	创建新目录
int mknod(char *file, int, int)	创建设备文件
int fstat(int fd, struct stat *st)	将打开文件的信息存放到st
int stat(char file, struct stat st)	根据文件名将文件信息存放到st
int link(char file1, char file2)	给 f1 创建一个新名字(f2)
int unlink(char *file)	删除文件

1.2 I/O and File descriptors

copy.c

#include "kernel/types.h"
#include "user/user.h"

int main() {
    char buf[64];
    
    while(1) {
        int n = read(0, buf, sizeof(buf));
        if(n <= 0)
            break;
        write(1, buf, n);
    }
    
    exit(0);
}

file descriptor：文件描述符，用来表示一个被内核管理的、可以被进程读/写的对象的一个整数，通常将对象称之为文件。表现形式类似于字节流，通过打开文件、目录、设备、创建管道等方式获得。一个文件被打开得越早，文件描述符就越小。

每个进程都拥有自己独立的文件描述符列表，其中0是标准输入，1是标准输出，2是标准错误。shell将保证总是有3个文件描述符是可用的
1
2
3
4
5
6
while((fd = open("console", O_RDWR)) >= 0){ if(fd >= 3){ close(fd); break; } }

read和write：形式：int read(int fd, char *buf, int n)和int write(int fd, char *buf, int n)

read从文件描述符 fd 读 n 字节的内容写入buf，返回值是成功读取的字节数。
write向文件描述符 fd 写 n 字节buf的内容，返回值是成功写入的字节数。
每个文件描述符有一个offset，read会从这个offset开始读取内容，读完n个字节之后将这个offset后移n个字节，下一个read将从新的offset开始读取字节。write也有类似的offset
参数
- 第一个参数是文件描述符
- 第二个参数是指向某段内存的指针，程序可以通过指针对应的地址读取内存中的数据
- 第三个参数是代码想读取/写入的最大长度
read如果到达了文件的结尾没有更多的内容了，read会返回0。如果出现了一些错误，比如文件描述符不存在，read会返回-1。

/* essence of cat program */
char buf[512];
int n;
  
for (;;) {
    n = read(0, buf, sizeof buf);
    if (n == 0)
        break;
    if (n < 0){
        fprintf(2, "read errot\n");
        exit(1);
    }
    if (write(1, buf, n) != n){
        fprintf(2, "write error\n");
        exit(1);
    }
}

cat不知道它是从文件、控制台还是管道中读取。类似地，cat也不知道它是否正在打印到控制台、文件或其他任何东西。通过使用输入文件描述符0 以及输出文件描述符1 可以简单地实现cat。

close。形式：int close(int fd)。

将打开的文件fd释放，使该文件描述符可以被后面的open、pipe、dup等system call使用。新分配的文件描述符始终是当前进程中编号最低的未使用描述符

使用close来修改file descriptor table能够实现I/O重定向

/* implementation of I/O redirection,
 * more specifically, cat < input.txt
 */
char *argv[2];

argv[0] = "cat";
argv[1] = 0;  
if (fork() == 0) {
    // in the child process
    close(0);  // this step is to release the stdin file descriptor
    open("input.txt", O_RDONLY); // the newly allocated fd for input.txt is 0, since the previous fd 0 is released
    exec("cat", argv); // execute the cat program, by default takes in the fd 0 as input, which is input.txt
}

父进程的fd table将不会被子进程fd table的变化影响，但是文件中的offset将被共享。

if(fork() == 0) {
    write(1, "hello ", 6);
    exit(0);
} else {
    wait(0);
    write(1, "world\n", 6);
}
/* hello world */

open: 形式: int open(char *file, int flags)。

返回fd(当前进程中编号最低的未使用描述符)
参数
- 第一个参数是要打开的文件名
- 第二个参数由一组标志组成，用位表示，控制open的功能，可能的值有O_RDONLY, O_WRONLY, O_RDWR, O_CREATE, O_TRUNC。指示open打开文件进行读取；写入；读写；如果文件不存在，则创建文件；将文件截断为零长度。

// open.c: create a file, write to it.

#include "kernel/types.h"
#include "user/user.h"
#include "kernel/fcntl.h"

int main() {
  int fd = open("output.txt", O_WRONLY | O_CREATE);
  write(fd, "ooo\n", 4);

  exit(0);
}

dup: 形式：int dup(int fd)。

复制现有的fd，返回一个引用相同底层I/O对象的新fd。两个fd共享一个offset
1
2
3
4
fd = dup(1); write(1, "hello ", 6); write(fd, "world\n", 6); // outputs hello world
除了dup和fork之外，其他方式不能使两个I/O对象共享同一个offset，比如同时open相同的文件

xv6本身是不支持标准错误文件描述符重定向的，我们可以通过dup来实现
1
ls existing-file non-existing-file > tmp1 2>&1
2>&1告诉shell fd2 是 fd1 的复制，这样存在文件的文件名和不存在文件的错误信息都会出现在文件tmp1中。

1.3 Pipes

pipe: 管道是一个小的内核缓冲区，它以文件描述符对的形式提供给进程，一个用于写操作，一个用于读操作。从管道的一端写的数据可以从管道的另一端读取。管道提供了一种进程间交互的方式。

pipe: 形式为int pipe(int p[])，p[0]为读取的文件描述符，p[1]为写入的文件描述符

/* 示例代码运行了程序 wc，它的标准输入绑定到了一个管道的读端口。 */
int p[2];
char *argv[2];
  
argv[0] = "wc";
argv[1] = 0;
  
pipe(p); // read fd put into p[0], write fd put into p[1]
if (fork() == 0) {
    close(0);
    dup(p[0]); // 子进程将管道的读端口拷贝在描述符0(标准输入)上
    close(p[0]); // original read end of pipe is closed
    close(p[1]); // fd p[1] is closed in child process, but not closed in the parent process. 注意这里关闭p[1]非常重要，因为如果不关闭p[1]，管道的读取端会一直等待读取，wc就永远也无法等到EOF
    exec("/bin/wc", argv); // by default wc will take fd 0 as the input, which is the read end of pipe in this case
} else {
    close(p[0]); // close the read end of pipe in parent process will not affect child process
    write(p[1], "hello world\n", 12); 
    close(p[1]); // write end of pipe closed, the pipe shuts down
}

如果数据没有准备好，那么对管道执行的read会一直等待，直到有数据了或者其他绑定在这个管道写端口的描述符都已经关闭了。在后一种情况中，read 会返回 0，就像是一份文件读到了最后。
xv6中的实现和上述的类似

case PIPE:
pcmd = (struct pipecmd*)cmd;
if(pipe(p) < 0)
    panic("pipe");
if(fork1() == 0){
    // in child process
    close(1); // close stdout
    dup(p[1]); // make the fd 1 as the write end of pipe
    close(p[0]);
    close(p[1]);
    runcmd(pcmd->left); // run command in the left side of pipe |, output redirected to the write end of pipe
}
if(fork1() == 0){
    // in child process
    close(0); // close stdin
    dup(p[0]); // make the fd 0 as the read end of pipe
    close(p[0]);
    close(p[1]);
    runcmd(pcmd->right); //  run command in the right side of pipe |, input redirected to the read end of pipe
}
close(p[0]);
close(p[1]);
wait(0); // wait for child process to finish
wait(0); // wait for child process to finish
break;

管道看上去也可以用临时文件来替代，但有四个不同点
1
2
echo hello world | wc echo hello world > /tmp/xyz; wc < /tmp/xyz
- 管道会进行自我清扫，如果是 shell 重定向的话，我们必须要在任务完成后删除 /tmp/xyz
- 管道可以传输任意长度的数据
- 管道允许同步：两个进程可以使用一对管道来进行二者之间的信息传递，每一个读操作都阻塞调用进程，直到另一个进程用 write 完成数据的发送。
- 实现进程间通信，管道效率更高

1.4 File system

xv6文件系统包含了文件(byte arrays)和目录(对其他文件和目录的引用)。目录生成了一个树，树从根目录/开始。对于不以/开头的路径，认为是是相对路径

mkdir("/dir");
fd = open("/dir/file", O_CREATE|O_WRONGLY);
close(fd);
mknod("/console", 1, 1);

mknod：形式：int mknod(char *file, int, int)
- 创建设备文件，一个设备文件有一个major device和一个minor device，用来唯一确定这个设备。当一个进程打开了这个设备文件时，内核会将read和write的system call重新定向到设备上。
一个文件的名称和文件本身是不一样的，文件本身，也叫inode。可以有多个名字，也叫link，每个link包括了一个文件名和一个对inode的引用。一个inode存储了文件的元数据，包括该文件的类型(file, directory or device)、大小、文件在硬盘中的存储位置以及指向这个inode的link的个数

fstat。形式：int fstat(int fd, struct stat *st)

将inode中的相关信息存储到st中。

#define T_DIR 1 // Directory
#define T_FILE 2 // File
#define T_DEVICE 3 // Device
struct stat {
    int dev; // File system’s disk device
    uint ino; // Inode number
    short type; // Type of file
    short nlink; // Number of links to file
    uint64 size; // Size of file in bytes
};

link。形式：int link(char *file1, char *file2)
- 创建一个指向同一个inode的文件名。unlink则是将一个文件名从文件系统中移除，只有当指向这个inode的文件名的数量为0时这个inode以及其存储的文件内容才会被从硬盘上移除
1
2
3
4
open("a", O_CREATE|O_WRONLY); link("a", "b"); unlink("a");
- 读写 a 就相当于读写 b。每一个 inode 都由一个唯一的 inode 号 直接确定。在上面这段代码中，我们可以通过 fstat 知道 a 和 b 都指向同样的内容：a 和 b 都会返回同样的 inode 号（ino），并且 nlink 数会设置为2。

注意：Unix提供了许多在用户层面的程序来执行文件系统相关的操作，比如mkdir、ln、rm等，而不是将其放在shell或kernel内，这样可以使用户比较方便地在这些程序上进行扩展。

但是cd是一个例外，它是在shell程序内构建的，因为它必须要改变这个calling shell本身指向的路径位置，如果是一个和shell平行的程序，那么它必须要调用一个子进程，在子进程里起一个新 shell 再进行cd，cd只会改变子进程的当前工作目录。父进程的工作目录保持原样。

2023-08-09 14:53:19 # MIT # 6.S081