本期主题:
unix环境高级编程——文件IO
0.引言
在Unix系统中,对于大多数文件IO只需要用到5个函数:open、read、write、lseek、close。这些函数通常被称为不带缓冲的IO(unbuffered IO)
,与标准IO相对,不带缓冲的意思是当调用read、write时,都直接调用了内核的系统调用(system call)。
1.文件描述符
对于内核而言,一切皆文件,所有打开的文件都通过文件描述符引用。
文件描述符是一个非负整数
。当打开一个文件时,内核会向进程返回一个非负整数的文件描述符。举个例子,当读、写一个文件时,先用open返回一个文件描述符来描述该文件,并这个文件描述符作为参数传递给read、write。
按照惯例,UNIX的shell将标准输入与文件描述符0相关联,文件描述符1与标准输出相关联,文件描述符2与标准错误关联。
在/usr/include/unistd.h中定义标准输入、输出、错误
#define STDIN_FILENO 0
#define STDOUT_FILENO 1
#define STDERR_FILENO 2
2.IO编程中常用的API接口
1.open函数
作用: 调用open函数可以打开或者创建一个文件,若打开成功,则返回一个最小的未被使用的文件描述符,若失败则返回-1
参数说明: int open(const char *pathname, int flags, …/ * mode_t mode */ );
arg1: *pathname: 文件的路径名
arg2: flags: 表明access mode,是只读、只写还是可读可写
在linux中,可用man 2 open来查询open函数的说明
gary@ubuntu:~/workspaces/0.Server_Workspace$ man 2 open
OPEN(2) Linux Programmer's Manual OPEN(2)
NAME
open, openat, creat - open and possibly create a file
SYNOPSIS
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int creat(const char *pathname, mode_t mode);
int openat(int dirfd, const char *pathname, int flags);
int openat(int dirfd, const char *pathname, int flags, mode_t mode);
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
openat():
Since glibc 2.10:
_XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L
Before glibc 2.10:
_ATFILE_SOURCE
这里就不全部展开了,大家可以自己去查看一下
2.close函数
有open函数就会有对应的close函数
CLOSE(2) Linux Programmer's Manual CLOSE(2)
NAME
close - close a file descriptor
SYNOPSIS
#include <unistd.h>
int close(int fd);
DESCRIPTION
close() closes a file descriptor, so that it no longer refers to any file and may be reused. Any record locks (see fcntl(2)) held on the file it was associated
with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).
If fd is the last file descriptor referring to the underlying open file description (see open(2)), the resources associated with the open file description are freed;
if the descriptor was the last reference to a file which has been removed using unlink(2), the file is deleted.
3.read函数
调用read函数从打开文件中读取数据
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
On files that support seeking, the read operation commences at the current file offset, and the file offset is incremented by the number of bytes read. If the cur‐
rent file offset is at or past the end of file, no bytes are read, and read() returns zero.
If count is zero, read() may detect the errors described below. In the absence of any errors, or if read() does not check for errors, a read() with a count of 0
returns zero and has no other effects.
If count is greater than SSIZE_MAX, the result is unspecified.
实际读到的字节可能会少于要求读的字节数,比如以下这种情况:
在读要求字节数之前已经到达了文件的尾端,
- 例如这个文件只有10个字节,要求读100个字节,但实际只会返回10个字节;
- 下一次再调用read时,会返回0,因为已经到了尾端;
看个例子来帮助理解:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd, size, ret;
char buf[100];
fd = open("./temp", O_RDONLY);
size = read(fd, buf, 100);
printf("size is %d\n", size);
size = read(fd, buf, 100);
printf("size is %d\n", size);
}
gary@ubuntu:~/workspaces/0.Server_Workspace/2.Unix_Advanced_Program/1.Unix_IO/1.Unbufferd_IO$ od -c temp
0000000 h e l l o w o r l d ! \n
gary@ubuntu:~/workspaces/0.Server_Workspace/2.Unix_Advanced_Program/1.Unix_IO/1.Unbufferd_IO$ ./a.out
size is 13
size is 0
temp是我创建的一个普通文件,里面的内容是Hello world!,一共13个字节,第一次要求读100字节,但是文件只有13个字节,所以返回13,第二次read由于已经到达文件尾了,所以返回0。
linux可重定向的小技巧
- “>”或”1>”输出重定向:把前面输出的东西输入到后边的文件中,会清除文件原有的内容
- “>>”或”1>>” 追加输出重定向:把前面输出的东西追加到后边的文件尾部,不会清除文件原有的内容
- “<”或”0<”输入重定向:输入重定向用于改变命令的输入,后面指定输入内容,后面跟文件名
- “<<” 或 “0<<” 追加输入重定向:后面跟字符串,用来表示“输入结束”,也可用Ctrl + D 来结束输入
因此上面的例子也可以这么验证:
int main(void)
{
int fd, size, ret;
char buf[100];
// fd = open("./temp", O_RDONLY);
size = read(STDIN_FILENO, buf, 100);
printf("size is %d\n", size);
}
gary@ubuntu:~/workspaces/0.Server_Workspace/2.Unix_Advanced_Program/1.Unix_IO/1.Unbufferd_IO$ ./a.out < temp
size is 13
4.write函数
调用write函数向打开文件写数据
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
5.lseek函数
每个打开的文件都有一个"当前文件偏移量(current file offset)"
SYNOPSIS
#include <sys/types.h>
#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
DESCRIPTION
The lseek() function repositions the offset of the open file associated with the file descriptor fd to the argument offset according to the directive whence as fol‐
lows:
SEEK_SET
The offset is set to offset bytes.
SEEK_CUR
The offset is set to its current location plus offset bytes.
SEEK_END
The offset is set to the size of the file plus offset bytes.
看一个例子,用od(1)命令来比较文件的差异
int main(void)
{
int fd, size, ret;
char test_buf[] = "ABCDEFG";
char buf[100];
fd = open("./temp", O_RDWR);
size = read(fd, buf, 100);
printf("size is %d\n", size);
ret = lseek(fd, 20, SEEK_SET);
if (ret == -1)
{
printf("cannot seek !\n");
}
else
printf("seek OK!\n");
write(fd, test_buf, sizeof(test_buf));
}
gary@ubuntu:~/workspaces/0.Server_Workspace/2.Unix_Advanced_Program/1.Unix_IO/1.Unbufferd_IO$ od -c temp
0000000 h e l l o w o r l d ! \n \0 \0 \0
0000020 \0 \0 \0 \0 A B C D E F G \0
0000034
3.函数sync、fsync和fdatasync
unix系统实现在内核中设有缓冲区高速缓存,大多数的磁盘IO都通过缓冲区缓存。当我们向文件写入数据时,内核通常先将数据复制到缓冲区中,然后排入队列,最后晚些再写入磁盘,这种被称为延迟写。UNIX提供了这几种函数:
int fsync(int fd);
int fdatasync(int fd);
void sync(void);
通常被称为 update的系统守护进程会周期性调用sync函数
,从而来保证定期冲洗内核的块缓冲区。