[20171121]参数filesystemio_options=asynch.txt
--//首先给出oracle官方的解析:
https://docs.oracle.com/cd/E11882_01/server.112/e41573/os.htm#PFGRF94410
9.1.1.2 FILESYSTEMIO_OPTIONS Initialization Parameter
You can use the FILESYSTEMIO_OPTIONS initialization parameter to enable or disable asynchronous I/O or direct I/O on   
file system files. This parameter is platform-specific and has a default value that is best for a particular platform.
FILESYTEMIO_OPTIONS can be set to one of the following values:
ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.   
DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.    
SETALL: enable both asynchronous and direct I/O on file system files.    
NONE: disable both asynchronous and direct I/O on file system files.
1.环境:   
--// 说明数据库已经全部移动到本地磁盘.    
SYS@book> @ &r/ver1    
PORT_STRING          VERSION    BANNER    
-------------------- ---------- ----------------------------------------------------------------------------    
x86_64/Linux 2.4.xx  11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
SYS@book> alter system set filesystemio_options=asynch scope=spfile;   
System altered.
--//重启数据库.
2.建立测试环境:   
--//以前已经测试直接路径读的情况,为了避免直接路径读干扰关闭直接路径读.    
SCOTT@book> alter system set "_serial_direct_read"=never scope=memory;    
Session altered.
SCOTT@book> create table t as select rownum id from dual connect by level<=2;   
Table created.
SCOTT@book> ALTER TABLE t MINIMIZE RECORDS_PER_BLOCK ;   
Table altered.    
--//这样可以实现每块2条记录.
SCOTT@book> insert into t select rownum+2 from dual connect by level <=8e4-2;   
79998 rows created.
SCOTT@book> commit ;   
Commit complete.
--//分析表略.
SCOTT@book> select OWNER,SEGMENT_NAME,SEGMENT_TYPE,HEADER_FILE,HEADER_BLOCK,BYTES,BLOCKS from dba_segments where owner=user and segment_name='T';   
OWNER  SEGMENT_NAME         SEGMENT_TYPE       HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS    
------ -------------------- ------------------ ----------- ------------ ---------- ----------    
SCOTT  T                    TABLE                        4          546  335544320      40960
--//占用335544320/1024/1024=320M,40960块.
SCOTT@book> select object_id,data_object_id from dba_objects where owner=user and object_name='T';   
 OBJECT_ID DATA_OBJECT_ID    
---------- --------------    
     90736          90736
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     37771    
     
--//仅仅缓存1部分.
3.测试os缓存与oracle的数据缓存(部分)的情况:   
$ cachestats /mnt/ramdisk/book/users01.dbf    
/mnt/ramdisk/book/users01.dbf            pages in cache: 553418/554498 (99.8%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     37764
SCOTT@book> set timing on   
SCOTT@book> select count(*) from t;    
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:00.19    
--//访问需要0.19秒.
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     40446    
Elapsed: 00:00:00.07
4.测试os不缓存,而oracle的数据缓存(部分)的情况:   
# cachedel /mnt/ramdisk/book/users01.dbf    
$ cachestats /mnt/ramdisk/book/users01.dbf    
/mnt/ramdisk/book/users01.dbf            pages in cache: 0/554498 (0.0%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     40446    
Elapsed: 00:00:00.07
SCOTT@book> select count(*) from t;   
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:00.06    
--//你可以发现需要0.06秒.
$ cachestats /mnt/ramdisk/book/users01.dbf   
/mnt/ramdisk/book/users01.dbf            pages in cache: 6/554498 (0.0%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     40448
--//可以执行全表扫描后,os已经缓存了/mnt/ramdisk/book/users01.dbf文件达到99.6%.   
SCOTT@book> select count(*) from t;    
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:00.06
--//再次执行全表扫描,也就是证明我以前的判断,os缓存可能会掩盖不良sql语句,特别是与IO访问相关的sql语句.
5.测试os不缓存,而oracle的数据不缓存的情况:
SCOTT@book> alter system flush buffer_cache;   
System altered.
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
         0
$ cachedel /mnt/ramdisk/book/users01.dbf   
$ cachestats /mnt/ramdisk/book/users01.dbf    
/mnt/ramdisk/book/users01.dbf            pages in cache: 0/554498 (0.0%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> select count(*) from t;   
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:02.12    
--//os与数据库都没有缓存,需要2秒多.    
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';    
  COUNT(*)    
----------    
     40196
$ cachestats /mnt/ramdisk/book/users01.dbf   
/mnt/ramdisk/book/users01.dbf            pages in cache: 435524/554498 (78.5%)  [filesize=2217992.0K, pagesize=4K]
--//再次执行:   
SCOTT@book> select count(*) from t;    
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:00.08
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     40196
--//因为os与数据库缓存,第2次执行可以很快的完成.
6.测试os缓存,而oracle的数据不缓存的情况:
$ cachestats /mnt/ramdisk/book/users01.dbf   
/mnt/ramdisk/book/users01.dbf            pages in cache: 435524/554498 (78.5%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> alter system flush buffer_cache;   
System altered.
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
         0
SCOTT@book> select count(*) from t;   
  COUNT(*)    
----------    
     80000    
Elapsed: 00:00:00.33
--//再次体现os缓存对读取数据块的影响. 也就是在设置filesystemio_options=asynch的情况,操作系统文件缓存会掩盖IO操作相关的   
--//sql语句.当你剩余的内存不在满足也就是数据库出现问题的时候.
--//你可以发现我前面测试仅仅是读,没有测试写,实际上写日志采用lgwr进程,而写脏快是采用dbwN进程.而在修改这些块时依旧会读取   
--//数据块到数据缓存(直接路径读除外)以及os同样会缓存.这样os缓存的作用一样发挥作用.
7.继续测试修改的情况:
--//OS缓存和数据库不缓存的情况下:   
$ cachestats /mnt/ramdisk/book/users01.dbf    
/mnt/ramdisk/book/users01.dbf            pages in cache: 435526/554498 (78.5%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> alter system flush buffer_cache;   
System altered.
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
         0
SCOTT@book> update t set id=id  where id=1;   
1 row updated.    
Elapsed: 00:00:00.31
SCOTT@book> commit ;   
Commit complete.
SCOTT@book> select count(*) from v$bh where OBJD=90736 and STATUS<>'free';   
  COUNT(*)    
----------    
     40196
--//OS不缓存和数据库不缓存的情况下:     
$ cachedel /mnt/ramdisk/book/users01.dbf    
$ cachestats /mnt/ramdisk/book/users01.dbf    
/mnt/ramdisk/book/users01.dbf            pages in cache: 0/554498 (0.0%)  [filesize=2217992.0K, pagesize=4K]
SCOTT@book> alter system flush buffer_cache;   
System altered.
SCOTT@book> update t set id=id  where id=2;   
1 row updated.    
Elapsed: 00:00:02.88
SCOTT@book> commit ;   
Commit complete.
--//一样可以看出OS缓存对测试时间的影响.
8.存疑:   
--//在测试中我自己一直存在一个疑问:    
--//如果写日志也是使用这种缓存机制问题就大了(我的理解一直认为OS缓存实际上磁盘上对应文件的影子),因为os缓存可能因为掉电而    
--//没有真正写到磁盘的日志文件里.日志是oracle恢复机制的命根子,没有日志记录恢复就有问题.
$ find /mnt/ramdisk/book/ -name "redo0[123].*" -print0  | xargs -0 -I{} cachestats {}   
/mnt/ramdisk/book/redo03.log             pages in cache: 12256/12801 (95.7%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo01.log             pages in cache: 11301/12801 (88.3%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo02.log             pages in cache: 12801/12801 (100.0%)  [filesize=51200.5K, pagesize=4K]    
--//执行dml足够50M的日志文件切换多次,可以发现os还是缓存这些文件.执行如下查询:
SELECT file_no   
      ,filetype_id    
      ,filetype_name    
      ,ASYNCH_IO    
      ,access_method    
      ,RETRIES_ON_ERROR    
  FROM V$IOSTAT_FILE order by 1;
FILE_NO FILETYPE_ID FILETYPE_NAME                ASYNCH_IO ACCESS_METH RETRIES_ON_ERROR   
------- ----------- ---------------------------- --------- ----------- ----------------    
      0           0 Other                        ASYNC_OFF OS_LIB                     0    
      0           1 Control File                 ASYNC_OFF                            0    
      0          18 Data Pump Dump File          ASYNC_OFF                            0    
      0          17 Flashback Log                ASYNC_OFF                            0    
      0          12 Data File Copy               ASYNC_OFF                            0    
      0          11 Archive Log Backup           ASYNC_OFF                            0    
      0          10 Data File Incremental Backup ASYNC_OFF                            0    
      0           9 Data File Backup             ASYNC_OFF                            0    
      0           3 Log File                     ASYNC_OFF                            0    
      0           4 Archive Log                  ASYNC_OFF                            0    
      1           6 Temp File                    ASYNC_ON  OS_LIB                     0    
      1           2 Data File                    ASYNC_ON  OS_LIB                     0    
      2           2 Data File                    ASYNC_ON  OS_LIB                     0    
      3           2 Data File                    ASYNC_ON  OS_LIB                     0    
      4           2 Data File                    ASYNC_ON  OS_LIB                     0    
      5           2 Data File                    ASYNC_ON  OS_LIB                     0    
      6           2 Data File                    ASYNC_ON  OS_LIB                     0    
17 rows selected.
--//在这样的情况下实际上按照这个视图查询结果Log File是关闭asynchronous I/O,而实际上如果你跟踪lgwr进程还是可以发现   
--//采用asynchronous I/O    
--//注: ASYNCH_IO=ASYNC_ON,ACCESS_METH='OS_LIB' 表示ACCESS_METH使用OS_LIB吗? OS_LIB有表示什么???    
$ ps -ef | grep ora_lgwr_boo[k] | awk '{print $2}'    
45350
$ strace -p $(ps -ef | grep ora_lgwr_boo[k] | awk '{print $2}') -c   
Process 45350 attached - interrupt to quit    
^CProcess 45350 detached    
% time     seconds  usecs/call     calls    errors syscall    
------ ----------- ----------- --------- --------- ----------------    
   nan    0.000000           0        14           getrusage    
   nan    0.000000           0        61           times    
   nan    0.000000           0        15           io_getevents    
   nan    0.000000           0        15           io_submit    
   nan    0.000000           0        20         5 semtimedop    
------ ----------- ----------- --------- --------- ----------------    
100.00    0.000000                   125         5 total
--//依旧能看到io_submit,io_getevents函数调用,说明还是使用asynchronous I/O.   
--//找到一个链接:http://www.cnblogs.com/sopost/p/3589731.html,摘要如下:
LGWR 会绕过操作系统的缓冲,直接写入数据文件中,以确保REDO LOG 的信息不会因为操作系统出现故障(比如宕机)而丢失要求确保写入   
REDO LOG 文件的数据。    
--//这点很重要!!如何证明呢?我的感觉在设置filesystemio_options=asynch的情况下不是这样,而是也会OS缓存.我的测试:    
$ find  /mnt/ramdisk/book/ -name "redo0[123].log" -print | xargs -I{} cachedel {}    
$ find  /mnt/ramdisk/book/ -name "redo0[123].log" -print | xargs -I{} cachestats {}    
/mnt/ramdisk/book/redo03.log             pages in cache: 0/12801 (0.0%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo01.log             pages in cache: 1/12801 (0.0%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo02.log             pages in cache: 0/12801 (0.0%)  [filesize=51200.5K, pagesize=4K]
SCOTT@book> create table t1 tablespace tea as select rownum id ,lpad('A',32,'A') name from dual connect by level<=1e5;   
Table created.
SCOTT@book> @ &r/logfile   
GROUP# STATUS TYPE       MEMBER                          IS_ GROUP# THREAD# SEQUENCE#       BYTES BLOCKSIZE MEMBERS ARC STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME    
------ ------ ---------- ------------------------------- --- ------ ------- --------- ----------- --------- ------- --- ---------- ------------- ------------------- ------------ -------------------    
     1        ONLINE     /mnt/ramdisk/book/redo01.log    NO       1       1       959    52428800       512       1 NO  CURRENT      13279770540 2017-11-17 17:16:45 2.814750E+14    
     2        ONLINE     /mnt/ramdisk/book/redo02.log    NO       2       1       957    52428800       512       1 YES INACTIVE     13279760694 2017-11-17 17:04:36  13279770469 2017-11-17 17:16:37    
     3        ONLINE     /mnt/ramdisk/book/redo03.log    NO       3       1       958    52428800       512       1 YES INACTIVE     13279770469 2017-11-17 17:16:37  13279770540 2017-11-17 17:16:45    
     4        STANDBY    /mnt/ramdisk/book/redostb01.log NO    
     5        STANDBY    /mnt/ramdisk/book/redostb02.log NO    
     6        STANDBY    /mnt/ramdisk/book/redostb03.log NO    
     7        STANDBY    /mnt/ramdisk/book/redostb04.log NO    
7 rows selected.
$ find  /mnt/ramdisk/book/ -name "redo0[123].log" -print | xargs -I{} cachestats {}   
/mnt/ramdisk/book/redo03.log             pages in cache: 0/12801 (0.0%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo01.log             pages in cache: 1269/12801 (9.9%)  [filesize=51200.5K, pagesize=4K]    
/mnt/ramdisk/book/redo02.log             pages in cache: 0/12801 (0.0%)  [filesize=51200.5K, pagesize=4K]
--//你可以发现os 缓存还是缓存了/mnt/ramdisk/book/redo01.log,10%.
实际上,虽然Oracle 数据库使用了绕过缓冲直接写REDO LOG 文件的方法,以避免操作系统故障导致的数据丢失,不过我们还是无法确保   
这些数据已经确确实实被写到了物理磁盘上。因为我们RDBMS 使用的绝大多数存储系统都是带有写缓冲的,写缓冲可以有效的提高存储系    
统写性能,不过也带来了另外的一个问题,就是说一旦存储出现故障,可能会导致REDO LOG 的信息丢失,甚至导致REDO LOG 出现严重损    
坏。存储故障的概率较小,不过这种小概率事件一旦发生还是会导致一些数据库事务的丢失,因此虽然Oracle 的内部算法可以确保一旦    
事务提交成功,事务就确认被保存完毕了,不过还是可能出现提交成功的事务丢失的现象。
实际上,Oracle 在设计REDO LOG 文件的时候,已经最大限度的考虑了REDO LOG 文件的安全性,REDO LOG 文件的BLOCK SIZE 和数据库   
的BLOCK SIZE 是完全不同的,REDO LOG 文件的BLOCK SIZE 是和操作系统的IO BLOCK SZIE 完全相同的,这种设计确保了一个REDO LOG    
BLOCK 是在一次物理IO 中同时写入的,因此REDOLOG BLOCK 不会出现块断裂的现象。    
----------------------------
9.总结:   
通过前面的比较可以发现设置filesystemio_options=asynch,读取数据块时会充分利用os的缓存,这样如果在服务器内存充足的情况下,掩盖    
IO相关操作的sql语句.当大量IO操作出现,而服务器内存不足的情况下,不良sql语句才会严重影响数据库性能.
如果设置filesystemio_options=none,这个是缺省设置,只不过不能利用asynchronous I/O特性罢了,其他与设置asynch相似,不再测试.
如果设置filesystemio_options=asynch是否会冒丢失数据的风险,我的理解os缓存实际上磁盘上对应文件的影子,当将脏块写盘时实际上   
更新os上的缓存,再由os 操作系统更新到磁盘(不知道我的理解是否正确,如何证明...).如果没更新到磁盘时掉电,oracle重启时要读取日    
志恢复,最终取决于日志是否准确记录dml相关操作,如果这部分内容存在,理论将数据不会丢失.
作为dba一定要了解该参数具体含义,而不是道听途说,下一次讲解设置filesystemio_options=setall的情况.
最后我给承认许多知识理解还不是很透彻,完全是我自己乱猜想....希望内行高手指正.
--//如果你看Oracle Database11g DBA Handbook.pdf,提到:
Using Raw Devices P291
Raw devices are available with most Unix operating systems. When they are used, Oracle bypasses   
the Unix buffer cache and eliminates the file system overhead. For I/O-intensive applications,    
they may result in a performance improvement of around 20 percent over traditional file systems    
(and a slightly smaller improvement over Automatic Storage Management). Recent file system    
enhancements have largely overcome this performance difference.
Raw devices cannot be managed with the same commands as file systems. For example, the   
tar command cannot be used to back up individual files; instead, the dd command must be used.    
This is a much less flexible command to use and limits your recovery capabilities.
--//我个人的观点还是给辩证的看待,不要小看os的缓存能掩盖数据库IO性能问题.