原创,如果有误请指出
今天同事讨论关于主键使用varchar和int的区别。
我现在总结的3个问题:
1、tablespace中空间浪费
   当然我们知道使用varchar可能会导致辅助索引比较大,因为用到varchar可能存储的字符较多,同时
   在行头也存在一个可变字段字符区域(1-2)字节
   而辅助索引叶子结点毕竟都存储了主键值,这样至少会多varchar数据字节数量+1(或者2) 字节- 4(int)字节空间。
   如果辅助索引比较多空间浪费是可想而知的。
2、辅助索引B+树扫描性能
    由于辅助索引B+树的空间要求更大,虽然在B+树层次一般都是3层-4层,索引单值定位I/O消耗并不明显,如果涉及到
    范围查询(比如PAGE_CUR_G),需要访问的块就更多,同时比如例如辅助索引的using index,需要访问的块自然
    更多
3、比较更加复杂
   innodb 在进行元组比较的时候,不管是DML,select都会涉及到元组的比较,同时回表的时候也涉及
   到比较操作。而varchar类型的比较比int类型更为复杂一些。
那么我们就来分析第三个问题,第一个问题和第二个问题是显而易见的。
我这里数据库字符集为latin1/latin1_swedish_ci
其实在innodb底层进行比较的时候都调用cmp_data这个函数
在innodb中有自己的定义的数据类型如下:
	
	
		
			- 
				/*-------------------------------------------*/
 
- 
				/* The 'MAIN TYPE' of a column */
 
- 
				#define DATA_MISSING    0    /* missing column */
 
- 
				#define    DATA_VARCHAR    1    /* character varying of the
 
- 
				latin1_swedish_ci charset-collation; note
 
- 
				that the MySQL format for this, DATA_BINARY,
 
- 
				DATA_VARMYSQL, is also affected by whether the
 
- 
				'precise type' contains
 
- 
				DATA_MYSQL_TRUE_VARCHAR */
 
- 
				#define DATA_CHAR    2    /* fixed length character of the
 
- 
				latin1_swedish_ci charset-collation */
 
- 
				#define DATA_FIXBINARY    3    /* binary string of fixed length */
 
- 
				#define DATA_BINARY    4    /* binary string */
 
- 
				#define DATA_BLOB    5    /* binary large object, or a TEXT type;
 
- 
				if prtype & DATA_BINARY_TYPE == 0, then this is
 
- 
				actually a TEXT column (or a BLOB created
 
- 
				with < 4.0.14; since column prefix indexes
 
- 
				came only in 4.0.14, the missing flag in BLOBs
 
- 
				created before that does not cause any harm) */
 
- 
				#define    DATA_INT    6    /* integer: can be any size 1 - 8 bytes */
 
- 
				#define    DATA_SYS_CHILD    7    /* address of the child page in node pointer */
 
- 
				#define    DATA_SYS    8    /* system column */ 
			
 
 
我们熟悉的int类型属于DATA_INT而varchar属于DATA_VARCHAR,rowid属于DATA_SYS
在函数cmp_data根据各种类型的不同进行了不同比较的方式,这里就将int和varchar
判断的方式进行说明:
1、innodb int类型比较
实际上是在cmp_data中进行了大概的方式如下
	
	
		
			- 
				if (len) {
 
- 
				#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
 
- 
				/* Compare the first bytes with a loop to avoid the call
 
- 
				overhead of memcmp(). On x86 and x86-64, the GCC built-in
 
- 
				(repz cmpsb) seems to be very slow, so we will be calling the
 
- 
				libc version. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
 
- 
				tracks the slowness of the GCC built-in memcmp().
 
- 
				
 
- 
				
 
- 
				We compare up to the first 4..7 bytes with the loop.
 
- 
				The (len & 3) is used for "normalizing" or
 
- 
				"quantizing" the len parameter for the memcmp() call,
 
- 
				in case the whole prefix is equal. On x86 and x86-64,
 
- 
				the GNU libc memcmp() of equal strings is faster with
 
- 
				len=4 than with len=3.
 
- 
				
 
- 
				
 
- 
				On other architectures than the IA32 or AMD64, there could
 
- 
				be a built-in memcmp() that is faster than the loop.
 
- 
				We only use the loop where we know that it can improve
 
- 
				the performance. */
 
- 
				for (ulint i = 4 + (len & 3); i > 0; i--) {
 
- 
				cmp = int(*data1++) - int(*data2++);
 
- 
				if (cmp) {
 
- 
				return(cmp);
 
- 
				}
 
- 
				
 
- 
				
 
- 
				if (!--len) {
 
- 
				break;
 
- 
				}
 
- 
				}
 
- 
				my_strnncollsp_simple
 
- 
				
 
- 
				
 
- 
				if (len) {
 
- 
				#endif /* IA32 or AMD64 */
 
- 
				cmp = memcmp(data1, data2, len);
 
- 
				
 
- 
				
 
- 
				if (cmp) {
 
- 
				return(cmp);
 
- 
				}
 
- 
				
 
- 
				
 
- 
				data1 += len;
 
- 
				data2 += len;
 
- 
				#if defined __i386__ || defined __x86_64__ || defined _M_IX86 || defined _M_X64
 
- 
				}
 
- 
				#endif /* IA32 or AMD64 */
 
- 
				}
 
- 
				
 
- 
				
 
- 
				cmp = (int) (len1 - len2);
 
- 
				
 
- 
				
 
- 
				if (!cmp || pad == ULINT_UNDEFINED) {
 
- 
				return(cmp);
 
- 
				} 
			
 
 
可以看到整个方式比较简洁,对于我们常用的x86_64模型并没有直接使用memcpy进行而是
进行了优化在注释中也有说明,才出现了for (ulint i = 4 + (len & 3); i > 0; i--)
部分,如果是IA32 or AMD64则直接使用memcpy进行比较。感兴趣的可以仔细阅读一下
2、innodb varchar类型比较
实际上这个比较会通过cmp_data->cmp_whole_field->my_strnncollsp_simple调用最终调用
my_strnncollsp_simple完成,而比如order by 会调用my_strnxfrm_simple他们都在一个
文件中。
下面是整个my_strnncollsp_simple函数
	
	
		
			- 
				/*
 
- 
				  Compare strings, discarding end space
 
- 
				
 
- 
				
 
- 
				  SYNOPSIS
 
- 
				    my_strnncollsp_simple()
 
- 
				    cs    character set handler
 
- 
				    a    First string to compare
 
- 
				    a_length    Length of 'a'
 
- 
				    b    Second string to compare
 
- 
				    b_length    Length of 'b'
 
- 
				    diff_if_only_endspace_difference
 
- 
				       Set to 1 if the strings should be regarded as different
 
- 
				                        if they only difference in end space
 
- 
				
 
- 
				
 
- 
				  IMPLEMENTATION
 
- 
				    If one string is shorter as the other, then we space extend the other
 
- 
				    so that the strings have equal length.
 
- 
				
 
- 
				
 
- 
				    This will ensure that the following things hold:
 
- 
				
 
- 
				
 
- 
				    "a" == "a "
 
- 
				    "a/0" < "a"
 
- 
				    "a/0" < "a "
 
- 
				
 
- 
				
 
- 
				  RETURN
 
- 
				    < 0    a < b
 
- 
				    = 0    a == b
 
- 
				    > 0    a > b
 
- 
				*/
 
- 
				
 
- 
				
 
- 
				int my_strnncollsp_simple(const CHARSET_INFO *cs, const uchar *a,
 
- 
				                          size_t a_length, const uchar *b, size_t b_length,
 
- 
				                          my_bool diff_if_only_endspace_difference)
 
- 
				{
 
- 
				  const uchar *map= cs->sort_order, *end;
 
- 
				  size_t length;
 
- 
				  int res;
 
- 
				
 
- 
				
 
- 
				#ifndef VARCHAR_WITH_DIFF_ENDSPACE_ARE_DIFFERENT_FOR_UNIQUE
 
- 
				  diff_if_only_endspace_difference= 0;
 
- 
				#endif
 
- 
				
 
- 
				
 
- 
				  end= a + (length= MY_MIN(a_length, b_length));
 
- 
				  while (a < end)
 
- 
				  {
 
- 
				    if (map[*a++] != map[*b++])
 
- 
				      return ((int) map[a[-1]] - (int) map[b[-1]]);
 
- 
				  }
 
- 
				  res= 0;
 
- 
				  if (a_length != b_length)
 
- 
				  {
 
- 
				    int swap= 1;
 
- 
				    if (diff_if_only_endspace_difference)
 
- 
				      res= 1; /* Assume 'a' is bigger */
 
- 
				    /*
 
- 
				      Check the next not space character of the longer key. If it's < ' ',
 
- 
				      then it's smaller than the other key.
 
- 
				    */
 
- 
				    if (a_length < b_length)
 
- 
				    {
 
- 
				      /* put shorter key in s */
 
- 
				      a_length= b_length;
 
- 
				      a= b;
 
- 
				      swap= -1; /* swap sign of result */
 
- 
				      res= -res;
 
- 
				    }
 
- 
				    for (end= a + a_length-length; a < end ; a++)
 
- 
				    {
 
- 
				      if (map[*a] != map[' '])
 
- 
				return (map[*a] < map[' ']) ? -swap : swap;
 
- 
				    }
 
- 
				  }
 
- 
				  return res;
 
- 
				} 
			
 
 
其中*map= cs->sort_order比较关键这是内存中已经存储好的字符集的顺序,
循环进行
map[*a++] != map[*b++]
*a++和*b++ 会得到的字符集编码,然后在整个排序好的字符数组中找,
则得到了实际字符集编码进行比较,不管是比较的复杂度还是需要比较的
长度 varchar很可能都远远大于int类型,下面是打印cs->sort_order这片
内存区域前128字节得到的结果,
(gdb) x/128bx 0x258b000
0x258b000 :          0x00    0x01    0x02    0x03    0x04    0x05    0x06    0x07
0x258b008 :        0x08    0x09    0x0a    0x0b    0x0c    0x0d    0x0e    0x0f
0x258b010 :       0x10    0x11    0x12    0x13    0x14    0x15    0x16    0x17
0x258b018 :       0x18    0x19    0x1a    0x1b    0x1c    0x1d    0x1e    0x1f
0x258b020 :       0x20    0x21    0x22    0x23    0x24    0x25    0x26    0x27
0x258b028 :       0x28    0x29    0x2a    0x2b    0x2c    0x2d    0x2e    0x2f
0x258b030 :       0x30    0x31    0x32    0x33    0x34    0x35    0x36    0x37
0x258b038 :       0x38    0x39    0x3a    0x3b    0x3c    0x3d    0x3e    0x3f
0x258b040 :       0x40    0x41    0x42    0x43    0x44    0x45    0x46    0x47
0x258b048 :       0x48    0x49    0x4a    0x4b    0x4c    0x4d    0x4e    0x4f
0x258b050 :       0x50    0x51    0x52    0x53    0x54    0x55    0x56    0x57
0x258b058 :       0x58    0x59    0x5a    0x5b    0x5c    0x5d    0x5e    0x5f
0x258b060 :       0x60    0x41    0x42    0x43    0x44    0x45    0x46    0x47
0x258b068 :      0x48    0x49    0x4a    0x4b    0x4c    0x4d    0x4e    0x4f
0x258b070 :      0x50    0x51    0x52    0x53    0x54    0x55    0x56    0x57
0x258b078 :      0x58    0x59    0x5a    0x7b    0x7c    0x7d    0x7e    0x7f
而从内存的地址0x258b000我们也能看到他确实是存在于堆内存空间中,它是一片堆内存区域。
下面是varchar比较的调用栈帧以备后用
	
	
		
			- 
				#0  my_strnncollsp_simple (cs=0x2d4b9c0, a=0x7fff57a71f93 "gaopeng", a_length=7, b=0x7fffbd7e807f "gaopeng", b_length=7, diff_if_only_endspace_difference=0 '/000')
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/strings/ctype-simple.c:165
 
- 
				#1  0x0000000001ab8ec2 in cmp_whole_field (mtype=1, prtype=524303, a=0x7fff57a71f93 "gaopeng", a_length=7, b=0x7fffbd7e807f "gaopeng", b_length=7)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:374
 
- 
				#2  0x0000000001aba827 in cmp_data (mtype=1, prtype=524303, data1=0x7fff57a71f93 "gaopeng", len1=7, data2=0x7fffbd7e807f "gaopeng", len2=7)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:468
 
- 
				#3  0x0000000001ab9a05 in cmp_dtuple_rec_with_match_bytes (dtuple=0x7fff48ed3280, rec=0x7fffbd7e807f "gaopeng", index=0x7fff48ec78a0, offsets=0x7fff57a6bc50, 
 
- 
				    matched_fields=0x7fff57a6bf80, matched_bytes=0x7fff57a6bf78) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/rem/rem0cmp.cc:880
 
- 
				#4  0x0000000001a87fe2 in page_cur_search_with_match_bytes (block=0x7fffbcceafc0, index=0x7fff48ec78a0, tuple=0x7fff48ed3280, mode=PAGE_CUR_GE, 
 
- 
				    iup_matched_fields=0x7fff57a6cdf8, iup_matched_bytes=0x7fff57a6cdf0, ilow_matched_fields=0x7fff57a6cde8, ilow_matched_bytes=0x7fff57a6cde0, cursor=0x7fff57a713f8)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/page/page0cur.cc:850
 
- 
				#5  0x0000000001c17a3e in btr_cur_search_to_nth_level (index=0x7fff48ec78a0, level=0, tuple=0x7fff48ed3280, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fff57a713f0, 
 
- 
				    has_search_latch=0, file=0x2336938 "/root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc", line=5744, mtr=0x7fff57a70ee0)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:1478
 
- 
				#6  0x0000000001c222bf in btr_estimate_n_rows_in_range_low (index=0x7fff48ec78a0, tuple1=0x7fff48ed3280, mode1=PAGE_CUR_GE, tuple2=0x7fff48ed32e0, mode2=PAGE_CUR_G, 
 
- 
				    nth_attempt=1) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:5744
 
- 
				#7  0x0000000001c22a09 in btr_estimate_n_rows_in_range (index=0x7fff48ec78a0, tuple1=0x7fff48ed3280, mode1=PAGE_CUR_GE, tuple2=0x7fff48ed32e0, mode2=PAGE_CUR_G)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/btr/btr0cur.cc:6044
 
- 
				#8  0x00000000019b3e0e in ha_innobase::records_in_range (this=0x7fff48e7e3b0, keynr=1, min_key=0x7fff57a71680, max_key=0x7fff57a716a0)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:13938
 
- 
				#9  0x0000000000f6ed5b in handler::multi_range_read_info_const (this=0x7fff48e7e3b0, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges_arg=0, 
 
- 
				    bufsz=0x7fff57a71780, flags=0x7fff57a71784, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:6440
 
- 
				#10 0x0000000000f70662 in DsMrr_impl::dsmrr_info_const (this=0x7fff48e7e820, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges=0, 
 
- 
				    bufsz=0x7fff57a71d70, flags=0x7fff57a71d74, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:7112
 
- 
				#11 0x00000000019be22f in ha_innobase::multi_range_read_info_const (this=0x7fff48e7e3b0, keyno=1, seq=0x7fff57a71b90, seq_init_param=0x7fff57a71850, n_ranges=0, 
 
- 
				    bufsz=0x7fff57a71d70, flags=0x7fff57a71d74, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:21351
 
- 
				#12 0x000000000178c9e4 in check_quick_select (param=0x7fff57a71e30, idx=0, index_only=false, tree=0x7fff48e700e0, update_tbl_stats=true, mrr_flags=0x7fff57a71d74, 
 
- 
				    bufsize=0x7fff57a71d70, cost=0x7fff57a71d10) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:10030
 
- 
				#13 0x0000000001783305 in get_key_scans_params (param=0x7fff57a71e30, tree=0x7fff48e70058, index_read_must_be_used=false, update_tbl_stats=true, 
 
- 
				    cost_est=0x7fff57a74190) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:5812
 
- 
				#14 0x000000000177ce43 in test_quick_select (thd=0x7fff4801f4d0, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, 
 
- 
				    interesting_order=st_order::ORDER_NOT_RELEVANT, tab=0x7fff48eacf20, cond=0x7fff48eacd50, needed_reg=0x7fff48eacf60, quick=0x7fff57a744c8)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/sql/opt_range.cc:3066
 
- 
				#15 0x000000000158b9bc in get_quick_record_count (thd=0x7fff4801f4d0, tab=0x7fff48eacf20, limit=18446744073709551615)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5942
 
- 
				#16 0x000000000158b073 in JOIN::estimate_rowcount (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5689
 
- 
				#17 0x00000000015893b5 in JOIN::make_join_plan (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:5046
 
- 
				#18 0x000000000157d9b7 in JOIN::optimize (this=0x7fff48eac980) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:387
 
- 
				#19 0x00000000015fab71 in st_select_lex::optimize (this=0x7fff48aa45c0, thd=0x7fff4801f4d0) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:1009
 
- 
				#20 0x00000000015f9284 in handle_query (thd=0x7fff4801f4d0, lex=0x7fff48021ab0, result=0x7fff48aa5dc8, added_options=0, removed_options=0)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:164
 
- 
				#21 0x00000000015ac159 in execute_sqlcom_select (thd=0x7fff4801f4d0, all_tables=0x7fff48aa54b8) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5391
 
- 
				#22 0x00000000015a4774 in mysql_execute_command (thd=0x7fff4801f4d0, first_level=true) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:2889
 
- 
				#23 0x00000000015ad12a in mysql_parse (thd=0x7fff4801f4d0, parser_state=0x7fff57a76600) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5836
 
- 
				#24 0x00000000015a0fe9 in dispatch_command (thd=0x7fff4801f4d0, com_data=0x7fff57a76d70, command=COM_QUERY)
 
- 
				    at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1447
 
- 
				#25 0x000000000159fe1a in do_command (thd=0x7fff4801f4d0) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1010
 
- 
				#26 0x00000000016e1d6c in handle_connection (arg=0x6320740) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/conn_handler/connection_handler_per_thread.cc:312
 
- 
				---Type <return> to continue, or q <return> to quit--- 
 
- 
				#27 0x0000000001d723f4 in pfs_spawn_thread (arg=0x6320530) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/perfschema/pfs.cc:2188
 
- 
				#28 0x0000003ca62079d1 in start_thread () from /lib64/libpthread.so.0
 
- 
				#29 0x0000003ca5ee8b6d in clone () from /lib64/libc.so.6
			
 
 
作者微信:
