MYSQL——事务，索引原理与慢查询优化

Z时代
2024-01-10
分类：综合

python

一、事务

1、数据库事务是指：作为单个逻辑工作单元执行的一系列操作（SQL语句）。这些操作要么全部执行，要么全部不执行

2、事务管理是每个数据库（oracle、mysql、db等）都必须实现的。

3、作用：保证了对数据操作的安全性

#还钱的例子

A用银行卡给B的支付宝转账1000

1 将A银行卡账户的数据减1000块

2 将B支付宝账户的数据加1000块

在操作多条数据的时候可能会出现某几条操作不成功的情况 ,一条不成功就不会成功

4、事务的四大特性

ACID
  A:原子性（atomicity)
      一个事务是一个不可分割的单位，事务中包含的诸多操作
      要么同时成功要么同时失败
  C:一致性（consistency)
      事务必须是使数据库从一个一致性的状态变到另外一个一致性的状态
      一致性跟原子性是密切相关的
  I:隔离性（isolation）
      一个事务的执行不能被其他事务干扰
      （即一个事务内部的操作及使用到的数据对并发的其他事务是隔离的，并发执行的事务之间也是互相不干扰的）
  D:持久性（durability）
      也叫"永久性"
      一个事务一旦提交成功执行成功 那么它对数据库中数据的修改应该是永久的
      接下来的其他操作或者故障不应该对其有任何的影响

5、如何使用事务

# 事务相关的关键字
# 1 开启事务
  start transaction;
# 2 回滚(回到事务执行之前的状态)
  rollback;
# 3 确认(确认之后就无法回滚了)
  commit;
"""模拟转账功能"""
  create table user(
      id int primary key auto_increment,
      name char(16),
      balance int
  );
  insert into user(name,balance) values
  ("jason",1000),
  ("egon",1000),
  ("tank",1000);
# 1 先开启事务
  start transaction;
# 2 多条sql语句
  update user set balance=900 where name="jason";
  update user set balance=1010 where name="egon";
  update user set balance=1090 where name="tank";
"""
  总结
      当你想让多条sql语句保持一致性 要么同时成功要么同时失败 
      你就应该考虑使用事务
"""

二、索引原理与慢查询优化

ps:数据都是存在与硬盘上的，查询数据不可避免的需要进行IO操作

1、索引:就是一种数据结构，类似于书的目录。意味着以后在查询数据的应该先找目录再找数据，而不是一页一页的翻书，从而提升查询速度降低IO操作

2、索引在MySQL中也叫“键”,是存储引擎用于快速查找记录的一种数据结构

　　 * primary key

　　 * unique key

　　 * index key

注意： foreign key不是用来加速查询用的，不在我们的而研究范围之内

上面的三种key，前面两种除了可以增加查询速度之外各自还具有约束条件，而最后一种index key没有任何的约束条件，只是用来帮助你快速查询数据

3、本质

通过不断的缩小想要的数据范围筛选出最终的结果，同时将随机事件(一页一页的翻)
变成顺序事件(先找目录、再找数据)
也就是说有了索引机制，我们可以总是用一种固定的方式查找数据
4、一张表中可以有多个索引(多个目录)

5、索引虽然能够帮助你加快查询速度但是也有缺点

　　1 当表中有大量数据存在的前提下创建索引速度会很慢

　　2 在索引创建完毕之后对表的查询性能会大幅度的提升但是写的性能也会大幅度的降低

ps:索引不要随意的创建！！！

6、b+树

为什么说B+树比B树更适合做操作系统的数据库索引和文件索引？
（1）B+树的磁盘读写的代价更低
B+树内部结点没有指向关键字具体信息的指针，这样内部结点相对B树更小。
（2）B+树的查询更加的稳定
因为非终端结点并不是最终指向文件内容的结点，仅仅是作为叶子结点中关键字的索引。这样所有的关键字的查找都会走一条从根结点到叶子结点的路径。所有的关键字查询长度都是相同的，查询效率相当。

### b+树

"""

只有叶子节点存放的是真实的数据其他节点存放的是虚拟数据仅仅是用来指路的

树的层级越高查询数据所需要经历的步骤就越多(树有几层查询数据就需要几步)，所以要想查询快，树的层数越少越好

一个磁盘块存储是有限制的

为什么建议你将id字段作为索引

占得空间少一个磁盘块能够存储的数据多

那么久降低了树的高度从而减少查询次数

"""

### 聚集索引(primary key)

"""

聚集索引指的就是主键

Innodb 只有两个文件直接将主键存放在了idb表中

MyIsam 三个文件单独将索引存在一个文件

"""

### 辅助索引(unique,index)

查询数据的时候不可能一直使用到主键，也有可能会用到name,password等其他字段

那么这个时候你是没有办法利用聚集索引。这个时候你就可以根据情况给其他字段设置辅助索引(也是一个b+树）

"""

叶子节点存放的是数据对应的主键值

先按照辅助索引拿到数据的主键值

之后还是需要去主键的聚集索引里面查询数据

"""

### 覆盖索引

在辅助索引的叶子节点就已经拿到了需要的数据

# 给name设置辅助索引

select name from user where name="jason";

# 非覆盖索引

select age from user where name="jason";

7、测试索引是否有效的代码

#1. 准备表
  create table s1(
  id int,
  name varchar(20),
  gender char(6),
  email varchar(50)
  );
#2. 创建存储过程，实现批量插入记录
  delimiter $$ #声明存储过程的结束符号为$$
  create procedure auto_insert1()
  BEGIN
      declare i int default 1;
while(i<3000000)do
          insert into s1 values(i,"jason","male",concat("jason",i,"@oldboy"));
          set i=i+1;
      end while;
  END$$ #$$结束
  delimiter ; #重新声明分号为结束符号
#3. 查看存储过程
  show create procedure auto_insert1G 
#4. 调用存储过程
  call auto_insert1();
  ```
  ``` mysql 
# 表没有任何索引的情况下
  select * from s1 where id=30000;
# 避免打印带来的时间损耗
  select count(id) from s1 where id = 30000;
  select count(id) from s1 where id = 1;
# 给id做一个主键
  alter table s1 add primary key(id);  # 速度很慢
  select count(id) from s1 where id = 1;  # 速度相较于未建索引之前两者差着数量级
  select count(id) from s1 where name = "jason"# 速度仍然很慢
"""
  范围问题
"""
# 并不是加了索引，以后查询的时候按照这个字段速度就一定快   
  select count(id) from s1 where id > 1;  # 速度相较于id = 1慢了很多
  select count(id) from s1 where id >1 and id < 3;
  select count(id) from s1 where id > 1 and id < 10000;
  select count(id) from s1 where id != 3;
  alter table s1 drop primary key;  # 删除主键 单独再来研究name字段
  select count(id) from s1 where name = "jason";  # 又慢了
  create index idx_name on s1(name);  # 给s1表的name字段创建索引
  select count(id) from s1 where name = "jason"# 仍然很慢！！！
"""
  再来看b+树的原理，数据需要区分度比较高，而我们这张表全是jason，根本无法区分
  那这个树其实就建成了“一根棍子”
"""
  select count(id) from s1 where name = "xxx";  
# 这个会很快，我就是一根棍，第一个不匹配直接不需要再往下走了
  select count(id) from s1 where name like "xxx";
  select count(id) from s1 where name like "xxx%";
  select count(id) from s1 where name like "%xxx";  # 慢 最左匹配特性
# 区分度低的字段不能建索引
  drop index idx_name on s1;
# 给id字段建普通的索引
  create index idx_id on s1(id);
  select count(id) from s1 where id = 3;  # 快了
  select count(id) from s1 where id*12 = 3;  # 慢了  索引的字段一定不要参与计算
  drop index idx_id on s1;
  select count(id) from s1 where name="jason"and gender = "male"and id = 3 and email = "xxx";
# 针对上面这种连续多个and的操作，mysql会从左到右先找区分度比较高的索引字段，先将整体范围降下来再去比较其他条件
  create index idx_name on s1(name);
  select count(id) from s1 where name="jason"and gender = "male"and id = 3 and email = "xxx";  # 并没有加速
  drop index idx_name on s1;
# 给name，gender这种区分度不高的字段加上索引并不难加快查询速度
  create index idx_id on s1(id);
  select count(id) from s1 where name="jason"and gender = "male"and id = 3 and email = "xxx";  # 快了  先通过id已经讲数据快速锁定成了一条了
  select count(id) from s1 where name="jason"and gender = "male"and id > 3 and email = "xxx";  # 慢了  基于id查出来的数据仍然很多，然后还要去比较其他字段
  drop index idx_id on s1
  create index idx_email on s1(email);
  select count(id) from s1 where name="jason"and gender = "male"and id > 3 and email = "xxx";  # 快 通过email字段一剑封喉 
#### 联合索引
  select count(id) from s1 where name="jason"and gender = "male"and id > 3 and email = "xxx";  
# 如果上述四个字段区分度都很高，那给谁建都能加速查询
# 给email加然而不用email字段
  select count(id) from s1 where name="jason"and gender = "male"and id > 3; 
# 给name加然而不用name字段
  select count(id) from s1 where gender = "male"and id > 3; 
# 给gender加然而不用gender字段
  select count(id) from s1 where id > 3; 
# 带来的问题是所有的字段都建了索引然而都没有用到，还需要花费四次建立的时间
  create index idx_all on s1(email,name,gender,id);  # 最左匹配原则，区分度高的往左放
  select count(id) from s1 where name="jason"and gender = "male"and id > 3 and email = "xxx";  # 速度变快

8、查询优化神器-explain

执行计划：让mysql预估执行操作(一般正确)
    all < index < range < index_merge < ref_or_null < ref < eq_ref < system/const
    id,email
    慢：
        select * from userinfo3 where name="alex"
        explain select * from userinfo3 where name="alex"
        type: ALL(全表扫描)
            select * from userinfo3 limit 1;
    快：
        select * from userinfo3 where email="alex"
        type: const(走索引)

9、慢查询优化的基本步骤

0.先运行看看是否真的很慢，注意设置SQL_NO_CACHE
1.where条件单表查，锁定最小返回记录表。这句话的意思是把查询语句的where都应用到表中返回的记录数最小的表开始查起，单表每个字段分别查询，看哪个字段的区分度最高
2.explain查看执行计划，是否与1预期一致（从锁定记录较少的表开始查询）
3.order by limit 形式的sql语句让排序的表优先查
4.了解业务方使用场景
5.加索引时参照建索引的几大原则
6.观察结果，不符合预期继续从0分析

10、慢日志管理

慢日志
- 执行时间 > 10
            - 未命中索引
- 日志文件路径
        配置：
- 内存
                show variables like "%query%";
                show variables like "%queries%";
                set global 变量名 = 值
- 配置文件
                mysqld --defaults-file="E:wupeiqimysql-5.7.16-winx64mysql-5.7.16-winx64my-default.ini"
                my.conf内容：
                    slow_query_log = ON
                    slow_query_log_file = D:/....
                注意：修改配置文件之后，需要重启服务