mysql笔记 - SELECT 语句
SELECT [ALL | DISTINCT | DISTINCTROW ] * [HIGH_PRIORITY] [STRAIGHT_JOIN] [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT] [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS] select_expr [, select_expr ...] [FROM table_references [WHERE where_condition] [GROUP BY {col_name | expr | position} [ASC | DESC], ... [WITH ROLLUP]] [HAVING where_condition] [ORDER BY {col_name | expr | position} [ASC | DESC], ...] [LIMIT {[offset,] row_count | row_count OFFSET offset}] [PROCEDURE procedure_name(argument_list)] [INTO OUTFILE 'file_name' [CHARACTER SET charset_name] export_options | INTO DUMPFILE 'file_name' | INTO var_name [, var_name]] [FOR UPDATE | LOCK IN SHARE MODE]] select * from employees; -- 取出所有数据 desc employees; -- 查看表信息
通常会要求开发人员不要使用select * 即使是select所有字段推荐写上所有的列名,虽然很麻烦但是规范上推荐这样做
一个重要的原因是alter table会对表结构进行修改的很多操作,alter table后select * from取出的字段可能比一开始涉及的要多,可能会遇到各种各样的问题。另外select alter table可以加列,可以加在某个字段中间的位置,那么程序那边取得的数据并不是一开始想要的数据,容易出错。所以建议select的时候建议把所有的列都带上
LIMIT 限制取出来的数据量
不加LIMIT限制会一次取出所有的数据,数据量大的时候可能是灾难性的事情,通常都会加一个LIMIT,图形化工具一般默认会加上一个LIMIT
select * from employees limit 0,3; -- 从第0条开始取出3条数据 select * from employees limit 1,3; -- 从第1条开始取出3条数据
LIMIT常用于分页,但是LIMIT offset数字越大性能越差,为什么呢?
select * from employees limit 30; -- 取30条,扫30条,查询快(0.00 sec) select * from employees limit 1000000,30; -- 同样取30条,扫1000030行记录,查询变很慢(14.55 sec)
好的做法
select emp_no,birth_date,first_name,last_name,gender from employees limit 10; +--------+------------+------------+-----------+--------+ | emp_no | birth_date | first_name | last_name | gender | +--------+------------+------------+-----------+--------+ | 10001 | 1953-09-02 | Georgi | Facello | M | | 10002 | 1964-06-02 | Bezalel | Simmel | F | | 10003 | 1959-12-03 | Parto | Bamford | M | | 10004 | 1954-05-01 | Chirstian | Koblick | M | | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | | 10006 | 1953-04-20 | Anneke | Preusig | F | | 10007 | 1957-05-23 | Tzvetan | Zielinski | F | | 10008 | 1958-02-19 | Saniya | Kalloufi | M | | 10009 | 1952-04-19 | Sumant | Peac | F | | 10010 | 1963-06-01 | Duangkaew | Piveteau | F | +--------+------------+------------+-----------+--------+ select emp_no,birth_date,first_name,last_name,gender from employees where emp_no > 10010 limit 10; +--------+------------+------------+-------------+--------+ | emp_no | birth_date | first_name | last_name | gender | +--------+------------+------------+-------------+--------+ | 10011 | 1953-11-07 | Mary | Sluis | F | | 10012 | 1960-10-04 | Patricio | Bridgland | M | | 10013 | 1963-06-07 | Eberhardt | Terkki | M | | 10014 | 1956-02-12 | Berni | Genin | M | | 10015 | 1959-08-19 | Guoxiang | Nooteboom | M | | 10016 | 1961-05-02 | Kazuhito | Cappelletti | M | | 10017 | 1958-07-06 | Cristinel | Bouloucos | F | | 10018 | 1954-06-19 | Kazuhide | Peha | F | | 10019 | 1953-01-23 | Lillian | Haddadi | M | | 10020 | 1952-12-24 | Mayuko | Warwick | M | +--------+------------+------------+-------------+--------+
从之前取得数据的最大值开始取数据,这样的一个好处是不管你的值多大,因为是通过大于等于来定位的,所以只是取了10条数据不会再去扫这么多的数据
下面的语句表示从mysql中随机取出3条数据,select语句不带order by表示随机取3条数据
select emp_no,first_name,last_name from employees limit 3; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 10001 | Georgi | Facello | | 10002 | Bezalel | Simmel | | 10003 | Parto | Bamford | +--------+------------+-----------+
数据量小的时候不容易看出来,数据量大的时候容易发现这个问题,或者alter table一下后再试
alter table employees add index idx_name(first_name,last_name); select emp_no,first_name,last_name from employees limit 3; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 69256 | Aamer | Anger | | 486584 | Aamer | Armand | | 237165 | Aamer | Azevdeo | +--------+------------+-----------+
order by 表示根据哪个字段进行排序,可以按照物理数据顺序取出数据
select emp_no,first_name,last_name from employees order by emp_no limit 3; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 10001 | Georgi | Facello | | 10002 | Bezalel | Simmel | | 10003 | Parto | Bamford | +--------+------------+-----------+
查询分区表
select * from titles partition (p19) limit 1; +--------+--------------+------------+------------+ | emp_no | title | from_date | to_date | +--------+--------------+------------+------------+ | 10052 | Senior Staff | 2002-01-31 | 9999-01-01 | +--------+--------------+------------+------------+
ORDER BY 对取出来的数据做排序
order by根据某个字段进行分页
select emp_no,first_name,last_name from employees order by last_name limit 10; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 11761 | Bartek | Aamodt | | 15427 | Aluzio | Aamodt | | 18182 | Dekang | Aamodt | | 16572 | Matt | Aamodt | | 12791 | Mokhtar | Aamodt | | 12516 | Sreenivas | Aamodt | | 12982 | Sachem | Aamodt | | 17400 | Basim | Aamodt | | 19898 | Vidar | Aamodt | | 17885 | Takanari | Aamodt | +--------+------------+-----------+
WHERE 过滤数据
select emp_no,first_name,last_name from employees where emp_no = 10001; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 10001 | Georgi | Facello | +--------+------------+-----------+ select emp_no,first_name,last_name from employees where emp_no >= 10001 limit 10; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 10001 | Georgi | Facello | | 10002 | Bezalel | Simmel | | 10003 | Parto | Bamford | | 10004 | Chirstian | Koblick | | 10005 | Kyoichi | Maliniak | | 10006 | Anneke | Preusig | | 10007 | Tzvetan | Zielinski | | 10008 | Saniya | Kalloufi | | 10009 | Sumant | Peac | | 10010 | Duangkaew | Piveteau | +--------+------------+-----------+ select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like 'Am%' limit 10; +--------+------------+---------------+ | emp_no | first_name | last_name | +--------+------------+---------------+ | 100860 | Amabile | Aamodt | | 285669 | Amabile | Akiyama | | 276002 | Amabile | Albarhamtoshy | | 454340 | Amabile | Alencar | | 86625 | Amabile | Anger | | 416143 | Amabile | Antonisse | | 491486 | Amabile | Antonisse | | 451988 | Amabile | Apsitis | | 409363 | Amabile | Atchley | | 208844 | Amabile | Baar | +--------+------------+---------------+ select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like '%Am%' limit 10; +--------+------------+--------------+ | emp_no | first_name | last_name | +--------+------------+--------------+ | 20044 | Kiam | Gist | | 20062 | Uinam | Heuser | | 20114 | Ramalingam | Zyda | | 20118 | Mohammed | Schneeberger | | 20142 | Arumugam | Emmart | | 20159 | Isamu | Valiente | | 20167 | Stamatina | Kobara | | 20217 | Tamiya | Ruemmler | | 20265 | Amalendu | Willoner | | 20285 | Pramod | Escriba | +--------+------------+--------------+ select emp_no,first_name,last_name from employees where emp_no >= 20001 or first_name like '%Am%' limit 10; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 69256 | Aamer | Anger | | 486584 | Aamer | Armand | | 237165 | Aamer | Azevdeo | | 413688 | Aamer | Azuma | | 281363 | Aamer | Baak | | 242368 | Aamer | Baaleh | | 206549 | Aamer | Baar | | 259089 | Aamer | Baba | | 60922 | Aamer | Bahl | | 283280 | Aamer | Bahl | +--------+------------+-----------+
where 1 = 1表示没有任何条件全部成立,因为不知道第一个where应该怎么写,如果提前把where 1 = 1写好,这样程序那边拼接后面的条件就比较方便
select emp_no,first_name,last_name from employees where 1 = 1 limit 10; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 69256 | Aamer | Anger | | 486584 | Aamer | Armand | | 237165 | Aamer | Azevdeo | | 413688 | Aamer | Azuma | | 281363 | Aamer | Baak | | 242368 | Aamer | Baaleh | | 206549 | Aamer | Baar | | 259089 | Aamer | Baba | | 60922 | Aamer | Bahl | | 283280 | Aamer | Bahl | +--------+------------+-----------+
比如
select emp_no,first_name,last_name from employees where 1 = 1 and emp_no = 20000; +--------+------------+-----------+ | emp_no | first_name | last_name | +--------+------------+-----------+ | 20000 | Jenwei | Matzke | +--------+------------+-----------+
GROUP BY
分组就是根据某个字段当中相同的值来进行分组,并且分组完之后是要加一个聚集(统计)函数
use test; create table t (a int auto_increment primary key, b int, c int); insert into t select NULL,1,1; insert into t select NULL,1,7; insert into t select NULL,1,8; insert into t select NULL,2,8; insert into t select NULL,2,10; insert into t select NULL,3,8; select * from t; +---+------+------+ | a | b | c | +---+------+------+ | 1 | 1 | 1 | | 2 | 1 | 7 | | 3 | 1 | 8 | | 4 | 2 | 8 | | 5 | 2 | 10 | | 6 | 3 | 8 | +---+------+------+ select b,sum(b),avg(b) from t group by b; +------+--------+--------+ | b | sum(b) | avg(b) | +------+--------+--------+ | 1 | 3 | 1.0000 | | 2 | 4 | 2.0000 | | 3 | 3 | 3.0000 | +------+--------+--------+
HAVING
HAVING表示对group by中的聚合函数进行过滤
select b,sum(b),avg(b) from t where avb(b)>2 group by b; -- 这样是会报错的 ERROR 1305 (42000): FUNCTION test.avb does not exist
分组中需要对分组的条件进行过滤需要使用HAVING
select b,sum(b),avg(b) from t group by b having avg(b) > 2; +------+--------+--------+ | b | sum(b) | avg(b) | +------+--------+--------+ | 3 | 3 | 3.0000 | +------+--------+--------+
不是说使用了group by就代表不能使用where了,这样也是可以使用where进行过滤的,只不过where过滤的是非聚合的结果(对查询的记录进行过滤),而having是用来过滤聚合的结果
select b,sum(b),avg(b) from t where b < 3 group by b; +------+--------+--------+ | b | sum(b) | avg(b) | +------+--------+--------+ | 1 | 3 | 1.0000 | | 2 | 4 | 2.0000 | +------+--------+--------+
双重过滤
select b,sum(b),avg(b) from t where b < 3 group by b having avg(b) < 2; +------+--------+--------+ | b | sum(b) | avg(b) | +------+--------+--------+ | 1 | 3 | 1.0000 | +------+--------+--------+
所以where是一开始选数据的时候就开始过滤了,having是在聚集结果出来之后才开始过滤
having也可以过滤指定的一列而不是聚集函数,但是这样写没有什么意义,使用where可以获得更好的性能,所以having一般都跟聚集函数搭配使用
select b,sum(b),avg(b) from t where b < 3 group by b having b < 2; +------+--------+--------+ | b | sum(b) | avg(b) | +------+--------+--------+ | 1 | 3 | 1.0000 | +------+--------+--------+
select字段中没有选择的字段,having中使用会报错
select b,sum(b),avg(b) from t where b < 3 group by b having a < 2; ERROR 1054 (42S22): Unknown column 'a' in 'having clause'
这样的语句没有意义,因为取出来的a字段数据不一定是146,mysql会随机取a字段的数据,而且这种写法在orcal中好像不支持
select a,b,sum(b) from t group by b; +---+------+--------+ | a | b | sum(b) | +---+------+--------+ | 1 | 1 | 3 | | 4 | 2 | 4 | | 6 | 3 | 3 | +---+------+--------+
JOIN
基本的多表关联查询
SELECT concat(first_name,' ',last_name), dept_name -- concat连接了2个字段的字符串 FROM employees e, -- e是别名,下面3个雷同 dept_emp de, departments d WHERE e.emp_no = de.emp_no -- 关联条件 AND de.dept_no = d.dept_no -- 再一次关联条件 LIMIT 3; +----------------------------------+------------------+ | concat(first_name,' ',last_name) | dept_name | +----------------------------------+------------------+ | Mary Sluis | Customer Service | | Huan Lortz | Customer Service | | Basil Tramer | Customer Service | +----------------------------------+------------------+
新建一些例子
create table a (a int); create table b (a int); insert into a select 1; insert into a select 2; insert into a select 3; insert into b select 1; insert into b select 2; select * from a;select * from b; +------+ | a | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (0.00 sec) +------+ | a | +------+ | 1 | | 2 | +------+ 2 rows in set (0.03 sec)
最简单的两表关联,并不是2张表关联就会做笛卡尔积,如果关联的时候没有写where关联条件就会产生笛卡尔积
select * from a,b where a.a = b.a; +------+------+ | a | a | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+ select * from a,b; -- 这样会产生笛卡尔积 +------+------+ | a | a | +------+------+ | 1 | 1 | | 1 | 2 | | 2 | 1 | | 2 | 2 | | 3 | 1 | | 3 | 2 | +------+------+
三张表只有1个过滤条件会出现图示效果,会产生笛卡尔积的效果,出现了4条记录
create table c (a int); insert into c select 2; insert into c select 3; select * from a,b,c where a.a = b.a; -- 这样会产生笛卡尔积 +------+------+------+ | a | a | a | +------+------+------+ | 1 | 1 | 2 | | 2 | 2 | 2 | | 1 | 1 | 3 | | 2 | 2 | 3 | +------+------+------+
还可以这样写,关联条件可以非等值
select * from a,b where a.a < b.a; +------+------+ | a | a | +------+------+ | 1 | 2 | +------+------+
修改一下字段名先。。。
alter table a change a x int; alter table b change a y int; alter table c change a z int;
INNER JOIN
这3个写法没有区别
select * from a inner join b on a.x = b.y; select * from a join b on a.x = b.y; select * from a,b where a.x = b.y; -- 都返回 +------+------+ | x | y | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+
以下语句的关联,通常来说会选择关联过滤度高的条件进行关联,这里b.y > 1的关联度会高一些。
select * from a,b where a.x = b.y and b.y > 1; +------+------+ | x | y | +------+------+ | 2 | 2 | +------+------+
假设a表100w记录b表100w记录,其中的x,y一一对应,优化机通畅都会先从b>100 and b<200条件先进行过滤,这样就只需要100w和100条记录进行join就可以了,如果先从a.x = b.y进行过滤,那就是100w和100w的记录进行join,所以一般优化机是会把b>100 and b<200的过滤条件设置得更高
INNER JOIN的时候过滤条件即可写在on里面也可以写在where条件里面,对结果来说是没有区别的
select * from a inner join b on (a.x = b.y and b.y > 1); select * from a inner join b on a.x = b.y where b.y > 1; 都返回 +------+------+ | x | y | +------+------+ | 2 | 2 | +------+------+
OUTER JOIN
outer join 分成left/right两种,outer可以关键词可以省略,left/right join代表左/右外连接
左连接中左边的表为保留表,保留表中所有的字段都是要出现的,如果关联条件存在的话就是一对一的情况,右表中关联条件不存在则关联出来的结果就是NULL值
select * from a left join b on a.x = b.y; +------+------+ | x | y | +------+------+ | 1 | 1 | | 2 | 2 | | 3 | NULL | +------+------+ select * from a right join b on a.x = b.y; +------+------+ | x | y | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+
a表中存在的但是b表中不存在的数据,左连接实现
select * from a left join b on a.x = b.y where b.y is NULL; +------+------+ | x | y | +------+------+ | 3 | NULL | +------+------+
a表中但是不在b表中,不使用left join 使用not in + 子查询,不过会存在一些性能上的问题
select * from a where a.x not in (select y from b); +------+ | x | +------+ | 3 | +------+
这样的查询b.y会全部返回NULL,因为a.x = b.y and b.y is NULL不成立,所以b.y全部用NULL填充
select * from a left join b on a.x = b.y and b.y is NULL; +------+------+ | x | y | +------+------+ | 1 | NULL | | 2 | NULL | | 3 | NULL | +------+------+
过滤条件写在on后还是where后是有讲究的,推荐使用on来进行两张表之间的关联,where用来进行数据的过滤。
外连接中where不能写在on前面,内连接则可以,left join是用到比较多的一个方式
多次left join
select * from a left join b on a.x = b.y left join c on a.x = c.z; +------+------+------+ | x | y | z | +------+------+------+ | 2 | 2 | 2 | | 1 | 1 | NULL | | 3 | NULL | 3 | +------+------+------+ select * from a left join b on a.x = b.y left join c on b.y = c.z order by a.x; +------+------+------+ | x | y | z | +------+------+------+ | 1 | 1 | NULL | | 2 | 2 | 2 | | 3 | NULL | NULL | +------+------+------+
第二次join使用等值连接尝试
select * from a left join b on a.x = b.y inner join c on b.y = c.z order by a.x; +------+------+------+ | x | y | z | +------+------+------+ | 2 | 2 | 2 | +------+------+------+
一些例子
查询不是经理的员工
SELECT e.emp_no FROM employees e LEFT JOIN dept_manager dm ON e.emp_no = dm.emp_no WHERE dm.emp_no IS NULL LIMIT 3;
分页技巧
使用where而不是使用limit 10000,10 因为limit会扫10000前面的1w条数据
select * from employees where emp_no > 10000 limit 10;
根据生日分页
alter table employees add index idx_birth(birth_date,emp_no); -- 建立联合索引 select * from employees where birth_date > '1952-02-02' order by birth_date limit 10; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 16093 | 1952-02-03 | Luise | Tramer | M | 1992-02-28 | | 16447 | 1952-02-03 | Zhiguo | Savasere | F | 1987-04-28 | | 23857 | 1952-02-03 | Kristen | Frijda | M | 1986-05-31 | | 27259 | 1952-02-03 | Michaela | Lipner | M | 1997-09-05 | | 32094 | 1952-02-03 | Mohd | Buchter | F | 1986-03-16 | | 32641 | 1952-02-03 | Tua | Tetzlaff | F | 1988-12-01 | | 41374 | 1952-02-03 | JiYoung | Schurmann | M | 1988-01-25 | | 43737 | 1952-02-03 | Debatosh | Beerel | F | 1987-12-07 | | 56225 | 1952-02-03 | Miquel | Rusmann | F | 1990-06-17 | | 59577 | 1952-02-03 | Adel | Swiler | F | 1986-11-27 | +--------+------------+------------+-----------+--------+------------+ -- 错误的结果,日期直接跳到了02-03 select * from employees where (birth_date,emp_no) > ('1952-02-02', 10010) order by birth_date limit 10; +--------+------------+------------+-------------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-------------+--------+------------+ | 12282 | 1952-02-02 | Tadahiro | Delgrange | M | 1997-01-09 | | 13944 | 1952-02-02 | Takahito | Maierhofer | M | 1989-01-18 | | 22614 | 1952-02-02 | Dung | Madeira | M | 1989-01-24 | | 29456 | 1952-02-02 | Barun | Krohm | F | 1992-11-23 | | 33131 | 1952-02-02 | Reinhold | Savasere | M | 1998-01-30 | | 40660 | 1952-02-02 | Piyush | Erbe | F | 1988-04-04 | | 48910 | 1952-02-02 | Zhongwei | DuBourdieux | M | 1999-12-19 | | 51486 | 1952-02-02 | Jianwen | Sigstam | F | 1989-07-20 | | 59884 | 1952-02-02 | Fan | Przulj | M | 1991-09-25 | | 61382 | 1952-02-02 | Kristof | Ranft | M | 1989-04-21 | +--------+------------+------------+-------------+--------+------------+ -- 正确的结果 select o_orderkey,o_orderstatus,o_totalprice from orders where(o_orderdate,o_orderkey) > ('1992-01-01',88199) order by o_orderdate limit 10; select o_orderkey,o_orderstatus,o_totalprice from orders order by o_orderdate limit 10,10; -- 结果一致
当然分页推荐使用redis来做,用数据库实现其实意义不大