一个比较清晰的SQL编写风格

1,683 阅读5分钟

前言


SQL (Structured Query Language) 是具有数据操纵和数据定义等多种功能的数据库语言,这种语言具有交互性特点,能为用户提供极大的便利,数据库管理系统应充分利用SQL语言提高计算机应用系统的工作质量与效率。

身边很多人工作中经常和SQL打交道, 可是每人的编写SQL风格都独树一帜。

刚好在githup上看到一个不错的编码风格, 在这里给大家推荐一下。

mattm/sql-style-guide

介绍


使用小写的SQL

小写SQL比大写SQL易读,而且不必一直按住shift键。

-- Good
select * from users

-- Bad
SELECT * FROM users

-- Bad
Select * From users

单行查询vs多行查询

以下情况最好将SQL写在同一行中:

  • 查询所有列(*)或者只查询1列或者两列
  • 查询语句没有额外复杂性
-- Good
select * from users

-- Good
select id from users

-- Good
select id, email from users

-- Good
select count(*) from users

这样做的原因很简单,当所有内容都在一行时,仍然很容易阅读。但一旦你开始添加更多的列或更复杂的代码,如果是多行代码就更容易阅读:

-- Good
select
    id,
    email,
    created_at
from users

-- Good
select *
from users
where email = 'example@domain.com

对于具有1或2列的查询,可以将这些列放在同一行上。对于3+列,将每个列名放在它自己的行上,包括第一项:

-- Good
select id, email
from users
where email like '%@gmail.com'

-- Good
select user_id, count(*) as total_charges
from charges
group by user_id

-- Good
select
    id,
    email,
    created_at
from users

-- Bad
select id, email, created_at
from users

-- Bad
select id,
    email
from users

向左对齐

-- Good
select id, email
from users
where email like '%@gmail.com'

-- Bad
select id, email
  from users
 where email like '%@gmail.com'

使用单引号

一些SQL方言,如BigQuery支持使用双引号,但是对于大多数方言,双引号最终将引用列名。因此,单引号更可取:

-- Good
select *
from users
where email = 'example@domain.com'

-- Bad
select *
from users
where email = "example@domain.com"

使用 != 而不用 <>

因为!=读起来像“不等于”,更接近我们大声说出来的方式。

-- Good
select count(*) as paying_users_count
from users
where plan_name != 'free'

-- Bad
select count(*) as paying_users_count
from users
where plan_name <> 'free'

逗号应该放在行尾

-- Good
select
    id,
    email
from users

-- Bad
select
    id
    , email
from users

缩进条件

当只有一个where条件时,将它保留在与where相同的行上:

select email
from users
where id = 1234

当有多个缩进时,将每个缩进比where更深一层。将逻辑运算符放在前一个条件的末尾:

select id, email
from users
where 
    created_at >= '2019-03-01' and 
    vertical = 'work'

避免括号内的空格

-- Good
select *
from users
where id in (1, 2)

-- Bad
select *
from users
where id in ( 1, 2 )

in值的长列表分成多个缩进行

-- Good
select *
from users
where email in (
    'user-1@example.com',
    'user-2@example.com',
    'user-3@example.com',
    'user-4@example.com'
)

表名应该是名词的复数形式

-- Good
select * from users
select * from visit_logs

-- Bad
select * from user
select * from visitLog

列名应该是xxx_xxx格式

-- Good select id, email, timestamp_trunc(created_at, month) as signup_month from users

-- Bad select id, email, timestamp_trunc(created_at, month) as SignupMonth from users

列名的约定

  • 布尔字段的前缀应该是is_、has_或does_。例如,is_customer、has_unsubscribe等。
  • 仅限日期的字段应该以_date作为后缀。例如,report_date。
  • Date+time字段应该以_at作为后缀。例如,created_at、posted_at等。

列排序约定

首先放置主键,然后是外键,然后是所有其他列。如果表中有任何系统列(created_at、updated_at、is_deleted等),那么将它们放在最后。

-- Good
select
    id,
    name,
    created_at
from users

-- Bad
select
    created_at,
    name,
    id,
from users

inner join 处理

最好显式,以便连接类型非常清楚:

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on users.id = charges.user_id

-- Bad
select
    email,
    sum(amount) as total_revenue
from users
join charges on users.id = charges.user_id

对于join条件,将首先引用的表放在on之后

-- Good
select
    ...
from users
left join charges on users.id = charges.user_id
-- primary_key = foreign_key --> one-to-many --> fanout
  
select
    ...
from charges
left join users on charges.user_id = users.id
-- foreign_key = primary_key --> many-to-one --> no fanout

-- Bad
select
    ...
from users
left join charges on charges.user_id = users.id

单个连接条件应该与连接位于同一行

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on users.id = charges.user_id
group by email

-- Bad
select
    email,
    sum(amount) as total_revenue
from users
inner join charges
on users.id = charges.user_id
group by email

当你有多个连接条件时,把它们放在各自的缩进行:

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on 
    users.id = charges.user_id and
    refunded = false
group by email

除非必要,否则不要包含表名

-- Good
select
    id,
    name
from companies

-- Bad
select
    companies.id,
    companies.name
from companies

始终重命名聚合和函数包装的参数

-- Good
select count(*) as total_users
from users

-- Bad
select count(*)
from users

-- Good
select timestamp_millis(property_beacon_interest) as expressed_interest_at
from hubspot.contact
where property_beacon_interest is not null

-- Bad
select timestamp_millis(property_beacon_interest)
from hubspot.contact
where property_beacon_interest is not null

在布尔条件需显示写出

-- Good
select * from customers where is_cancelled = true
select * from customers where is_cancelled = false

-- Bad
select * from customers where is_cancelled
select * from customers where not is_cancelled

as 重命名列名

-- Good
select
    id,
    email,
    timestamp_trunc(created_at, month) as signup_month
from users

-- Bad
select
    id,
    email,
    timestamp_trunc(created_at, month) signup_month
from users

group by 时写出列名而不要用序号

-- Good
select user_id, count(*) as total_charges
from charges
group by user_id

-- Bad
select
    user_id,
    count(*) as total_charges
from charges
group by 1

分组的列首先显示出来

-- Good
select
  timestamp_trunc(com_created_at, year) as signup_year,
  count(*) as total_companies
from companies
group by signup_year

-- Bad
select
  count(*) as total_companies,
  timestamp_trunc(com_created_at, year) as signup_year
from companies
group by signup_year

调整对齐case-when语句

每个when应该在它自己的行上(case行上没有任何内容),并且应该比case行缩进更深一层。then可以在同一条线上,也可以在它下面的自己的线上。

-- Good
select
    case
        when event_name = 'viewed_homepage' then 'Homepage'
        when event_name = 'viewed_editor' then 'Editor'
    end as page_name
from events

-- Good too
select
    case
        when event_name = 'viewed_homepage'
            then 'Homepage'
        when event_name = 'viewed_editor'
            then 'Editor'
    end as page_name
from events

-- Bad 
select
    case when event_name = 'viewed_homepage' then 'Homepage'
        when event_name = 'viewed_editor' then 'Editor'
    end as page_name
from events

尽量使用CTEs, 避免使用子查询

-- Good
with ordered_details as (

    select
        user_id,
        name,
        row_number() over (partition by user_id order by date_updated desc) as details_rank
    from billingdaddy.billing_stored_details

),

final as (

    select user_id, name
    from ordered_details
    where details_rank = 1

)

select * from final

-- Bad
select user_id, name
from (
    select
        user_id,
        name,
        row_number() over (partition by user_id order by date_updated desc) as details_rank
    from billingdaddy.billing_stored_details
) ranked
where details_rank = 1

使用有意义的CTE名称

-- Good
with ordered_details as (

-- Bad
with d1 as (

使用over()函数

你可以把它放在同一行上,也可以根据它的长度把它分成多行:

-- Good
select
    user_id,
    name,
    row_number() over (partition by user_id order by date_updated desc) as details_rank
from billingdaddy.billing_stored_details

-- Good
select
    user_id,
    name,
    row_number() over (
        partition by user_id
        order by date_updated desc
    ) as details_rank
from billingdaddy.billing_stored_details

喜欢可以关注公众号: 终身幼稚园

终身幼稚园