Rails ActiveRecord 的三种数据预加载形式 - includes, preload, eager_load

michael_roshen · 2014年10月22日 · 最后由 wenbohan 回复于 2018年04月26日 · 6962 次阅读
本帖已被设为精华帖!

== Rails中的三种数据预加载方式

在试用rails和ActiveRecord的时候,你可能已经非常熟练的试用includes来进行数据 的预加载了,但是你知道为什么有时候你会得到一个非常小非常漂亮的sql,有时候却得到 一个非常大的sql吗?你知道preload和eager_load哪一个可以帮助你达到同样的效果吗? 你知道这些预加载的处理方式在rails4中有什么变化吗?如果你还不知道,那么今天这篇 文章来告诉你那些你不知道的预加载方式

== 数据准备

class User < ActiveRecord::Base
  attr_accessible :email, :name
  has_many :addresses, dependent: :destroy
end
class Address < ActiveRecord::Base
  attr_accessible :city, :country, :postal_code, :street, :user_id
  belongs_to :user
end

== db seed

rob = User.create!(name: "Robert", email: "robert@example.org")
bob = User.create!(name: "Bobert", email: "robert@exampleb.org")

rob.addresses.create!(country: "poland", city: "wrocal", postal_code: "22", street: "Rynek")
rob.addresses.create!(country: "Paris", city: "wrocal2", postal_code: "2542", street: "8 rue chambiges")
bob.addresses.create!(country: "Bobert", city: "wroceal", postal_code: "22232", street: "tiergarten")

== rails3

当你要试用预加载这个特性的时候,通常会选择includes方法

  1. 不是用includes方法

    users = User.all
    users[0].addresses
    users[1].addresses
    

    SELECT "users".* FROM "users" SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" = 1 SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" = 2

  2. 使用includes方法,当查找某个user的地址时就不需要再做查询

    users = User.includes(:addresses)
    

    SELECT "users".* FROM "users" SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2)

users[0].addresses
users[1].addresses

那么另外两个方法是干什么用的呢?

users = User.preload(:addresses)

SELECT "users".* FROM "users" SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1, 2) 可以看出来,preload方法的执行效果和includes非常相似, 那么还有什么区别吗?继续往下看吧

现在来看看eager_load

users = User.eager_laod(:addresses)

SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id"

eager_load与前两种方法完全不同,最神秘的是rails有两种方式来进行数据的预加载 我们会得出下面的结论

  1. 通过拆分sql查询语句,就像 includes preload一样
  2. 使用一条查询语句,通过left join来获取数据

实际上,当我们要使用preload时,意味着我们要拆分查询语句 如果使用eager_load ,则会生成一条left join查询语句, 那includes是干什么的呢?includes则是rails用来动态选择两种 方式的一个方法,取决于查询条件,下面让我们看一下includes什么情况下会 代理eager_load生成一条查询语句

User.includes(:addresses).where("addresses.country = ?", "Poland")
User.eager_load(:addresses).where("addresses.country = ?", "Poland")

SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id" WHERE (addresses.country = 'Poland')

那么当使用preload会发生什么情况呢?

User.preload(:addresses).where("addresses.country = ?", "poland")

SELECT "users".* FROM "users" WHERE (addresses.country = 'poland') SQLite3::SQLException: no such column: addresses.country rails会抛出异常,因为preload没有进行left join

你可能会想,这段代码的真正意图是什么?

  1. 获取所有地址为poland的用户,并预加载conntry为poland的addresses
  2. 获取所有地址为poland的用户,并预加载这些用户所有的addresses
  3. 获取所有用户和conntry为poland的addresses ruby User.includes(:addresses).where("addresses.country = ?", "poland") 你知道着段代码对应的是哪个目标吗? 没错,第一个 那我们来看一下是否可以达到第二个和第三个

==获取所有地址为poland的用户,并预加载这些用户关联的所有addresses 首先获取地址为poland的用户

User.joins(:addresses).where("addresses.country = ?", "Poland")

然后去预加载着些用户所对应的addresses, 使用includes

r = User.joins(:addresses).
    where("addresses.country = ?", "Poland").
    includes(:addresses)
r[0].addresses

[ #

] 这个结果显然不是我们想要的,因为它只预加载了rob的一个地址,而我们要的 是rob所对应的所有地址

对比一下输出结果,发现只不过是join的方式不同,第一个是inner join, 第二个是left join 但是对最终结果没有影响

r = User.joins(:addresses).
    where("addresses.country = ?", "Poland").
    includes(:addresses)
r = User.includes(:addresses).where("addresses.country = ?", "poland")

下面看一些preload的执行效果

u = User.joins(:addresses).
    where("addresses.country = ?", "poland").
    preload(:addresses)
u[0].addresses

SELECT "users".* FROM "users" INNER JOIN "addresses" ON "addresses"."user_id" = "users"."id" WHERE (addresses.country = 'Poland') SELECT "addresses".* FROM "addresses" WHERE "addresses"."user_id" IN (1)

[

, ] ] 这个才是我们想要的结果

==获取所有用户,并预加载这些用户关联的所有addresses 在关联关系中添加条件过滤

class User < ActiveRecord::Base
  attr_accessible :email, :name
  has_many :addresses, dependent: :destroy
  has_many :polish_addresses, conditions: {country: "poland"}, 
            class_name: "Address"
end

###preload & includes

r = User.preload(:polish_addresses)
r = User.includes(:polish_addresses)
 SELECT "users".* FROM "users" 
 SELECT "addresses".* FROM "addresses" WHERE "addresses"."country" = 'Poland' AND "addresses"."user_id" IN (1, 2)

r[0].polish_addresses

 [
   #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
 ] 

r[1].polish_addresses

 []

###eager_load

r = User.eager_load(:polish_addresses)
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, 
        "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
 FROM "users" 
 LEFT OUTER JOIN "addresses" 
 ON "addresses"."user_id" = "users"."id" AND "addresses"."country" = 'Poland'

r[0].polish_addresses
 [
   #<Address id: 1, user_id: 1, country: "Poland", street: "Rynek", postal_code: "55-555", city: "Wrocław", created_at: "2013-12-08 11:26:50", updated_at: "2013-12-08 11:26:50">
 ]
r[1].polish_addresses
 []

注意:预加载的是polish_addresses,而不是addresses

== Rails4

Rails4现在鼓励使用新的lambda语法用于定义带条件的关联关系 因为在类加载的时候,关联关系只被加载了一次,新的语法用于解决这个问题

class User < ActiveRecord::Base
  attr_accessible :email, :name
  has_many :addresses, dependent: :destroy
  has_many :polish_addresses, -> { where(country: 'poland')},
            class_name: "Address"
end

像下面定义scope的时候,Time.now应该在每次类加载的时候都重新获取系统时间 在开发环境下,它是自动重新加载的,所以在开发环境下不会发现这个问题,现在更倾向于 后面两种写法

# Bad
scope :from_the_past, where("happens_at <= ?", Time.now) 

# OK
scope :from_the_past, -> { where("happens_at <= ?", Time.now) }

# OK
def self.from_the_past
  where("happens_at <= ?", Time.now)
end

rails4为关联关系和scope提供了一致的语法,防止我们的应用出现这样的bug

不带条件的预加载和rails3效果是一样的

User.includes(:addresses)
User.preload(:addresses)
User.eager_load(:addresses)

稍微有点区别的是带有条件的预加载, 在使用includes的时候,需要我们自己来 指定关联的表,rails不再帮我们处理,通过references(:addresses)指定

User.includes(:addresses, :places).
where("addresses.country = ?", "Poland").
references(:addresses)

和rails3一样,也可以使用eager_load来做同样的事情,但是不需要用 references(:addresses)指定关联表

User.eager_load(:addresses).where("addresses.country = ?", "Poland")

使用preload也和rails3一样,会抛出异常,找不到关联表

User.preload(:addresses).where("addresses.country = ?", "Poland")

在rails4中下面两种写法与rails3一样

User.joins(:addresses).
where("addresses.country = ?", "Poland").
preload(:addresses)

User.preload(:polish_addresses)

==总结 1.rails3 includes: 代理preload和eager_load,取决过滤条件中是否出现关联表 preload: 用分解了的查询语句获取数据 eager_load: 通过left join合成一条查询语句进行加载 2.rails4 在rails4中,使用includes时,如果过滤条件中出现关联字段,则需要使用 references来指定

微信:ruby程序员 博文:http://michael-roshen.iteye.com/blog/2146511 原文:http://blog.arkency.com/2013/12/rails4-preloading/

共收到 27 条回复

长知识了~~ :thumbsup:

好东西,收藏!

细节的东西才是精华~

长知识了

好厉害!

写的好清晰 之前也有一个类似的帖子 赞一个

#6楼 @sun528 最近在做rails升级,碰到了,顺便整理一下

唔,看一遍留个印象,等我用到了再细细考究,谢谢啦

rails4 不能使用attr_accessible 了吧

#9楼 @rdongxie 嗯,是的,去掉了,我再Gemfile中加了protected_attributes这个gem包。

11楼 已删除

ORM的东西确实很烦恼。需要限制查询语句数量的时候我都直接用join了。

#12楼 @sickate 各有利弊吧,没有orm对于sql不熟练的更烦恼,哈哈。

#13楼 @michael_roshen 是的。我说的join也是用activerecord提供的join,不是plain sql。有了你这篇文章以后用include和eager_load更有把握了。

受益匪浅呀,真没注意这些细节!

很赞,不过建议题目改一下,改成 ActiveRecord 的三种数据预加载形式

我看到标题的时候,以为是关于 Rails 4 类加载的 eager loading 和 lazy loading,以及 Rails 4 针对 Ruby 的 Autoload 的扩展和解决 concurrency 问题,看完才明白内容全部是针对 ActiveRecord。

弱弱问一句,Rails4中的includes或者eager_load语句可不可以像joins那样支持自定义话的join语句?

#19楼 @simonykq eager_load和includes都是通过外建关联做的预加载,我猜是不支持join,我看了mongoid的源代码是这样的,rails源代码我没看,只看了执行语句和执行结果,你可以自己实验一下

class User < ActiveRecord::Base attr_accessible :email, :name, ** : birthplace** has_many :addresses, dependent: :destroy has_many :polish_addresses, -> { where(country: 'poland')}, class_name: "Address" end

就这个类我想请教一下问题:

对于 has_many :polish_addresses, -> { where(country: 'poland')}, class_name: "Address" 这句话中country: 'poland' 等于一个string。 但我想如何找出country等于该user的birthplace的地址? 类似于: has_many :polish_addresses, -> { where(country: self.birthplace )}, class_name: "Address" 但该self显示不是user instance

#21楼 @rainsz User.eager_load(:addresses).where("addresses.country = users.birthplace") ?

#22楼 @michael_roshen (User.eager_load(:addresses).where("addresses.country = users.birthplace")).first.address 这样做是可以达到目的。但也失去的has_many :polish_addresses 的存在价值,如果能实现:User.first.polish_addresses 再好不过了。

#23楼 @rainsz 你是要查一个出生地=地址的user? 那就写一个实例方法吧,self.birthplace就可以用了,做一个join就加个where条件过滤一下,放在has_many里的话..我写不出来

#24楼 @michael_roshen 我的想法就是查询出某个user包含了birthplace的所有addresses,比如我的bithplace是北京,那么找出我所有address中包含北京的地址。 当然用其它方法总是可以获取到,只是上次也遇到了相同的问题,一直想在has_many 中去实现,也没有成功。

你的这个topic总结的特别好,引发了关于缓存的思考。

好东西

你好,问下includes可以指定需要查询的项吗, 就是指定select?

需要 登录 后方可回复, 如果你还没有账号请点击这里 注册