How OpenStack integrates with Ceph?

1,225 阅读16分钟

What's Ceph

Ceph uniquely delivers object, block, and file storage in one unified system.

按照官方解释,Ceph是一个提供了对象存储、块存储和文件系统服务的统一系统。具备良好的性能、可靠性和可扩展性。

Ceph最早起源于Sage Weil发表的一篇论文Ceph: A Scalable, High-Performance Distributed File System. OSDI 2006: 307-320. Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos Maltzahn:,随后贡献给了开源社区,经过数年的发展,目前得到了众多云计算厂商的支持并广泛应用。

Ceph 101

无论是希望在云平台中提供对象存储服务块存储服务,还是希望部署一个Ceph文件系统,所有的Ceph存储集群部署都要具备Ceph Node、网络(如存储网)和Ceph存储集群本身。

Ceph存储集群包含下列组件(或服务):

  • Ceph Monitor - ceph-mon,管理集群状态的映射,包括monitor映射、管理映射、OSD映射和CRUSH映射。这些映射是非常重要的用于各个Ceph守护进程之间协作的集群状态数据,此外还负责daemons和clients之间的鉴权。通常至少需要3个monitors来满足冗余和高可用性。
  • Ceph Manager - ceph-mgr,负责跟踪运行时指标和Ceph集群的当前状态,包括存储率,当前性能指标和系统负载。Ceph Manager还提供了Python模块来管理和暴露Ceph集群信息,包括Ceph Dashboard和REST API。至少需要两个manager来保证高可用。
  • Ceph OSD (Object Storage Deamon) - ceph-osd,存储数据,处理数据赋值、恢复和重新平衡,并通过检查其他Ceph OSD守护进程的心跳来向Ceph monitor和manager提供一些监控信息。通常需要至少3个Ceph OSD才能实现冗余和高可用性。
  • Ceph Metadata Server (Required when running Ceph File System clients) - ceph-mds,代表Ceph文件系统存储元数据(即Ceph块设备和Ceph对象存储不适用MDS)、Ceph的元数据服务器允许POSIX文件系统的用户来执行基本的命令(如ls、find等),而不用将这些负载置于Ceph存储集群之上。

Ceph将数据作为对象,存储到逻辑存储pool中。通过使用CRUSH算法,Ceph计算得出哪一个placement group应该存储对象,然后计算得出哪一个Ceph OSD应当存储placement group。CRUSH算法使得Ceph存储集群具备动态伸缩、重新平衡和回复的能力。

所谓Ceph存储集群(Ceph Storage Cluster),是所有Ceph部署的基础。基于RADOS的Ceph存储集群包含两类守护进程:Ceph OSD Daemon,在存储节点将数据最为对象存储;Ceph Monitor维护了集群映射的一个master copy。Ceph的文件系统、对象存储和块存储,在Ceph存储集群读取和向Ceph存储集群写入数据。

对于Ceph存储来说,有以下几个主要概念:

  • Pools - Ceph将数据存储到pool中,这是一个逻辑的概念。Pool管理了placement groups的数量、副本集的数量和CRUSH rule。为了能够访问pool进行数据存储,你必须提供用户信息已完成鉴权来访问pool,更多的操作,可以参考Pools
  • Plaement Groups - Ceph将对象映射到Placement group,它是裸机对象池的切片或者片段,将对象作为一个组放置到OSD中。引入PG的概念是为了更好的分配数据和定位数据
  • CRUSH Maps - 是Ceph使用的数据分布算法,类似一致性哈希,让数据分配到预期的地方

Ceph installation

Ceph集群化的安装和部署,一般是比较复杂的,可以参考官方给出的安装手册:

Architecture

本节,我们粗略介绍一下Ceph的架构和基础组件。

其中包含几个组件(或者说服务接口):

  • RADOS - 从图上可以看到,RADOS是Ceph的基础,是一个可靠、自主、分布式的、可自我修复、自我管理的智能存储节点。可以参考:RADOS - A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters.
  • Librados - 是RADOS提供的库,支持应用直接访问RADOS,目前提供了包括C、C++、Java、Python、Ruby和PHP的支持。
  • RADOSGW - Rados gateway, 简称RADOSGW,或者RGW。是Ceph对外提供的对象存储服务,其RESTful API接口与S3和Swift兼容。
  • RBD - Rados block device,是Ceph对外提供的块存储服务,支持块设备调整大小、精简配置块、快照和克隆功能。Ceph支持内核对象(KO)和直接使用librbd的QEMU管理程序——避免了虚拟化系统的内核开销。
  • CephFS - Ceph file system,提供了与POSIX兼容的文件系统,可与挂载一起使用或用做用户空间(FUSE)中的文件系统。

RBD interfaces and tools

从前文中,我们已经知道了块存储服务通过librbd通过librados提供了Ceph客户端与RBD交互的API,主要提供一下接口:

cephx

cephx默认启用,用于提供用户校验和鉴权。即我们如果使用rbd命令访问ceph的时候,需要提供用户名或者ID,以及keyring文件,如如下命令:

rbd --id {user-ID} --keyring=/path/to/secret [commands]
rbd --name {username} --keyring=/path/to/secret [commands]

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --keyring=/etc/ceph/ceph.client.glance.keyring ls images
01b9995a-e212-42f7-b11f-1beda23a24b8

其中同时指定pool的话,可以使用--pool <pool_name>或者-p <poo_name>

[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images --keyring=/etc/ceph/ceph.client.glance.keyring ls
01b9995a-e212-42f7-b11f-1beda23a24b8

如果指定了id的话,也可以不用指定keyring文件,默认会在指定目录下寻找指定用户的keyring文件:

[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
01b9995a-e212-42f7-b11f-1beda23a24b8

pool

  • 创建块存储池
rbd pool init <pool-name>
  • 创建用户
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'

image

  • 创建镜像
rbd create --size {megabytes} {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance create --size 1024 images/f-test
  • 查看镜像列表,查看标记删除的镜像:
rbd ls
rbd ls {poolname}
rbd trash ls 
rbd trash ls {poolname}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
f-test
  • 查看镜像信息
rbd info {image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
  • resize镜像大小
rbd resize --size 2048 foo (to increase)
rbd resize --size 2048 foo --allow-shrink (to decrease)

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images resize --size 2048 f-test
Resizing image: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 2GiB in 512 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images resize --size 1024 f-test --allow-shrink
Resizing image: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test
rbd image 'f-test':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e756a86b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 14:36:47 2019
  • 删除或标记删除镜像
rbd rm {image-name}
rbd rm {pool-name}/{image-name}
# 标记删除
rbd trash mv {pool-name}/{image-name}
# 删除标记删除的镜像
rbd trash rm {pool-name}/{image-id}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash mv f-test
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash ls
e756a86b8b4567 f-test
  • 恢复镜像
rbd trash restore {image-id}
rbd trash restore {pool-name}/{image-id}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash restore e756a86b8b4567
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images trash ls
[root@gd02-control-11e115e64e13 ~]#

snapshot

  • 指定镜像创建snapshot
rbd snap create {pool-name}/{image-name}@{snap-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap create f-test@f-test-snap

因为我们在命令中指定了pool,所以没有在后面的参数上带上pool name

  • 查看镜像的快照列表
rbd snap ls {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap ls f-test
SNAPID NAME        SIZE TIMESTAMP
    67 f-test-snap 1GiB Thu Dec  5 15:24:17 2019
  • 回滚快照。回滚快照指的是使用快照的数据覆盖当前image。快照回滚的耗时随着镜像size的增加而增加。所以更快的方式是从snapshot克隆出一个镜像,而不是使用快照回滚镜像
rbd snap rollback {pool-name}/{image-name}@{snap-name}
  • 删除快照
rbd snap rm {pool-name}/{image-name}@{snap-name}
  • 删除镜像的所有快照
rbd snap purge {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images snap purge f-test
Removing all snapshots: 100% complete...done.

更多关于镜像layering的介绍,可以参考rbd snapshot layering,关于copy-on-write后面专门学习下吧,这里继续快照的介绍。

  • 保护/取消保护快照,保护快照的原因是因为clone快照时需要访问父快照,如果用户不小心删除了父快照,会造成clone中断。为了避免数据丢失,clone快照前必须保护快照:
rbd snap protect {pool-name}/{image-name}@{snapshot-name}
rbd snap unprotect {pool-name}/{image-name}@{snapshot-name}
  • 克隆快照,注意,克隆快照必须指定目标镜像的poolname
rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images clone f-test@f-test-snap images/f-test-child
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images ls
f-test
f-test-child
  • 查看快照的children
rbd children {pool-name}/{image-name}@{snapshot-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance -p images children f-test@f-test-snap
images/f-test-child
  • flatten克隆的镜像。克隆的镜像维护了一个对父快照的引用。当从子镜像中移除父快照的引用,就是执行了一个flatten操作,将父快照的信息拷贝到子克隆镜像中。操作耗时随着快照的增加size的增加而增加。如果要删除快照,必须先对子镜像做flatten操作
rbd flatten {pool-name}/{image-name}

e.g:
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test-child
rbd image 'f-test-child':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e766cf6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 16:32:29 2019
	parent: images/f-test@f-test-snap
	overlap: 1GiB
[root@gd02-control-11e115e64e13 ~]# rbd --id glance -p images flatten f-test-child
Image flatten: 100% complete...done.
[root@gd02-control-11e115e64e13 ~]# rbd --id glance --pool images info f-test-child
rbd image 'f-test-child':
	size 1GiB in 256 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.e766cf6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	flags:
	create_timestamp: Thu Dec  5 16:32:29 2019

Intergrations

在OpenStack中,可以通过libvirt配置QEMU接口为librbd,来使用Ceph的块存储服务。结构如下:

在OpenStack中,有三处使用了Ceph的块存储服务:

  • 镜像服务Glance,管理虚机镜像,镜像是immutable的
  • 卷服务Cinder,管理虚机的块设备
  • Guest Disks,即虚机的块设备,包括系统盘、config drive、ephemeral等。默认情况下,虚机的块设备都存放在计算节点本地的/var/lib/nova/instance/<uuid>目录下,而之前想使用Ceph来管理虚机的块设备只能通过创建卷启动的虚机,将虚机的块设备交由Cinder管理。现在好了,可以通过配置images_type=rbd,可以实现Nova直接将块设备存储到Ceph上。通过这种场景,可以非常轻松的完成热迁移、疏散的执行。

那我们就来看看,Nova、Cinder、Glance是如何集成Ceph的。

Configuration

Create pools

默认Ceph的块存储的服务使用rbdpool,建议是为各个服务创建单独的pool,比如:

ceph osd pool create volumes
ceph osd pool create images
ceph osd pool create backups
ceph osd pool create vms

更多的配置,比如配置PG,可以参考

新创建的pool必须先进性初始化才能使用(不一定要满足下面四个池子,可以参考真是的线上部署情况):

rbd pool init volumes
rbd pool init images
rbd pool init backups
rbd pool init vms

另外,查看集群中有哪些pool,可以在ceph管理节点执行ceph osd lspools进行查看:

root@hb02-other-10e114e194e61 ~]# ll /etc/ceph/
total 12
-rw------- 1 ceph ceph  71 Mar 15  2019 ceph.client.admin.keyring
-rw-r--r-- 1 root root 722 Mar 15  2019 ceph.conf
-rw-r--r-- 1 root root  92 Jan 31  2019 rbdmap
[root@hb02-other-10e114e194e61 ~]# ceph osd lspools
1 images,2 backups,3 volumes,4 .rgw.root,5 default.rgw.control,6 default.rgw.meta,7 default.rgw.log,8 default.rgw.buckets.index,9 default.rgw.buckets.data,

Configure OpenStack Ceph clients

运行glance-api、cinder-volume、cinder-backup和nova-compute的节点都可以看做是Ceph的客户端,所以需要ceph.conf配置文件:

ssh {your-openstack-server} sudo tee /etc/ceph/ceph.conf </etc/ceph/ceph.conf

如,我们环境中某计算节点的ceph.conf:

[global]
fsid = a7849998-270b-40d0-93e8-6d1106a5b799
public_network = ****
cluster_network = ****
mon_initial_members =****
mon_host = ****
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

rbd_cache = false
# BEGIN ANSIBLE MANAGED BLOCK
[client]
rbd cache = false
rbd cache writethrough until flush = false
cache size =  67108864
rbd cache max dirty = 0
rbd cache max dirty age = 0
admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/qemu/qemu-guest-$pid.log
rbd concurrent management ops = 20
# END ANSIBLE MANAGED BLOCK

注意,我们没有开启RBD cache功能。

安装需要的packages,如Python bindings和命令行工具:

sudo yum install python-rbd
sudo yum install ceph-common

配置客户端身份验证:

ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes'
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'

将指定的client.cinder, client.glance和client.cinder-backup的keyring文件,添加到适当的节点:

# ceph auth get-or-create client.glance | ssh {your-glance-api-server} sudo tee /etc/ceph/ceph.client.glance.keyring
# ssh {your-glance-api-server} sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
# ceph auth get-or-create client.cinder | ssh {your-volume-server} sudo tee /etc/ceph/ceph.client.cinder.keyring
# ssh {your-cinder-volume-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
# ceph auth get-or-create client.cinder-backup | ssh {your-cinder-backup-server} sudo tee /etc/ceph/ceph.client.cinder-backup.keyring
# ssh {your-cinder-backup-server} sudo chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring

将cinder的keyring文件传到nova-compute节点:

ceph auth get-or-create client.cinder | ssh {your-nova-compute-server} sudo tee /etc/ceph/ceph.client.cinder.keyring

还需要将client.cinder用户的秘钥存储到libvirt中,因为libvirt进程需要使用这个秘钥访问cinder中的块设备:

ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key

那么在Nova中需要要集成Ceph,则需要配置一下配置项:

# /etc/nova/nova.conf

[libvirt]
images_type = rbd
images_rbd_pool = volumes
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder

Code

上面只是在配置中介绍了如何在OpenStack中继承了Ceph,下面我们针对代码层面,讲一下Nova中是如何与Ceph集成交互的。

首先,在OpenStack中,使用的是librbd的python module来操作的——

连接到RADOS,并打开一个IO context:

import rados
import rbd

cluster = rados.Rados(rados_id='cinder', conffile='/etc/ceph/ceph.conf')
cluster.conf_set('key','AQBkfUldodI2NBAAJcf7VnwebGWc1YNH0Njisg==')
cluster.connect()
ioctx = cluster.open_ioctx('volumes')

初始化一个RBD对象,创建image:

rbd_inst = rbd.RBD()
size = 1 * 1024**3  # 1 GiB
rbd_inst.create(ioctx, 'f-test-librbd', size)

如果需要对image执行IO操作,如写入600 bytes数据,注意data不能是unicode类型,只能是一个char类型,则:

image = rbd.Image(ioctx, 'f-test-librbd')
data = 'foo' * 200
image.write(data, 0)

最后,关掉image、IO context和RADOS连接:

image.close()
ioctx.close()
cluster.shutdown()

完整的文件如下:

import rados
import rbd

cluster = rados.Rados(rados_id='cinder', conffile='/etc/ceph/ceph.conf')
cluster.conf_set('key','AQBkfUldodI2NBAAJcf7VnwebGWc1YNH0Njisg==')
cluster.connect()
ioctx = cluster.open_ioctx('volumes')

rbd_inst = rbd.RBD()
size = 1 * 1024**3  # 4 GiB
rbd_inst.create(ioctx, 'f-test-librbd', size)

image = rbd.Image(ioctx, 'f-test-librbd')
data = 'foo' * 200
image.write(data, 0)

image.close()
ioctx.close()
cluster.shutdown()

执行后,通过rbd命令查看镜像:

[root@gd02-compute-11e115e64e11 fan]# rbd --id cinder -p volumes info f-test-librbd
2019-12-05 20:03:10.064 7fdcbbfa8b00 -1 asok(0x5633c89f3290) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.cinder.4171166.94780409205944.asok': (2) No such file or directory
rbd image 'f-test-librbd':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: e76d524885b8e9
	block_name_prefix: rbd_data.e76d524885b8e9
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features:
	flags:
	create_timestamp: Thu Dec  5 19:59:46 2019
	access_timestamp: Thu Dec  5 19:59:46 2019
	modify_timestamp: Thu Dec  5 19:59:46 2019

为了安全起见,每一个关闭的调用可以在finally中执行:

cluster = rados.Rados(conffile='my_ceph_conf')
try:
    cluster.connect()
    ioctx = cluster.open_ioctx('my_pool')
    try:
        rbd_inst = rbd.RBD()
        size = 4 * 1024**3  # 4 GiB
        rbd_inst.create(ioctx, 'myimage', size)
        image = rbd.Image(ioctx, 'myimage')
        try:
            data = 'foo' * 200
            image.write(data, 0)
        finally:
            image.close()
    finally:
        ioctx.close()
finally:
    cluster.shutdown()

此外,Rados、Ioctx和Image类还可以用做context managers,具备自动关闭的功能,如:

with rados.Rados(conffile='my_ceph.conf') as cluster:
    with cluster.open_ioctx('mypool') as ioctx:
        rbd_inst = rbd.RBD()
        size = 4 * 1024**3  # 4 GiB
        rbd_inst.create(ioctx, 'myimage', size)
        with rbd.Image(ioctx, 'myimage') as image:
            data = 'foo' * 200
            image.write(data, 0)

API Reference

Nova

在Nova中,在文件nova/nova/virt/libvirt/storage/rbd_utils.py中,封装了一些rbd通用方法,如构造RBDDriver和构造RADOSClient连接Ceph:

class RbdProxy(object):
    """A wrapper around rbd.RBD class instance to avoid blocking of process.

    Offloads all calls to rbd.RBD class methods to native OS threads, so that
    we do not block the whole process while executing the librbd code.

    """

    def __init__(self):
        self._rbd = tpool.Proxy(rbd.RBD())

    def __getattr__(self, attr):
        return getattr(self._rbd, attr)


class RBDVolumeProxy(object):
    """Context manager for dealing with an existing rbd volume.

    This handles connecting to rados and opening an ioctx automatically, and
    otherwise acts like a librbd Image object.

    The underlying librados client and ioctx can be accessed as the attributes
    'client' and 'ioctx'.
    """
    def __init__(self, driver, name, pool=None, snapshot=None,
                 read_only=False):
        client, ioctx = driver._connect_to_rados(pool)
        try:
            self.volume = tpool.Proxy(rbd.Image(ioctx, name,
                                                snapshot=snapshot,
                                                read_only=read_only))
        except rbd.ImageNotFound:
            with excutils.save_and_reraise_exception():
                LOG.debug("rbd image %s does not exist", name)
                driver._disconnect_from_rados(client, ioctx)
        except rbd.Error:
            with excutils.save_and_reraise_exception():
                LOG.exception(_("error opening rbd image %s"), name)
                driver._disconnect_from_rados(client, ioctx)

        self.driver = driver
        self.client = client
        self.ioctx = ioctx

    def __enter__(self):
        return self

    def __exit__(self, type_, value, traceback):
        try:
            self.volume.close()
        finally:
            self.driver._disconnect_from_rados(self.client, self.ioctx)

    def __getattr__(self, attrib):
        return getattr(self.volume, attrib)


class RBDDriver(object):

    def __init__(self, pool, ceph_conf, rbd_user, rbd_key=None):
        self.pool = pool
        # NOTE(angdraug): rados.Rados fails to connect if ceph_conf is None:
        # https://github.com/ceph/ceph/pull/1787
        self.ceph_conf = ceph_conf or ''
        self.rbd_user = rbd_user or None
        self.rbd_key = rbd_key or None
        if rbd is None:
            raise RuntimeError(_('rbd python libraries not found'))

    # 连接到Rados,并返回client和ioctx
    def _connect_to_rados(self, pool=None):
        client = rados.Rados(rados_id=self.rbd_user,
                                  conffile=self.ceph_conf)
        if self.rbd_key:
            client.conf_set('key', self.rbd_key)
        try:
            client.connect()
            pool_to_open = pool or self.pool
            # NOTE(luogangyi): open_ioctx >= 10.1.0 could handle unicode
            # arguments perfectly as part of Python 3 support.
            # Therefore, when we turn to Python 3, it's safe to remove
            # str() conversion.
            ioctx = client.open_ioctx(str(pool_to_open))
            return client, ioctx
        except rados.Error:
            # shutdown cannot raise an exception
            client.shutdown()
            raise
    #...

class RADOSClient(object):
    """Context manager to simplify error handling for connecting to ceph."""
    def __init__(self, driver, pool=None):
        self.driver = driver
        self.cluster, self.ioctx = driver._connect_to_rados(pool)

    def __enter__(self):
        return self

    def __exit__(self, type_, value, traceback):
        self.driver._disconnect_from_rados(self.cluster, self.ioctx)

    @property
    def features(self):
        features = self.cluster.conf_get('rbd_default_features')
        if ((features is None) or (int(features) == 0)):
            features = rbd.RBD_FEATURE_LAYERING
        return int(features)

此外还构造了rbd.RBD()、rbd.Image()对象,来完成卷或者镜像的操作,如:

# RbdProxy管理了rbd.RBD()对象,进行卷的操作。nova.virt.libvirt.storage.rbd_utils.RBDDriver#cleanup_volumes
    def cleanup_volumes(self, filter_fn):
        with RADOSClient(self, self.pool) as client:
            volumes = RbdProxy().list(client.ioctx)
            for volume in filter(filter_fn, volumes):
                self._destroy_volume(client, volume)

# RBDVolumeProxy对象其实管理了rbd.Image()对象,来创建快照。nova.virt.libvirt.storage.rbd_utils.RBDDriver#create_snap
    def create_snap(self, volume, name, pool=None, protect=False):
        """Create a snapshot of an RBD volume.

        :volume: Name of RBD object
        :name: Name of snapshot
        :pool: Name of pool
        :protect: Set the snapshot to "protected"
        """
        LOG.debug('creating snapshot(%(snap)s) on rbd image(%(img)s)',
                  {'snap': name, 'img': volume})
        with RBDVolumeProxy(self, str(volume), pool=pool) as vol:
            vol.create_snap(name)
            if protect and not vol.is_protected_snap(name):
                vol.protect_snap(name)

Reference

  1. Ceph offcial doc
  2. 理解Ceph
  3. Ceph from scratch