Storing data at scale isn’t like saving a file on your hard drive. It requires a software manager to keep track of all the bits that make up your company’s files. That’s where distributed storage management packages like Ceph and Gluster come into place.
Ceph and Gluster are both systems used for managing distributed storage. Both are considered software-defined storage, meaning they’re largely hardware-agnostic. They organize the bits that make up your data using their own underlying infrastructure, which is what defines this choice: what underlying framework do you want supporting your data?
That’s a decision you want to make based on the type of data you’re storing, how that data is accessed, and where that data lives. Ceph and GlusterFS are both good choices, but their ideal applications are subtly different.
Object-Based Storage for Unstructured Data: Ceph
Ceph is an object-based system, meaning it manages stored data as objects rather than as a file hierarchy, spreading binary data across the cluster. Similar object storage methods are used by Facebook to store images and Dropbox to store client files. In general, object storage supports massive unstructured data, so it’s perfect for large-scale data storage. The system is maintained by a network of daemons in the form of cluster monitors, metadata servers, and journaled storage. These combine to make Ceph capable but more complex than the competition.
The uncommon object and block-based storage means Ceph uses its own tools for managing storage. This requires system administrators to become familiar with Ceph’s tools. The true inner workings of Ceph can be hard to grasp at first glance. Basically, you need to be willing to learn how it works to gain the benefits. The self-managed, self-healing system can reduce ongoing operating expenses over time, and Ceph can run on industry-standard server hardware.
The system can also create block storage, providing access to block device images that can be stripped and replicated across the cluster. Applications can access Ceph Object Storage through a RESTful interface that supports Amazon S3 and Openstack Swift APIs. The goal is high performance, massive storage, and compatibility with legacy code.
Block Storage in Hierarchical Trees: GlusterFS
GlusterFS, better known as Gluster, is a more traditional file store. It’s easy to set up, and a properly-compiled build can be used on any system that has a folder. The flexibility and ease of use is a major advantage of the system. While it can scale to enormous capacities, performance tends to quickly degrade. It’s best suited for large average file sizes (greater than 4 MB) and sequential access. A cluster can spread across physical, virtual, and cloud servers, allowing for flexible storage virtualization.
Gluster uses block storage, which means chunks of data are stored on open space on connected cluster devices. File and block-based storage uses a file system that arranges files in hierarchical trees. It aggregates geographically distinct servers over Ethernet to create a scalable parallel network file system. Gluster is essentially a cluster-based version of FUSE and NFS, providing a familiar architecture for most system administrators. It’s intended to be simple, maintainable, and widely usable but doesn’t have the speed of access that Ceph can offer under the right circumstances.
Ceph is best suited toward the rapid access of unstructured data, which constitutes the vast majority of files in the world. Gluster is better for sequential data access, like streaming video, or for applications where speed isn’t as important, like backup.
Which file storage system are you using?