Digital Storage Projections For 2021, Part 3
Hand touching pass thru infographic to 2021 year with blue bokeh and dark background. New year … [+]
The great pandemic of 2020 taught us about the importance of digital connections and compute, memory and networking in data centers that enabled us to work from home. The trends toward remote work created during the pandemic will continue (to some extent) even after the pandemic has ended. According to a recent survey report from INAP, “54% say… the pandemic has motivated their organization to move applications and workloads off-premise. IT pros also shared the biggest challenges they expect to face in 2021: adapting infrastructure and networking strategies for long-term remote work or a return to the office.”
Some industries were particularly impacted by the pandemic and had to scramble to find ways to stay in business, aided by storage technology. For instance, live video projection was shut down or significantly constrained throughout much of 2020. According to David Feller, VP or product management and solutions engineering at Spectra Logic, “The pandemic has resulted in media and entertainment (one of the industries hardest hit by COVID-19), becoming almost completely dependent on reusing pre-existing content, making access to archive absolutely critical.”
This move to supporting new working models, provide ready access to archives plus the growth of machine to machine data generation and data movement (with the growth of IoT, AI and other big data applications) will drive many trends in the development of digital storage and memory systems. This third and last part of my Digital Storage Projections for 2021 blog will explore trends in storage/memory systems in 2021 and beyond.
We will explore developments in general storage architectures (NAS, SAN and object storage), NVMe and NVMe-oF (over fabric), as well as CXL and GenZ and talk about how these technologies will enable disaggregated and composable data centers. We will also discuss computational storage and in-memory processing and the general growth of distributed processing using special and general-purpose accelerators. Finally, we will look at developments and needs for storage security to address ransomware and other cybersecurity attacks. We will include some quotes from industry players as well.
NAS (where data is accessed as files) is generally associated with unstructured data such as video and medical images, while SAN is generally associated with defined (structured) workloads such as databases. High performance scalable NAS also has an important role in storage for AI and big data analytic applications. However, object storage, the most common storage in large data centers is also used for unstructured data. Scale out NAS is sometimes able to compete with object storage scalability. In addition, many software platforms, designed for file level access, give some advantage to NAS or file-access to data through gateways or other approaches for accessing object storage. We expect growth in both object storage and scale out NAS in 2021.
Object storage in data center applications is often used in containerized workflows that can be spun up or spun down as needed. Creating and managing persistent storage of containerized applications is becoming popular with many different tools. Some of these tools are also part of storage solutions that can span multiple hybrid data centers or cloud storage. According to Zedara, “The need to move data from on-premises to cloud or between clouds is a challenge that can be overcome with containers – and having a common platform will enable containerization to go to new levels in 2021”. 2021 will see growth in storage and memory developments for containerized applications as well as multi-cloud environments.
At the same time that cloud storage is growing, it is not entirely displacing on-premises storage. According to Jon Toor, CMO for Cloudian, “All public cloud providers now offer on-prem solutions, which positions public cloud and on-prem as environments that should work in combination, rather than being viewed as an either/or decision. In addition, enterprise storage providers have upped their cloud game, building new solutions that work with the public cloud rather than competing with it. As both sides move towards the center, the inevitable result is that organizations will come to view public cloud and on-prem as two sides of the enterprise storage coin.” In addition to traditional on-prem and cloud, with the rise of IoT and higher speed wireless networks more storage and processing at the network edge and endpoints is becoming common in 2021 and beyond.
Storage tiering that optimizes storage cost and performance puts only the most accessed data on the fastest (usually SSD) storage devices. As the amount of data grows, the amount of data in accessible or archive archives will increase. According to Shawn O. Brurne, Global Hypergrowth Storage OM at IBM, “Only 33% of IT budgets shrunk as a result of the 2020 challenges, most budgets aligned spending to mobile access and security enhancements. While some infrastructure changes are on hold, storage requirements continue to grow. Active Archives are the most efficient method of reducing cost of infrastructure without recognizing penalties. The growth created in cloud storage during the pandemic will lead to an increased spend on active archives by Hyperscale and Hyperscale-lite storage providers.”
NVMe is now the dominant storage interface for new SSD storage systems. NVMe works on the PCIe physical bus and thus its performance will continue to advance with future PCIe bus advances. NVMe over fabrics, such as Fibre Channel and Ethernet, are allowing wider use of remote direct memory access (RDMA) of various sorts as well as helping to enable storage pooling and server disaggregation (including distributed computing approaches, often called computational storage, discussed below). NVMe based storage solutions will continue to dominate in 2021 with faster connectivity as PCIe Gen4 and early Gen5 buses become available. The figure below shows how disaggregated and composable infrastructure differs from traditional converged infrastructure.
Comparison of converged versus composable infrastructure
Intel Presentation, August 2020
The CXL interface also runs on the PCIe bus and unlike the traditional memory channel on computers, supports memories with differing characteristics (heterogenous memories), for instance Intel’s Optane as well as DRAM, but possibly other memories as well. CXL is being touted as providing the means to disaggregate and pool memory and will also lead to distributed processing close to memory, often called in-memory compute.
The CXL initiative has also joined with the GenZ initiative, with a general consensus of using CXL inside a box and GenZ for box to box and rack to rack connectivity. CXL and GenZ will likely continue in the early development and demonstration phase through 2021 with early products available in 2022 or later. The figure below from a VMWare presentation at the 2020 Persistent Memory Summit shows CXL or GenZ for persistent memory pooling serving a couple of servers.
Showing use of CXL and GenZ to pool heterogeous memory
Image from VMware Presentation at the 2020 PM Summit
Computational storage drives (CSDs) brings processing closer to where the data is stored, either in the storage media itself or in the media controllers. In addition, adding additional intelligence in a storage network, e.g. with Mellanox’s latest generation of Smart NICs provides new distributed processing options. Moving processing closer to the storage reduces the time to process data and also reduces energy consumption from moving data between processors and storage. All these computational storage approaches are using NVMe and NVMe over fabrics. The image below from SNIA shows various approaches to bringing processing closer to the stored data.
Examples of computational storage
Image from SNIA
Computational storage was originally developed to off-load storage-centric functions from a computer system CPU. These initial uses include data compression, data encryption and RAID control. However computational storage with more sophisticated processors can perform more advanced data analysis functions closer to the data and thus faster than if a CPU was used for this data analysis.
Computational storage can use ASICs or FPGA domain specific processors (or accelerators). SNIA defines fixed computational storage services and programmable computational storage services. The fixed services perform dedicated services such as data compression, data encryption or simple RAID management. The programable version can run a host operating system, usually Linux and is more flexible in the local processing that it can provide. In 2021 we expect to see the greater use of storage systems with programmable computational storage services and continued use of fixed services both in hyperscale and enterprise applications.
In addition to changes in basic storage hardware, storage software will increasingly use AI tools for storage management. According to Shridar Subramanian, CMO of StorageCraft, “Cutting-edge storage tools increasingly rely on AI and machine learning to automate the data backup process. Given the exploding size of enterprise data, these intelligent tools will become vital for maintaining an efficient backup process that can quickly and effortlessly react to changing requirements while saving untold hours on manual backups.”
With many employees working at home, organizations with large amounts of valuable data have been more vulnerable to various hacking and malware attacks. Among these are ransomware attacks, where the user’s data is encrypted and the data is only released (if at all) after a ransom is paid. Storage companies have responded by pointing out that having an air gap between some storage and the networked storage, such as with magnetic tape backup, can save the loss of lots of data during an attack.
Not having sensitive data on accessible hardware is an important way to protect that data. Drew Daniels, CISO and CIO of Druva said, “A strong data protection architecture will be key to ensure endpoints aren’t cluttered unnecessarily with sensitive or confidential data like PII. Instead, the focus should be on backing up such data, and then restoring it temporarily at a future time, if and when required.”
Other solutions are WORM and immutable data backups where encryption and other changes in data is prevented. According to Cathleen Southwick CIO for Pure Storage, “In 2021, regulatory bodies will enact stricter data privacy laws for consumer protection, and CIOs will make security their topmost priority and invest in confidential computing, the virtual private cloud, and other secure infrastructure.”
Also, in general, since data is kept on digital storage, improving the security of data movement is an important area of development as well as ways to assure that some sort of backdoor is not built into the hardware. Control of the component supply chain and ensuring the provenance of components using technologies such as distributed legers will be an important element in storage system security. In addition, there is much work going on to establish roots of trust in storage components and storage systems. We think that, after the security breaches and malware attacks in 2020, that 2021 will see a major emphasis in developing more comprehensive storage system security.
2021 will see continued growth of storage in data center to on-premises, the edge and end points to support remote work and the growth of IoT and AI workloads. NVMe and NVMe-oF will be commonly used in storage system with continuing development of CXL, GenZ and other memory fabrics. Computational storage will demonstrate useful applications and storage security will be a big focus to secure valuable institutional data.