Daily Chip Digest: Online Articles -AI/ML

AI & Machine Learning, Cloud & Edge Computing, Datacenters & Servers

Processing Moves To The Edge

3 Shares

Definitions vary by market and by vendor, but an explosion of data requires more processing to be done locally.

April 12th, 2018 - By: Kevin Fogarty

Edge computing is evolving from a relatively obscure concept into an increasingly complex component of a distributed computing architecture, in which processing is being shifted toward end devices and satellite data facilities and away from the cloud.

Edge computing has gained attention in two main areas. One is the industrial IoT, where it serves as a do-it-yourself infrastructure for on-site data centers. The second involves autonomous vehicles, where there is simply not enough time to ask the cloud for solutions.

But ask two people to describe it and you are likely to get two very different answers. On one hand, it is understood well enough that it can be used in satellite IIoT data centers and in machine learning-enabled iPhones. On the other, most of those designing it can’t say what it looks like.

Much of the confusion stems from the fact that edge computing is not a technology. It’s more of a technological coping mechanism. It represents a series of efforts to deal with the exponential growth of data from billions of endpoint devices by digesting at least some of that data wherever it is created. That requires building massive compute performance into everything from sensors to smart phones, and all of this has to happen within an even tighter power budget.

“We are moving to an intelligent edge,” said , president and CEO of Cadence, said in a recent speech. “This is going to be a new era for semiconductors. We want data at our fingertips to be able to make decisions on the fly.”

This approach stands in stark contrast to the general consensus several years ago that simple sensors would be used to collect data from the physical world and processed in the cloud. The original concept failed to take into account that the amount of data being collected by sensors is growing too large to move around quickly. The best solution is to pre-process that data, because most of it is useless.

“The IoT represents an exponential increase in the number of devices in the world, and the amount of data generated by these devices could swamp the data center’s ability to process it,” according to Steven Woo, distinguished inventor and vice president of enterprise solutions technology at Rambus. “It’s likely you can do aggregation, filtering and some rudimentary processing, depending on what how complex your computations are.”

This is the growing responsibility of edge devices. But how the edge evolves, and how quickly, depends upon the readiness of end markets that will drive it. So while the edge began taking off last year in the IIoT, it is still on hold in the automotive space because it’s not clear at this point how quickly fully autonomous vehicles will begin to ramp up.

“If there isn’t an immediate production target, you might get away with something that’s a lot less advanced,” said Ty Garibay, CTO at ArterisIP. “You might be able to aggregate this kind of functionality into multiple smaller chips made by different companies. There will be LiDAR, radar, and possibly a sensor fusion hub, which may be an FPGA. And then you might need enough compute power for the car controller, which also may have to figure out which data to process and what to send back to the cloud. The question now is how you make it smart enough to send back the right data.”

What is the edge?

Many chipmakers and systems companies struggle with the variety of ways it is possible to shift computing to the edge. There are no demarcation lines between the many levels that may or may not be included in this distributed computing model.

“There is a lot of difference of opinion on the point of what the edge looks like,” according to Jeff Miller, product marketing manager at Mentor, a Siemens Business. “The cloud is where the really high-powered machine language or computational resources will continue to be, but bandwidth to get it there is expensive and shared spectrum is a finite resource. So just streaming all that data to the cloud from thousands of devices without some pre-processing at the edge is not practical.”

It doesn’t help that there are varied language and explanations offered by carriers, networking providers, integrators, datacenter OEMs and cloud providers—all of which are competing for what might be billions of dollars in additional sales in a market described by a term that doesn’t mean anything specific enough to package under a single brand name, according to Tom Hackenberg, principal analyst for embedded systems at IHSMarkit.

“Edge computing” is a common but non-specific term referring to the addition of computing resources anywhere close to the endpoint of an IT infrastructure. The definition has been narrowed colloquially to mean compute resources installed specifically to support IoT installations. “It’s a set of architectural strategies, not a product, not a technology,” Hackenberg said.

Even limiting the definition of edge to its function as the compute power for IoT installations doesn’t focus the picture much, according to Shane Rau, research vice president for computing semiconductors at IDC. “There is no one IoT. There are thousands, each in a different industry with a different level of acceptance and capability. It may not be possible to see what the edge looks like because it looks like the edge of everything.”

Still, there are benefits to getting this right. Gopal Rahgavan, CEO of startup Eta Compute, said that edge computing improves both privacy and security because it keeps data local. And it improves response time by eliminating the time it takes to send and receive data back from the cloud.

“You want to sense, infer, and act without going to the cloud, but you also want the ability to learn on the edge,” he said, noting that the cochlea in the ear already does this today, allowing it to identify speech in a noisy environment. The same happens with the retina in the eye, which can decipher images and movement before the brain can process those images.

Fig. 1: Edge computing platform. Source: NTT

Why the edge is getting so much attention

One of the initial drivers behind the edge computing model was the industrial IoT, where a desire to see projects succeed prompted industrial organizations to try to solve both the cost-efficiency and data-deluge problems on their own.

“In the industrial space there is a need for factory automation and intelligence at the edge, and the risk is comparatively smaller because it is possible to demonstrate value in accomplishing those things,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “The IIoT will lead the charge to build out IoT infrastructure for very practical reasons.”

That, in turn, led to a push to keep compute resources near the physical plants. But the benefits go much deeper than just keeping IoT devices off the Internet, according to Rambus’ Woo. More processing power means greater ability to pre-process data to eliminate repetitions of the same temperature reading, for example, or render the data feed from hundreds of sensors as a single status report.

Apple’s announcement in 2017 that it would put machine learning accelerators into its top-end iPhone touched off a rush that Gartner predicts will see 80% of smartphones will be AI-equipped by 2022. Those will be powerful, latency-sensitive edge devices, but will focus on functions aimed at individual consumers – augmented reality and biometric authentication, for example, which will limit their impact in the short term, said IDC’s Rau.

The addition of ML capabilities into other consumer devices – and autonomous vehicles and other smart devices – is likely to create an ecosystem on which all kinds of powerful applications can be built, using edge data centers for support, said Mohandass.

“We saw in the mainframe era that having a central brain and everything else being dumb didn’t work,” he said. “There was a lot more computing power with PCs, even if they were limited. Now, with central cloud, hyperscale datacenters have a lot more power. Clients aren’t quite a dumb terminal, but they are not too smart. We’re heading for another inflection point where the edge devices, the clients, have the capacity to have a lot more intelligence. We’re not there yet, but it’s coming.”

Until then, the focus should be on developing ways to use that deluge of data from IoT devices to accomplish things that wouldn’t be possible otherwise, said Mentor’s Miller. “The core value of the IoT is in bringing together large data sets, not so much monitoring so you know immediately when there’s a leak in tank 36 out of 1000 tanks somewhere. The value is in identifying things that are about to fail or activate actuators in the field before a problem actually comes up.”

Other pieces of the puzzle

Much of the edge model is based on the way the human body processes information. A person’s hand will recoil from a hot stove, for example, before signals reach the brain. The brain then can explain what just happened and avoid such situations in the future.

That sounds simple enough in concept, but from a chip design standpoint this is difficult to achieve. “A lot of IoT devices actually present an interesting dilemma because they don’t need a lot of memory, but what they need is a very small power signature,” said Graham Allan, product marketing manager for memory interfaces at Synopsys. “That is a particular application that is not yet well served by the DRAM industry. It remains to be seen whether or not that market will be big enough to warrant having its own product, or whether it will continue to be served by the two generations of older LPDDR technology and you just have to live with what’s there.”

In some cases, there may be a middle step, as well. In 2015, Cisco proposed the idea of Fog computing, extending the reach of cloud-based applications to the edge using boxes that combined routing and Linux-based application servers to analyze sensor data using Cisco’s IOx operating system. Fog has its own open consortium and reference architecture for what it calls a “cloud-to-Thing continuum of services,” and NIST was interested enough to put out Fog guidelines. (The IEEE Standards Association announced in October it will use the OpenFog Reference architecture as the basis for its work on fog standards under IEEEP1934 Standards Working Group on Fog Computing and Networking Architecture Framework.)

This also is aimed at keeping the Internet from drowning in things. Initial plans for the IoT included building IoT control centers at or near the site of IoT installations, with enough compute resources to store the data flowing from devices, provide sub-second response to devices where it was needed, and boil masses of raw data down to statistical reports that could be digested easily by process. These principles were traditional best practices for embedded systems installed as endpoints near the edge of the organization’s IT infrastructure, but the scale and variety of functions involved turned the decision to add computing resources at the edge into edge computing. That has evolved still further into the “intelligent edge.”

Regardless of the moniker, edge computing appears to be icing on the cake for technology providers. For one thing, it won’t cannibalize public cloud spending. IDC predicts a 23% increase this year compared to last, and 21.9% annual growth until 2021. And it could only be helping sales of the IoT, a market in which IDC predicts spending will rise 15% in 2018 compared to 2017, to a total of $772 billion, $239 billion of which will go to modules, sensors, infrastructure and security. IoT will see annual growth of 14% per year and pass the $1 trillion mark in 2020, according to IDC.

Gartner predicts semiconductor revenue will rise 7.5% to $451 billion in 2018, far above the record $411 billion in 2017. And by 2021 51% of all devices connecting to the Internet will be IoT. Their chatter will rise from 2% of all global IP traffic during 2016 to 5% of all IP traffic, according to Cisco Systems (Cisco VNI Global IP Traffic Forecast).

Humans will interact with those devices an average of 4,800 times per day in 2025, helping to drive the volume of digital data created every year up by a factor of 10, from 16.1 zettabytes in 2016 to 163 zettabytes during 2025, according to IDC’s August, 2017 report Data Age 2025.

While reports from IDC and IHSMarkit show the cloud market continuing to grow, they have trouble showing the increasing dominance of edge computing, which may not exist in a formal sense. Moreover, it is difficult to define well enough for those who design the intelligence to make it happen.

IHSMarkit’s most recent estimate is that there were about 32 billion IoT devices online during 2017; there will be 40 billion by 2020, 50 billion by 2022 and 72.5 billion by 2025. “The IoT exists because microcontrollers and other controllers came down in price enough to make it feasible to connect a wider range of embedded device, but we didn’t have the infrastructure to support that,” Hackenberg said. “That is what edge computing addresses. Once a stronger infrastructure is in place, growth in the IoT explodes.”

That’s not bad for a concept that is still ill-defined. “Everyone gets very excited about the edge, but no one knows what it means,” according to Stephen Mellor, CTO of the Industrial Internet Consortium (IIC), a standards- and best-practices consortium that is heavily supported by Industrial Internet of Things providers. The group put out its own guide to IoT analytics and data issues last year. “You can do some controlled analysis and processing at the edge, but you still need the cloud for analytics on larger datasets that can help you decide on a plan of attack that you then execute closer to the edge.”

Fig. 2: Market impact of Edge, IoT growth. Source: Cisco Systems

Datacenters, Data Closets, Data Containers

Not surprisingly, there is some variability in what building blocks and configurations might work best as edge data centers. Edge data centers have to be more flexible and more focused on immediate response than traditional glass-house data centers. They also have to be able to combine many data streams into one usable base that can be acted upon quickly.

From a hardware perspective, however, the edge can be anything from a collection of servers and storage units house using a co-location agreement in a local cloud or data processing facility, to a hyperconverged data center-infrastructure module housed in a cryogenically cooled shipping container.

The scale of some IoT installations will force some organizations to build full-scale data centers even at the edge, or use a piece of one owned by a service provider, according Michael Howard, executive director of research and analysis for carrier networks at IHSMarkit. Some carriers are interested in accelerating converting the 17,000 or so telco wiring centers in almost every community in the U.S. to offer richer IT services, including edge services. Central Office Rearchitected as a Datacenter (CORD) programs have converted only a few facilities, however, and most will see more use in the conversion to 5G than in edge computing, Howard said.

Other options include the smaller, more modular and more easily scalable products that make it easier to assemble resources to fit the size and function of the devices they support, Hackenberg said. That could mean hyper-converged datacenter solutions like Cisco’s UCS, or pre-packaged 3KVA to 8KVA DCIM-compliant Micro Data Centers from Schneider Electric, HPE and others. There also are VM-based self-contained server/application “cloudlets” described by Mahadev Satyanarayanan of Carnegie Mellon University, and the nascent Open Edge Computing consortium.

—Ed Sperling contributed to this story

Further Reading

Nvidia sets sights on the driverless revolution with Drive PX Pegasus

Meanwhile, Intel has promised that its first ML processor based on the Nervana technology it bought in 2016 will reach market in 2019 under the code name of Spring Crest. The company also currently has a Nervana chip for developers to get their feet wet with AI, called Lake Crest. Intel says Spring Crest will eventually offer three to four times the performance of Lake Crest.

Can all those survive? “I think in the future, we’re going to see an evolution of where AI manifests itself,” says Movidius’ Brown. “If you want it in a data center, you need a data center chip. If you want a headset, you find a chip for it. How this will evolve is we may see where different chips have different strengths, and those will possibly get merged into CPUs. What we may also see are chips coming out with multiple features.”

If all that feels a bit like deja vu, maybe it is. The progression of the AI chip could in some ways match how chips of the past evolved—things started with high specialization and many competitors, but eventually some offerings gained traction and a few market leaders encompassed multiple features. Thirty years ago, the 80386 was the premier desktop chip and if you were doing heavy calculations in Lotus 1-2-3, you bought an 80387 math co-processor for your IBM PC-AT. Then came the 80486, and Intel made all kinds of noises about the math co-processor being integrated into the CPU. The CPU then slowly gained things like security extensions, a memory controller, and GPU.

So like every other technology, this emerging AI chip industry likely won’t sustain its current plethora of competitors. For instance, OTAS’ Doris notes many internal-use chips that don’t come to market become pet projects for senior technologists, and a change of regime often means adopting the industry standard instead. Intersect360’s Snell points out that today’s army of AI chip startups will also diminish—“There’s so many competitors right now it has to consolidate,” as he puts it. Many of those companies will simply hope to carve out a niche that might entice a big player to acquire them.

“There will be a tough footrace, I agree,” IBM’s McCredie says. “There has to be a narrowing down.” One day, that may mean this new chip field looks a lot like those old chip fields—the x86, Nvidia GPU, ARM-worlds. But for now, this AI chip race has just gotten off the starting line, and its many entrants intend to keep running.

Andy Patrizio is a freelance technology journalist based in Orange County, California, not entirely by choice. He prefers building PCs to buying them, has played too many hours of Where’s My Water on his iPhone, and collects old coins when he has some to spare.

View article comments

From <https://arstechnica.com/gadgets/2018/07/the-ai-revolution-has-spawned-a-new-chips-arms-race/?amp=1>

designLines

Internet Of Things Designline

IoT Was Interesting, But Follow the Money to AI Chips

By Kurt Shuler 02.20.2019 3

By 2025, a full five sixths of the growth in semiconductors is going to be the result of AI.

A few years ago there was a lot of buzz about IoT, and indeed it continues to serve a role, but looking out to 2025 the real dollar growth for the semiconductor industry is in algorithm-specific ASICs, ASSPs, SoCs, and accelerators for Artificial Intelligence (AI), from the data center to the edge.

In fact, the up-coming change in focus will be so radical, that by the 2025 timeframe, a full five sixths of the growth in semiconductors is going to be the result of AI.

Figure 1: By 2025, a full five sixths of the growth in semiconductors will be geared towards enabling AI/deep learning algorithms. (Image source: Tractica)

Anyone tracking the industry closely knows how we got to this point. Designers were implementing IoT before it even became a “thing.” Deploying sensors and communicating on a machine-to-machine level to perform data analysis and implement functions based on structural or ambient environment and other parameters just seemed like a smart thing to do. The Internet just helped to do it remotely. Then someone latched onto the term “the Internet of things” and suddenly everyone’s an IoT silicon, software, or systems player.

From the IC suppliers’ perspective, simply pulling already available silicon blocks together to form a sensing signal chain, processor, memory, and an RF interface was enough to make them a “leading provider of IoT solutions.”

While the hype was destined to fade, there remains a good deal of innovation around low-cost, low-power data acquisition, with the ensuing low margins. There may be higher margins at the software and system level for deployers of IoT networks, but not for semiconductor manufacturers. But that’s about to change, as the focus shifts from generating data to analyzing data using the explosion of deep-learning algorithms that are enabling what we now call artificial intelligence, or AI.

This shift in focus from generating data to making practical use of it through analysis and the application of AI algorithms has stretched the limits of classic processor architectures such as CPUs, GPUs, FPGAs. While all have been useful in their own distinct way, the need for faster neural network training and greater inference efficiency and more analysis at the edge for lower latencies, have pushed silicon providers and OEMs to change their modus operandi. Now architectures comprising the optimum mix of processing elements to run specific algorithms for AI are necessary, make that demanded, for applications such as autonomous vehicles, financial markets, weather forecasts, agriculture, and someday smart cities.

The applications have given rise to many AI function market segments, which can be roughly divided into data center training and inference, and edge training and inference.

Figure 2: Efficient, fast, and powerful inference engines will be required at both the data center, as well as at the edge, where localized processing can reduce latencies. (Image source: Arteris)

However, the bad news for many is that, like IoT, there’ll be a shake out and many won’t make it in applications like autonomous vehicles. The good news is that they’ll be able to take their learnings and apply it somewhere else, like tracking passers-by at street windows for marketing campaigns.

Those who last, will have made the best use of heterogeneous processing elements, memory, I/O and on-chip interconnect architectures to achieve the necessary gains in efficiency and performance required for the next generation of AI solutions.

Until that shakeout happens, both OEMs and dedicated chip houses will be spending a lot of cash and IP capital on developing SoCs, ASICs/ASSPs, and accelerators that will best implement the most advanced algorithms at the data center and at the edge.

Figure 3: The total dollars spent on inference (2x) and training (4x to 5x) at the data center will grow sharply between now and 2025, reaching up to $10 and $5 billion, respectively. However, the rate of growth in dollars spent on inference at the edge is >40x, reaching $4 billion by 2025. (Image source: McKinsey & Company)

The smart silicon providers have already moved off the old “28 nm sweet spot” where there was a temporary “time out” to develop silicon to make the most of IoT principles. That emphasis on the sweet spot may have been more about a lack of vision as to where things were really heading. Now we know what’s coming: are you ready?

— Kurt Shuler is vice president of marketing at Arteris IP and has extensive IP, semiconductor, and software marketing experience in the mobile, consumer, and enterprise segments working for Intel and Texas Instruments. He is a member of the U.S. Technical Advisory Group (TAG) to the ISO 26262/TC22/SC3/WG16 working group, thereby helping create safety standards for semiconductors and semiconductor IP.

From <https://www.eetimes.com/iot-was-interesting-but-follow-the-money-to-ai-chips/#>

ATTACKING THE DATACENTER FROM THE EDGE INWARD

November 6, 2019 Jeffrey Burt

For much of the decade, a debate around Arm was whether it would fulfill its promise to become a silicon designer with suppliers of any significance to datacenter hardware. The company initially saw an opportunity in the trend among enterprises in buying energy-efficient servers that could run their commercial workloads but not sabotage their budgets by gobbling up huge amounts of power while doing that. Arm’s low-power architecture that dominates the mobile device market seemed a good fit for those situations, despite the challenge of building up a software ecosystem that could support it.

And every step – forward or back – along the way was noted and scrutinized, from major OEMs like Dell EMC and Hewlett Packard Enterprise running out systems powered by Arm-based SoCs and the rise of hyperscalers like Google, Facebook, Microsoft and Amazon with their massive datacenters and their need to keep lid on power consumption to the early exit of pioneer Calxeda, the backing away by AMD and Samsung, the sharp left turn by Qualcomm to exit the server chip space after coming out with its Centriq system-on-a-chip (SoC), the consolidation that saw Marvell buy Cavium, and the embrace by the HPC crowd such as Cray and Fujitsu.

Through all this, Arm has gained a degree of traction, from major cloud providers and system makers adopting the Arm architecture to various degrees to chip makers like Marvell (now with Cavium) and Ampere – led by a group of ex-Intel executives, including CEO Renee James – putting together products to go into the systems.

While all this was going, the industry saw the rise of edge computing, driven by the ongoing decentralization of IT that has been fueled by not only the cloud but the proliferation of mobile devices, the Internet of Things (IoT), big data, analytics and automation, and other trends like artificial intelligence (AI) and machine learning. There is a drive to put as much compute, storage, virtualization and analytics capabilities as close as possible to the devices that are generating massive amounts of data and to gain crucial insights into that data as close to real time as possible.

Arm over the past couple of years has put a sharp focus on the edge, IoT, 5G and other emerging trends, a concentration that was evident at last month’s TechCon show. There was more discussion of the company’s Pelion IoT platform and Neoverse – an edge and hyperscale infrastructure platform that includes everything from silicon to reference designs.

The chip designer talked about expanding its Platform Security Architecture (PSA) that Arm partners and third parties can leverage to build more security into their IoT devices out to the infrastructure edge, part of a larger effort called Project Cassini. Launched in partnership with ecosystem partners, Arm is looking to leverage its strong presence in endpoints to drive the evolution of infrastructure and cloud-native software at the edge through Arm technologies and the development of platform standards and reference systems.

It’s part of Arm’s effort to take a leadership role in how the edge develops, a delicate balancing act that includes other technology vendors and essentially sets the direction while enabling broad participation in how things move in that direction, according to Drew Henry, the company’s one-time head of the infrastructure business and now senior vice president of IPG and operations. It’s a different role than Arm has taken in the past in the datacenter and uncommon in the industry as a whole, Henry tells The Next Platform.

“What we’re doing is carefully stepping with our ecosystem a little in front of it, saying, ‘Hey, this is the view we have. Let’s all go along this together,’” he says. “You see this beginning to show up. There’s this industry consortium – that’s the Autonomous Vehicle Computing Consortium that we’re doing in the autonomy space. Project Cassini, which is about how to create a standard platform for edge computing that respects the diversity of silicon and some of the designs around those types of devices, going from low power to high power, small amounts of compute to large amounts of compute, all kinds of locations and industrial IoT locations to 5G bay stations, whatever. Realizing that’s a strength, that you want to enable a software ecosystem to be able to deploy [solutions], how you marry those things. We stepped in with that ecosystem and said, ‘Alright, let’s just agree on some standards on a way these platforms are going to boot, let’s agree with the way security is going to be held in it. If we do that well, then the cloud-native software companies will be able to come in and deploy software on top of it in a cloud-native stack fairly easily to do the things that people want to do. That’s that balance.”

That’s a contrast to what has driven computing with Intel, Henry added, “where there’s been this ecosystem, but with one incredibly dominant viewpoint for it. There’s just so much invention that has to happen over the next decade or so to accomplish these rules of autonomy and Internet of Things and stuff that it’s too much to expect that any one company is going to have all the right answers. The ecosystem needs to [drive] it.”

A DIFFERENT ANIMAL

The datacenter compute environment for Arm continues to evolve, driven not only by what the chip designer is doing with its architecture but also with the efforts from manufacturing partners. Marvell is continuing to develop the ThunderX2 SoCs that it inherited when it bought Cavium for about $5.5 billion last year, and other chip makers like Ampere are coming to market with offerings based on the X-Gene designs from Applied Micro, which the company bought. At the same time, some tech vendors are taking Arm’s architecture and creating their own chips. Fujitsu is developing the AFX64 chip, which will be the foundation for its Post-K supercomputer. Amazon Web Services (AWS) turned to the Arm architecture – with expertise from its acquisition of Annapurna Labs for $350 million in 2015 – for its Graviton chips. Huawei is also making a play in the Arm chip space.

Enterprise, supercomputer and cloud datacenters are served by suppliers and companies that develop their own Arm-based chips, with Arm innovation and investment, Henry says. Arm is not so much leading an evolution but working with companies to grow the presence of its architecture in datacenters. But the edge is different and calls for Arm to take a different – and a more leadership – role.

“The spaces in compute in the large, aggregated compute areas, which are datacenter and supercomputing, I feel really good about the portfolio that is servicing those,” he says. “That’s why we’ve kind of shifted our focus now, effectively saying, ‘Alright, we’ve got a lot of work to do to continue to help with that group, but there’s also this emerging area of compute at the edge that also needs to be invested in – where if we don’t invest, collectively as an ecosystem to get it established, it is going to take longer to mature than it should.”

The edge is a different compute environment, where the “ecosystem broadens because now you’ve got companies that have networking IP that you can combine together with silicon,” Henry says. “This is where Broadcom enters into the marketplace, and NXP enters into the marketplace and others, so we’ve got a pretty rich ecosystem of being able to provide compute wherever you need compute. A lot of people fixate on the classical server sitting in a datacenter. That’s a relatively small unit amount in the marketplace, relatively small compute that’s done across the ecosystem. We absolutely are doing great in that space, but it’s not the only focus for us. Servicing the cloud is fairly well understood. Serving compute at the infrastructure edge is more complicated, so that is where we can be much more involved in leading and coordinating the activities there. That’s what Cassini’s about.”

From <https://www.nextplatform.com/2019/11/06/attacking-the-datacenter-from-the-edge-inward/>

It Takes Liquidity To Make Infrastructure Fluid

November 14, 2019 Timothy Prickett Morgan

Stranded capacity has always been the biggest waste in the datacenter, and over the years, we have added more and more clever kinds of virtualization – hardware partitions, virtual machines and their hypervisors, and containers – as well as the systems management tools that exploit them. There is a certain amount of hardware virtualization going on these days, too, with the addition of virtual storage and virtual switching to so-called SmartNICs.

The next step in this evolution is disaggregation and composability, which can be thought of in a number of different ways. The metaphor we like here at The Next Platform is smashing all of the server nodes in a cluster and then stitching all of the components back together again with software abstraction that works at the peripheral transport and memory bus levels – what is commonly called composability. You can also think of this as making the motherboard of the system extensible and malleable, busting beyond the skin of one server to make a giant pool of hardware that can allow myriad, concurrent physical hardware configurations – usually over the PCI-Express bus – to be created on the fly and reconfigured as workloads dictate. This way, CPUs, memory, flash storage, disk storage, and GPU and FPGA accelerators are not tied so tightly to the nodes they happen to be physically located within.

There are a lot of companies that are trying to do this. Among the big OEMs, Hewlett Packard Enterprise has its Synergy line and Dell has its PowerEdge MX line and its Kinetic strategy. Cisco Systems did an initial foray into composability with its UCS M Series machines. DriveScale has offered a level of server composability through a special network adapter that allows compute and storage to scale independently at the rack scale, across nodes, akin to similar projects under way at Intel, Dell, the Scorpio alliance of Baidu, Alibaba, and Tencent, and the Open Compute Project spearheaded by Facebook. Juniper Networks acquired HTBase to get some composability for its network gear, and Liqid dropped out of stealth in June 2017 with its own PCI-Express switch fabric to link bays of components together and make them composable into logical servers. TidalScale, which dropped out of stealth a few months later in October 2017, has created what it calls a HyperKernel to glom together multiple servers into one giant system that can then be carved up into logical servers with composable components; rather than use VMs to break this hyperserver down, LXC or Docker containers are used to create software isolation. GigaIO has been coming on strong in the past year with its own PCI-Express switches and FabreX fabric.

There are going to be lots of different ways to skin this composability cat, and it is not clear which way is going to dominate. But our guess is that the software approaches from DriveScale, Liqid, and TidalScale are going to prevail compared to the proprietary approaches that Cisco, Dell, and HPE have tried to use with their respective malleable iron. Being the innovator, as HPE was here, may not be enough to win the market, and we would not be surprised to see HPE snap up one of these other companies and then Dell to snap up whichever one HPE doesn’t acquire. Then again, the Synergy line of iron at HPE was already at an annualized revenue run rate of $1.5 billion – with 3,000 customers – and growing at 78 percent in the middle of this year, so maybe HPE thinks it already has the right answer.

Liqid, for one, is not looking to be acquired and in fact has just brought in $28 million in its second round of funding, bringing the total funds raised to date to $50 million; the funding was led by Panorama Point Partners, with Iron Gate Capital and DH Capital kicking in some dough. After three years of hardware and software development, Liqid needs more cash to build up its sales and marketing teams to chase the opportunities and also needs to plow funds back into research and development to keep the Liqid Fabric OS, managed fabric switch, and Command Center management software moving ahead.

“We have a handful of large customers that make up a good chunk of our revenues right now,” Sumit Puri, co-founder and chief executive officer at Liqid, tells The Next Platform. “These are the customers we started with back in the day, and we have ramped them to the size we want all of our customers to be, and some of them are showing us projects out on the horizon that are at massive scale. We have dozens of proofs of concept under way, and some of them will be relatively small and never grow into a seven-figure customer. Some of them will.”

Puri is not about to get into specific pricing for the switches and software that turn a rack of servers with peripherals into a stack of composable, logical servers, but says that the adder over the cost of traditional clusters is on the order of 5 percent to 10 percent to the total cost of the infrastructure. But the composability means that every workload can be configured with the right logical server setup – the right number of CPUs, GPUs, FPGAs, flash drives, and such – so that utilization can be driven up by factors of 2X to 4X on the cluster compared to the industry average. Datacenter utilization, says Puri, averages something on the order of 12 percent worldwide (including compute and storage), and as best as Liqid can figure Google, which is the best at this in the industry, is average 30 percent utilization in its datacenters. The Liqid stack can drive it as high as 90 percent utilization, according to Puri. That’s mainframe-class right there, and about as good as it gets.

The prospect pipeline is on the order of thousands of customers, and that is why funding is necessary. It takes people to attack that opportunity, and even if HPE has been talking about composability for the past five years, it is not yet a mainstream approach for systems.

As with most distributed systems, there is a tension between making one large pool of infrastructure and making multiple isolated pools to limit the blast area in the event that something goes wrong in the infrastructure. The typical large enterprise might have pods of compute, networking, and storage that range in size from a half rack, a full rack, or up to one, tow, or even three racks, but rarely larger or smaller than that. They tend to deploy groups of applications on pods and upgrade the infrastructure by the pod to make expanding the infrastructure easier and more cost effective than doing it a few servers at a time.

In a deal that Liqid is closing right now, the customer wants to have a single 800-node cluster, but only wants to have 200 of the nodes hanging off the Liqid PCI-Express fabric because it does not want to pay the “composability tax,” as Puri put it, on all of those systems. Over time, as the company will possibly expand the Liqid fabric into the remaining 600 servers, but it is far more likely that it will be the new servers that are adding in the coming years that will have them, and after a three or four year stint, the old machines that did not have composability will simply be removed from the cluster.

There are a number of different scenarios where composability is taking off according to Liqid. The important thing to note is that the basic assumption is that components are aggregated into their own enclosures and then the PCI-Express fabric in the Liqid switch can reaggregate them as needed, tying specific processors in servers to specific flash or Optane storage, network adapters, or GPUs within enclosures. You can never attach more devices to a given server than it allows, of course, so don’t think that with the Liqid switch you can suddenly hang 128 GPUs off of one CPU. Your can’t do more than the BIOS says. But you can do that much and less as needed.

The Liqid fabric is not just restricted to PCI-Express, but can also be extended with Ethernet and InfiniBand attachment for those cases when distance and horizontal scale is more important than the low latency that PCI-Express switching affords. Liqid’s stack does require disaggregation at the physical level, meaning that the peripherals are ganged up into their respective enclosures and then linked together using the PCI-Express fabric or using NVM-Express over Ethernet or perhaps GPUDirect over RDMA networks to link flash and GPUs to compute elements.

Next week at the SC19 supercomputer conference in Denver, Liqid will be showing off the next phase of its product development, where the hardware doesn’t have to be pooled at the physical layer and then composed, but rather standard servers using a mix of CPUs and GPUs and FPGAs for compute and flash and Optane for storage will be able to have their resources disaggregated, pooled, and composable using only the Liqid software to sort it into pools and then ladle it all out to workloads. The performance you get will, of course, be limited by the network interface used to reaggregate the components – Ethernet will be slower than InfiniBand will be slower than PCI-Express, and for many applications, the only real impact will be the load time for the applications and the data. Any application that requires a lot of back and forth chatter between compute and storage elements will want to be on PCI-Express. But this new capability will allow Liqid to go into so-called “brownfield” server environments and bring composability to them.

So where is composability taking off? The first big area of success for Liqid was, not surprisingly, for GPU-centric workloads, where the GPUs traditionally get locked away inside of a server node and are unused most of the time. Disaggregation and composability allow for them to be kept busy doing workloads, and the hardware configuration can change rapidly as needed. If you put a virtualization or container layer on top of the reaggregated hardware, then you can move workloads around and change hardware as necessary. This is, in fact, what companies are now interested in doing, with either a VMware virtualization or Kubernetes container environment on top of the liquid hardware. Composable bare metal clouds are also on the rise, like this:

Liqid has also partnered recently with ScaleMP so it can offer virtual NUMA servers over composable infrastructure and therefore be better able to compete with TidalScale, which did this at the heart of its eponymous composable architecture.

There is also talk about using Liqid on 5G and edge infrastructure – but everybody is trying to get a piece of that action.

From <https://www.nextplatform.com/2019/11/14/it-takes-liquidity-to-make-infrastructure-fluid/>

Nvidia Arms Up Server OEMs And ODMs For Hybrid Compute

November 18, 2019 Timothy Prickett Morgan

The one thing that AMD’s return to the CPU market and its more aggressive moves in the GPU compute arena have done, as well as Intel’s plan to create a line of discrete Xe GPUs that can be used as companions to its Xeon processors, has done is push Nvidia and Arm closer together.

Arm is the chip development arm that in 1990 was spun out of British workstation maker Acorn Computer, which created its own Acorn RISC Machine processor and significantly for client computing, was chosen by Apple for its Newton handheld computer project. Over the years, Arm has licensed it eponymous RISC architecture to others and also collected royalties on the devices that they make in exchange for doing a lot of the grunt work in chip design as well as ensuring software compatibility and instruction set purity across its licensees.

This business, among other factors, is how and why Arm has become the largest semiconductor IP peddler in the world, with $1.61 billion in sales in 2018. Arm is everywhere in mobile computing, and this is why Japanese conglomerate SoftBank paid $32 billion for the chip designer three years ago. With anywhere from hundreds of billions to a trillion devices plugged into the Internet at some point in the coming decade, depending on who you ask, and a very large portion of them expected to use the Arm architecture, it seemed like a pretty safe bet that Arm was going to make a lot of money.

Getting Arm’s architecture into servers has been more problematic, and the reasons for this are myriad and we are not getting into the litany of it. One issue is the very way that Arm licenses its architecture and makes its money, which is great but which has relied on other chip makers, with much less deeper pockets who do not have the same muscle as Arm, much less AMD or Intel, to extend it for server platforms with features like threading or memory controllers or peripheral controllers. The software stack took too long to mature, although we are there now with Linux and probably with Windows Server (only Microsoft knows for sure on that last bit). And despite it all, the Arm collective has shown how hard it is to sustain the effort to create a new server chip architecture, with a multiple generation roadmap, that takes on Intel’s hegemony in the datacenter – which is doubly difficult with an ascending AMD that has actually gotten its X86 products and roadmap together with the Epyc family that launched in 2017 with the “Naples” processors and that has been substantially improved with the “Rome” chips this year.

All of this is background against what is the real news. And that is that Nvidia, which definitely has a stake in helping Arm server chips be full-functioning peers to X86 and Power processors, is doing something about it. Specifically, the company is making a few important Arm-related announcements at the SC19 supercomputing conference in Denver this week.

The first thing is that Nvidia is making good on its promise earlier this summer to make Arm a peer with X86 and Power with regard to the entire Nvidia software stack, including the full breadth of the CUDA programming environment with its software development kit and its libraries for accelerating HPC and AI applications. Ian Buck, vice president and general manager of accelerated computing at Nvidia, tells The Next Platform that most of the libraries for HPC and AI are actually available in the first beta of the Arm distribution of CUDA – there are still a few that need some work.

As we pointed out last summer, this CUDA-X stack, as it is now called, may have started out as a bunch of accelerated math libraries, but not it comprises tens of millions of lines of code and is on the same order of magnitude in that regard as a basic operating system. So moving that stack and testing all the possible different features in the host is not trivial.

Last month, ahead of the CUDA on Arm launch here at SC19, Nvidia gave it out to number of key HPC centers that are at the forefront of Arm in HPC, notably RIKEN in Japan, Oak Ridge National Laboratory in the United States, and the University of Bristol in the United Kingdom. They have been working on porting some of their codes to run in accelerated mode on Arm-based systems using the CUDA stack. In fact, of the 630 applications that have been accelerated already using X86 or Power systems as hosts, Buck says that 30 of them have already been ported to using Arm hosts, which is not bad at all considering that it was pre-beta software that the labs were using. This includes GROMACS, LAMMPS, MILC, NAMD, Quantum Espresso, and Relion, just to name a few, and the testing of the Arm ports was done in conjunction with not only key hardware partners that have Arm processors – Marvell, Fujitsu, and Ampere are the ones that matter, with maybe HiSilicon in China but not mentioned – or make Arm servers – such as Cray, Hewlett Packard Enterprise, and Fujitsu – or who make Linux on Arm distributions – with Red Hat, SUSE Linux, and Canonical being the important ones.

“Our experience is that for most of these applications, it is just a matter of doing a recompile of the code on the new host and it runs,” explains Buck. This stands to reason since a lot of the code in a hybrid CPU-GPU system has, by definition, been ported to actually run on the Tesla GPU accelerators in the box. “And as long as they are not using some sort of bespoke library that only exists in the ecosystem out of the control of the X86 platform, it has been working fine. And the performance has been good. We haven’t released performance numbers, but it is comparable to what we seen on Intel Xeon platforms. And that makes sense since so many of these applications get the bulk of their performance from the GPUs anyway, and the ThunderX2, which most of these centers have, is performing well because its memory system is good and its PCI-Express connectivity is good.”

Although Nvidia did not say this, at some point, this CUDA-X stack on Arm will probably be made available on those Cray Storm CS500 systems that some of the same HPC centers mentioned above are getting equipped with the Fujitsu A64FX Arm processor that Fujitsu has designed for RIKEN’s “Fugaku” exascale system. Cray, of course, announced that partnership with Fujitsu and RIKEN, Oak Ridge, and Bristol ahead of SC19, and said that it was not planning to make the integrated Tofu D interconnect available in the CS500 clusters with the A64FX iron. And that means that the single PCI-Express 4.0 slot in the A64FX processor is going to be in contention on the A64FX processor, or someone is going to have to create a Tofu D to InfiniBand or Ethernet bridge to accelerate this server chip. A Tofu D to NVLink bridge would be even better. . . . But perhaps this is just a perfect use case for PCI-Express switching with disaggregation of accelerators and network interfaces and dynamic composition with a fabric layer, such as what GigaIO is doing.

That’s not Nvidia’s concern today, though. What Nvidia does want to do is make it easier for any Arm processor plugged into any server design to plug into a complex of GPU accelerators, and this is being accomplished with a new reference design dubbed EBAC – short for Everything But A CPU – that Nvidia is making available and that is shown below:

The EBAC design has a modified GPU tray from the hyperscale HGX system design, which includes eight “Volta” Tesla V100 accelerators with 32 GB of HBM2 memory on each. The GPUs are cross-connected by NVLink so they can share data and memory atomics across those links, and the tray of GPUs also has what amounts to an I/O mezzanine card on the front that has four ConnectX5 network interface cards running at 100 Gb/sec from Mellanox Technologies (which Nvidia is in the process of buying) and four PCI-Express Mini SAS HD connectors that can lash any Arm server to this I/O and GPU compute complex. In the image above, it looks like a quad of two-socket “Mustang” ThunderX2 system boards, in a pair of 1U rack servers, would be ganged up with the Tesla Volta accelerators. Presumably there is a PCI-Express switch chip complex within the EBAC system chip to link all of this together, even if it is not, strictly speaking, composable.

There is probably not a reason it could not be made composable, or extended to support an A64FX complex. We shall see. If anyone needs to build composability into its systems, now that we think about it, it is Nvidia.

From <https://www.nextplatform.com/2019/11/18/nvidia-arms-up-server-oems-and-odms-for-hybrid-compute/>

Daily Chip Digest

Pages

Search This Blog

Online Articles -AI/ML

No comments:

Post a Comment

Smartphone Components

Technology - Google News

CNN.com - Technology

Tech News Headlines - Yahoo! News

Gadget Review

CNET Reviews - Most Recent Reviews

Wired Top Stories

Engadget

VR-Zone Articles

Followers