Online Articles -AI/ML



AI & Machine Learning, Cloud & Edge Computing, Datacenters & Servers


Processing Moves To The Edge
3 Shares



Definitions vary by market and by vendor, but an explosion of data requires more processing to be done locally.
April 12th, 2018 - By: Kevin Fogarty

Edge computing is evolving from a relatively obscure concept into an increasingly complex component of a distributed computing architecture, in which processing is being shifted toward end devices and satellite data facilities and away from the cloud.
Edge computing has gained attention in two main areas. One is the industrial IoT, where it serves as a do-it-yourself infrastructure for on-site data centers. The second involves autonomous vehicles, where there is simply not enough time to ask the cloud for solutions.
But ask two people to describe it and you are likely to get two very different answers. On one hand, it is understood well enough that it can be used in satellite IIoT data centers and in machine learning-enabled iPhones. On the other, most of those designing it can’t say what it looks like.
Much of the confusion stems from the fact that edge computing is not a technology. It’s more of a technological coping mechanism. It represents a series of efforts to deal with the exponential growth of data from billions of endpoint devices by digesting at least some of that data wherever it is created. That requires building massive compute performance into everything from sensors to smart phones, and all of this has to happen within an even tighter power budget.
“We are moving to an intelligent edge,” said , president and CEO of Cadence, said in a recent speech. “This is going to be a new era for semiconductors. We want data at our fingertips to be able to make decisions on the fly.”
This approach stands in stark contrast to the general consensus several years ago that simple sensors would be used to collect data from the physical world and processed in the cloud. The original concept failed to take into account that the amount of data being collected by sensors is growing too large to move around quickly. The best solution is to pre-process that data, because most of it is useless.
“The IoT represents an exponential increase in the number of devices in the world, and the amount of data generated by these devices could swamp the data center’s ability to process it,” according to Steven Woo, distinguished inventor and vice president of enterprise solutions technology at Rambus. “It’s likely you can do aggregation, filtering and some rudimentary processing, depending on what how complex your computations are.”
This is the growing responsibility of edge devices. But how the edge evolves, and how quickly, depends upon the readiness of end markets that will drive it. So while the edge began taking off last year in the IIoT, it is still on hold in the automotive space because it’s not clear at this point how quickly fully autonomous vehicles will begin to ramp up.
“If there isn’t an immediate production target, you might get away with something that’s a lot less advanced,” said Ty Garibay, CTO at ArterisIP. “You might be able to aggregate this kind of functionality into multiple smaller chips made by different companies. There will be LiDAR, radar, and possibly a sensor fusion hub, which may be an FPGA. And then you might need enough compute power for the car controller, which also may have to figure out which data to process and what to send back to the cloud. The question now is how you make it smart enough to send back the right data.”
What is the edge?
Many chipmakers and systems companies struggle with the variety of ways it is possible to shift computing to the edge. There are no demarcation lines between the many levels that may or may not be included in this distributed computing model.
“There is a lot of difference of opinion on the point of what the edge looks like,” according to Jeff Miller, product marketing manager at Mentor, a Siemens Business. “The cloud is where the really high-powered machine language or computational resources will continue to be, but bandwidth to get it there is expensive and shared spectrum is a finite resource. So just streaming all that data to the cloud from thousands of devices without some pre-processing at the edge is not practical.”
It doesn’t help that there are varied language and explanations offered by carriers, networking providers, integrators, datacenter OEMs and cloud providers—all of which are competing for what might be billions of dollars in additional sales in a market described by a term that doesn’t mean anything specific enough to package under a single brand name, according to Tom Hackenberg, principal analyst for embedded systems at IHSMarkit.
“Edge computing” is a common but non-specific term referring to the addition of computing resources anywhere close to the endpoint of an IT infrastructure. The definition has been narrowed colloquially to mean compute resources installed specifically to support IoT installations. “It’s a set of architectural strategies, not a product, not a technology,” Hackenberg said.
Even limiting the definition of edge to its function as the compute power for IoT installations doesn’t focus the picture much, according to Shane Rau, research vice president for computing semiconductors at IDC. “There is no one IoT. There are thousands, each in a different industry with a different level of acceptance and capability. It may not be possible to see what the edge looks like because it looks like the edge of everything.”
Still, there are benefits to getting this right. Gopal Rahgavan, CEO of startup Eta Compute, said that edge computing improves both privacy and security because it keeps data local. And it improves response time by eliminating the time it takes to send and receive data back from the cloud.
“You want to sense, infer, and act without going to the cloud, but you also want the ability to learn on the edge,” he said, noting that the cochlea in the ear already does this today, allowing it to identify speech in a noisy environment. The same happens with the retina in the eye, which can decipher images and movement before the brain can process those images.

Fig. 1: Edge computing platform. Source: NTT
Why the edge is getting so much attention
One of the initial drivers behind the edge computing model was the industrial IoT, where a desire to see projects succeed prompted industrial organizations to try to solve both the cost-efficiency and data-deluge problems on their own.
“In the industrial space there is a need for factory automation and intelligence at the edge, and the risk is comparatively smaller because it is possible to demonstrate value in accomplishing those things,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “The IIoT will lead the charge to build out IoT infrastructure for very practical reasons.”
That, in turn, led to a push to keep compute resources near the physical plants. But the benefits go much deeper than just keeping IoT devices off the Internet, according to Rambus’ Woo. More processing power means greater ability to pre-process data to eliminate repetitions of the same temperature reading, for example, or render the data feed from hundreds of sensors as a single status report.
Apple’s announcement in 2017 that it would put machine learning accelerators into its top-end iPhone touched off a rush that Gartner predicts will see 80% of smartphones will be AI-equipped by 2022. Those will be powerful, latency-sensitive edge devices, but will focus on functions aimed at individual consumers – augmented reality and biometric authentication, for example, which will limit their impact in the short term, said IDC’s Rau.
The addition of ML capabilities into other consumer devices – and autonomous vehicles and other smart devices – is likely to create an ecosystem on which all kinds of powerful applications can be built, using edge data centers for support, said Mohandass.
“We saw in the mainframe era that having a central brain and everything else being dumb didn’t work,” he said. “There was a lot more computing power with PCs, even if they were limited. Now, with central cloud, hyperscale datacenters have a lot more power. Clients aren’t quite a dumb terminal, but they are not too smart. We’re heading for another inflection point where the edge devices, the clients, have the capacity to have a lot more intelligence. We’re not there yet, but it’s coming.”
Until then, the focus should be on developing ways to use that deluge of data from IoT devices to accomplish things that wouldn’t be possible otherwise, said Mentor’s Miller. “The core value of the IoT is in bringing together large data sets, not so much monitoring so you know immediately when there’s a leak in tank 36 out of 1000 tanks somewhere. The value is in identifying things that are about to fail or activate actuators in the field before a problem actually comes up.”
Other pieces of the puzzle
Much of the edge model is based on the way the human body processes information. A person’s hand will recoil from a hot stove, for example, before signals reach the brain. The brain then can explain what just happened and avoid such situations in the future.
That sounds simple enough in concept, but from a chip design standpoint this is difficult to achieve. “A lot of IoT devices actually present an interesting dilemma because they don’t need a lot of memory, but what they need is a very small power signature,” said Graham Allan, product marketing manager for memory interfaces at Synopsys. “That is a particular application that is not yet well served by the DRAM industry. It remains to be seen whether or not that market will be big enough to warrant having its own product, or whether it will continue to be served by the two generations of older LPDDR technology and you just have to live with what’s there.”
In some cases, there may be a middle step, as well. In 2015, Cisco proposed the idea of Fog computing, extending the reach of cloud-based applications to the edge using boxes that combined routing and Linux-based application servers to analyze sensor data using Cisco’s IOx operating system. Fog has its own open consortium and reference architecture for what it calls a “cloud-to-Thing continuum of services,” and NIST was interested enough to put out Fog guidelines. (The IEEE Standards Association announced in October it will use the OpenFog Reference architecture as the basis for its work on fog standards under IEEEP1934 Standards Working Group on Fog Computing and Networking Architecture Framework.)
This also is aimed at keeping the Internet from drowning in things. Initial plans for the IoT included building IoT control centers at or near the site of IoT installations, with enough compute resources to store the data flowing from devices, provide sub-second response to devices where it was needed, and boil masses of raw data down to statistical reports that could be digested easily by process. These principles were traditional best practices for embedded systems installed as endpoints near the edge of the organization’s IT infrastructure, but the scale and variety of functions involved turned the decision to add computing resources at the edge into edge computing. That has evolved still further into the “intelligent edge.”
Regardless of the moniker, edge computing appears to be icing on the cake for technology providers. For one thing, it won’t cannibalize public cloud spending. IDC predicts a 23% increase this year compared to last, and 21.9% annual growth until 2021. And it could only be helping sales of the IoT, a market in which IDC predicts spending will rise 15% in 2018 compared to 2017, to a total of $772 billion, $239 billion of which will go to modules, sensors, infrastructure and security. IoT will see annual growth of 14% per year and pass the $1 trillion mark in 2020, according to IDC.
Gartner predicts semiconductor revenue will rise 7.5% to $451 billion in 2018, far above the record $411 billion in 2017. And by 2021 51% of all devices connecting to the Internet will be IoT. Their chatter will rise from 2% of all global IP traffic during 2016 to 5% of all IP traffic, according to Cisco Systems (Cisco VNI Global IP Traffic Forecast).
Humans will interact with those devices an average of 4,800 times per day in 2025, helping to drive the volume of digital data created every year up by a factor of 10, from 16.1 zettabytes in 2016 to 163 zettabytes during 2025, according to IDC’s August, 2017 report Data Age 2025.
While reports from IDC and IHSMarkit show the cloud market continuing to grow, they have trouble showing the increasing dominance of edge computing, which may not exist in a formal sense. Moreover, it is difficult to define well enough for those who design the intelligence to make it happen.
IHSMarkit’s most recent estimate is that there were about 32 billion IoT devices online during 2017; there will be 40 billion by 2020, 50 billion by 2022 and 72.5 billion by 2025. “The IoT exists because microcontrollers and other controllers came down in price enough to make it feasible to connect a wider range of embedded device, but we didn’t have the infrastructure to support that,” Hackenberg said. “That is what edge computing addresses. Once a stronger infrastructure is in place, growth in the IoT explodes.”
That’s not bad for a concept that is still ill-defined. “Everyone gets very excited about the edge, but no one knows what it means,” according to Stephen Mellor, CTO of the Industrial Internet Consortium (IIC), a standards- and best-practices consortium that is heavily supported by Industrial Internet of Things providers. The group put out its own guide to IoT analytics and data issues last year. “You can do some controlled analysis and processing at the edge, but you still need the cloud for analytics on larger datasets that can help you decide on a plan of attack that you then execute closer to the edge.”



Fig. 2: Market impact of Edge, IoT growth. Source: Cisco Systems
Datacenters, Data Closets, Data Containers
Not surprisingly, there is some variability in what building blocks and configurations might work best as edge data centers. Edge data centers have to be more flexible and more focused on immediate response than traditional glass-house data centers. They also have to be able to combine many data streams into one usable base that can be acted upon quickly.
From a hardware perspective, however, the edge can be anything from a collection of servers and storage units house using a co-location agreement in a local cloud or data processing facility, to a hyperconverged data center-infrastructure module housed in a cryogenically cooled shipping container.
The scale of some IoT installations will force some organizations to build full-scale data centers even at the edge, or use a piece of one owned by a service provider, according Michael Howard, executive director of research and analysis for carrier networks at IHSMarkit. Some carriers are interested in accelerating converting the 17,000 or so telco wiring centers in almost every community in the U.S. to offer richer IT services, including edge services. Central Office Rearchitected as a Datacenter (CORD) programs have converted only a few facilities, however, and most will see more use in the conversion to 5G than in edge computing, Howard said.
Other options include the smaller, more modular and more easily scalable products that make it easier to assemble resources to fit the size and function of the devices they support, Hackenberg said. That could mean hyper-converged datacenter solutions like Cisco’s UCS, or pre-packaged 3KVA to 8KVA DCIM-compliant Micro Data Centers from Schneider Electric, HPE and others. There also are VM-based self-contained server/application “cloudlets” described by Mahadev Satyanarayanan of Carnegie Mellon University, and the nascent Open Edge Computing consortium.
—Ed Sperling contributed to this story
Related Stories
The risk of breaches is growing, and so is the potential damage.
As cloud services adoption soars, datacenter chip requirements are evolving.

From <https://semiengineering.com/processing-moves-to-the-edge/


Navigating The Foggy Edge Of Computing



It’s not just cloud and edge anymore as a new layer of distributed computing closer to end devices picks up steam.
April 12th, 2018 - By: Aharon Etengoff
popularity
The National Institute of Standards and Technology (NIST) defines fog computing as a horizontal, physical or virtual resource paradigm that resides between smart end-devices and traditional cloud or data centers. This model supports vertically-isolated, latency-sensitive applications by providing ubiquitous, scalable, layered, federated and distributed
computing, storage and network connectivity. Put simply, fog computing extends the cloud to be closer to the things that produce and act on Internet of Things (IoT) data.
According to Business Matters, moving computing and storage resources closer to the user is critical to the success of the Internet of Everything (IoE), with new processes decreasing response time and working more efficiently in a fog environment. Indeed, as Chuck Byers of the OpenFog Consortium confirms, fog computing is “rapidly gaining momentum” as the architecture that bridges the current gap in IoT, 5G and embedded AI systems.




As mentioned above, 5G networks is one area in which fog computing is expected to play a major role. As RCR Wireless reports, the convergence of 5G and fog computing is anticipated to be an “inevitable consequence” of bringing processing tasks closer to the edge of an enterprise’s network. For example, in certain scenarios, 5G will require very dense antenna deployments – perhaps even less than 20 kilometers from one another. According to Network World, a fog computing architecture could be created among stations that include a centralized controller. This centralized controller would manage applications running on the 5G network, while handling connections to back-end data centers or clouds.
Edge computing
There are a number of important distinctions between fog and edge computing. Indeed, fog computing works with the cloud, while edge is typically defined by the exclusion of cloud and fog.


 
Moreover, as NIST points out, fog is hierarchical, where edge is often limited to a small number of peripheral layers. In practical terms, the edge can be defined as the network layer encompassing the smart end devices and their users. This allows the edge to provide local computing capabilities for IoT devices.
According to Bob O’Donnell, the founder and chief analyst of Technalysis Research LLC, connected autonomous (or semi-autonomous) vehicles is perhaps one of the best examples of an advanced-edge computing element.
“Thanks to a combination of enormous amounts of sensor data, critical local processing power and an equally essential need to connect back to more advanced data analysis tools in the cloud, autonomous cars are seen as the poster child of advanced-edge computing,” he states in a recent Recode article.
Indeed, according to AT&T, self-driving vehicles could generate as much as 3.6 terabytes of data per hour from the clusters of cameras and other sensors, although certain functions (such as braking, turning and acceleration) will likely always be managed by the computer systems in cars themselves. Nevertheless, AT&T sees some of the secondary systems being offloaded to the cloud with edge computing.
“We’re shrinking the distance,” AT&T states in a 2017 press release. “Instead of sending commands hundreds of miles to a handful of data centers scattered around the country, we’ll send them to the tens of thousands of central offices, macro towers and small cells usually never farther than a few miles from our customers.”
Silicon and services: At the edge of the foggy cloud
Fog and edge computing are impacting chip designs, strategies and roadmaps across the semiconductor industry. As Ann Steffora Mutschler of Semiconductor Engineering notes, an explosion in cloud services is making chip design for the server market more challenging, diverse and competitive.
“Unlike data center number crunching of the past, the cloud addresses a broad range of applications and data types,” she explains.

 
“So, while a server chip architecture may work well for one application, it may not be the optimal choice for another. And the more those tasks become segmented within a cloud operation, the greater that distinction becomes.”
With regard to services, the National Institute of Standards and Technology sees fog computing as an extension of the traditional cloud-based computing model where implementations of the architecture can reside in multiple layers of a network’s topology. As with the cloud, multiple service models are implementable, including Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).
From our perspective, edge computing offers similar opportunities, particularly with regards to SaaS and PaaS. As an example, both could be applied to the automotive sector, with companies deploying sensor-based vehicle systems that proactively detect potential issues and malfunctions. This solution, which, in its most optimal configurations, would combine silicon and services, could be sold as a hardware and software product, or deployed as a service with subscription fees generated on a monthly or annual basis.
In conclusion, fog and edge computing will continue to evolve to meet the demands of a diverse number of verticals, including the IoT, autonomous/connected vehicles, next-generation mobile networks and data centers.

From <https://semiengineering.com/navigating-the-foggy-edge-of-computing/



Exponentials At The Edge
By
Ed Sperling
semiengineering.com
3 min
The age of portable communication has set off a scramble for devices that can achieve almost anything a desktop computer could handle even five years ago. But this is just the beginning.
The big breakthrough with mobile devices was the ability to combine voice calls, text and eventually e-mail, providing the rudiments of a mobile office-all on a single charge of a battery that was light enough to carry and unobtrusive enough that it didn’t have to be strapped onto a belt. Mobile electronics have evolved far beyond that, of course. A smartphone today can plot the best route through traffic in real-time, download full documents for editing, record and send videos, take high-resolution photographs, and serve as a platform for interactive multi-player games. It even can be attached to headgear as part of a virtual reality system.
This is just phase one. The next phase will add intelligent screening for a growing flood of data across more devices. Most of the data being collected is completely useless. Some of it is useful only when combined with data from thousands or millions of other users and mined in a cloud for patterns and anomalies. The remainder will have to be dealt with inside a number of individual or networked edge devices, which can filter out what needs immediate attention and what does not.
This all sounds logical enough. If you partition data according to compute resources, then the usefulness of that data can be maximized on a number of fronts. It can be used to understand traffic patterns and develop new ways of capitalizing on them, which is much of the impetus behind AI and deep learning. If this sounds insidious, there’s really nothing new here other than the methods of acquiring, and the ability to centralize some of the screening processes. This is why there currently is a war being waged between IBM, Amazon, Google, Microsoft, Facebook, Alibaba, Apple, not to mention a number of government agencies that are building their own cloud infrastructures. It’s also likely there will be many more private clouds built in the future, which will either democratize or protect that data, or both.
Developments at the edge are not just another rev of Moore’s Law, where processors double density every couple of years. The term being used more frequently these days is exponentials. It’s all about exponential improvements in power, performance, processing, throughput and communication. The main reason why companies are looking at advanced packaging options, including fan-out on substrate, 2.5D and 3D-ICs, as well as pouring money into 3nm transistors that can be patterned with high-NA EUV and directed self-assembly, is that multiple approaches will be needed and combined to achieve these kinds of exponential gains.
The payoff from all these efforts ultimately will be enormous, though. The entire smartphone/tablet market has driven much of the innovation in semiconductor design for more than a decade, and that was just one market. Collectively, all of these new applications will dwarf the size of the mobility market. And each also will add some unique elements that ultimately can be leveraged across market segments, driving new technologies and approaches and even new markets.
We are just at the beginning of this explosion, and not all markets are moving at the same pace. But the focus on power and performance is central to all of this, and for any of these new markets to live up to their potential, huge gains will be required over the next decade.
Technology is moving from the office or the home out into the rest of the world, where interactions are complex, unpredictable (at least so far) and continuous. All of this will require new tooling, different architectural approaches, and a massive amount of innovation in semiconductors to make this work. The chip market is about to get very interesting.

From <https://app.getpocket.com/read/2105518166>



The AI revolution has spawned a new chips arms race


For years, the semiconductor world seemed to have settled into a quiet balance: Intel vanquished virtually all of the RISC processors in the server world, save IBM’s POWER line. Elsewhere AMD had self-destructed, making it pretty much an x86 world. Then Nvidia mowed down all of it many competitors in the 1990s. Suddenly only ATI, now a part of AMD, remained. It boasted just half of Nvidia’s prior market share.
On the newer mobile front, it looked to be a similar near-monopolistic story: ARM ruled the world. Intel tried mightily with the Atom processor, but the company met repeated rejection before finally giving up in 2015.
Further Reading
Then just like that, everything changed. AMD resurfaced as a viable x86 competitor; the advent of field gate programmable array (FPGA) processors for specialized tasks like Big Data created a new niche. But really, the colossal shift in the chip world came with the advent of artificial intelligence (AI) and machine learning (ML). With these emerging technologies, a flood of new processors has arrived—and they are coming from unlikely sources.
  • Intel got into the market with its purchase of startup Nervana Systems in 2016. It bought a second company, Movidius, for image processing AI.
  • Microsoft is preparing an AI chip for its HoloLens VR/AR headset, and there’s potential for use in other devices.
  • Google has a special AI chip for neural networks call the Tensor Processing Unit, or TPU, which is available for AI apps on the Google Cloud Platform.
  • Amazon is reportedly working on an AI chip for its Alexa home assistant.
  • Apple is working on an AI processor called the Neural Engine that will power Siri and FaceID.
  • ARM Holdings recently introduced two new processors, the ARM Machine Learning (ML) Processor and ARM Object Detection (OD) Processor. Both specialize in image recognition.
  • IBM is developing specific AI processor, and the company also licensed NVLink from Nvidia for high-speed data throughput specific to AI and ML.
  • Even non-traditional tech companies like Tesla want in on this area, with CEO Elon Musk acknowledging last year that former AMD and Apple chip engineer Jim Keller would be building hardware for the car company.
That macro-view doesn’t even begin to account for the startups. The New York Times puts the number of AI-dedicated startup chip companies—not software companies, silicon companies—at 45 and growing, but even that estimate may be incomplete. It’s tricky to get a complete picture since some are in China being funded by the government and flying under the radar.
Why the sudden explosion in hardware after years of chip maker stasis? After all, there is general consensus that Nvidia’s GPUs are excellent for AI and are widely used already. Why do we need more chips now, and so many different ones at that?
The answer is a bit complex, just like AI itself.
Follow the money (and usage and efficiency)
While x86 currently remains a dominant chip architecture for computing, it’s too general purpose for a highly specialized task like AI, says Addison Snell, CEO of Intersect360 Research, which covers HPC and AI issues.
“It was built to be a general server platform. As such it has to be pretty good at everything,” he says. “With other chips, [companies are] building something that specializes in one app without having to worry about the rest of the infrastructure. So leave the OS and infrastructure overhead to the x86 host and farm things out to various co-processors and accelerators.”
The actual task of processing AI is a very different process from standard computing or GPU processing, hence the perceived need for specialized chips. A x86 CPU can do AI, but it does a task in 12 steps when only three are required; a GPU in some cases can also be overkill.
Generally, scientific computation is done in a deterministic fashion. You want to know two plus three equals five and calculate it to all of its decimal places—x86 and GPU do that just fine. But the nature of AI is to say 2.5 + 3.5 is observed to be six almost all of the time without actually running the calculation. What matters with artificial intelligence today is the pattern found in the data, not the deterministic calculation.
In simpler terms, what defines AI and machine learning is that they draw upon and improve from past experience. The famous AlphaGo simulates tons of Go matches to improve. Another example you use every day is Facebook’s facial recognition AI, trained for years so it can accurately tag your photos (it should come as no surprise that Facebook has also made three major facial recognition acquisitions in recent years: Face.com [2012], Masquerade [2016], and Faciometrics [2016]).
Further Reading
Once a lesson is learned with AI, it does not necessarily always have to be relearned. That is the hallmark of Machine Learning, a subset of the greater definition of AI. At its core, ML is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction based on that data. It’s a mechanism for pattern recognition—machine learning software remembers that two plus three equals five so the overall AI system can use that information, for instance. You can get into splitting hairs over whether that recognition is AI or not.
Enlarge / In the future, maybe even "playing Go" will be a use case with a dedicated AI chip...
STR/AFP/Getty Images
AI for self-driving cars, for another example, doesn’t use deterministic physics to determine the path of other things in its environment. It’s merely using previous experience to say this other car is here traveling this way, and all other times I observed such a vehicle, it traveled this way. Therefore, the system expects a certain type of action.
The result of this predictive problem solving is that AI calculations can be done with single precision calculations. So while CPUs and GPUs can both do it very well, they are in fact overkill for the task. A single-precision chip can do the work and do it in a much smaller, lower power footprint.
Make no mistake, power and scope are a big deal when it comes to chips—perhaps especially for AI, since one size does not fit all in this area. Within AI is machine learning, and within that is deep learning, and all those can be deployed for different tasks through different setups. “Not every AI chip is equal,” says Gary Brown, director of marketing at Movidius, an Intel company. Movidius made a custom chip just for deep learning processes because the steps involved are highly restricted on a CPU. “Each chip can handle different intelligence at different times. Our chip is visual intelligence, where algorithms are using camera input to derive meaning from what’s being seen. That’s our focus.”
Brown says there is even a need and requirement to differentiate at the network edge as well as in the data center—companies in this space are simply finding they need to use different chips in these different locations.
“Chips on the edge won’t compete with chips for the data center,” he says. “Data center chips like Xeon have to have high performance capabilities for that kind of AI, which is different for AI in smartphones. There you have to get down below one watt. So the question is, ‘Where is [the native processor] not good enough so you need an accessory chip?’”
After all, power is an issue if you want AI on your smartphone or augmented reality headset. Nvidia’s Volta processors are beasts at AI processing but draw up to 300 watts. You aren’t going to shoehorn one of those in a smartphone.
Sean Stetson, director of technology advancement at Seegrid, a maker of self-driving industrial vehicles like forklifts, also feels AI and ML have been ill served by general processors thus far. “In order to make any algorithm work, whether it’s machine learning or image processing or graphics processing, they all have very specific workflows,” he says. “If you do not have a compute core set up specific to those patterns, you do a lot of wasteful data loads and transfers. It’s when you are moving data around when you are most inefficient, that’s where you incur a lot of signaling and transient power. The efficiency of a processor is measured in energy used per instruction.”
A desire for more specialization and increased energy efficiency isn’t the whole reason these newer AI chips exist, of course. Brad McCredie, an IBM fellow and vice president of IBM Power systems development, adds one more obvious incentive for everyone seemingly jumping on the bandwagon: the prize is so big. “The IT industry is seeing growth for the first time in decades, and we’re seeing an inflection in exponential growth,” he says. “That whole inflection is new money expected to come to IT industry, and it’s all around AI. That is what has caused the flood of VC into that space. People see a gold rush; there’s no doubt.”

Enlarge / You wouldn't put a Ferrari engine in something like this, right? The same may go for AI chips partnering with non-AI-focused hardware and software.
eystone-France/Gamma-Keystone via Getty Images
A whole new ecosystem
AI-focused chips are not being designed in a vacuum. Accompanying them are new means of throughput to handle the highly parallel nature of AI and ML processing. If you build an AI co-processor and then use the outdated technologies of your standard PC, or even a server, that’s like putting a Ferrari engine in a Volkswagen Beetle.
“When people talk about AI and chips for AI, building an AI solution involves quite a lot of non-AI technology,” says Amir Khosrowshahi, vice president and CTO of the AI product group at Intel and co-founder of Nervana. “It involves CPUs, memory, SSD, and interconnects. It’s really critical to have all of these for getting it to work.”
When IBM designed its Power9 processor for mission critical systems, for example, it used Nvidia’s high-speed NVLink for core interconnects, PCI Express Generation 4, and its own interface called OpenCAPI (Coherent Accelerator Processor Interface). OpenCAPI is a new connection type that provides a high bandwidth, low latency connection for memory, accelerators, network, storage, and other chips.
The x86 ecosystem, says McCredie, isn’t keeping up. He points to the fact that PCI Express Gen 3 has been on the market seven years without a significant update (the first only happened recently), and IBM was one of the first to adopt it. x86 servers are still shipping with PCIe Gen 3, which has half the bandwidth of Gen 4.
“This explosion of compute capabilities will require a magnitude more of computational capacity,” he says. “We need processors to do all they can do and then some. The industry is finally getting into memory bandwidth and I/O bandwidth performance. These things are becoming first order constraints on system performance.”
“I think the set of accelerators will grow,” McCredie continues. “There are going to be more workloads that need more acceleration. We’re even going to go back and accelerate common workloads like databases and ERP (enterprise resource planning). I think we are seeing the start of a solid trend in the industry where we shift to more acceleration and more becoming available on the market.”
But hardware alone doesn’t do the learning in machine learning, software plays a major part. And in all of this rush for new chips, there is little mention of the software to accompany it. Luckily, that’s because the software is largely already there—it was waiting for the chips to catch up, argues Tom Doris, CEO of OTAS Technologies, a financial analytics and AI developer.
“I think that if you look at longer history, it’s all hardware-driven,” he says. “Algorithms haven’t changed much. Advances are all driven by advances in hardware. That was one of the surprises for me, having been away from the field for a few years. Things haven’t changed a whole lot in software and algorithms since the late 90s. It’s all about the compute power.”
David Rosenberg, data scientist in the Office of the CTO for Bloomberg, also feels the software is in good shape. “There are areas where the software has a long way to go, and that has to do with distributed computing, it has to do with the science of distribute neural computing,” he says. “But for the things we already know how to do, the software has been improved pretty well. Now it’s a matter of can the hardware execute the software fast enough and efficiently enough.”
With some use cases today, in fact, hardware and software are now being developed on parallel tracks with the aim of supporting this new wave of AI chips and use cases. At Nvidia, the software and hardware teams are roughly the same size, notes Ian Buck, the former Stanford University professor who developed what would become the CUDA programming language (CUDA allows developers to write apps to use the Nvidia GPU for parallel processing instead of a CPU). Buck now heads AI efforts at the chip company.
“We co-develop new architectures with system software, libraries, AI frameworks, and compilers, all to take advantage of new methods and neural networks showing up every day,” he says. “The only way to be successful in AI is not just build great silicon but also be tightly integrated all the way through the stack on the software stack, to implement and optimize these new networks being invented every day.”
So for Buck, one of the reasons why AI represents a new kind of computing is because he believes it really does constitute a new type of relationship between hardware and software. “We don’t need to think of backwards compatibility, we’re reinventing the kinds of processors good at these kinds of tasks and doing it in conjunction with the software to run on them.”
The future of this horserace
While there is laundry list of potential AI chip developers today, one of the biggest questions surrounding all of these initiatives is how many will come to market versus how many will be kept for the vendor versus how many will be scrapped entirely. Most AI chips today are still vapor.
When it comes to the many non-CPU makers designing AI chips, like Google, Facebook, and Microsoft, it seems like those companies are making custom silicon for their own use and will likely never bring them to market. Such entities have the billions in revenue that can be plowed into R&D of custom chips without the need for immediate and obvious return on investment. So users may rely on Google’s Tensor Processing Unit as part of its Google Cloud service, but the company won’t sell it directly. That is a likely outcome for Facebook and Microsoft’s efforts as well.
Other chips are definitely coming to market. Nvidia recently announced three new AI-oriented chips: the Jetson Xavier system-on-chip designed for smarter robots; Drive Pegasus, which is designed for deep learning in autonomous taxis; and Drive Xavier for semi-autonomous cars. Powering all of that is Isaac Sim, a simulation environment that developers can use to train robots and perform tests with Jetson Xavier.
Further Reading
Meanwhile, Intel has promised that its first ML processor based on the Nervana technology it bought in 2016 will reach market in 2019 under the code name of Spring Crest. The company also currently has a Nervana chip for developers to get their feet wet with AI, called Lake Crest. Intel says Spring Crest will eventually offer three to four times the performance of Lake Crest.
Can all those survive? “I think in the future, we’re going to see an evolution of where AI manifests itself,” says Movidius’ Brown. “If you want it in a data center, you need a data center chip. If you want a headset, you find a chip for it. How this will evolve is we may see where different chips have different strengths, and those will possibly get merged into CPUs. What we may also see are chips coming out with multiple features.”
If all that feels a bit like deja vu, maybe it is. The progression of the AI chip could in some ways match how chips of the past evolved—things started with high specialization and many competitors, but eventually some offerings gained traction and a few market leaders encompassed multiple features. Thirty years ago, the 80386 was the premier desktop chip and if you were doing heavy calculations in Lotus 1-2-3, you bought an 80387 math co-processor for your IBM PC-AT. Then came the 80486, and Intel made all kinds of noises about the math co-processor being integrated into the CPU. The CPU then slowly gained things like security extensions, a memory controller, and GPU.
So like every other technology, this emerging AI chip industry likely won’t sustain its current plethora of competitors. For instance, OTAS’ Doris notes many internal-use chips that don’t come to market become pet projects for senior technologists, and a change of regime often means adopting the industry standard instead. Intersect360’s Snell points out that today’s army of AI chip startups will also diminish—“There’s so many competitors right now it has to consolidate,” as he puts it. Many of those companies will simply hope to carve out a niche that might entice a big player to acquire them.
“There will be a tough footrace, I agree,” IBM’s McCredie says. “There has to be a narrowing down.” One day, that may mean this new chip field looks a lot like those old chip fields—the x86, Nvidia GPU, ARM-worlds. But for now, this AI chip race has just gotten off the starting line, and its many entrants intend to keep running.
Andy Patrizio is a freelance technology journalist based in Orange County, California, not entirely by choice. He prefers building PCs to buying them, has played too many hours of Where’s My Water on his iPhone, and collects old coins when he has some to spare.

From <https://arstechnica.com/gadgets/2018/07/the-ai-revolution-has-spawned-a-new-chips-arms-race/?amp=1>




IoT Was Interesting, But Follow the Money to AI Chips
By Kurt Shuler  02.20.2019 3
Share Post
By 2025, a full five sixths of the growth in semiconductors is going to be the result of AI.
A few years ago there was a lot of buzz about IoT, and indeed it continues to serve a role, but looking out to 2025 the real dollar growth for the semiconductor industry is in algorithm-specific ASICs, ASSPs, SoCs, and accelerators for Artificial Intelligence (AI), from the data center to the edge.
In fact, the up-coming change in focus will be so radical, that by the 2025 timeframe, a full five sixths of the growth in semiconductors is going to be the result of AI.



Figure 1: By 2025, a full five sixths of the growth in semiconductors will be geared towards enabling AI/deep learning algorithms. (Image source: Tractica)
Anyone tracking the industry closely knows how we got to this point. Designers were implementing IoT before it even became a “thing.” Deploying sensors and communicating on a machine-to-machine level to perform data analysis and implement functions based on structural or ambient environment and other parameters just seemed like a smart thing to do. The Internet just helped to do it remotely. Then someone latched onto the term “the Internet of things” and suddenly everyone’s an IoT silicon, software, or systems player.
From the IC suppliers’ perspective, simply pulling already available silicon blocks together to form a sensing signal chain, processor, memory, and an RF interface was enough to make them a “leading provider of IoT solutions.”
While the hype was destined to fade, there remains a good deal of innovation around low-cost, low-power data acquisition, with the ensuing low margins. There may be higher margins at the software and system level for deployers of IoT networks, but not for semiconductor manufacturers. But that’s about to change, as the focus shifts from generating data to analyzing data using the explosion of deep-learning algorithms that are enabling what we now call artificial intelligence, or AI.
This shift in focus from generating data to making practical use of it through analysis and the application of AI algorithms has stretched the limits of classic processor architectures such as CPUs, GPUs, FPGAs. While all have been useful in their own distinct way, the need for faster neural network training and greater inference efficiency and more analysis at the edge for lower latencies, have pushed silicon providers and OEMs to change their modus operandi. Now architectures comprising the optimum mix of processing elements to run specific algorithms for AI are necessary, make that demanded, for applications such as autonomous vehicles, financial markets, weather forecasts, agriculture, and someday smart cities.
The applications have given rise to many AI function market segments, which can be roughly divided into data center training and inference, and edge training and inference.


Figure 2: Efficient, fast, and powerful inference engines will be required at both the data center, as well as at the edge, where localized processing can reduce latencies. (Image source: Arteris)
However, the bad news for many is that, like IoT, there’ll be a shake out and many won’t make it in applications like autonomous vehicles. The good news is that they’ll be able to take their learnings and apply it somewhere else, like tracking passers-by at street windows for marketing campaigns.
Those who last, will have made the best use of heterogeneous processing elements, memory, I/O and on-chip interconnect architectures to achieve the necessary gains in efficiency and performance required for the next generation of AI solutions.
Until that shakeout happens, both OEMs and dedicated chip houses will be spending a lot of cash and IP capital on developing  SoCs, ASICs/ASSPs, and accelerators that will best implement the most advanced algorithms at the data center and at the edge. 



Figure 3: The total dollars spent on inference (2x) and training (4x to 5x) at the data center will grow sharply between now and 2025, reaching up to $10 and $5 billion, respectively. However, the rate of growth in dollars spent on inference at the edge is >40x, reaching $4 billion by 2025. (Image source: McKinsey & Company)
The smart silicon providers have already moved off the old “28 nm sweet spot” where there was a temporary “time out” to develop silicon to make the most of IoT principles. That emphasis on the sweet spot may have been more about a lack of vision as to where things were really heading. Now we know what’s coming: are you ready?
Kurt Shuler is vice president of marketing at Arteris IP and has extensive IP, semiconductor, and software marketing experience in the mobile, consumer, and enterprise segments working for Intel and Texas Instruments. He is a member of the U.S. Technical Advisory Group (TAG) to the ISO 26262/TC22/SC3/WG16 working group, thereby helping create safety standards for semiconductors and semiconductor IP.

From <https://www.eetimes.com/iot-was-interesting-but-follow-the-money-to-ai-chips/#>




ATTACKING THE DATACENTER FROM THE EDGE INWARD

For much of the decade, a debate around Arm was whether it would fulfill its promise to become a silicon designer with suppliers of any significance to datacenter hardware. The company initially saw an opportunity in the trend among enterprises in buying energy-efficient servers that could run their commercial workloads but not sabotage their budgets by gobbling up huge amounts of power while doing that. Arm’s low-power architecture that dominates the mobile device market seemed a good fit for those situations, despite the challenge of building up a software ecosystem that could support it.
And every step – forward or back – along the way was noted and scrutinized, from major OEMs like Dell EMC and Hewlett Packard Enterprise running out systems powered by Arm-based SoCs and the rise of hyperscalers like Google, Facebook, Microsoft and Amazon with their massive datacenters and their need to keep lid on power consumption to the early exit of pioneer Calxeda, the backing away by AMD and Samsung, the sharp left turn by Qualcomm to exit the server chip space after coming out with its Centriq system-on-a-chip (SoC), the consolidation that saw Marvell buy Cavium, and the embrace by the HPC crowd such as Cray and Fujitsu.
Through all this, Arm has gained a degree of traction, from major cloud providers and system makers adopting the Arm architecture to various degrees to chip makers like Marvell (now with Cavium) and Ampere – led by a group of ex-Intel executives, including CEO Renee James – putting together products to go into the systems.
While all this was going, the industry saw the rise of edge computing, driven by the ongoing decentralization of IT that has been fueled by not only the cloud but the proliferation of mobile devices, the Internet of Things (IoT), big data, analytics and automation, and other trends like artificial intelligence (AI) and machine learning. There is a drive to put as much compute, storage, virtualization and analytics capabilities as close as possible to the devices that are generating massive amounts of data and to gain crucial insights into that data as close to real time as possible.
Arm over the past couple of years has put a sharp focus on the edge, IoT, 5G and other emerging trends, a concentration that was evident at last month’s TechCon show. There was more discussion of the company’s Pelion IoT platform and Neoverse – an edge and hyperscale infrastructure platform that includes everything from silicon to reference designs.


The chip designer talked about expanding its Platform Security Architecture (PSA) that Arm partners and third parties can leverage to build more security into their IoT devices out to the infrastructure edge, part of a larger effort called Project Cassini. Launched in partnership with ecosystem partners, Arm is looking to leverage its strong presence in endpoints to drive the evolution of infrastructure and cloud-native software at the edge through Arm technologies and the development of platform standards and reference systems.


It’s part of Arm’s effort to take a leadership role in how the edge develops, a delicate balancing act that includes other technology vendors and essentially sets the direction while enabling broad participation in how things move in that direction, according to Drew Henry, the company’s one-time head of the infrastructure business and now senior vice president of IPG and operations. It’s a different role than Arm has taken in the past in the datacenter and uncommon in the industry as a whole, Henry tells The Next Platform.
“What we’re doing is carefully stepping with our ecosystem a little in front of it, saying, ‘Hey, this is the view we have. Let’s all go along this together,’” he says. “You see this beginning to show up. There’s this industry consortium – that’s the Autonomous Vehicle Computing Consortium that we’re doing in the autonomy space. Project Cassini, which is about how to create a standard platform for edge computing that respects the diversity of silicon and some of the designs around those types of devices, going from low power to high power, small amounts of compute to large amounts of compute, all kinds of locations and industrial IoT locations to 5G bay stations, whatever. Realizing that’s a strength, that you want to enable a software ecosystem to be able to deploy [solutions], how you marry those things. We stepped in with that ecosystem and said, ‘Alright, let’s just agree on some standards on a way these platforms are going to boot, let’s agree with the way security is going to be held in it. If we do that well, then the cloud-native software companies will be able to come in and deploy software on top of it in a cloud-native stack fairly easily to do the things that people want to do. That’s that balance.”
That’s a contrast to what has driven computing with Intel, Henry added, “where there’s been this ecosystem, but with one incredibly dominant viewpoint for it. There’s just so much invention that has to happen over the next decade or so to accomplish these rules of autonomy and Internet of Things and stuff that it’s too much to expect that any one company is going to have all the right answers. The ecosystem needs to [drive] it.”
A DIFFERENT ANIMAL
The datacenter compute environment for Arm continues to evolve, driven not only by what the chip designer is doing with its architecture but also with the efforts from manufacturing partners. Marvell is continuing to develop the ThunderX2 SoCs that it inherited when it bought Cavium for about $5.5 billion last year, and other chip makers like Ampere are coming to market with offerings based on the X-Gene designs from Applied Micro, which the company bought. At the same time, some tech vendors are taking Arm’s architecture and creating their own chips. Fujitsu is developing the AFX64 chip, which will be the foundation for its Post-K supercomputer. Amazon Web Services (AWS) turned to the Arm architecture – with expertise from its acquisition of Annapurna Labs for $350 million in 2015 – for its Graviton chips. Huawei is also making a play in the Arm chip space.
Enterprise, supercomputer and cloud datacenters are served by suppliers and companies that develop their own Arm-based chips, with Arm innovation and investment, Henry says. Arm is not so much leading an evolution but working with companies to grow the presence of its architecture in datacenters. But the edge is different and calls for Arm to take a different – and a more leadership – role.
“The spaces in compute in the large, aggregated compute areas, which are datacenter and supercomputing, I feel really good about the portfolio that is servicing those,” he says. “That’s why we’ve kind of shifted our focus now, effectively saying, ‘Alright, we’ve got a lot of work to do to continue to help with that group, but there’s also this emerging area of compute at the edge that also needs to be invested in – where if we don’t invest, collectively as an ecosystem to get it established, it is going to take longer to mature than it should.”
The edge is a different compute environment, where the “ecosystem broadens because now you’ve got companies that have networking IP that you can combine together with silicon,” Henry says. “This is where Broadcom enters into the marketplace, and NXP enters into the marketplace and others, so we’ve got a pretty rich ecosystem of being able to provide compute wherever you need compute. A lot of people fixate on the classical server sitting in a datacenter. That’s a relatively small unit amount in the marketplace, relatively small compute that’s done across the ecosystem. We absolutely are doing great in that space, but it’s not the only focus for us. Servicing the cloud is fairly well understood. Serving compute at the infrastructure edge is more complicated, so that is where we can be much more involved in leading and coordinating the activities there. That’s what Cassini’s about.”

From <https://www.nextplatform.com/2019/11/06/attacking-the-datacenter-from-the-edge-inward/>




It Takes Liquidity To Make Infrastructure Fluid






Stranded capacity has always been the biggest waste in the datacenter, and over the years, we have added more and more clever kinds of virtualization – hardware partitions, virtual machines and their hypervisors, and containers – as well as the systems management tools that exploit them. There is a certain amount of hardware virtualization going on these days, too, with the addition of virtual storage and virtual switching to so-called SmartNICs.
The next step in this evolution is disaggregation and composability, which can be thought of in a number of different ways. The metaphor we like here at The Next Platform is smashing all of the server nodes in a cluster and then stitching all of the components back together again with software abstraction that works at the peripheral transport and memory bus levels – what is commonly called composability. You can also think of this as making the motherboard of the system extensible and malleable, busting beyond the skin of one server to make a giant pool of hardware that can allow myriad, concurrent physical hardware configurations – usually over the PCI-Express bus – to be created on the fly and reconfigured as workloads dictate. This way, CPUs, memory, flash storage, disk storage, and GPU and FPGA accelerators are not tied so tightly to the nodes they happen to be physically located within.
There are a lot of companies that are trying to do this. Among the big OEMs, Hewlett Packard Enterprise has its Synergy line and Dell has its PowerEdge MX line and its Kinetic strategy. Cisco Systems did an initial foray into composability with its UCS M Series machines. DriveScale has offered a level of server composability through a special network adapter that allows compute and storage to scale independently at the rack scale, across nodes, akin to similar projects under way at Intel, Dell, the Scorpio alliance of Baidu, Alibaba, and Tencent, and the Open Compute Project spearheaded by Facebook. Juniper Networks acquired HTBase to get some composability for its network gear, and Liqid dropped out of stealth in June 2017 with its own PCI-Express switch fabric to link bays of components together and make them composable into logical servers. TidalScale, which dropped out of stealth a few months later in October 2017, has created what it calls a HyperKernel to glom together multiple servers into one giant system that can then be carved up into logical servers with composable components; rather than use VMs to break this hyperserver down, LXC or Docker containers are used to create software isolation. GigaIO has been coming on strong in the past year with its own PCI-Express switches and FabreX fabric.
There are going to be lots of different ways to skin this composability cat, and it is not clear which way is going to dominate. But our guess is that the software approaches from DriveScale, Liqid, and TidalScale are going to prevail compared to the proprietary approaches that Cisco, Dell, and HPE have tried to use with their respective malleable iron. Being the innovator, as HPE was here, may not be enough to win the market, and we would not be surprised to see HPE snap up one of these other companies and then Dell to snap up whichever one HPE doesn’t acquire. Then again, the Synergy line of iron at HPE was already at an annualized revenue run rate of $1.5 billion – with 3,000 customers – and growing at 78 percent in the middle of this year, so maybe HPE thinks it already has the right answer.
Liqid, for one, is not looking to be acquired and in fact has just brought in $28 million in its second round of funding, bringing the total funds raised to date to $50 million; the funding was led by Panorama Point Partners, with Iron Gate Capital and DH Capital kicking in some dough. After three years of hardware and software development, Liqid needs more cash to build up its sales and marketing teams to chase the opportunities and also needs to plow funds back into research and development to keep the Liqid Fabric OS, managed fabric switch, and Command Center management software moving ahead.


“We have a handful of large customers that make up a good chunk of our revenues right now,” Sumit Puri, co-founder and chief executive officer at Liqid, tells The Next Platform. “These are the customers we started with back in the day, and we have ramped them to the size we want all of our customers to be, and some of them are showing us projects out on the horizon that are at massive scale. We have dozens of proofs of concept under way, and some of them will be relatively small and never grow into a seven-figure customer. Some of them will.”
Puri is not about to get into specific pricing for the switches and software that turn a rack of servers with peripherals into a stack of composable, logical servers, but says that the adder over the cost of traditional clusters is on the order of 5 percent to 10 percent to the total cost of the infrastructure. But the composability means that every workload can be configured with the right logical server setup – the right number of CPUs, GPUs, FPGAs, flash drives, and such – so that utilization can be driven up by factors of 2X to 4X on the cluster compared to the industry average. Datacenter utilization, says Puri, averages something on the order of 12 percent worldwide (including compute and storage), and as best as Liqid can figure Google, which is the best at this in the industry, is average 30 percent utilization in its datacenters. The Liqid stack can drive it as high as 90 percent utilization, according to Puri. That’s mainframe-class right there, and about as good as it gets.
The prospect pipeline is on the order of thousands of customers, and that is why funding is necessary. It takes people to attack that opportunity, and even if HPE has been talking about composability for the past five years, it is not yet a mainstream approach for systems.




As with most distributed systems, there is a tension between making one large pool of infrastructure and making multiple isolated pools to limit the blast area in the event that something goes wrong in the infrastructure. The typical large enterprise might have pods of compute, networking, and storage that range in size from a half rack, a full rack, or up to one, tow, or even three racks, but rarely larger or smaller than that. They tend to deploy groups of applications on pods and upgrade the infrastructure by the pod to make expanding the infrastructure easier and more cost effective than doing it a few servers at a time.
In a deal that Liqid is closing right now, the customer wants to have a single 800-node cluster, but only wants to have 200 of the nodes hanging off the Liqid PCI-Express fabric because it does not want to pay the “composability tax,” as Puri put it, on all of those systems. Over time, as the company will possibly expand the Liqid fabric into the remaining 600 servers, but it is far more likely that it will be the new servers that are adding in the coming years that will have them, and after a three or four year stint, the old machines that did not have composability will simply be removed from the cluster.
There are a number of different scenarios where composability is taking off according to Liqid. The important thing to note is that the basic assumption is that components are aggregated into their own enclosures and then the PCI-Express fabric in the Liqid switch can reaggregate them as needed, tying specific processors in servers to specific flash or Optane storage, network adapters, or GPUs within enclosures. You can never attach more devices to a given server than it allows, of course, so don’t think that with the Liqid switch you can suddenly hang 128 GPUs off of one CPU. Your can’t do more than the BIOS says. But you can do that much and less as needed.


The Liqid fabric is not just restricted to PCI-Express, but can also be extended with Ethernet and InfiniBand attachment for those cases when distance and horizontal scale is more important than the low latency that PCI-Express switching affords. Liqid’s stack does require disaggregation at the physical level, meaning that the peripherals are ganged up into their respective enclosures and then linked together using the PCI-Express fabric or using NVM-Express over Ethernet or perhaps GPUDirect over RDMA networks to link flash and GPUs to compute elements.
Next week at the SC19 supercomputer conference in Denver, Liqid will be showing off the next phase of its product development, where the hardware doesn’t have to be pooled at the physical layer and then composed, but rather standard servers using a mix of CPUs and GPUs and FPGAs for compute and flash and Optane for storage will be able to have their resources disaggregated, pooled, and composable using only the Liqid software to sort it into pools and then ladle it all out to workloads. The performance you get will, of course, be limited by the network interface used to reaggregate the components – Ethernet will be slower than InfiniBand will be slower than PCI-Express, and for many applications, the only real impact will be the load time for the applications and the data. Any application that requires a lot of back and forth chatter between compute and storage elements will want to be on PCI-Express. But this new capability will allow Liqid to go into so-called “brownfield” server environments and bring composability to them.
So where is composability taking off? The first big area of success for Liqid was, not surprisingly, for GPU-centric workloads, where the GPUs traditionally get locked away inside of a server node and are unused most of the time. Disaggregation and composability allow for them to be kept busy doing workloads, and the hardware configuration can change rapidly as needed. If you put a virtualization or container layer on top of the reaggregated hardware, then you can move workloads around and change hardware as necessary. This is, in fact, what companies are now interested in doing, with either a VMware virtualization or Kubernetes container environment on top of the liquid hardware. Composable bare metal clouds are also on the rise, like this:


Liqid has also partnered recently with ScaleMP so it can offer virtual NUMA servers over composable infrastructure and therefore be better able to compete with TidalScale, which did this at the heart of its eponymous composable architecture.
There is also talk about using Liqid on 5G and edge infrastructure – but everybody is trying to get a piece of that action.

From <https://www.nextplatform.com/2019/11/14/it-takes-liquidity-to-make-infrastructure-fluid/>


Nvidia Arms Up Server OEMs And ODMs For Hybrid Compute


The one thing that AMD’s return to the CPU market and its more aggressive moves in the GPU compute arena have done, as well as Intel’s plan to create a line of discrete Xe GPUs that can be used as companions to its Xeon processors, has done is push Nvidia and Arm closer together.
Arm is the chip development arm that in 1990 was spun out of British workstation maker Acorn Computer, which created its own Acorn RISC Machine processor and significantly for client computing, was chosen by Apple for its Newton handheld computer project. Over the years, Arm has licensed it eponymous RISC architecture to others and also collected royalties on the devices that they make in exchange for doing a lot of the grunt work in chip design as well as ensuring software compatibility and instruction set purity across its licensees.
This business, among other factors, is how and why Arm has become the largest semiconductor IP peddler in the world, with $1.61 billion in sales in 2018. Arm is everywhere in mobile computing, and this is why Japanese conglomerate SoftBank paid $32 billion for the chip designer three  years ago. With anywhere from hundreds of billions to a trillion devices plugged into the Internet at some point in the coming decade, depending on who you ask, and a very large portion of them expected to use the Arm architecture, it seemed like a pretty safe bet that Arm was going to make a lot of money.
Getting Arm’s architecture into servers has been more problematic, and the reasons for this are myriad and we are not getting into the litany of it. One issue is the very way that Arm licenses its architecture and makes its money, which is great but which has relied on other chip makers, with much less deeper pockets who do not have the same muscle as Arm, much less AMD or Intel, to extend it for server platforms with features like threading or memory controllers or peripheral controllers. The software stack took too long to mature, although we are there now with Linux and probably with Windows Server (only Microsoft knows for sure on that last bit). And despite it all, the Arm collective has shown how hard it is to sustain the effort to create a new server chip architecture, with a multiple generation roadmap, that takes on Intel’s hegemony in the datacenter – which is doubly difficult with an ascending AMD that has actually gotten its X86 products and roadmap together with the Epyc family that launched in  2017 with the “Naples” processors and that has been substantially improved with the “Rome” chips this year.
All of this is background against what is the real news. And that is that Nvidia, which definitely has a stake in helping Arm server chips be full-functioning peers to X86 and Power processors, is doing something about it. Specifically, the company is making a few important Arm-related announcements at the SC19 supercomputing conference in Denver this week.
The first thing is that Nvidia is making good on its promise earlier this summer to make Arm a peer with X86 and Power with regard to the entire Nvidia software stack, including the full breadth of the CUDA programming environment with its software development kit and its libraries for accelerating HPC and AI applications. Ian Buck, vice president and general manager of accelerated computing at Nvidia, tells The Next Platform that most of the libraries for HPC and AI are actually available in the first beta of the Arm distribution of CUDA – there are still a few that need some work.
As we pointed out last summer, this CUDA-X stack, as it is now called, may have started out as a bunch of accelerated math libraries, but not it comprises tens of millions of lines of code and is on the same order of magnitude in that regard as a basic operating system. So moving that stack and testing all the possible different features in the host is not trivial.
Last month, ahead of the CUDA on Arm launch here at SC19, Nvidia gave it out to number of key HPC centers that are at the forefront of Arm in HPC, notably RIKEN in Japan, Oak Ridge National Laboratory in the United States, and the University of Bristol in the United Kingdom. They have been working on porting some of their codes to run in accelerated mode on Arm-based systems using the CUDA stack. In fact, of the 630 applications that have been accelerated already using X86 or Power systems as hosts, Buck says that 30 of them have already been ported to using Arm hosts, which is not bad at all considering that it was pre-beta software that the labs were using. This includes GROMACS, LAMMPS, MILC, NAMD, Quantum Espresso, and Relion, just to name a few, and the testing of the Arm ports was done in conjunction with not only key hardware partners that have Arm processors – Marvell, Fujitsu, and Ampere are the ones that matter, with maybe HiSilicon in China but not mentioned – or make Arm servers – such as Cray, Hewlett Packard Enterprise, and Fujitsu – or who make Linux on Arm distributions – with Red Hat, SUSE Linux, and Canonical being the important ones.
“Our experience is that for most of these applications, it is just a matter of doing a recompile of the code on the new host and it runs,” explains Buck. This stands to reason since a lot of the code in a hybrid CPU-GPU system has, by definition, been ported to actually run on the Tesla GPU accelerators in the box. “And as long as they are not using some sort of bespoke library that only exists in the ecosystem out of the control of the X86 platform, it has been working fine. And the performance has been good. We haven’t released performance numbers, but it is comparable to what we seen on Intel Xeon platforms. And that makes sense since so many of these applications get the bulk of their performance from the GPUs anyway, and the ThunderX2, which most of these centers have, is performing well because its memory system is good and its PCI-Express connectivity is good.”
Although Nvidia did not say this, at some point, this CUDA-X stack on Arm will probably be made available on those Cray Storm CS500 systems that some of the same HPC centers mentioned above are getting equipped with the Fujitsu A64FX Arm processor that Fujitsu has designed for RIKEN’s “Fugaku” exascale system. Cray, of course, announced that partnership with Fujitsu and RIKEN, Oak Ridge, and Bristol ahead of SC19, and said that it was not planning to make the integrated Tofu D interconnect available in the CS500 clusters with the A64FX iron. And that means that the single PCI-Express 4.0 slot in the A64FX processor is going to be in contention on the A64FX processor, or someone is going to have to create a Tofu D to InfiniBand or Ethernet bridge to accelerate this server chip. A Tofu D to NVLink bridge would be even better. . . . But perhaps this is just a perfect use case for PCI-Express switching with disaggregation of accelerators and network interfaces and dynamic composition with a fabric layer, such as what GigaIO is doing.
That’s not Nvidia’s concern today, though. What Nvidia does want to do is make it easier for any Arm processor plugged into any server design to plug into a complex of GPU accelerators, and this is being accomplished with a new reference design dubbed EBAC – short for Everything But A CPU – that Nvidia is making available and that is shown below:

The EBAC design has a modified GPU tray from the hyperscale HGX system design, which includes eight “Volta” Tesla V100 accelerators with 32 GB of HBM2 memory on each. The GPUs are cross-connected by NVLink so they can share data and memory atomics across those links, and the tray of GPUs also has what amounts to an I/O mezzanine card on the front that has four ConnectX5 network interface cards running at 100 Gb/sec from Mellanox Technologies (which Nvidia is in the process of buying) and four PCI-Express Mini SAS HD connectors that can lash any Arm server to this I/O and GPU compute complex. In the image above, it looks like a quad of two-socket “Mustang” ThunderX2 system boards, in a pair of 1U rack servers, would be ganged up with the Tesla Volta accelerators. Presumably there is a PCI-Express switch chip complex within the EBAC system chip to link all of this together, even if it is not, strictly speaking, composable.
There is probably not a reason it could not be made composable, or extended to support an A64FX complex. We shall see. If anyone needs to build composability into its systems, now that we think about it, it is Nvidia.

From <https://www.nextplatform.com/2019/11/18/nvidia-arms-up-server-oems-and-odms-for-hybrid-compute/>




No comments:

Post a Comment

Smartphone Components

Antenna + Switch & RFFE, Filter, Duplexer, Amplifier, Transceiver, Baseband, Application Processor [SOC + LPDDR3], Memory [Flash / SSD...