AI & Machine Learning, Cloud & Edge Computing, Datacenters & Servers
Processing Moves To The Edge
3 Shares
Definitions vary by
market and by vendor, but an explosion of data requires more processing to be
done locally.
April
12th, 2018 - By: Kevin Fogarty
Edge computing is
evolving from a relatively obscure concept into an increasingly complex
component of a distributed computing architecture, in which processing is being
shifted toward end devices and satellite data facilities and away from the
cloud.
Edge computing has
gained attention in two main areas. One is the industrial
IoT, where it serves as a do-it-yourself infrastructure for on-site data
centers. The second involves autonomous vehicles, where there is simply not
enough time to ask the cloud for solutions.
But ask two people
to describe it and you are likely to get two very different answers. On one
hand, it is understood well enough that it can be used in satellite IIoT data
centers and in machine learning-enabled iPhones. On the other, most of those
designing it can’t say what it looks like.
Much of the
confusion stems from the fact that edge computing is not a technology. It’s
more of a technological coping mechanism. It represents a series of efforts to
deal with the exponential growth of data from billions of endpoint devices by
digesting at least some of that data wherever it is created. That requires
building massive compute performance into everything from sensors to smart
phones, and all of this has to happen within an even tighter power budget.
“We are moving to an
intelligent edge,” said , president and CEO of Cadence, said in
a recent speech. “This is going to be a new era for semiconductors. We want
data at our fingertips to be able to make decisions on the fly.”
This approach stands
in stark contrast to the general consensus several years ago that simple sensors
would be used to collect data from the physical world and processed in the
cloud. The original concept failed to take into account that the amount of data
being collected by sensors is growing too large to move around quickly. The
best solution is to pre-process that data, because most of it is useless.
“The IoT represents
an exponential increase in the number of devices in the world, and the amount
of data generated by these devices could swamp the data center’s ability to
process it,” according to Steven Woo, distinguished inventor and vice president
of enterprise solutions technology at Rambus. “It’s
likely you can do aggregation, filtering and some rudimentary processing,
depending on what how complex your computations are.”
This is the growing
responsibility of edge devices. But how the edge evolves, and how quickly,
depends upon the readiness of end markets that will drive it. So while the edge
began taking off last year in the IIoT, it is still on hold in the automotive space
because it’s not clear at this point how quickly fully autonomous vehicles will
begin to ramp up.
“If there isn’t an
immediate production target, you might get away with something that’s a lot
less advanced,” said Ty Garibay, CTO at ArterisIP. “You
might be able to aggregate this kind of functionality into multiple smaller
chips made by different companies. There will be LiDAR, radar, and possibly a
sensor fusion hub, which may be an FPGA. And
then you might need enough compute power for the car controller, which also may
have to figure out which data to process and what to send back to the cloud.
The question now is how you make it smart enough to send back the right data.”
What is the edge?
Many chipmakers and
systems companies struggle with the variety of ways it is possible to shift
computing to the edge. There are no demarcation lines between the many levels
that may or may not be included in this distributed computing model.
“There is a lot of
difference of opinion on the point of what the edge looks like,” according to
Jeff Miller, product marketing manager at Mentor, a Siemens
Business. “The cloud is where the really high-powered machine language or
computational resources will continue to be, but bandwidth to get it there is
expensive and shared spectrum is a finite resource. So just streaming all that
data to the cloud from thousands of devices without some pre-processing at the
edge is not practical.”
It doesn’t help that
there are varied language and explanations offered by carriers, networking
providers, integrators, datacenter OEMs and cloud providers—all of which are
competing for what might be billions of dollars in additional sales in a market
described by a term that doesn’t mean anything specific enough to package under
a single brand name, according to Tom Hackenberg, principal analyst for
embedded systems at IHSMarkit.
“Edge computing” is
a common but non-specific term referring to the addition of computing resources
anywhere close to the endpoint of an IT infrastructure. The definition has been
narrowed colloquially to mean compute resources installed specifically to support
IoT installations. “It’s a set of architectural strategies, not a product, not
a technology,” Hackenberg said.
Even limiting the
definition of edge to its function as the compute power for IoT installations
doesn’t focus the picture much, according to Shane Rau, research vice president
for computing semiconductors at IDC. “There is no one IoT. There are thousands,
each in a different industry with a different level of acceptance and
capability. It may not be possible to see what the edge looks like because it
looks like the edge of everything.”
Still, there are
benefits to getting this right. Gopal Rahgavan, CEO of startup Eta Compute,
said that edge computing improves both privacy and security because it keeps
data local. And it improves response time by eliminating the time it takes to
send and receive data back from the cloud.
“You want to sense,
infer, and act without going to the cloud, but you also want the ability to
learn on the edge,” he said, noting that the cochlea in the ear already does
this today, allowing it to identify speech in a noisy environment. The same
happens with the retina in the eye, which can decipher images and movement
before the brain can process those images.
Fig. 1: Edge computing platform. Source: NTT
Why the edge is getting so much attention
One of the initial
drivers behind the edge computing model was the industrial IoT, where a desire
to see projects succeed prompted industrial organizations to try to solve both
the cost-efficiency and data-deluge problems on their own.
“In the industrial
space there is a need for factory automation and intelligence at the edge, and
the risk is comparatively smaller because it is possible to demonstrate value
in accomplishing those things,” said Anush Mohandass, vice president of marketing
and business development at NetSpeed Systems.
“The IIoT will lead the charge to build out IoT infrastructure for very
practical reasons.”
That, in turn, led
to a push to keep compute resources near the physical plants. But the benefits
go much deeper than just keeping IoT devices off the Internet, according to
Rambus’ Woo. More processing power means greater ability to pre-process data to
eliminate repetitions of the same temperature reading, for example, or render
the data feed from hundreds of sensors as a single status report.
Apple’s announcement
in 2017 that it would put machine
learning accelerators into its top-end iPhone touched off a rush that
Gartner predicts will see 80% of smartphones will be AI-equipped by 2022.
Those will be powerful, latency-sensitive edge devices, but will focus on
functions aimed at individual consumers – augmented reality and biometric
authentication, for example, which will limit their impact in the short term,
said IDC’s Rau.
The addition of ML
capabilities into other consumer devices – and autonomous vehicles and other
smart devices – is likely to create an ecosystem on which all kinds of powerful
applications can be built, using edge data centers for support, said Mohandass.
“We saw in the
mainframe era that having a central brain and everything else being dumb didn’t
work,” he said. “There was a lot more computing power with PCs, even if they
were limited. Now, with central cloud, hyperscale datacenters have a lot more
power. Clients aren’t quite a dumb terminal, but they are not too smart. We’re
heading for another inflection point where the edge devices, the clients, have
the capacity to have a lot more intelligence. We’re not there yet, but it’s
coming.”
Until then, the
focus should be on developing ways to use that deluge of data from IoT devices
to accomplish things that wouldn’t be possible otherwise, said Mentor’s Miller.
“The core value of the IoT is in bringing together large data sets, not so much
monitoring so you know immediately when there’s a leak in tank 36 out of 1000
tanks somewhere. The value is in identifying things that are about to fail or
activate actuators in the field before a problem actually comes up.”
Other pieces of the puzzle
Much of the edge
model is based on the way the human body processes information. A person’s hand
will recoil from a hot stove, for example, before signals reach the brain. The
brain then can explain what just happened and avoid such situations in the future.
That sounds simple
enough in concept, but from a chip design standpoint this is difficult to
achieve. “A lot of IoT devices actually present an interesting dilemma because
they don’t need a lot of memory, but what they need is a very small power
signature,” said Graham Allan, product marketing manager for memory interfaces
at Synopsys.
“That is a particular application that is not yet well served by the DRAM
industry. It remains to be seen whether or not that market will be big enough
to warrant having its own product, or whether it will continue to be served by
the two generations of older LPDDR technology and you just have to live with
what’s there.”
In some cases, there
may be a middle step, as well. In 2015, Cisco proposed the idea of Fog
computing, extending the reach of cloud-based applications to the edge
using boxes that combined routing and Linux-based application servers to
analyze sensor data using Cisco’s IOx operating system. Fog has its own open consortium and reference
architecture for what it calls a “cloud-to-Thing continuum of services,” and
NIST was interested enough to put out Fog
guidelines. (The IEEE Standards Association announced in October it will
use the OpenFog Reference architecture as the basis for its work on fog
standards under IEEEP1934 Standards Working Group on Fog Computing and
Networking Architecture Framework.)
This also is aimed
at keeping the Internet from drowning in things. Initial plans for the IoT
included building IoT control centers at or near the site of IoT installations,
with enough compute resources to store the data flowing from devices, provide
sub-second response to devices where it was needed, and boil masses of raw data
down to statistical reports that could be digested easily by process. These
principles were traditional best practices for embedded systems installed as
endpoints near the edge of the organization’s IT infrastructure, but the scale
and variety of functions involved turned the decision to add computing
resources at the edge into edge computing. That has evolved still further into
the “intelligent edge.”
Regardless of the
moniker, edge computing appears to be icing on the cake for technology
providers. For one thing, it won’t cannibalize public cloud spending. IDC
predicts a 23%
increase this year compared to last, and 21.9% annual growth until 2021.
And it could only be helping sales of the IoT, a market in which IDC predicts spending will
rise 15% in 2018 compared to 2017, to a total of $772 billion, $239 billion
of which will go to modules, sensors, infrastructure and security. IoT will see
annual growth of 14% per year and pass the $1 trillion mark in 2020, according
to IDC.
Gartner predicts semiconductor revenue will
rise 7.5% to $451 billion in 2018, far above the record $411 billion in 2017.
And by 2021 51% of all devices connecting to the Internet will be IoT. Their
chatter will rise from 2% of all global IP traffic during 2016 to 5% of all IP
traffic, according to Cisco Systems (Cisco
VNI Global IP Traffic Forecast).
Humans will interact
with those devices an average of 4,800 times per day in 2025, helping to drive
the volume of digital data created every year up by a factor of 10, from 16.1
zettabytes in 2016 to 163 zettabytes during 2025, according to IDC’s August,
2017 report Data Age 2025.
While reports from
IDC and IHSMarkit show the cloud market continuing to grow, they have trouble
showing the increasing dominance of edge computing, which may not exist in a
formal sense. Moreover, it is difficult to define well enough for those who
design the intelligence to make it happen.
IHSMarkit’s most
recent estimate is that there were about 32 billion IoT devices online during
2017; there will be 40 billion by 2020, 50 billion by 2022 and 72.5 billion by
2025. “The IoT exists because microcontrollers and other controllers came down
in price enough to make it feasible to connect a wider range of embedded
device, but we didn’t have the infrastructure to support that,” Hackenberg
said. “That is what edge computing addresses. Once a stronger infrastructure is
in place, growth in the IoT explodes.”
That’s not bad for a
concept that is still ill-defined. “Everyone gets very excited about the edge,
but no one knows what it means,” according to Stephen Mellor, CTO of the
Industrial Internet Consortium (IIC), a standards- and best-practices
consortium that is heavily supported by Industrial Internet of Things
providers. The group put out its own guide to IoT analytics and data issues
last year. “You can do some controlled analysis and processing at the edge, but
you still need the cloud for analytics on larger datasets that can help you
decide on a plan of attack that you then execute closer to the edge.”
Fig. 2: Market impact of Edge, IoT growth. Source:
Cisco Systems
Datacenters, Data Closets, Data Containers
Not surprisingly,
there is some variability in what building blocks and configurations might work
best as edge data centers. Edge data centers have to be more flexible and more
focused on immediate response than traditional glass-house data centers. They
also have to be able to combine many data streams into one usable base that can
be acted upon quickly.
From a hardware
perspective, however, the edge can be anything from a collection of servers and
storage units house using a co-location agreement in a local cloud or data
processing facility, to a hyperconverged data center-infrastructure module
housed in a cryogenically cooled shipping container.
The scale of some
IoT installations will force some organizations to build full-scale data
centers even at the edge, or use a piece of one owned by a service provider,
according Michael Howard, executive director of research and analysis for
carrier networks at IHSMarkit. Some carriers are interested in accelerating
converting the 17,000 or so telco wiring centers in almost every community in
the U.S. to offer richer IT services, including edge services. Central Office
Rearchitected as a Datacenter (CORD) programs have converted only a few
facilities, however, and most will see more use in the conversion to 5G than in
edge computing, Howard said.
Other options
include the smaller, more modular and more easily scalable products that make
it easier to assemble resources to fit the size and function of the devices
they support, Hackenberg said. That could mean hyper-converged datacenter
solutions like Cisco’s UCS, or pre-packaged 3KVA to 8KVA DCIM-compliant Micro
Data Centers from Schneider Electric, HPE and others. There also are VM-based
self-contained server/application “cloudlets”
described by Mahadev Satyanarayanan of Carnegie Mellon University, and the
nascent Open Edge Computing
consortium.
—Ed Sperling contributed to this story
Related Stories
The risk of breaches is growing, and so is the
potential damage.
As cloud services adoption soars, datacenter chip
requirements are evolving.
From
<https://semiengineering.com/processing-moves-to-the-edge/>
From
<https://semiengineering.com/navigating-the-foggy-edge-of-computing/>
From
<https://app.getpocket.com/read/2105518166>
From
<https://arstechnica.com/gadgets/2018/07/the-ai-revolution-has-spawned-a-new-chips-arms-race/?amp=1>
From
<https://www.eetimes.com/iot-was-interesting-but-follow-the-money-to-ai-chips/#>
From
<https://www.nextplatform.com/2019/11/06/attacking-the-datacenter-from-the-edge-inward/>
From
<https://www.nextplatform.com/2019/11/14/it-takes-liquidity-to-make-infrastructure-fluid/>
From
<https://www.nextplatform.com/2019/11/18/nvidia-arms-up-server-oems-and-odms-for-hybrid-compute/>
Navigating The Foggy Edge Of Computing
It’s not just cloud
and edge anymore as a new layer of distributed computing closer to end devices
picks up steam.
April
12th, 2018 - By: Aharon Etengoff
The
National Institute of Standards and Technology (NIST) defines fog computing
as a horizontal, physical or virtual resource paradigm that resides between
smart end-devices and traditional cloud or data centers. This model supports
vertically-isolated, latency-sensitive applications by providing ubiquitous, scalable,
layered, federated and distributed
computing, storage
and network connectivity. Put simply, fog computing extends
the cloud to be closer to the things that produce and act on Internet of
Things (IoT) data.
According to Business
Matters, moving computing and storage resources closer to the user is
critical to the success of the Internet of Everything (IoE), with new processes
decreasing response time and working more efficiently in a fog environment.
Indeed, as Chuck
Byers of the OpenFog Consortium confirms, fog computing is “rapidly gaining
momentum” as the architecture that bridges the current gap in IoT, 5G and
embedded AI systems.
As mentioned above,
5G networks is one area in which fog computing is expected to play a major
role. As
RCR Wireless reports, the convergence of 5G and fog computing is
anticipated to be an “inevitable consequence” of bringing processing tasks
closer to the edge of an enterprise’s network. For example, in certain
scenarios, 5G will require very dense antenna deployments – perhaps even less
than 20 kilometers from one another. According to Network
World, a fog computing architecture could be created among stations that
include a centralized controller. This centralized controller would manage
applications running on the 5G network, while handling connections to back-end
data centers or clouds.
Edge computing
There are a number
of important distinctions between fog and edge computing. Indeed, fog computing
works with the cloud, while edge is typically defined by the exclusion of cloud
and fog.
Moreover, as
NIST points out, fog is hierarchical, where edge is often limited to a
small number of peripheral layers. In practical terms, the edge can be defined
as the network layer encompassing the smart end devices and their users. This
allows the edge to provide local computing capabilities for IoT devices.
According to Bob
O’Donnell, the founder and chief analyst of Technalysis Research LLC, connected
autonomous (or semi-autonomous) vehicles is perhaps one of the best examples of
an advanced-edge computing element.
“Thanks to a
combination of enormous amounts of sensor data, critical local processing power
and an equally essential need to connect back to more advanced data analysis
tools in the cloud, autonomous cars are seen as the poster child of
advanced-edge computing,” he
states in a recent Recode article.
Indeed, according
to AT&T, self-driving vehicles could generate as much as 3.6 terabytes
of data per hour from the clusters of cameras and other sensors, although
certain functions (such as braking, turning and acceleration) will likely
always be managed by the computer systems in cars themselves. Nevertheless,
AT&T sees some of the secondary systems being offloaded to the cloud with
edge computing.
“We’re shrinking the
distance,” AT&T
states in a 2017 press release. “Instead of sending commands hundreds of
miles to a handful of data centers scattered around the country, we’ll send
them to the tens of thousands of central offices, macro towers and small cells
usually never farther than a few miles from our customers.”
Silicon and services: At the edge of the foggy cloud
Fog and edge
computing are impacting chip designs, strategies and roadmaps across the
semiconductor industry. As Ann Steffora
Mutschler of Semiconductor Engineering notes, an explosion in cloud
services is making chip design for the server market more challenging, diverse
and competitive.
“Unlike data center
number crunching of the past, the cloud addresses a broad range of applications
and data types,” she explains.
“So, while a server
chip architecture may work well for one application, it may not be the optimal
choice for another. And the more those tasks become segmented within a cloud
operation, the greater that distinction becomes.”
With regard to
services, the
National Institute of Standards and Technology sees fog computing as an
extension of the traditional cloud-based computing model where implementations
of the architecture can reside in multiple layers of a network’s topology. As
with the cloud, multiple
service models are implementable, including Software as a Service (SaaS),
Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).
From our
perspective, edge computing offers similar opportunities, particularly with
regards to SaaS and PaaS. As an example, both
could be applied to the automotive sector, with companies deploying
sensor-based vehicle systems that proactively detect potential issues and
malfunctions. This solution, which, in its most optimal configurations, would
combine silicon and services, could be sold as a hardware and software product,
or deployed as a service with subscription fees generated on a monthly or
annual basis.
In conclusion, fog
and edge computing will continue to evolve to meet the demands of a diverse
number of verticals, including the IoT, autonomous/connected vehicles,
next-generation mobile networks and data centers.
Exponentials At The Edge
By
Ed Sperling
semiengineering.com
3 min
The age of portable
communication has set off a scramble for devices that can achieve almost
anything a desktop computer could handle even five years ago. But this is just
the beginning.
The big breakthrough
with mobile devices was the ability to combine voice calls, text and eventually
e-mail, providing the rudiments of a mobile office-all on a single charge of a
battery that was light enough to carry and unobtrusive enough that it didn’t
have to be strapped onto a belt. Mobile electronics have evolved far beyond
that, of course. A smartphone today can plot the best route through traffic in
real-time, download full documents for editing, record and send videos, take
high-resolution photographs, and serve as a platform for interactive
multi-player games. It even can be attached to headgear as part of a virtual
reality system.
This is just phase
one. The next phase will add intelligent screening for a growing flood of data
across more devices. Most of the data being collected is completely useless.
Some of it is useful only when combined with data from thousands or millions of
other users and mined in a cloud for patterns and anomalies. The remainder will have to be dealt
with inside a number of individual or networked edge devices, which can filter
out what needs immediate attention and what does not.
This all sounds
logical enough. If you partition data according to compute resources, then the
usefulness of that data can be maximized on a number of fronts. It can be used
to understand traffic patterns and develop new ways of capitalizing on them,
which is much of the impetus behind AI and deep learning. If this sounds
insidious, there’s really nothing new here other than the methods of acquiring,
and the ability to centralize some of the screening processes. This is why
there currently is a war being waged between IBM, Amazon, Google, Microsoft,
Facebook, Alibaba, Apple, not to mention a number of government agencies that
are building their own cloud infrastructures. It’s also likely there will be
many more private clouds built in the future, which will either democratize or
protect that data, or both.
Developments at the
edge are not just another rev of Moore’s Law, where processors double density
every couple of years. The term being used more frequently these days is
exponentials. It’s all
about exponential improvements in power, performance, processing, throughput
and communication. The main reason why companies are looking at advanced
packaging options, including fan-out on substrate, 2.5D and 3D-ICs, as well as
pouring money into 3nm transistors that can be patterned with high-NA EUV and
directed self-assembly, is that multiple approaches will be needed and combined
to achieve these kinds of exponential gains.
The payoff from all
these efforts ultimately will be enormous, though. The entire smartphone/tablet
market has driven much of the innovation in semiconductor design for more than
a decade, and that was just one market. Collectively, all of these new applications
will dwarf the size of the mobility market. And each also will add some unique
elements that ultimately can be leveraged across market segments, driving new
technologies and approaches and even new markets.
We are just at the
beginning of this explosion, and not all markets are moving at the same pace.
But the focus on power and
performance is central to all of this, and for any of these new markets
to live up to their potential, huge gains will be required over the next
decade.
Technology is moving
from the office or the home out into the rest of the world, where interactions
are complex, unpredictable (at least so far) and continuous. All of this will
require new tooling, different architectural approaches, and a massive amount
of innovation in semiconductors to make this work. The chip market is about to
get very interesting.
The AI revolution has spawned a new chips arms race
For years, the
semiconductor world seemed to have settled into a quiet balance: Intel
vanquished virtually all of the RISC processors in the server world, save IBM’s
POWER line. Elsewhere AMD
had self-destructed, making it pretty much an x86 world. Then Nvidia mowed
down all of it many competitors in the 1990s. Suddenly only ATI, now a part of
AMD, remained. It boasted just half of Nvidia’s prior market share.
On the newer mobile
front, it looked to be a similar near-monopolistic story: ARM
ruled the world. Intel tried mightily with the Atom processor, but the
company met repeated rejection before finally giving up in 2015.
Further Reading
Then just like that,
everything changed. AMD resurfaced as a viable x86 competitor; the advent of
field gate programmable array (FPGA) processors for specialized tasks like Big
Data created a new niche. But really, the colossal shift in the chip world came
with the advent of artificial intelligence (AI) and machine learning (ML). With
these emerging technologies, a flood of new processors has arrived—and they are
coming from unlikely sources.
- Intel got into the market with its purchase of startup Nervana Systems in 2016. It bought a second company, Movidius, for image processing AI.
- Microsoft is preparing an AI chip for its HoloLens VR/AR headset, and there’s potential for use in other devices.
- Google has a special AI chip for neural networks call the Tensor Processing Unit, or TPU, which is available for AI apps on the Google Cloud Platform.
- Amazon is reportedly working on an AI chip for its Alexa home assistant.
- Apple is working on an AI processor called the Neural Engine that will power Siri and FaceID.
- ARM Holdings recently introduced two new processors, the ARM Machine Learning (ML) Processor and ARM Object Detection (OD) Processor. Both specialize in image recognition.
- IBM is developing specific AI processor, and the company also licensed NVLink from Nvidia for high-speed data throughput specific to AI and ML.
- Even non-traditional tech companies like Tesla want in on this area, with CEO Elon Musk acknowledging last year that former AMD and Apple chip engineer Jim Keller would be building hardware for the car company.
That macro-view
doesn’t even begin to account for the startups. The
New York Times puts the number of AI-dedicated startup chip
companies—not software companies, silicon companies—at 45
and growing, but even that estimate may be incomplete. It’s tricky to get a
complete picture since some are in China being funded by the government and
flying under the radar.
Why the sudden
explosion in hardware after years of chip maker stasis? After all, there is
general consensus that Nvidia’s GPUs are excellent for AI and are widely used
already. Why do we need more chips now, and so many different ones at that?
The answer is a bit
complex, just like AI itself.
Follow the money (and usage and efficiency)
While x86 currently
remains a dominant chip architecture for computing, it’s too general purpose
for a highly specialized task like AI, says Addison Snell, CEO of Intersect360
Research, which covers HPC and AI issues.
“It was built to be
a general server platform. As such it has to be pretty good at everything,” he
says. “With other chips, [companies are] building something that specializes in
one app without having to worry about the rest of the infrastructure. So leave
the OS and infrastructure overhead to the x86 host and farm things out to
various co-processors and accelerators.”
The actual task of
processing AI is a very different process from standard computing or GPU
processing, hence the perceived need for specialized chips. A x86 CPU can do
AI, but it does a task in 12 steps when only three are required; a GPU in some
cases can also be overkill.
Generally,
scientific computation is done in a deterministic fashion. You want to know two
plus three equals five and calculate it to all of its decimal places—x86 and
GPU do that just fine. But the nature of AI is to say 2.5 + 3.5 is observed to
be six almost all of the time without actually running the calculation. What
matters with artificial intelligence today is the pattern
found in the data, not the deterministic calculation.
In simpler terms,
what defines AI and machine learning is that they draw upon and improve from
past experience. The famous AlphaGo
simulates tons of Go matches to improve. Another example you use every day is Facebook’s
facial recognition AI, trained for years so it can accurately tag your
photos (it should come as no surprise that Facebook has also made three major
facial recognition acquisitions in recent years: Face.com [2012], Masquerade
[2016], and Faciometrics [2016]).
Further Reading
Once a lesson is
learned with AI, it does not necessarily always have to be relearned. That is
the hallmark of Machine Learning, a subset of the greater definition of AI. At
its core, ML is the practice of using algorithms to parse data, learn from it,
and then make a determination or prediction based on that data. It’s a
mechanism for pattern recognition—machine learning software remembers that two
plus three equals five so the overall AI system can use that information, for
instance. You can get into splitting hairs over whether that recognition is AI
or not.
Enlarge
/ In the future, maybe even "playing Go" will be a use case with a
dedicated AI chip...
STR/AFP/Getty Images
AI for self-driving
cars, for another example, doesn’t use deterministic physics to determine the
path of other things in its environment. It’s merely using previous experience
to say this other car is here traveling this way, and all other times I observed
such a vehicle, it traveled this way. Therefore, the system expects a certain
type of action.
The result of this
predictive problem solving is that AI calculations can be done with single
precision calculations. So while CPUs and GPUs can both do it very well, they
are in fact overkill for the task. A single-precision chip can do the work and
do it in a much smaller, lower power footprint.
Make no mistake,
power and scope are a big deal when it comes to chips—perhaps especially for
AI, since one size does not fit all in this area. Within AI is machine
learning, and within that is deep learning, and all those can be deployed for
different tasks through different setups. “Not every AI chip is equal,” says
Gary Brown, director of marketing at Movidius, an Intel company. Movidius made
a custom chip just for deep learning processes because the steps involved are
highly restricted on a CPU. “Each chip can handle different intelligence at
different times. Our chip is visual intelligence, where algorithms are using
camera input to derive meaning from what’s being seen. That’s our focus.”
Brown says there is
even a need and requirement to differentiate at the network edge as well as in
the data center—companies in this space are simply finding they need to use
different chips in these different locations.
“Chips on the edge
won’t compete with chips for the data center,” he says. “Data center chips like
Xeon have to have high performance capabilities for that kind of AI, which is
different for AI in smartphones. There you have to get down below one watt. So
the question is, ‘Where is [the native processor] not good enough so you need
an accessory chip?’”
After all, power is
an issue if you want AI on your smartphone or augmented reality headset.
Nvidia’s Volta processors are beasts at AI processing but draw up to 300 watts.
You aren’t going to shoehorn one of those in a smartphone.
Sean Stetson,
director of technology advancement at Seegrid,
a maker of self-driving industrial vehicles like forklifts, also feels AI and
ML have been ill served by general processors thus far. “In order to make any
algorithm work, whether it’s machine learning or image processing or graphics
processing, they all have very specific workflows,” he says. “If you do not
have a compute core set up specific to those patterns, you do a lot of wasteful
data loads and transfers. It’s when you are moving data around when you are
most inefficient, that’s where you incur a lot of signaling and transient
power. The efficiency of a processor is measured in energy used per
instruction.”
A desire for more
specialization and increased energy efficiency isn’t the whole reason these
newer AI chips exist, of course. Brad McCredie, an IBM fellow and vice
president of IBM Power systems development, adds one more obvious incentive for
everyone seemingly jumping on the bandwagon: the prize is so big. “The IT industry is seeing growth for the
first time in decades, and we’re seeing an inflection in exponential growth,”
he says. “That whole inflection is new money expected to come to IT industry,
and it’s all around AI. That is what has caused the flood of VC into that
space. People see a gold rush; there’s no doubt.”
Enlarge
/ You wouldn't put a Ferrari engine in something like this, right? The same may
go for AI chips partnering with non-AI-focused hardware and software.
eystone-France/Gamma-Keystone
via Getty Images
A whole new ecosystem
AI-focused chips are
not being designed in a vacuum. Accompanying them are new means of throughput
to handle the highly parallel nature of AI and ML processing. If you build an
AI co-processor and then use the outdated technologies of your standard PC, or
even a server, that’s like putting a Ferrari engine in a Volkswagen Beetle.
“When people talk
about AI and chips for AI, building an AI solution involves quite a lot of
non-AI technology,” says Amir Khosrowshahi, vice president and CTO of the AI
product group at Intel and co-founder of Nervana. “It involves CPUs, memory,
SSD, and interconnects. It’s really critical to have all of these for getting
it to work.”
When IBM designed
its Power9 processor for mission critical systems, for example, it used
Nvidia’s high-speed NVLink for core interconnects, PCI Express Generation 4,
and its own interface called OpenCAPI (Coherent Accelerator Processor
Interface). OpenCAPI is a new connection type that provides a high bandwidth,
low latency connection for memory, accelerators, network, storage, and other
chips.
The x86 ecosystem,
says McCredie, isn’t keeping up. He points to the fact that PCI Express Gen 3
has been on the market seven years without a significant update (the first only
happened recently), and IBM was one of the first to adopt it. x86 servers are
still shipping with PCIe Gen 3, which has half the bandwidth of Gen 4.
“This explosion of
compute capabilities will require a magnitude more of computational capacity,”
he says. “We need processors to do all they can do and then some. The industry
is finally getting into memory bandwidth and I/O bandwidth performance. These
things are becoming first order constraints on system performance.”
“I think the set of
accelerators will grow,” McCredie continues. “There are going to be more
workloads that need more acceleration. We’re even going to go back and
accelerate common workloads like databases and ERP (enterprise resource
planning). I think we are seeing the start of a solid trend in the industry
where we shift to more acceleration and more becoming available on the market.”
But hardware alone
doesn’t do the learning in machine learning, software plays a major part. And
in all of this rush for new chips, there is little mention of the software to
accompany it. Luckily, that’s because the software is largely already there—it was
waiting for the chips to catch up, argues Tom Doris, CEO of OTAS Technologies,
a financial analytics and AI developer.
“I think that if you
look at longer history, it’s all hardware-driven,” he says. “Algorithms haven’t
changed much. Advances are all driven by advances in hardware. That was one of
the surprises for me, having been away from the field for a few years. Things
haven’t changed a whole lot in software and algorithms since the late 90s. It’s
all about the compute power.”
David Rosenberg,
data scientist in the Office of the CTO for Bloomberg, also feels the software
is in good shape. “There are areas where the software has a long way to go, and
that has to do with distributed computing, it has to do with the science of distribute
neural computing,” he says. “But for the things we already know how to do, the
software has been improved pretty well. Now it’s a matter of can the hardware
execute the software fast enough and efficiently enough.”
With some use cases
today, in fact, hardware and software are now being developed on parallel
tracks with the aim of supporting this new wave of AI chips and use cases. At
Nvidia, the software and hardware teams are roughly the same size, notes Ian
Buck, the former Stanford University professor who developed what would become
the CUDA programming language (CUDA allows developers to write apps to use the
Nvidia GPU for parallel processing instead of a CPU). Buck now heads AI efforts
at the chip company.
“We co-develop new
architectures with system software, libraries, AI frameworks, and compilers,
all to take advantage of new methods and neural networks showing up every day,”
he says. “The only way to be successful in AI is not just build great silicon but
also be tightly integrated all the way through the stack on the software stack,
to implement and optimize these new networks being invented every day.”
So for Buck, one of
the reasons why AI represents a new kind of computing is because he believes it
really does constitute a new type of relationship between hardware and
software. “We don’t need to think of backwards compatibility, we’re reinventing
the kinds of processors good at these kinds of tasks and doing it in
conjunction with the software to run on them.”
The future of this horserace
While there is
laundry list of potential AI chip developers today, one of the biggest
questions surrounding all of these initiatives is how many will come to market
versus how many will be kept for the vendor versus how many will be scrapped
entirely. Most AI chips today are still vapor.
When it comes to the
many non-CPU makers designing AI chips, like Google, Facebook, and Microsoft,
it seems like those companies are making custom silicon for their own use and
will likely never bring them to market. Such entities have the billions in revenue
that can be plowed into R&D of custom chips without the need for immediate
and obvious return on investment. So users may rely on Google’s Tensor
Processing Unit as part of its Google Cloud service, but the company won’t sell
it directly. That is a likely outcome for Facebook and Microsoft’s efforts as
well.
Other chips are
definitely coming to market. Nvidia
recently announced three new AI-oriented chips: the Jetson Xavier
system-on-chip designed for smarter robots; Drive Pegasus, which is designed
for deep learning in autonomous taxis; and Drive Xavier for semi-autonomous
cars. Powering all of that is Isaac Sim, a simulation environment that
developers can use to train robots and perform tests with Jetson Xavier.
Further Reading
Meanwhile, Intel has
promised that its first ML processor based on the Nervana technology it bought
in 2016 will
reach market in 2019 under the code name of Spring Crest. The company also
currently has a Nervana chip for developers to get their feet wet with AI,
called Lake Crest. Intel says Spring Crest will eventually offer three to four
times the performance of Lake Crest.
Can all those
survive? “I think in the future, we’re going to see an evolution of where AI
manifests itself,” says Movidius’ Brown. “If you want it in a data center, you
need a data center chip. If you want a headset, you find a chip for it. How
this will evolve is we may see where different chips have different strengths,
and those will possibly get merged into CPUs. What we may also see are chips
coming out with multiple features.”
If all that feels a
bit like deja vu, maybe it is. The progression of the AI chip could in some
ways match how chips of the past evolved—things started with high
specialization and many competitors, but eventually some offerings gained
traction and a few market leaders encompassed multiple features. Thirty years
ago, the 80386 was the premier desktop chip and if you were doing heavy
calculations in Lotus 1-2-3, you bought an 80387 math co-processor for your IBM
PC-AT. Then came the 80486, and Intel made all kinds of noises about the math
co-processor being integrated into the CPU. The CPU then slowly gained things
like security extensions, a memory controller, and GPU.
So like every other
technology, this emerging AI chip industry likely won’t sustain its current
plethora of competitors. For instance, OTAS’ Doris notes many internal-use
chips that don’t come to market become pet projects for senior technologists,
and a change of regime often means adopting the industry standard instead.
Intersect360’s Snell points out that today’s army of AI chip startups will also
diminish—“There’s so many competitors right now it has to consolidate,” as he
puts it. Many of those companies will simply hope to carve out a niche that
might entice a big player to acquire them.
“There will be a
tough footrace, I agree,” IBM’s McCredie says. “There has to be a narrowing
down.” One day, that may mean this new chip field looks a lot like those old
chip fields—the x86, Nvidia GPU, ARM-worlds. But for now, this AI chip race has
just gotten off the starting line, and its many entrants intend to keep
running.
Andy Patrizio is a freelance technology journalist
based in Orange County, California, not entirely by choice. He prefers building
PCs to buying them, has played too many hours of Where’s My Water on his
iPhone, and collects old coins when he has some to spare.
IoT Was Interesting, But Follow the Money to AI Chips
By Kurt Shuler
02.20.2019 3
Share Post
By 2025, a full five sixths of the growth in
semiconductors is going to be the result of AI.
A few years ago
there was a lot of buzz about IoT, and indeed it continues to serve a role, but
looking out to 2025 the real
dollar growth for the semiconductor industry is in algorithm-specific ASICs,
ASSPs, SoCs, and accelerators for Artificial Intelligence (AI), from the data
center to the edge.
In fact, the
up-coming change in focus will be so radical, that by the 2025 timeframe, a
full five sixths of the growth in semiconductors is going to be the result of
AI.
Figure 1: By 2025, a full five sixths of the growth
in semiconductors will be geared towards enabling AI/deep learning algorithms.
(Image source: Tractica)
Anyone tracking the
industry closely knows how we got to this point. Designers were implementing
IoT before it even became a “thing.” Deploying sensors and communicating on a
machine-to-machine level to perform data analysis and implement functions based
on structural or ambient environment and other parameters just seemed like a
smart thing to do. The Internet just helped to do it remotely. Then someone
latched onto the term “the Internet of things” and suddenly everyone’s an IoT
silicon, software, or systems player.
From the IC
suppliers’ perspective,
simply pulling already available silicon blocks together to form a sensing
signal chain, processor, memory, and an RF interface was enough to make them a
“leading provider of IoT solutions.”
While the hype was
destined to fade, there remains a good deal of innovation around low-cost, low-power data
acquisition, with the ensuing low margins. There may be higher margins
at the software and system level for deployers of IoT networks, but not for
semiconductor manufacturers. But that’s about to change, as the focus shifts from generating
data to analyzing data using the explosion of deep-learning algorithms that are
enabling what we now call artificial intelligence, or AI.
This shift in focus
from generating data to making practical use of it through analysis and the application of
AI algorithms has stretched the limits of classic processor architectures such
as CPUs, GPUs, FPGAs. While all have been useful in their own distinct
way, the need for faster
neural network training and greater inference efficiency and more analysis at
the edge for lower latencies, have pushed silicon providers and OEMs to
change their modus operandi. Now architectures comprising the optimum mix of
processing elements to run specific algorithms for AI are necessary, make that
demanded, for applications such as autonomous vehicles, financial markets,
weather forecasts, agriculture, and someday smart cities.
The applications
have given rise to many AI
function market segments, which can be roughly divided into data center
training and inference, and edge training and inference.
Figure 2: Efficient, fast, and powerful inference
engines will be required at both the data center, as well as at the edge, where
localized processing can reduce latencies. (Image source: Arteris)
However, the bad
news for many is that, like IoT, there’ll be a shake out and many won’t make it
in applications like autonomous vehicles. The good news is that they’ll be able
to take their learnings and apply it somewhere else, like tracking passers-by
at street windows for marketing campaigns.
Those who last, will
have made the best use of heterogeneous processing elements, memory, I/O and
on-chip interconnect architectures to achieve the necessary gains in efficiency and performance required
for the next generation of AI solutions.
Until that shakeout
happens, both OEMs and dedicated chip houses will be spending a lot of cash and
IP capital on developing SoCs, ASICs/ASSPs, and accelerators that will
best implement the most advanced algorithms at the data center and at the edge.
Figure 3: The total dollars spent on inference (2x)
and training (4x to 5x) at the data center will grow sharply between now and
2025, reaching up to $10 and $5 billion, respectively. However, the rate of
growth in dollars spent on inference at the edge is >40x, reaching $4
billion by 2025. (Image source: McKinsey & Company)
The smart silicon
providers have already moved off the old “28 nm sweet spot” where there was a
temporary “time out” to develop silicon to make the most of IoT principles.
That emphasis on the sweet spot may have been more about a lack of vision as to
where things were really heading. Now we know what’s coming: are you ready?
— Kurt Shuler is vice president of marketing at Arteris IP
and has extensive IP, semiconductor, and software marketing experience in the
mobile, consumer, and enterprise segments working for Intel and Texas
Instruments. He is a member of the U.S. Technical Advisory Group (TAG) to the
ISO 26262/TC22/SC3/WG16 working group, thereby helping create safety standards
for semiconductors and semiconductor IP.
ATTACKING THE DATACENTER FROM THE EDGE INWARD
For
much of the decade, a debate around Arm was whether it would fulfill its
promise to become a silicon designer with suppliers of any significance to
datacenter hardware. The company initially saw an opportunity in the trend
among enterprises in buying energy-efficient servers that could run their
commercial workloads but not sabotage their budgets by gobbling up huge amounts
of power while doing that. Arm’s low-power architecture that dominates the
mobile device market seemed a good fit for those situations, despite the
challenge of building up a software ecosystem that could support it.
And every step –
forward or back – along the way was noted and scrutinized, from major OEMs like
Dell EMC and Hewlett Packard Enterprise running out systems powered by
Arm-based SoCs and the rise of hyperscalers like Google, Facebook, Microsoft and Amazon with their massive datacenters and their need
to keep lid on power consumption to the early exit of pioneer Calxeda, the
backing away by AMD and Samsung, the sharp left turn by Qualcomm to exit the server chip space after
coming out with its Centriq system-on-a-chip (SoC), the consolidation that
saw Marvell buy Cavium, and the embrace by the HPC crowd such as Cray and Fujitsu.
Through all this, Arm has gained a degree of traction, from
major cloud providers and system makers adopting the Arm architecture to
various degrees to chip makers like Marvell (now with Cavium) and Ampere – led
by a group of ex-Intel executives, including CEO Renee James – putting together
products to go into the systems.
While all this was going, the industry saw the rise
of edge computing, driven by the ongoing
decentralization of IT that has been fueled by not only the cloud but the
proliferation of mobile devices, the Internet
of Things (IoT), big data, analytics and
automation, and other trends like artificial intelligence (AI) and machine
learning. There is a drive to put as much compute, storage, virtualization and
analytics capabilities as close as possible to the devices that are generating
massive amounts of data and to gain crucial insights into that data as close to
real time as possible.
Arm over the past couple of years has put a sharp focus on the
edge, IoT, 5G and other emerging trends, a concentration that was evident at
last month’s TechCon show. There was more discussion of the company’s Pelion
IoT platform and Neoverse – an
edge and hyperscale infrastructure platform that includes everything from
silicon to reference designs.
The chip designer talked about expanding its Platform
Security Architecture (PSA) that Arm
partners and third parties can leverage to build more security into their IoT
devices out to the infrastructure edge, part of a larger effort called Project
Cassini. Launched in partnership with ecosystem partners, Arm is looking to leverage
its strong presence in endpoints to drive the evolution of infrastructure and
cloud-native software at the edge through Arm technologies and the development
of platform standards and reference systems.
It’s
part of Arm’s effort to take a leadership role in how the edge develops, a
delicate balancing act that includes other technology vendors and essentially
sets the direction while enabling broad participation in how things move in
that direction, according to Drew Henry, the company’s one-time head of the
infrastructure business and now senior vice president of IPG and operations.
It’s a different role than Arm has taken in the past in the datacenter and
uncommon in the industry as a whole, Henry tells The Next Platform.
“What we’re doing is carefully stepping with our ecosystem a little in
front of it, saying, ‘Hey, this is the view we have. Let’s all go along this
together,’” he says. “You see this beginning to show up. There’s this industry
consortium – that’s the Autonomous
Vehicle Computing Consortium that we’re doing in the autonomy space. Project
Cassini, which is about how to create a standard platform for edge computing
that respects the diversity of silicon and some of the designs around those types
of devices, going from low power to high power, small amounts of compute to
large amounts of compute, all kinds of locations and industrial IoT locations
to 5G bay stations, whatever. Realizing that’s a strength, that you want to
enable a software ecosystem to be able to deploy [solutions], how you marry
those things. We stepped in with that ecosystem and said, ‘Alright, let’s just
agree on some standards on a way these platforms are going to boot, let’s agree
with the way security is going to be held in it. If we do that well, then the
cloud-native software companies will be able to come in and deploy software on
top of it in a cloud-native stack fairly easily to do the things that people
want to do. That’s that balance.”
That’s
a contrast to what has driven computing with Intel, Henry added, “where there’s
been this ecosystem, but with one incredibly dominant viewpoint for it. There’s
just so much invention that has to happen over the next decade or so to
accomplish these rules of autonomy and Internet of Things and stuff that it’s
too much to expect that any one company is going to have all the right answers.
The ecosystem needs to [drive] it.”
A DIFFERENT ANIMAL
The datacenter compute environment for Arm continues to evolve,
driven not only by what the chip designer is doing with its architecture but
also with the efforts from manufacturing partners. Marvell is continuing to
develop the ThunderX2 SoCs that it inherited when it bought Cavium for about
$5.5 billion last year, and other chip makers like Ampere are coming to market
with offerings based on the X-Gene designs from Applied Micro, which the
company bought. At the same time, some tech vendors are taking Arm’s
architecture and creating their own chips. Fujitsu is developing the AFX64
chip, which will be the foundation for its Post-K supercomputer. Amazon Web Services (AWS) turned to
the Arm architecture –
with expertise from its acquisition of Annapurna Labs for $350 million in 2015
– for its Graviton chips. Huawei is also making a play in the Arm chip space.
Enterprise, supercomputer and cloud datacenters are served by
suppliers and companies that develop their own Arm-based chips, with Arm
innovation and investment, Henry says. Arm is not so much leading an evolution
but working with companies to grow the presence of its architecture in
datacenters. But the edge is different and calls for Arm to take a different – and a more leadership – role.
“The spaces in compute in the large, aggregated compute areas, which
are datacenter and supercomputing, I feel really good about the portfolio that
is servicing those,” he says. “That’s why we’ve kind of shifted our focus now,
effectively saying, ‘Alright, we’ve got a lot of work to do to continue to help
with that group, but there’s also this emerging area of compute at the edge
that also needs to be invested in –
where if we don’t invest, collectively as an ecosystem to get it established,
it is going to take longer to mature than it should.”
The
edge is a different compute environment, where the “ecosystem broadens because
now you’ve got companies that have networking IP that you can combine together
with silicon,” Henry says. “This is where Broadcom enters into the marketplace,
and NXP enters into the marketplace and others, so we’ve got a pretty rich
ecosystem of being able to provide compute wherever you need compute. A lot of
people fixate on the classical server sitting in a datacenter. That’s a
relatively small unit amount in the marketplace, relatively small compute
that’s done across the ecosystem. We absolutely are doing great in that space,
but it’s not the only focus for us. Servicing the cloud is fairly well
understood. Serving compute at the infrastructure edge is more complicated, so
that is where we can be much more involved in leading and coordinating the
activities there. That’s what Cassini’s about.”
It Takes Liquidity To Make Infrastructure Fluid
Stranded capacity
has always been the biggest waste in the datacenter, and over the years, we
have added more and more clever kinds of virtualization – hardware partitions,
virtual machines and their hypervisors, and containers – as well as the systems
management tools that exploit them. There is a certain amount of hardware
virtualization going on these days, too, with the addition of virtual storage
and virtual switching to so-called SmartNICs.
The next step in
this evolution is disaggregation and composability, which can be thought of in
a number of different ways. The metaphor we like here at The Next Platform is smashing
all of the server nodes in a cluster and then stitching all of the components
back together again with software abstraction that works at the peripheral
transport and memory bus levels – what is commonly called composability. You
can also think of this as making the motherboard of the system extensible and
malleable, busting beyond the skin of one server to make a giant pool of
hardware that can allow myriad, concurrent physical hardware configurations –
usually over the PCI-Express bus – to be created on the fly and reconfigured as
workloads dictate. This way, CPUs, memory, flash storage, disk storage, and GPU
and FPGA accelerators are not tied so tightly to the nodes they happen to be
physically located within.
There are a lot of
companies that are trying to do this. Among the big OEMs, Hewlett Packard
Enterprise has its
Synergy line and Dell has its
PowerEdge MX line and its Kinetic strategy. Cisco Systems did an initial
foray into composability with its
UCS M Series machines. DriveScale has offered a level of server
composability through
a special network adapter that allows compute and storage to scale
independently at the rack scale, across nodes, akin to similar projects
under way at Intel, Dell, the Scorpio alliance of Baidu, Alibaba, and Tencent,
and the Open Compute Project spearheaded by Facebook. Juniper Networks acquired
HTBase to get some composability for its network gear, and Liqid dropped out of
stealth in June 2017 with its
own PCI-Express switch fabric to link bays of components together and make them
composable into logical servers. TidalScale, which
dropped out of stealth a few months later in October 2017, has created what
it calls a HyperKernel to glom together multiple servers into one giant system
that can then be carved up into logical servers with composable components;
rather than use VMs to break this hyperserver down, LXC or Docker containers
are used to create software isolation. GigaIO has been coming on strong in the
past year with
its own PCI-Express switches and FabreX fabric.
There are going to
be lots of different ways to skin this composability cat, and it is not clear
which way is going to dominate. But our guess is that the software approaches
from DriveScale, Liqid, and TidalScale are going to prevail compared to the proprietary
approaches that Cisco, Dell, and HPE have tried to use with their respective
malleable iron. Being the innovator, as HPE was here, may not be enough to win
the market, and we would not be surprised to see HPE snap up one of these other
companies and then Dell to snap up whichever one HPE doesn’t acquire. Then
again, the Synergy line of iron at HPE was already at an annualized revenue run
rate of $1.5 billion – with 3,000 customers – and growing at 78 percent in the
middle of this year, so maybe HPE thinks it already has the right answer.
Liqid, for one, is
not looking to be acquired and in fact has just brought in $28 million in its
second round of funding, bringing the total funds raised to date to $50
million; the funding was led by Panorama Point Partners, with Iron Gate Capital
and DH Capital kicking in some dough. After three years of hardware and
software development, Liqid needs more cash to build up its sales and marketing
teams to chase the opportunities and also needs to plow funds back into
research and development to keep the Liqid Fabric OS, managed fabric switch,
and Command Center management software moving ahead.
“We have a handful
of large customers that make up a good chunk of our revenues right now,” Sumit
Puri, co-founder and chief executive officer at Liqid, tells The Next Platform. “These are the customers we
started with back in the day, and we have ramped them to the size we want all
of our customers to be, and some of them are showing us projects out on the
horizon that are at massive scale. We have dozens of proofs of concept under
way, and some of them will be relatively small and never grow into a
seven-figure customer. Some of them will.”
Puri is not about to
get into specific pricing for the switches and software that turn a rack of
servers with peripherals into a stack of composable, logical servers, but says
that the adder over the cost of traditional clusters is on the order of 5 percent
to 10 percent to the total cost of the infrastructure. But the composability
means that every workload can be configured with the right logical server setup
– the right number of CPUs, GPUs, FPGAs, flash drives, and such – so that
utilization can be driven up by factors of 2X to 4X on the cluster compared to
the industry average. Datacenter utilization, says Puri, averages something on
the order of 12 percent worldwide (including compute and storage), and as best
as Liqid can figure Google, which is the best at this in the industry, is
average 30 percent utilization in its datacenters. The Liqid stack can drive it
as high as 90 percent utilization, according to Puri. That’s mainframe-class
right there, and about as good as it gets.
The prospect
pipeline is on the order of thousands of customers, and that is why funding is
necessary. It takes people to attack that opportunity, and even if HPE has been
talking about composability for the past five years, it is not yet a mainstream
approach for systems.
As with most
distributed systems, there is a tension between making one large pool of
infrastructure and making multiple isolated pools to limit the blast area in
the event that something goes wrong in the infrastructure. The typical large
enterprise might have pods of compute, networking, and storage that range in
size from a half rack, a full rack, or up to one, tow, or even three racks, but
rarely larger or smaller than that. They tend to deploy groups of applications
on pods and upgrade the infrastructure by the pod to make expanding the
infrastructure easier and more cost effective than doing it a few servers at a
time.
In a deal that Liqid
is closing right now, the customer wants to have a single 800-node cluster, but
only wants to have 200 of the nodes hanging off the Liqid PCI-Express fabric
because it does not want to pay the “composability tax,” as Puri put it, on all
of those systems. Over time, as the company will possibly expand the Liqid
fabric into the remaining 600 servers, but it is far more likely that it will
be the new servers that are adding in the coming years that will have them, and
after a three or four year stint, the old machines that did not have
composability will simply be removed from the cluster.
There are a number
of different scenarios where composability is taking off according to Liqid.
The important thing to note is that the basic assumption is that components are
aggregated into their own enclosures and then the PCI-Express fabric in the Liqid
switch can reaggregate them as needed, tying specific processors in servers to
specific flash or Optane storage, network adapters, or GPUs within enclosures.
You can never attach more devices to a given server than it allows, of course,
so don’t think that with the Liqid switch you can suddenly hang 128 GPUs off of
one CPU. Your can’t do more than the BIOS says. But you can do that much and
less as needed.
The Liqid fabric is
not just restricted to PCI-Express, but can also be extended with Ethernet and
InfiniBand attachment for those cases when distance and horizontal scale is
more important than the low latency that PCI-Express switching affords. Liqid’s
stack does require disaggregation at the physical level, meaning that the
peripherals are ganged up into their respective enclosures and then linked
together using the PCI-Express fabric or using NVM-Express over Ethernet or
perhaps GPUDirect over RDMA networks to link flash and GPUs to compute
elements.
Next week at the
SC19 supercomputer conference in Denver, Liqid will be showing off the next
phase of its product development, where the hardware doesn’t have to be pooled
at the physical layer and then composed, but rather standard servers using a
mix of CPUs and GPUs and FPGAs for compute and flash and Optane for storage
will be able to have their resources disaggregated, pooled, and composable
using only the Liqid software to sort it into pools and then ladle it all out
to workloads. The performance you get will, of course, be limited by the
network interface used to reaggregate the components – Ethernet will be slower
than InfiniBand will be slower than PCI-Express, and for many applications, the
only real impact will be the load time for the applications and the data. Any
application that requires a lot of back and forth chatter between compute and
storage elements will want to be on PCI-Express. But this new capability will
allow Liqid to go into so-called “brownfield” server environments and bring
composability to them.
So where is
composability taking off? The first big area of success for Liqid was, not
surprisingly, for GPU-centric workloads, where the GPUs traditionally get
locked away inside of a server node and are unused most of the time.
Disaggregation and composability allow for them to be kept busy doing
workloads, and the hardware configuration can change rapidly as needed. If you
put a virtualization or container layer on top of the reaggregated hardware,
then you can move workloads around and change hardware as necessary. This is,
in fact, what companies are now interested in doing, with either a VMware
virtualization or Kubernetes container environment on top of the liquid
hardware. Composable bare metal clouds are also on the rise, like this:
Liqid has also
partnered recently with ScaleMP so it can offer virtual NUMA servers over
composable infrastructure and therefore be better able to compete with
TidalScale, which did this at the heart of its eponymous composable
architecture.
There is also talk
about using Liqid on 5G and edge infrastructure – but everybody is trying to
get a piece of that action.
Nvidia Arms Up Server OEMs And ODMs For Hybrid Compute
The one thing that
AMD’s return to the CPU market and its more aggressive moves in the GPU compute
arena have done, as well as Intel’s plan to create a line of discrete Xe GPUs that can be used as companions to
its Xeon processors, has done is push Nvidia and Arm closer together.
Arm is the chip
development arm that in 1990 was spun out of British workstation maker Acorn
Computer, which created its own Acorn RISC Machine processor and significantly
for client computing, was chosen by Apple for its Newton handheld computer
project. Over the years, Arm has licensed it eponymous RISC architecture to
others and also collected royalties on the devices that they make in exchange
for doing a lot of the grunt work in chip design as well as ensuring software
compatibility and instruction set purity across its licensees.
This business, among
other factors, is how and why Arm has become the largest semiconductor IP
peddler in the world, with $1.61 billion in sales in 2018. Arm is everywhere in
mobile computing, and this is why Japanese conglomerate SoftBank paid $32 billion
for the chip designer three years ago. With anywhere from hundreds of
billions to a trillion devices plugged into the Internet at some point in the
coming decade, depending on who you ask, and a very large portion of them
expected to use the Arm architecture, it seemed like a pretty safe bet that Arm
was going to make a lot of money.
Getting Arm’s
architecture into servers has been more problematic, and the reasons for this
are myriad and we are not getting into the litany of it. One issue is the very
way that Arm licenses its architecture and makes its money, which is great but
which has relied on other chip makers, with much less deeper pockets who do not
have the same muscle as Arm, much less AMD or Intel, to extend it for server
platforms with features like threading or memory controllers or peripheral
controllers. The software stack took too long to mature, although we are there
now with Linux and probably with Windows Server (only Microsoft knows for sure
on that last bit). And despite it all, the Arm collective has shown how hard it
is to sustain the effort to create a new server chip architecture, with a
multiple generation roadmap, that takes on Intel’s hegemony in the datacenter –
which is doubly difficult with an ascending AMD that has actually gotten its
X86 products and roadmap together with the Epyc family that launched in
2017 with the “Naples” processors and that has been substantially improved with
the “Rome” chips this year.
All of this is
background against what is the real news. And that is that Nvidia, which
definitely has a stake in helping Arm server chips be full-functioning peers to
X86 and Power processors, is doing something about it. Specifically, the
company is making a few important Arm-related announcements at the SC19
supercomputing conference in Denver this week.
The first thing is
that Nvidia is making good on its promise earlier this summer to make Arm a
peer with X86 and Power with regard to the entire Nvidia software stack,
including the full breadth of the CUDA programming environment with its
software development kit and its libraries for accelerating HPC and AI
applications. Ian Buck, vice president and general manager of accelerated
computing at Nvidia, tells The Next Platform
that most of the libraries for HPC and AI are actually available in the first
beta of the Arm distribution of CUDA – there are still a few that need some
work.
As we pointed out
last summer, this CUDA-X stack, as it is now called, may have started out as a
bunch of accelerated math libraries, but not it comprises tens of millions of
lines of code and is on the same order of magnitude in that regard as a basic operating
system. So moving that stack and testing all the possible different features in
the host is not trivial.
Last month, ahead of
the CUDA on Arm launch here at SC19, Nvidia gave it out to number of key HPC
centers that are at the forefront of Arm in HPC, notably RIKEN in Japan, Oak
Ridge National Laboratory in the United States, and the University of Bristol in
the United Kingdom. They have been working on porting some of their codes to
run in accelerated mode on Arm-based systems using the CUDA stack. In fact, of
the 630 applications that have been accelerated already using X86 or Power
systems as hosts, Buck says that 30 of them have already been ported to using
Arm hosts, which is not bad at all considering that it was pre-beta software
that the labs were using. This includes GROMACS, LAMMPS, MILC, NAMD, Quantum
Espresso, and Relion, just to name a few, and the testing of the Arm ports was
done in conjunction with not only key hardware partners that have Arm
processors – Marvell, Fujitsu, and Ampere are the ones that matter, with maybe
HiSilicon in China but not mentioned – or make Arm servers – such as Cray,
Hewlett Packard Enterprise, and Fujitsu – or who make Linux on Arm
distributions – with Red Hat, SUSE Linux, and Canonical being the important
ones.
“Our experience is
that for most of these applications, it is just a matter of doing a recompile
of the code on the new host and it runs,” explains Buck. This stands to reason
since a lot of the code in a hybrid CPU-GPU system has, by definition, been ported
to actually run on the Tesla GPU accelerators in the box. “And as long as they
are not using some sort of bespoke library that only exists in the ecosystem
out of the control of the X86 platform, it has been working fine. And the
performance has been good. We haven’t released performance numbers, but it is
comparable to what we seen on Intel Xeon platforms. And that makes sense since
so many of these applications get the bulk of their performance from the GPUs
anyway, and the ThunderX2, which most of these centers have, is performing well
because its memory system is good and its PCI-Express connectivity is good.”
Although Nvidia did
not say this, at some point, this CUDA-X stack on Arm will probably be made
available on those Cray Storm CS500 systems that some of the same HPC centers
mentioned above are
getting equipped with the Fujitsu A64FX Arm processor that Fujitsu has
designed for RIKEN’s “Fugaku” exascale system. Cray, of course, announced that
partnership with Fujitsu and RIKEN, Oak Ridge, and Bristol ahead of SC19, and
said that it was not planning to make the integrated Tofu D interconnect available
in the CS500 clusters with the A64FX iron. And that means that the single
PCI-Express 4.0 slot in the A64FX processor is going to be in contention on the
A64FX processor, or someone is going to have to create a Tofu D to InfiniBand
or Ethernet bridge to accelerate this server chip. A Tofu D to NVLink bridge
would be even better. . . . But perhaps this is just a perfect use case for
PCI-Express switching with disaggregation of accelerators and network
interfaces and dynamic composition with a fabric layer,
such as what GigaIO is doing.
That’s not Nvidia’s
concern today, though. What Nvidia does want to do is make it easier for any
Arm processor plugged into any server design to plug into a complex of GPU
accelerators, and this is being accomplished with a new reference design dubbed
EBAC – short for Everything But A CPU – that Nvidia is making available and
that is shown below:
The EBAC design has
a modified GPU tray from the hyperscale HGX system design, which includes eight
“Volta” Tesla V100 accelerators with 32 GB of HBM2 memory on each. The GPUs are
cross-connected by NVLink so they can share data and memory atomics across
those links, and the tray of GPUs also has what amounts to an I/O mezzanine
card on the front that has four ConnectX5 network interface cards running at
100 Gb/sec from Mellanox Technologies (which Nvidia is in the process of
buying) and four PCI-Express Mini SAS HD connectors that can lash any Arm
server to this I/O and GPU compute complex. In the image above, it looks like a
quad of two-socket “Mustang” ThunderX2 system boards, in a pair of 1U rack
servers, would be ganged up with the Tesla Volta accelerators. Presumably there
is a PCI-Express switch chip complex within the EBAC system chip to link all of
this together, even if it is not, strictly speaking, composable.
There is probably
not a reason it could not be made composable, or extended to support an A64FX
complex. We shall see. If anyone needs to build composability into its systems,
now that we think about it, it is Nvidia.
No comments:
Post a Comment