Enterprises are rethinking AI infrastructure as inference costs rise

AI spending in Asia Pacific continues to rise, yet many companies still struggle to get value from their AI projects. Much of this comes down to the infrastructure that supports AI, as most systems are not built to run inference at the speed or scale real applications need. Industry studies show many projects miss their ROI goals even after heavy investment in GenAI tools because of the issue.

The gap shows how much AI infrastructure influences performance, cost, and the ability to scale real-world deployments in the region.

Akamai is trying to address this challenge with Inference Cloud, built with NVIDIA and powered by the latest Blackwell GPUs. The idea is simple: if most AI applications need to make decisions in real time, then those decisions should be made close to users rather than in distant data centres. That shift, Akamai claims, can help companies manage cost, reduce delays, and support AI services that depend on split-second responses.

Jay Jenkins, CTO of Cloud Computing at Akamai, explained to AI News why this moment is forcing enterprises to rethink how they deploy AI and why inference, not training, has become the real bottleneck.

Table of Contents

Why AI projects struggle without the right infrastructure

Jenkins says the gap between experimentation and full-scale deployment is much wider than many organisations expect. “Many AI initiatives fail to deliver on expected business value because enterprises often underestimate the gap between experimentation and production,” he says. Even with strong interest in GenAI, large infrastructure bills, high latency, and the difficulty of running models at scale often block progress.

Jay Jenkins, CTO of Cloud Computing at Akamai.

Most companies still rely on centralised clouds and large GPU clusters. But as use grows, these setups become too expensive, especially in regions far from major cloud zones. Latency also becomes a major issue when models have to run multiple steps of inference over long distances. “AI is only as powerful as the infrastructure and architecture it runs on,” Jenkins says, adding that latency often weakens the user experience and the value the business hoped to deliver. He also points to multi-cloud setups, complex data rules, and growing compliance needs as common hurdles that slow the move from pilot projects to production.

Why inference now demands more attention than training

Across Asia Pacific, AI adoption is shifting from small pilots to real deployments in apps and services. Jenkins notes that as this happens, day-to-day inference – not the occasional training cycle – is what consumes most computing power. With many organisations rolling out language, vision, and multimodal models in multiple markets, the demand for fast and reliable inference is rising faster than expected. This is why inference has become the main constraint in the region. Models now need to operate in different languages, regulations, and data environments, often in real time. That puts enormous pressure on centralised systems that were never designed for this level of responsiveness.

How edge infrastructure improves AI performance and cost

Jenkins says moving inference closer to users, devices, or agents can reshape the cost equation. Doing so shortens the distance data must travel and allows models to respond faster. It also avoids the cost of routing huge volumes of data between major cloud hubs.

Physical AI systems – robots, autonomous machines, or smart city tools – depend on decisions made in milliseconds. When inference runs distantly, these systems don’t work as expected.

The savings from more localised deployments can also be substantial. Jenkins says Akamai analysis shows enterprises in India and Vietnam see large reductions in the cost of running image-generation models when workloads are placed at the edge, rather than centralised clouds. Better GPU use and lower egress fees played a major role in those savings.

Where edge-based AI is gaining traction

Early demand for edge inference is strongest from industries where even small delays can affect revenue, safety, or user engagement. Retail and e-commerce are among the first adopters because shoppers often abandon slow experiences. Personalised recommendations, search, and multimodal shopping tools all perform better when inference is local and fast.

Finance is another area where latency directly affects value. Jenkins says workloads like fraud checks, payment approval, and transaction scoring rely on chains of AI decisions that should happen in milliseconds. Running inference closer to where data is created helps financial firms move faster and keeps data inside regulatory borders.

Why cloud and GPU partnerships matter more now

As AI workloads grow, companies need infrastructure that can keep up. Jenkins says this has pushed cloud providers and GPU makers into closer collaboration. Akamai’s work with NVIDIA is one example, with GPUs, DPUs, and AI software deployed in thousands of edge locations.

The idea is to build an “AI delivery network” that spreads inference across many sites instead of concentrating everything in a few regions. This helps with performance, but it also supports compliance. Jenkins notes that almost half of large APAC organisations struggle with differing data rules across markets, which makes local processing more important. Emerging partnerships are now shaping the next phase of AI infrastructure in the region, especially for workloads that depend on low-latency responses.

Security is built into these systems from the start, Jenkins says. Zero-trust controls, data-aware routing, and protections against fraud and bots are becoming standard parts of the technology stacks on offer.

The infrastructure needed to support agentic AI and automation

Running agentic systems – which make many decisions in sequence – needs infrastructure that can operate at millisecond speeds. Jenkins believes the region’s diversity makes this harder but not impossible. Countries differ widely in connectivity, rules, and technical readiness, so AI workloads must be flexible enough to run where it makes the most sense. He points to research showing that most enterprises in the region already use public cloud in production, but many expect to rely on edge services by 2027. That shift will require infrastructure that can hold data in-country, route tasks to the closest suitable location, and keep functioning when networks are unstable.

What companies need to prepare for next

As inference moves to the edge, companies will need new ways to manage operations. Jenkins says organisations should expect a more distributed AI lifecycle, where models are updated across many sites. This requires better orchestration and strong visibility into performance, cost, and errors in core and edge systems.

Data governance becomes more complex but also more manageable when processing stays local. Half of the region’s large enterprises already struggle with the variance in regulations, so placing inference closer to where data is generated can help.

Security also needs more attention. While spreading inference to the edge can improve resilience, it also means every site must be secured. Firms need to protect APIs, data pipelines, and guard against fraud or bot attacks. Jenkins notes that many financial institutions already rely on Akamai’s controls in these areas.

(Photo by Igor Omilaev)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

What's Hot

A Should Pad Landed Warhammer FTL In DMCA Takedown Jail

This Horror Classic Still Holds the Guinness Record for Most Appearances of a Film in Other Movies

BMW Opened the Bespoke Door With Skytop and Speedtop. Now It’s Time for an ALPINA Coupe.

Enterprises are rethinking AI infrastructure as inference costs rise

US Supreme Court to consider Trump’s bid to end birthright citizenship | Courts News

Accenture and Anthropic partner to boost enterprise AI integration

What Anthropic's Discovery Means for Enterprises

How does the cutoff of Starlink terminals affect Russia’s moves in Ukraine? | Russia-Ukraine war News

Chinese AI Models Power 175,000 Unprotected Systems as Western Labs Pull Back

Is Portugal shifting to the right? | Elections

BMW Will Put eFuel In Cars Made In Germany From 2028

Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

Most Popular

BMW Will Put eFuel In Cars Made In Germany From 2028

Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

Subscribe to Updates

What's Hot

Enterprises are rethinking AI infrastructure as inference costs rise

Why AI projects struggle without the right infrastructure

Why inference now demands more attention than training

How edge infrastructure improves AI performance and cost

Where edge-based AI is gaining traction

Why cloud and GPU partnerships matter more now

The infrastructure needed to support agentic AI and automation

What companies need to prepare for next

Related posts:

US Supreme Court to consider Trump’s bid to end birthright citizenship | Courts News

Accenture and Anthropic partner to boost enterprise AI integration

What Anthropic's Discovery Means for Enterprises

Related Posts

Subscribe to Updates