Fractile is focused on AI hardware that runs LLM inference in memory to reduce compute overhead and drive scale
In December last year, then-CEO of Intel Pat Gelsinger abruptly retired as the company’s turnaround strategy, largely marked by a separation of the semiconductor design and fabrication businesses, didn’t convince investors. And while Intel apparently failed to sell its AI story to Wall Street, Gelsinger has continued his focus on scaling AI with an investment in a U.K. startup.
In a LinkedIn post published this week, Gelsinger announced his investment in a company called Fractile which specializes in AI hardware that processes large language model (LLM) inferencing in memory rather than moving model weights from memory to a processor, according to the company’s website.
“Inference of frontier AI models is bottlenecked by hardware,” Gelsinger wrote. “Even before test-time compute scaling, cost and latency were huge challenges for large-scale LLM deployments. With the advent of reasoning models, which require memory-bound generation of thousands of output tokens, the limitations of existing hardware roadmaps [have] compounded. To achieve our aspirations for AI, we need radically faster, cheaper and much lower power inference.”
A few things to unpack there. The core AI scaling laws essentially prove out that model size, dataset size and underlying compute power need to concurrently scale to increase the performance of an AI system. Test-time scaling is an emerging AI scaling law that refers to techniques applied during inference that enhance performance and drive efficiency without any retraining of the underlying LLM—things like dynamic model adjustment, input-specific scaling, quantization at inference, efficient batch processing and so forth. Read more on AI scaling laws here.
This also touches on edge AI which, generally speaking, is all about moving inferencing onto personal devices like handsets or PCs, or the infrastructure that’s one hop away from personal devices, on-premise enterprise datacenters, mobile network operator base stations, and otherwise distributed compute infrastructure that isn’t a hyperscaler or other centralized cloud. The idea is multi-faceted; in a nutshell, edge AI would improve latency, reduce compute costs, enhance personalization through contextual awareness, and improve data privacy and potentially better adhere to data sovereignty rules and regulations.
Gelsinger’s interest in edge AI isn’t new. It’s something he studied at Stanford University, and it’s something he pushed in his stint as CEO of Intel. In fact, during CES in 2024, Gelsinger examined the benefits of edge AI in a keynote interview. The lead was the company’s then-latest CPUs for AI PCs but the more important subtext was in his description of the three laws of edge computing.
“First is the laws of economics,” he said at the time. “It’s cheaper to do it on your device…I’m not renting cloud servers…Second is the laws of physics. If I have to round-trip the data to the cloud and back, it’s not going to be as responsive as I can do locally…And third is the laws of the land. Am I going to take my data to the cloud or am I going to keep it on my local device?”
Looking at Fractile’s approach, Gelsinger called out how the company’s “in-memory compute approach to inference acceleration jointly tackles two bottlenecks to scaling inference, overcoming both the memory bottleneck that holds back today’s GPUs, while decimating power consumption, the single biggest physical constraint we face over the next decade in scaling up data center capacity.”
Gelsinger continued in his recent post: “In the global race to build leading AI models, the role of inference performance is still under-appreciated. Being able to run any given model orders of magnitude faster, at a fraction of the cost and maybe most importantly at [a] dramatically lower power envelop[e] provides a performance leap equivalent to years of lead on model development. I look forward to advising the Fractile team as they tackle this vital challenge.”