Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#code-generation#ai-ethics#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
🕒 Latest🔥 Top

Filtering by tag:

model-optimizationClear
NewsOpinionResearchToolClear
Quantization from the ground up | ngrok blog
llmsmodel-optimizationai-performancedeveloper-tools
Tool

Quantization from the Ground Up

Qwen-3-Coder-Next is an 80 billion parameter model that requires 159.4GB of RAM to run. Techniques exist to reduce the size of large language models by 4x and increase their speed by 2x.

ngrok.com

🔥🔥🔥🔥🔥

26 min

4d ago

Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe

Hypura is a storage-tier-aware LLM inference scheduler designed for Apple Silicon, allowing users to run large models that exceed their Mac's memory. It optimally distributes model tensors across GPU, RAM, and NVMe storage based on access patterns and hardware capabilities to prevent system crashes.

github.com

🔥🔥🔥🔥🔥

6 min

5d ago

Quantization from the Ground Up

Qwen-3-Coder-Next is an 80 billion parameter model that requires 159.4GB of RAM to run. Techniques exist to reduce the size of large language models by 4x and increase their speed by 2x.

ngrok.com

🔥🔥🔥🔥🔥

26 min

4d ago

Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe

Hypura is a storage-tier-aware LLM inference scheduler designed for Apple Silicon, allowing users to run large models that exceed their Mac's memory. It optimally distributes model tensors across GPU, RAM, and NVMe storage based on access patterns and hardware capabilities to prevent system crashes.

github.com

🔥🔥🔥🔥🔥

6 min

5d ago

Quantization from the Ground Up

Qwen-3-Coder-Next is an 80 billion parameter model that requires 159.4GB of RAM to run. Techniques exist to reduce the size of large language models by 4x and increase their speed by 2x.

ngrok.com

🔥🔥🔥🔥🔥

26 min

4d ago

Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe

Hypura is a storage-tier-aware LLM inference scheduler designed for Apple Silicon, allowing users to run large models that exceed their Mac's memory. It optimally distributes model tensors across GPU, RAM, and NVMe storage based on access patterns and hardware capabilities to prevent system crashes.

github.com

🔥🔥🔥🔥🔥

6 min

5d ago

No more articles to load