Tether’s AI Research Group has unveiled QVAC MedPsy, a new class of medical language models designed to run efficiently on smartphones, wearables, and other low-power devices. The models claim to rival or surpass much larger systems in performance while keeping all data processing local and private.
Currently, most medical AI relies on large models hosted on remote servers, which requires transmitting sensitive patient data, diagnostic queries, and clinical notes over the internet. This raises significant privacy and compliance issues, especially in healthcare, where regulations like HIPAA are strict. As the medical AI market is projected to grow from around $36 billion today to over $500 billion by 2033, this cloud-dependent architecture is becoming harder to sustain.
Smaller Models, Better Results
The QVAC MedPsy release challenges a core assumption in AI: that bigger models and more compute power are necessary for better performance. Instead, the team focused on efficiency. A 1.7 billion parameter version scored an average of 62.62 across seven closed-ended medical benchmarks, beating Google’s MedGemma-1.5-4B-it by 11.42 points, despite being less than half the size. In a more realistic clinical benchmark called HealthBench Hard, the 1.7B model even outperformed MedGemma 27B, a model roughly sixteen times larger.
The 4 billion parameter version scored 70.54 on those same benchmarks, exceeding models nearly seven times its size, including MedGemma-27B-text. It also performed better on clinical-style evaluations such as HealthBench Hard, HealthBench, and MedXpertQA. The evaluation covered eight benchmark suites, including MedQA-USMLE, MedMCQA, MMLU Health, PubMedQA, and AfriMedQA, the last of which focuses on underserved global healthcare contexts.
These gains come from a staged post-training medical process that combines broad medical supervision with higher-value clinical reasoning data and reinforcement learning on harder cases. So maybe size really isn’t everything.
Faster, Cheaper, and Local
The models also reduce inference costs significantly. The QVAC MedPsy 4B model generates responses in about 909 tokens, compared to 2,953 tokens for comparable systems—a 3.2x reduction. The 1.7B model averages 1,110 tokens versus 1,901 tokens, a 1.7x saving. That leads to faster response times and enables the models to run locally without cloud dependency.
Tether is releasing quantized GGUF formats for local deployment. The recommended Q4_K_M versions are about 1.2 GB for the 1.7B model and 2.6 GB for the 4B model. Testing showed these compressed versions retained most of the benchmark performance, making them practical for mobile and edge environments.
Shifting Where Medical AI Can Be Used
This technology could change where and how medical AI is deployed. Previously, such systems required external cloud processing. Now, they can run within hospital systems for secure, local data analysis, on mobile devices, or in areas with limited connectivity or strict privacy constraints. This removes one of the major barriers to healthcare AI adoption: the need to send sensitive data outside controlled environments.
Paolo Ardoino, CEO of Tether, emphasized that the focus was on improving efficiency at the model level rather than scaling up size. He noted that the 1.7B model outperformed larger systems while using up to three times fewer tokens per response, which directly reduces compute requirements, latency, and cost. “It allows the model to run locally on standard hardware instead of relying on remote infrastructure,” he said. “In healthcare, that changes the constraints entirely.”
For the past decade, AI progress has been tied to cloud-based compute. QVAC MedPsy points in a different direction, where efficiency, locality, and privacy define performance. If these gains hold in real-world deployments, they could reshape the economics of medical AI infrastructure, shifting advantages toward local systems with lower cost, lower latency, and greater data control.
More details are available at https://qvac.tether.io/models/.
Tether Data, part of Tether’s broader vision, aims to advance decentralized infrastructure for privacy and efficiency. QVAC is its advanced AI research initiative focused on building open, adaptive intelligence systems that run locally on any device.
