Manifold Machines

I. Our Mission

We are dedicated to advancing Artificial Intelligence for low-resource contexts. Our research explores multimodality, safety, and sample-efficient learning, while our tools help bridge the data gap for small languages in just a few lines of code.

❧

II. Our Projects

Synglot, our data synthesis library

Python Data Synthesis Low-Resource Languages

Status: Active development

Goal: Easy data synthesis

Our attempt in rising the tide for low-resource languages by making it easy to translate and generate rich, expressive and diverse data.

Challenging Datasets

Macedonian Mathematics Medical

Status: Released

Goal: Reasoners in Macedonian

High-quality data is paramount for effective post-training. We provide novel, previously-unavailable datasets in Macedonian, from mathematics to medical scenarios.

Open-Source Models

Vision Fine-tuning Reinforcement Learning

Status: Active development

Goal: Provide robust OSS capabilities in Macedonian

We fine-tune open models to see better, and reinforce them to be smarter.

❧

III. Our Philosophy

Large language models are trained and used at an unprecedented scale. As a result, the resource strain for open-source builders and users — be it GPU hours or trillions of tokens — is monumental. We believe much can be achieved with small, local models, by incorporating the latest research advances and doing the boring parts well.

This means:

experimenting with novel architectures and techniques
being meticulous about high-quality data
optimizing for efficiency without sacrificing capability

Safety is paramount in our work. As AI capabilities move into a more agential territory, we find it critical to understanda and prevent misaligned behavior. For us, this means familiarity with the most recent safety research, transparency of our development practices and rigorous testing.

❧

IV. Contribute to our Work

For Researchers & Engineers: If you are interested in any part of the LLM development pipeline, get in touch. We are particularly eager to collaborate with people interested in multimodality and LLM evaluations, as we are seeking to pursue and publish research in these areas.

For Organizations & Institutions: You can support our research by sharing our work with your networks, sponsoring compute credits, or partnering with us partnership opportunities.

❧

V. People

Stay tuned :)

Check out our work