Qt AI Inference API
This projects contains the proof-of-concept for a new Qt AI Inference API. The purpose of the API is to let you easily use different types of AI models for inference from your Qt code, either from C++ or directly from QML! The API abstracts the details of the underlying model and framework implementations, allowing you to just tell what type of input and output you would like to use, and Qt will set things up for you! You can also chain different models together for pipelines.
Disclaimer This API is in proof-of-concept stage and under active development, and not yet a part of the Qt framework. Hence, Qt's compatibility promise does not apply; the API can still change in breaking ways. But, it is also a great time to impact the direction it will take! For suggestions feel free to create a ticket in the Qt project's JIRA, please use the label "QtAiApi" so we can easily find them and collect them together.
How it works
When you declare a model in your code, Qt will infer from the given input and output type what backend it will set up for the model. The backends are implemented as QPlugins. Currently, the backends are:
Input type | Output type | Qt backend | Description |
---|---|---|---|
Text|Image | Text | QtOllamaModel | Uses ollama to load LLM models and communicate to them with ollama's REST API |
Speech | Text | QtAsrModel | Uses Whisper for Automatic Speech Recognition (ASR), or speech-to-text |
Image | Json | QtTritonModel | Uses Triton to load a model for object detection from images |
Image | Json | QtYoloModel | Uses a YOLO model for object detection from images |
Text | Speech | QtTtsModel | Uses QtTextToSpeech (QtSpeech) to convert text into speech |
Text | Speech | QtPiperModel | Uses Piper TTS model to convert text into speech |
Text | Image | QtDiffuserModel | Uses Diffusers to convert text into images |
Note, the Qt backends expect the underlying backend implementation (ollama, Whisper...) to be running, and will not take care of starting them up for you. You need to start them yourself, e.g. in the case of QtOllamaModel, loading the intended model to ollama's memory by running:
ollama run <model>
Building the library
To build the API library, you need to have a Qt kit (6.7 or newer). Additional dependencies for specific plugins are:
- QtSpeech additional library for QtTtsModel
- OpenCV for QtTritonModel
- QtMultimedia for QtYoloModel
In Qt Creator, open the library project by choosing the CMakeLists.txt file under qt-ai-inference-api/aimodel/, configure with your Qt kit and build the library. You can also choose the qt-ai-inference-api/CMakeLists.txt to build the whole project, including the example app that ships with the API. For an example on how to include the library project into your Qt app project, see the Qt AI App example app
The API
Currently, the API consists of one class, QAiModel being the C++ implementation, and MultiModal its QML counterpart. QAiModel inherits QObject. To use the QML API, use the following import statement in your QML code:
import qtaimodel
Properties
AiModelPrivateInterface::AiModelTypes type
A combination of AiModelType flags to tell what type of model to instantiate. Possible values are:
Name | Value | Description |
---|---|---|
InputText | 0x00001 | The model takes text as input |
InputAudio | 0x00002 | The model takes speech as input |
InputVideo | 0x00004 | The model takes video as input |
InputImage | 0x00008 | The model takes image as input |
InputJson | 0x00010 | The model takes JSON as input |
OutputText | 0x00100 | The model outputs text |
OutputAudio | 0x00200 | The model outputs speech |
OutputVideo | 0x00400 | The model outputs video |
OutputImage | 0x00800 | The model outputs image |
OutputJson | 0x01000 | The model outputs JSON |
For supported input-output combinations, see the table under "How it works" section.
Example:
import qtaimodel
MultiModal {
// initiates a LLM model which takes text input, and produces text output
type: MultiModal.InputText | MultiModal.OutputText
}
Write method: | void setType(AiModelPrivateInterface::AiModelTypes) |
Read method: | AiModelPrivateInterface::AiModelTypes type() |
Notifier signal: | void typeChanged() |
QString prompt
The prompt for the model. This can be used to provide a more persistent prompt that will be combined with anything provided to the model with pushData(). The prompt will be prepended to any data provided with pushData(). Note, only setting the prompt will not send anything to the underlying model; you need to use pushData() to trigger that.
Example:
import qtaimodel
MultiModal {
id: model
type: MultiModal.InputText | MultiModal.OutputText
prompt: "Summarize the following text:"
}
function summarizeText() {
model.pushData("Lorem ipsum")
// The actual prompt sent to the underlying model will be "Summarize the following text: Lorem ipsum"
}
Write method: | void setPrompt(QString) |
Read method: | QString prompt() |
Notifier signal: | void promptChanged() |
QString model
Use to tell the underlying framework what specific model to use.
Example:
import qtaimodel
MultiModal {
id: model
// Runs a QtOllamaModel, where ollama uses deepseek-r1 model
type: MultiModal.InputText | MultiModal.OutputText
model: "deepseek-r1"
}
Write method: | void setModel(QString) |
Read method: | QString model() |
Notifier signal: | void modelChanged() |
QVariantList documents
Retrieval-Augmented Generation data to use for the model, if it supports it. RAG supports currently only chromadb, which should be running on background.
Example:
import qtaimodel
MultiModal {
id: llamaModel
type: (MultiModal.InputText | MultiModal.OutputText)
model: "llama3.2"
prompt: "Which item has best armor bonus?"
documents: ["Cloth of Authority | Armour Class +1",
"Drunken Cloth | Constitution +2 (up to 20)",
"Icebite Robe | Resistance to Damage Types: Cold damage.",
"Obsidian Laced Robe | Grants Resistance to Damage Types: Fire damage.",
"Moon Devotion Robe | Advantage on Constitution Saving throws.",
]
}
Write method: | void setDocuments(QVariantList) |
Read method: | QVariantList documents() |
Notifier signal: | void documentsChanged() |
int seed
Seed to use with model prompts. Seed reduces randomness in model answers.
Example:
import qtaimodel
MultiModal {
id: llamaModel
type: (MultiModal.InputText | MultiModal.OutputText)
model: "gemma3"
prompt: "Say hello?"
seed: 3453654
}
Write method: | void setDocuments(QByteArray) |
Read method: | QByteArray documents() |
Notifier signal: | void documentsChanged() |
QVector<QAiModel*> inputs
A list of models this model will use as its inputs. This allows for chaining models together to create pipelines. You can use the Optional flag with the model's type to tell whether it's an optional or mandatory input. For mandatory inputs, this model will not process any other inputs before the mandatory one has something to offer. For optional ones, other inputs will be processed regardless if that input has data available or not.
Example:
import qtaimodel
// The ASR model will convert speech to text and pass it to the LLM model. Its "optional" is set as true,
// so if the LLM model has other ways of receiving input, such as typing, it will not block processing
// those while waiting for the output of the ASR model.
MultiModal {
id: asrModel
type: MultiModal.InputAudio | MultiModal.OutputText
optional: true
}
MultiModal {
id: llmModel
type: MultiModal.InputText | MultiModal.OutputText
inputs: [asrModel]
}
Write method: | void setInputs(QVector<QAiModel*>) |
Read method: | QVector<QAiModel*> inputs() |
Notifier signal: | void inputsChanged() |
bool processing
Whether the model is currently processing a request.
Example:
import qtaimodel
MultiModal {
id: model
...
}
BusyIndicator {
running: model.processing
}
Write method: | void setProcessing(bool) |
Read method: | bool processing() |
Notifier signal: | void processingChanged() |
bool buffered
Whether the model should buffer the latest result for later use.
Write method: | void setBuffered(bool) |
Read method: | bool buffered() |
Notifier signal: | void bufferedChanged() |
bool optional
This boolean can be used when using the model as an input for another model, telling the other model(s) to treat this one as optional input. If this is set to true, the other model(s) will not wait for this model to produce output before processing its other inputs. If this is set to false, the other model(s) will wait for this model to produce output before processing its other inputs. Default is false. Has no effect if this model is not being used as input by any other model.
Write method: | void setOptional(bool) |
Read method: | bool optional() |
Notifier signal: | void optionalChanged() |
Methods
Q_INVOKABLE void pushData(QVariant data)
Push data to the model. The argument can be of any type supported by QVariant. The underlying QAiModel implementation will convert it based on its expected input type.
QVariant data - The data to push to the model. If the prompt property has been set, it will be prepended to the data before sending the final request to the underlying model.
Signals
void gotResult(QVariant result)
Emitted when the underlying model has finished processing and returns a result. The result is passed as QVariant, converted from the output type the underlying model provides.
Example:
import qtaimodel
MultiModal {
id: model
types: MultiModal.InputText | MultiModal.OutputText
onGotResult: (result) => {
someLabel.text = result
}
}
Known issues
- Currently the C++ API is not public, meaning the API is only usable from QML. This will change in a future patch.
Additional links
- The example app
- Qt Project JIRA - If you would like to leave ideas, suggestions or bug reports, please use the "QtAiApi" label so we can easily gather them!