Skip to content
Snippets Groups Projects
Jari Helaakoski's avatar
Jari Helaakoski authored
QtWS2025 release

See merge request !1
2f1a0cbe
History
Name Last commit Last update
LICENSES
aimodel
tests
App.qml
CMakeLists.txt
README.md

Qt AI Inference API

This projects contains the proof-of-concept for a new Qt AI Inference API. The purpose of the API is to let you easily use different types of AI models for inference from your Qt code, either from C++ or directly from QML! The API abstracts the details of the underlying model and framework implementations, allowing you to just tell what type of input and output you would like to use, and Qt will set things up for you! You can also chain different models together for pipelines.

Disclaimer This API is in proof-of-concept stage and under active development, and not yet a part of the Qt framework. Hence, Qt's compatibility promise does not apply; the API can still change in breaking ways. But, it is also a great time to impact the direction it will take! For suggestions feel free to create a ticket in the Qt project's JIRA, please use the label "QtAiApi" so we can easily find them and collect them together.

How it works

When you declare a model in your code, Qt will infer from the given input and output type what backend it will set up for the model. The backends are implemented as QPlugins. Currently, the backends are:

Input type Output type Qt backend Description
Text|Image Text QtOllamaModel Uses ollama to load LLM models and communicate to them with ollama's REST API
Speech Text QtAsrModel Uses Whisper for Automatic Speech Recognition (ASR), or speech-to-text
Image Json QtTritonModel Uses Triton to load a model for object detection from images
Image Json QtYoloModel Uses a YOLO model for object detection from images
Text Speech QtTtsModel Uses QtTextToSpeech (QtSpeech) to convert text into speech
Text Speech QtPiperModel Uses Piper TTS model to convert text into speech
Text Image QtDiffuserModel Uses Diffusers to convert text into images

Note, the Qt backends expect the underlying backend implementation (ollama, Whisper...) to be running, and will not take care of starting them up for you. You need to start them yourself, e.g. in the case of QtOllamaModel, loading the intended model to ollama's memory by running:

ollama run <model>

Building the library

To build the API library, you need to have a Qt kit (6.7 or newer). Additional dependencies for specific plugins are:

  • QtSpeech additional library for QtTtsModel
  • OpenCV for QtTritonModel
  • QtMultimedia for QtYoloModel

In Qt Creator, open the library project by choosing the CMakeLists.txt file under qt-ai-inference-api/aimodel/, configure with your Qt kit and build the library. You can also choose the qt-ai-inference-api/CMakeLists.txt to build the whole project, including the example app that ships with the API. For an example on how to include the library project into your Qt app project, see the Qt AI App example app

The API

Currently, the API consists of one class, QAiModel being the C++ implementation, and MultiModal its QML counterpart. QAiModel inherits QObject. To use the QML API, use the following import statement in your QML code:

import qtaimodel

Properties

AiModelPrivateInterface::AiModelTypes type

A combination of AiModelType flags to tell what type of model to instantiate. Possible values are:

Name Value Description
InputText 0x00001 The model takes text as input
InputAudio 0x00002 The model takes speech as input
InputVideo 0x00004 The model takes video as input
InputImage 0x00008 The model takes image as input
InputJson 0x00010 The model takes JSON as input
OutputText 0x00100 The model outputs text
OutputAudio 0x00200 The model outputs speech
OutputVideo 0x00400 The model outputs video
OutputImage 0x00800 The model outputs image
OutputJson 0x01000 The model outputs JSON

For supported input-output combinations, see the table under "How it works" section.

Example:

import qtaimodel

MultiModal {
    // initiates a LLM model which takes text input, and produces text output
    type: MultiModal.InputText | MultiModal.OutputText
}
Write method: void setType(AiModelPrivateInterface::AiModelTypes)
Read method: AiModelPrivateInterface::AiModelTypes type()
Notifier signal: void typeChanged()

QString prompt

The prompt for the model. This can be used to provide a more persistent prompt that will be combined with anything provided to the model with pushData(). The prompt will be prepended to any data provided with pushData(). Note, only setting the prompt will not send anything to the underlying model; you need to use pushData() to trigger that.

Example:

import qtaimodel

MultiModal {
    id: model
    type: MultiModal.InputText | MultiModal.OutputText
    prompt: "Summarize the following text:"
}

function summarizeText() {
    model.pushData("Lorem ipsum")
    // The actual prompt sent to the underlying model will be "Summarize the following text: Lorem ipsum"
}
Write method: void setPrompt(QString)
Read method: QString prompt()
Notifier signal: void promptChanged()

QString model

Use to tell the underlying framework what specific model to use.

Example:

import qtaimodel

MultiModal {
    id: model
    // Runs a QtOllamaModel, where ollama uses deepseek-r1 model
    type: MultiModal.InputText | MultiModal.OutputText
    model: "deepseek-r1"
}
Write method: void setModel(QString)
Read method: QString model()
Notifier signal: void modelChanged()

QVariantList documents

Retrieval-Augmented Generation data to use for the model, if it supports it. RAG supports currently only chromadb, which should be running on background.

Example:

import qtaimodel

    MultiModal {
        id: llamaModel
        type: (MultiModal.InputText | MultiModal.OutputText)
        model: "llama3.2"
        prompt: "Which item has best armor bonus?"
        documents: ["Cloth of Authority | Armour Class +1",
              "Drunken Cloth |  Constitution +2 (up to 20)",
              "Icebite Robe | Resistance to Damage Types: Cold damage.",
              "Obsidian Laced Robe | Grants Resistance to Damage Types: Fire damage.",
              "Moon Devotion Robe | Advantage on Constitution  Saving throws.",
        ]
    }
Write method: void setDocuments(QVariantList)
Read method: QVariantList documents()
Notifier signal: void documentsChanged()

int seed

Seed to use with model prompts. Seed reduces randomness in model answers.

Example:

import qtaimodel

    MultiModal {
        id: llamaModel
        type: (MultiModal.InputText | MultiModal.OutputText)
        model: "gemma3"
        prompt: "Say hello?"
        seed: 3453654
    }
Write method: void setDocuments(QByteArray)
Read method: QByteArray documents()
Notifier signal: void documentsChanged()

QVector<QAiModel*> inputs

A list of models this model will use as its inputs. This allows for chaining models together to create pipelines. You can use the Optional flag with the model's type to tell whether it's an optional or mandatory input. For mandatory inputs, this model will not process any other inputs before the mandatory one has something to offer. For optional ones, other inputs will be processed regardless if that input has data available or not.

Example:

import qtaimodel

// The ASR model will convert speech to text and pass it to the LLM model. Its "optional" is set as true,
// so if the LLM model has other ways of receiving input, such as typing, it will not block processing
// those while waiting for the output of the ASR model.
MultiModal {
    id: asrModel
    type: MultiModal.InputAudio | MultiModal.OutputText
    optional: true
}

MultiModal {
    id: llmModel
    type: MultiModal.InputText | MultiModal.OutputText
    inputs: [asrModel]
}
Write method: void setInputs(QVector<QAiModel*>)
Read method: QVector<QAiModel*> inputs()
Notifier signal: void inputsChanged()

bool processing

Whether the model is currently processing a request.

Example:

import qtaimodel

MultiModal {
    id: model
    ...
}

BusyIndicator {
    running: model.processing
}
Write method: void setProcessing(bool)
Read method: bool processing()
Notifier signal: void processingChanged()

bool buffered

Whether the model should buffer the latest result for later use.

Write method: void setBuffered(bool)
Read method: bool buffered()
Notifier signal: void bufferedChanged()

bool optional

This boolean can be used when using the model as an input for another model, telling the other model(s) to treat this one as optional input. If this is set to true, the other model(s) will not wait for this model to produce output before processing its other inputs. If this is set to false, the other model(s) will wait for this model to produce output before processing its other inputs. Default is false. Has no effect if this model is not being used as input by any other model.

Write method: void setOptional(bool)
Read method: bool optional()
Notifier signal: void optionalChanged()

Methods

Q_INVOKABLE void pushData(QVariant data)

Push data to the model. The argument can be of any type supported by QVariant. The underlying QAiModel implementation will convert it based on its expected input type.

QVariant data - The data to push to the model. If the prompt property has been set, it will be prepended to the data before sending the final request to the underlying model.

Signals

void gotResult(QVariant result)

Emitted when the underlying model has finished processing and returns a result. The result is passed as QVariant, converted from the output type the underlying model provides.

Example:

import qtaimodel

MultiModal {
    id: model
    types: MultiModal.InputText | MultiModal.OutputText
    onGotResult: (result) => {
        someLabel.text = result
    }
}

Known issues

  • Currently the C++ API is not public, meaning the API is only usable from QML. This will change in a future patch.

Additional links

  • The example app
  • Qt Project JIRA - If you would like to leave ideas, suggestions or bug reports, please use the "QtAiApi" label so we can easily gather them!