Send a message (Claude compatible)

post

https://api.ainft.com/v1/messages

Accepts a list of messages and returns a model-generated response. Supports both single-turn and multi-turn conversations. Authenticate via x-api-key header or Bearer token. Responses can be streamed (SSE) or returned as a single JSON object.

Recent Requests

Time	Status	User Agent
Retrieving recent requests…

Loading…

Body Params

Anthropic Messages API compatible request format. See https://docs.anthropic.com/en/api/messages

model

string

required

ID of the model to use (e.g., claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5).

max_tokens

integer

required

The maximum number of tokens to generate before stopping. Note that our models may stop before reaching this maximum. This parameter only specifies the absolute maximum number of tokens to generate. Different models have different maximum values for this parameter.

messages

array of objects

required

Input messages. Our models are trained to operate on alternating user and assistant conversational turns. When creating a new Message, you specify the prior conversational turns with the messages parameter, and the model then generates the next Message in the conversation. There is a limit of 100,000 messages in a single request.

messages*

system

System prompt. A system prompt is a way of providing context and instructions to Claude, such as specifying a particular goal or role.

System prompt as a plain string.

stream

boolean

Defaults to false

Whether to incrementally stream the response using server-sent events. Default false.

temperature

number

0 to 1

Defaults to 1

Amount of randomness injected into the response. Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. Note that even with temperature of 0.0, the results will not be fully deterministic.

top_p

number

Nucleus sampling: consider tokens with top_p probability mass. Default 1.

top_k

integer

Only sample from the top K options for each subsequent token. Default disabled.

stop_sequences

array of strings

Custom text sequences that will cause the model to stop generating. Our models will normally stop when they have naturally completed their turn, which will result in a response stop_reason of 'end_turn'. If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences parameter.

stop_sequences

metadata

object

An object describing metadata about the request.

thinking

Configuration for enabling Claude's extended thinking. When enabled, responses include thinking content blocks showing Claude's thinking process before the final answer. Requires a minimum budget of 1,024 tokens and counts towards your max_tokens limit.

tools

array of objects

Definitions of tools that the model may use. If you include tools in your API request, the model may return tool_use content blocks that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using tool_result content blocks.

tools

tool_choice

How the model should use the provided tools. The model can use a specific tool, any available tool, decide by itself, or not use tools at all.

Headers

string

enum

Defaults to application/json

Generated from available response content types

Allowed:

Responses

200Success. Returns SSE stream with Anthropic-compatible event format when streaming, or JSON response when not streaming.

400Bad Request - invalid parameters, malformed body, or invalid request

401Unauthorized - invalid or missing API key

403Forbidden - access denied, insufficient quota, or model access restricted

429Too Many Requests - rate limit exceeded

500Internal Server Error

502Bad Gateway - upstream service error

503Service Unavailable - overloaded or no available channel