This guide covers the exact setup — what to install, how to structure prompts, what went wrong first, and what actually moved the numbers.
What Clawdbot does in this workflow
Clawdbot is an open-source AI agent that runs on your machine. It has persistent memory, file access, and can use real tools — image generation APIs, social media APIs, your codebase.
For TikTok, the agent:
Generates slideshow images via an image API (OpenAI, Fal.ai, etc.)
Adds text overlays programmatically
Writes captions and picks hashtags
Uploads everything as a draft to TikTok via Postiz API
Logs what worked and what flopped
You add the trending sound and hit publish. That's your 60 seconds.
Setup
Hardware: Any machine running Linux. Old laptop, spare desktop, cheap VPS, Raspberry Pi. Minimum 2GB RAM, 20GB storage. No GPU needed — image generation happens via API.
Software:
Connect Clawdbot to your messaging app (WhatsApp, Telegram, Discord), give it API keys, done.
The format: photo carousels
TikTok is pushing photo content hard in 2026. Their data shows slideshows get roughly 3x more comments, 2x more likes, and 2.5x more shares compared to video.
3× more comments. 2× more likes. 2.5× more shares. Slideshows vs video.
Each slideshow:
Image generation
If your product shows exam answers, solution breakdowns, or cheat sheets, the TikTok images should match what users actually see in the app. No bait and switch.
The consistency problem
Showing the same exam question across 6 slides — with the AI solution being revealed step by step — means every slide needs to look like the same base with only the answer layer changing. Without careful prompting, you get 6 completely different papers. Different handwriting in slide 1, different question layout in slide 2, wrong subject in slide 3.
Fix: lock the base, change only what's being revealed. Write one hyper-detailed description of the exam sheet — paper type, question format, camera angle, lighting, pen style. Copy-paste it into every prompt. Only the answer overlay changes between slides.
Vague — different scene every time
"A student doing an exam"
Specific — same scene every time
"Close-up of college-ruled exam paper on wooden desk, shot from 45° above, blue ballpoint pen resting on right side, single overhead fluorescent light, question text in size-12 Times New Roman, white margins visible"
Be obsessively specific about the base. Only vary what the AI is revealing (the answer, the formula, the cheat code).
Local generation doesn't work yet
Tempting if you have a GPU, but the photorealism gap between Stable Diffusion and gpt-image-1.5 is large for this use case. Local images look AI-generated. API images with prompts like "iPhone photo" and "realistic lighting" look like phone photos.
API cost
~$0.50 per 6-slide post · ~$0.25 with batch API
Text overlays: three things that go wrong immediately
Font too small
Minimum 6.5% of image width. What looks fine on desktop is unreadable on a phone.
Text too high
TikTok's status bar covers the top ~15% of the screen. Hook text at the top of the image = hidden. Position it in the upper-middle zone.
Text gets compressed
Long lines exceed max width and canvas rendering squashes text horizontally. Set a max character count per line and force wrapping.
Put these rules in the skill file. The agent will get them wrong at first. You'll catch it on your phone. Update the rules. That's the loop.
Hooks
Everything else can be perfect — images, text, caption — and you still get 200 views if the hook is wrong.
What flopped
Feature-focused
"AI that solves any exam question instantly"
under 1K views
Self-focused
"Why I always blank on multiple choice"
under 1K views
Tool-focused
"The app that got me from a C to an A"
under 3K views
What hit
"My professor said AI can't help you on this exam so I showed him what happened"
234K views
"I showed my study group what AI does when you give it a closed-book test"
167K views
The formula: [Another person] + [conflict or doubt] → showed them [thing] → they changed their mind.
Every hook needs another person and a conflict. If there's no "other person," the hook probably won't work. The viewer pictures the reaction before they even swipe. That's what drives engagement.
Train the agent to brainstorm hooks using this formula. After 20–30 posts, it has real performance data to reference.
Posting workflow
Agent generates 6 images + text overlay on slide 1
Agent writes caption (story-style, product mention, max 5 hashtags)
Agent uploads as draft to TikTok via Postiz API with privacy_level: "SELF_ONLY"
Agent sends you the caption (draft API doesn't support captions in the upload)
You open TikTok, pick a trending sound, paste caption, publish
Step 5 is the 60 seconds. Everything else runs on a cron job at peak times. Drafts instead of direct publish because music matters on TikTok and you can't add it via API. Trending sounds change constantly. This piece still needs a human.
Skill files
Markdown documents that teach the agent how to do the job. This is where the leverage is. Write it like you're onboarding a sharp new hire who has zero context about your product. Include examples.
Your TikTok skill file should contain:
Image dimensions and format specs
Prompt template with the locked base description
Text overlay rules (font size, position, max line length, colors)
Caption structure and hashtag strategy
Hook formulas that work in your niche
Hook formulas that flopped
Post schedule and peak times
Every mistake logged as a rule
This file gets rewritten 15–20 times in the first week. Every failure becomes a rule. Every success becomes a formula. The agent compounds. After a week or two it has more pattern data on your niche than you could hold in your head.
Memory files
Separate from skill files. The agent logs every post (date, hook, view count, engagement), what worked and why, what flopped and why, trending formats or sounds, and competitor hooks that performed well.
When you plan content, the agent pulls from actual performance data instead of guessing. Batch your planning: brainstorm 10–15 hooks with the agent, pick the best ones, set up the schedule. Agent pre-generates everything overnight using batch API. By morning, a full day of content is queued.
What to expect
First 5–10 posts will be bad. Wrong image sizes, unreadable text, hooks nobody clicks. Normal.
The system works because it compounds. Each bad post teaches the agent something.
By post 20–30, you have a content engine that consistently produces 50K–100K view posts with occasional breakouts above 200K.
Cost per post
~$0.50
Your time per post
60 seconds
Starting cadence
3–6 posts/day
Checklist
Machine running Linux
Clawdbot installed, connected to messaging app
Image generation API key
Postiz account with TikTok connected
Skill file: image specs, prompt templates, text overlay rules, hook formulas
Memory file initialized for logging
First batch of posts generated and uploaded as drafts
Music added, captions pasted, published
Performance logged, skill file updated
Foundation
Haven't set up Clawdbot yet?
OpenClaw + Telegram + Claude. Takes ~20 minutes.






