PhotoFlow SEO Pro: AI Model Integration Guide
Overviewโ
PhotoFlow SEO Pro uses OpenAI GPT-4o (vision-capable model) via OpenRouter API to generate accurate alt text and captions for product images. This guide covers the complete integration setup.
๐ฏ AI Model Configurationโ
Current Model: openai/gpt-4oโ
Why GPT-4o?
- Vision Capabilities: Can analyze actual image content via
image_urlAPI - Accuracy: Generates descriptions based on what's visually present, not just context
- E-commerce Optimized: Perfect for product photos, galleries, and catalog images
- SEO-Friendly: Produces structured alt text and engaging captions
Server Configurationโ
# Environment Variables
OPENROUTER_API_KEY=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
AI_MODEL=openai/gpt-4o
AI_TIMEOUT_MS=15000
AI_RATE_LIMIT_MAX=20
AI_RATE_LIMIT_WINDOW_MS=60000
API Endpointโ
POST /api/ai/caption
Request:
{
"url": "https://example.com/product-image.jpg",
"context": {
"title": "Running Shoes"
},
"licenseKey": "pswp_demo_abcd1234"
}
Response:
{
"alt": "Red and black athletic shoe with white swoosh logo, displayed against solid red background",
"caption": "Striking red and black athletic shoe with sleek design, set against bold red backdrop"
}
๐ง Integration Instructionsโ
1. Client-Side Setupโ
import { createAiSeoPlugin } from 'photoswipe-pro';
const aiPlugin = createAiSeoPlugin({
baseUrl: '/api/ai',
licenseKey: 'your-license-key',
onSchema: (schema) => {
// Inject ImageObject schema for SEO
const script = document.createElement('script');
script.type = 'application/ld+json';
script.textContent = JSON.stringify(schema);
document.head.appendChild(script);
}
});
2. Server-Side Setupโ
// server/ai/router.js
import express from 'express';
import fetch from 'node-fetch';
const AI_MODEL = process.env.AI_MODEL || 'openai/gpt-4o';
const AI_TIMEOUT_MS = parseInt(process.env.AI_TIMEOUT_MS || '15000', 10);
async function callOpenRouter({ url, context }) {
const sys = 'You are an SEO and accessibility expert. Analyze ONLY the actual visual content of the image. Describe what you actually see in the image, not what the context suggests.';
const user = `Analyze this image and describe ONLY what you actually see. Ignore any context that doesn't match the visual content.
Context (ignore if it doesn't match the image): ${JSON.stringify(context)}
Provide:
ALT: [accurate description of what is visually present]
CAPTION: [engaging description based on what you actually see]`;
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`
},
body: JSON.stringify({
model: AI_MODEL,
messages: [
{ role: 'system', content: sys },
{
role: 'user',
content: [
{ type: 'text', text: user },
{ type: 'image_url', image_url: { url } }
]
}
],
max_tokens: 256,
temperature: 0.3
})
});
const data = await response.json();
return parseAiResponse(data.choices[0].message.content);
}
๐จ Prompt Engineeringโ
System Promptโ
You are an SEO and accessibility expert. Analyze ONLY the actual visual content of the image. Describe what you actually see in the image, not what the context suggests. If the context mentions products that are not visible in the image, ignore the context completely. Focus on describing the real visual elements: landscapes, objects, colors, composition, etc.
User Prompt Templateโ
Analyze this image and describe ONLY what you actually see. Ignore any context that doesn't match the visual content.
Context (ignore if it doesn't match the image): {context}
Provide:
ALT: [accurate description of what is visually present]
CAPTION: [engaging description based on what you actually see]
Key Improvementsโ
- Visual-First: Instructs AI to analyze actual image content
- Context Ignore: Ignores misleading product titles when they don't match the image
- Structured Output: Forces consistent ALT/CAPTION format
- E-commerce Focus: Optimized for product photography
๐ Performance & Limitsโ
Rate Limitingโ
- 20 requests per minute per IP address
- 15-second timeout for vision processing
- Graceful degradation to mock data if AI unavailable
Model Specificationsโ
- Model:
openai/gpt-4o - Max Tokens: 256
- Temperature: 0.3 (consistent, factual output)
- Vision Support: Full image analysis via
image_urlcontent type
๐ Security & Privacyโ
Data Handlingโ
- No image storage: Images processed via URL only
- No PII collection: Only image URLs and basic context
- Rate limiting: Prevents abuse and controls costs
- License validation: All requests require valid license keys
Error Handlingโ
// Graceful fallback to mock data
try {
const result = await aiPlugin({ slide, licenseKey });
return result;
} catch (error) {
console.warn('AI unavailable, using mock data:', error.message);
return { alt: 'Product image', caption: 'Product description' };
}
๐ Deployment Checklistโ
Environment Setupโ
-
OPENROUTER_API_KEYconfigured -
AI_MODEL=openai/gpt-4oset -
AI_TIMEOUT_MS=15000configured - Rate limiting enabled
Testingโ
- Vision model responds to image URLs
- Alt text accurately describes visual content
- Context is ignored when misleading
- Fallback to mock data works
- Rate limiting functions correctly
Monitoringโ
- AI response times < 15 seconds
- Success rate > 95%
- Error logging configured
- Cost monitoring enabled
๐ Resultsโ
Before (Generic Model)โ
- Alt Text: "Running shoes โ example caption"
- Issue: Generic, not based on actual image content
After (Vision Model)โ
- Alt Text: "Red and black athletic shoe with white swoosh logo, displayed against solid red background"
- Caption: "Striking red and black athletic shoe with sleek design, set against bold red backdrop"
- Result: Accurate, SEO-optimized descriptions based on actual visual content
๐ Migration Guideโ
From GPT-4o-mini to GPT-4oโ
- Update
AI_MODEL=openai/gpt-4o - Increase
AI_TIMEOUT_MS=15000 - Update API calls to use
image_urlcontent type - Test with actual product images
- Update documentation and examples
Ready to deploy? The Pro package includes all necessary AI integration code and the server is configured with the vision-capable model for accurate e-commerce image analysis.