In this tutorial, we’ll build a modern web application that converts images to detailed text descriptions using Claude’s vision capabilities. We’ll leverage the power of Next.js 15’s App Router and the Vercel AI SDK to create a responsive, real-time streaming application.
Prerequisites
- Basic knowledge of React and TypeScript
- Node.js installed on your machine
- An Anthropic API key
- Familiarity with Next.js (basic)
Project Setup
First, let’s create a new Next.js project with TypeScript and Tailwind CSS support:
npx create-next-app@latest image-to-text --typescript --tailwind --app
cd image-to-text
Install the required dependencies:
npm install ai @anthropic-ai/sdk
Building the API Endpoint
Create a new API route at app/api/chat/route.ts
. This endpoint will handle image uploads and communicate with Claude:
import { AnthropicStream, StreamingTextResponse } from 'ai';
import { Anthropic } from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
export const runtime = 'edge';
export async function POST(req: Request) {
const { image } = await req.json();
const response = await anthropic.messages.create({
model: 'claude-3-haiku-20240307',
max_tokens: 1024,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/jpeg',
data: image.split(',')[1]
}
},
{
type: 'text',
text: 'Describe this image in detail.'
}
]
}
]
});
const stream = AnthropicStream(response);
return new StreamingTextResponse(stream);
}
Creating the Frontend Interface
Replace the contents of app/page.tsx
with a responsive UI that handles image uploads and displays Claude’s responses:
'use client';
import { useState } from 'react';
import { useChat } from 'ai/react';
export default function ImageToText() {
const [image, setImage] = useState<string | null>(null);
const { messages, setMessages, isLoading, append } = useChat();
const handleImageUpload = (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (file) {
const reader = new FileReader();
reader.onload = (e) => {
const base64 = e.target?.result as string;
setImage(base64);
};
reader.readAsDataURL(file);
}
};
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!image) return;
await append({
role: 'user',
content: 'Analyze this image',
id: Date.now().toString(),
}, {
data: { image }
});
};
return (
<div className="max-w-2xl mx-auto p-4">
<h1 className="text-2xl font-bold mb-4">Image to Text Converter</h1>
<form onSubmit={handleSubmit} className="space-y-4">
<div className="space-y-2">
<label className="block">
<span className="text-gray-700">Upload Image</span>
<input
type="file"
accept="image/*"
onChange={handleImageUpload}
className="mt-1 block w-full"
/>
</label>
</div>
{image && (
<div className="mt-4">
<img
src={image}
alt="Uploaded preview"
className="max-w-md mx-auto rounded"
/>
</div>
)}
<button
type="submit"
disabled={!image || isLoading}
className="px-4 py-2 bg-blue-500 text-white rounded disabled:opacity-50"
>
{isLoading ? 'Analyzing...' : 'Analyze Image'}
</button>
</form>
<div className="mt-8 space-y-4">
{messages.map((message) => (
<div
key={message.id}
className={`p-4 rounded ${
message.role === 'assistant'
? 'bg-gray-100'
: 'bg-blue-100'
}`}
>
{message.content}
</div>
))}
</div>
</div>
);
}
Deployment
- Create a
.env.local
file:
ANTHROPIC_API_KEY=your_api_key_here
- Deploy to Vercel:
vercel deploy
Conclusion
We’ve built a modern, responsive image-to-text converter using cutting-edge technologies. The application demonstrates the power of combining Claude’s vision capabilities with Next.js and the Vercel AI SDK. This foundation can be extended to build more complex AI-powered image analysis tools.