Have you ever considered how automated image captions could improve your app? Image caption generation uses smart technology to create text descriptions for images. This image caption technology is changing how we handle and use pictures in apps.
In this blog, we’ll discuss why image caption generation is important. We’ll explain what it is and how it can be used differently. We’ll also look at the benefits of using automated captions.
Next, we’ll show you the best way to add image captions to your app, focusing on Filestack, a top service for this task. You’ll learn how developers can easily integrate Filestack’s image caption generation into their web applications. This includes getting and testing the API key and the steps to set up the feature. Let’s begin.
Image caption generation creates text descriptions for images automatically. It uses computer vision to understand what’s in the image and natural language processing to write the description.
Here’s how it works:
👉First, a special kind of artificial intelligence (AI) called a convolutional neural network (CNN) looks at the image. This AI identifies important parts like objects, actions, and settings.
👉It then turns this information into a format that another AI can understand.
👉Next, this information goes to another type of AI, usually a recurrent neural network (RNN) or a transformer. This AI creates sentences based on what it learned from the image.
The whole process is trained using many examples of images and their captions, so the AI learns how to match pictures with the right words.
Applications
Image caption generation has many uses, such as:
✔️Helping visually impaired people by describing images to them.
✔️Making it easier to find images with descriptive text in search engines.
✔️Automatically creating captions for photos on social media.
✔️Describing products in online stores for better searches.
✔️Organizing and finding pictures in large collections.
✔️Creating detailed educational materials for learning.
Automated image caption generation offers many benefits for apps.
- It helps visually impaired users by describing images to make content more accessible.
- It also improves search functions, allowing users to find images easily through descriptive captions.
- This technology saves time and effort by automatically creating captions. This is helpful for businesses with many images, like online stores.
- It also boosts social media engagement by making posts more informative and attractive.
- For apps with large image libraries, it makes organizing and finding images much easier.
The easiest and fastest method to generate image captions within your app is using the APIs. You will find many APIs in the market. However, Filestack is one of the easiest and most reliable APIs. Let’s explore it.
Filestack
Filestack’s Image Captioning service provides automatic image descriptions using the Processing API. It operates synchronously and requires a security policy and signature for usage.
Responses return captions like “a close up of a bird.” The service can be integrated into workflows to trigger tasks based on the captions.
Here are the examples of API calls:
👉Get the caption of your uploaded image with its handle
https://cdn.filestackcontent.com/security=p:<POLICY>,s:<SIGNATURE>/caption/<HANDLE>
👉Use caption in a chain with other tasks, e.g., resize
https://cdn.filestackcontent.com/security=p:<POLICY>,s:<SIGNATURE>/resize=h:1000/caption/<HANDLE>
👉Use caption with an external URL
https://cdn.filestackcontent.com/<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/<EXTERNAL_URL>
👉Use caption with Storage Aliases
https://cdn.filestackcontent.com/<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/src://<STORAGE_ALIAS>/<PATH_TO_FILE>
Follow the steps below to integrate the Filestack image caption generation into the web applications.
Getting the API key
- Visit the Filestack website. Click on the Sign-Up Free button and create an account.
- Log in to your account and get the API key in the top right corner.
- Once you get the API key, let’s move to testing the API.
Testing the API
Take the below URL:
https://cdn.filestackcontent.com/<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/<EXTERNAL_URL>
Add the Filestack API Key, Policy, and Signature.
Next, open the Postman and create a new GET request.
Next, click on the Send button after adding the required parameters. Suppose we add the image URL below in place of the external URL parameter. The response generated is given below:
If we get the response, our API is working fine. Let’s move to building a sample app:
Here are the steps to follow:
Step 1: Setting Up the HTML Structure
First, we define the basic structure of our HTML document:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Image Caption Generator</title>
<style>
/* CSS styles will go here */
</style>
</head>
<body>
<h1>Image Caption Generator by Filestack</h1>
<button onclick="openPicker()">Upload Image</button>
<p id="captionDisplay"></p>
<button id="copyButton" onclick="copyCaption()">Copy Caption</button>
<p id="copyMessage">Caption copied to clipboard!</p>
<script src="https://static.filestackapi.com/filestack-js/3.x.x/filestack.min.js"></script>
<script>
/* JavaScript code will go here */
</script>
</body>
</html>
- We start with the `<!DOCTYPE html>` declaration to specify the HTML5 document type.
- The `<html>` tag includes the `lang` attribute set to “en” for English.
- Inside the `<head>` tag, we set the character encoding to UTF-8 and make the page responsive with the `viewport` meta tag.
- The `<title>` tag sets the title of the web page.
- Inside the `<body>`, we add a header (`<h1>`) for the page title, a button to upload images, a paragraph to display the caption, another button to copy the caption, and a message indicating the caption was copied.
- We include the Filestack JavaScript library with a `<script>` tag.
Step 2: Adding CSS Styles
Next, we add some CSS to style our web page:
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
background: linear-gradient(135deg, #f0f0f0, #f8f9fa);
color: white;
text-align: center;
padding: 50px;
margin: 0;
background-image: url('https://img.freepik.com/free-photo/yellow-watercolor-paper_95678-446.jpg?size=626&ext=jpg&ga=GA1.1.2082370165.1717977600&semt=ais_user');
background-repeat: no-repeat;
background-size: cover;
}
h1 {
color: white;
font-size: 2.5em;
margin-bottom: 20px;
}
button {
padding: 12px 25px;
font-size: 16px;
color: #fff;
background-color: #3498db;
border: none;
border-radius: 5px;
cursor: pointer;
transition: background-color 0.3s ease;
}
button:hover {
background-color: #2980b9;
}
#captionDisplay {
margin-top: 30px;
font-size: 18px;
color: #555;
padding: 15px;
border: 1px solid #ddd;
border-radius: 10px;
background: #fff;
display: inline-block;
max-width: 500px;
position: relative;
}
#copyButton {
margin-top: 20px;
padding: 10px 20px;
font-size: 16px;
color: #fff;
background-color: #27ae60;
border: none;
border-radius: 5px;
cursor: pointer;
transition: background-color 0.3s ease;
display: none;
}
#copyButton:hover {
background-color: #219150;
}
#copyMessage {
display: none;
color: #27ae60;
margin-top: 10px;
}
</style>
- We style the body with a gradient background and set the font family.
- Centered the text, added padding, and set a background image.
- We style the header to have a larger font size and white color.
- Styled the buttons with padding, font size, background color, and hover effects.
- We style the caption display area with padding, border, background color, and other properties.
- The copy button and message are initially hidden and styled to match the theme.
Step 3: Adding JavaScript for Functionality
Finally, we add JavaScript to handle image uploads, generate captions, and copy captions to the clipboard:
<script src="https://static.filestackapi.com/filestack-js/3.x.x/filestack.min.js"></script>
<script>
const apiKey = 'AddYourAPIKey'; // Replace with your Filestack API key
const client = filestack.init(apiKey);
function openPicker() {
client.picker({
onUploadDone: (result) => {
const file = result.filesUploaded[0];
generateCaption(file.handle);
}
}).open();
}
function generateCaption(handle) {
const policy = 'AddYourPolicy'; // Replace with your Filestack policy
const signature = 'AddSignatureHere'; // Replace with your Filestack signature
const captionUrl = `https://cdn.filestackcontent.com/security=p:${policy},s:${signature}/caption/${handle}`;
fetch(captionUrl)
.then(res => res.json())
.then(data => {
document.getElementById('captionDisplay').innerText = data.caption;
document.getElementById('copyButton').style.display = 'inline-block';
})
.catch(error => {
console.error('Error generating caption:', error);
document.getElementById('captionDisplay').innerText = 'Error generating caption.';
});
}
function copyCaption() {
const captionText = document.getElementById('captionDisplay').innerText;
navigator.clipboard.writeText(captionText).then(() => {
const copyMessage = document.getElementById('copyMessage');
copyMessage.style.display = 'block';
setTimeout(() => {
copyMessage.style.display = 'none';
}, 2000);
}).catch(error => {
console.error('Error copying caption:', error);
});
}
</script>
- We initialize the Filestack client with our API key.
- The `openPicker` function opens the Filestack picker and calls `generateCaption` with the uploaded file’s handle.
- The `generateCaption` function constructs the URL for the Filestack caption API, fetches the caption, and displays it. If an error occurs, it logs the error and displays an error message.
- The `copyCaption` function copies the displayed caption to the clipboard and shows a message indicating the caption was copied successfully.
Get the complete code here: https://github.com/devayesha23/ImageCaptionGenerator
Output
When you run the app, you get the below web page where you upload the image and generate captions:
FAQs
A caption for an image is a short description that explains what is happening in the picture. For example, a photo of a sunset over the ocean might have the caption “Sunset over the ocean with orange and pink colours.”
Captions give context, highlight important details, or share emotions. Social media, news, and advertisements often use them to help people understand and connect with the image.
The best image caption tool is Filestack’s Image Caption Generator. It creates accurate and meaningful captions quickly.
For example, uploading a photo of a city street might say, “A busy city street during the day.” Filestack is easy to use and works well with different apps. It’s perfect for anyone needing fast and reliable image captions.
Generating image captions involves using AI models that look at an image and describe it with a sentence. In online shopping, you can upload a product image, and the tool might say, “A red handbag with leather straps,”. Hence making it easier to understand what the product is.
Using Filestack to generate image captions is safe. Filestack uses strong encryption to protect your data when you upload and process images.
For example, images are sent over a secure HTTPS connection. You can also set security rules to control who can see or edit your files. This means you have extra protection when generating image captions with Filestack.
Sign Up for free at Filestack to generate meaningful captions for your images.
Ayesha Zahra is a Geo Informatics Engineer with hands-on experience in web development (both frontend & backend). Also, she is a technical writer, a passionate programmer, and a video editor. She is always looking for opportunities to excel in her skills & build a strong career.