Image Caption Generation by Filestack API in 2024

Image Caption

Have you ever considered how automated image captions could improve your app? Image caption generation uses smart technology to create text descriptions for images. This image caption technology is changing how we handle and use pictures in apps.

In this blog, we’ll discuss why image caption generation is important. We’ll explain what it is and how it can be used differently. We’ll also look at the benefits of using automated captions.

Next, we’ll show you the best way to add image captions to your app, focusing on Filestack, a top service for this task. You’ll learn how developers can easily integrate Filestack’s image caption generation into their web applications. This includes getting and testing the API key and the steps to set up the feature. Let’s begin. 

What is image caption generation?

Image caption generation creates text descriptions for images automatically. It uses computer vision to understand what’s in the image and natural language processing to write the description. 

Here’s how it works: 

👉First, a special kind of artificial intelligence (AI) called a convolutional neural network (CNN) looks at the image. This AI identifies important parts like objects, actions, and settings. 

👉It then turns this information into a format that another AI can understand. 

👉Next, this information goes to another type of AI, usually a recurrent neural network (RNN) or a transformer. This AI creates sentences based on what it learned from the image. 

The whole process is trained using many examples of images and their captions, so the AI learns how to match pictures with the right words.


Image caption generation has many uses, such as:

✔️Helping visually impaired people by describing images to them.

✔️Making it easier to find images with descriptive text in search engines.

✔️Automatically creating captions for photos on social media.

✔️Describing products in online stores for better searches.

✔️Organizing and finding pictures in large collections.

✔️Creating detailed educational materials for learning.

Applications of images caption

What are the benefits of automated image caption generation in your apps?

Automated image caption generation offers many benefits for apps. 

  • It helps visually impaired users by describing images to make content more accessible. 
  • It also improves search functions, allowing users to find images easily through descriptive captions. 
  • This technology saves time and effort by automatically creating captions. This is helpful for businesses with many images, like online stores. 
  • It also boosts social media engagement by making posts more informative and attractive. 
  • For apps with large image libraries, it makes organizing and finding images much easier.

What is the best method to generate image captions in your app?

The easiest and fastest method to generate image captions within your app is using the APIs. You will find many APIs in the market. However, Filestack is one of the easiest and most reliable APIs. Let’s explore it. 


Filestack’s Image Captioning service provides automatic image descriptions using the Processing API. It operates synchronously and requires a security policy and signature for usage. 

Responses return captions like “a close up of a bird.” The service can be integrated into workflows to trigger tasks based on the captions. 

Here are the examples of API calls:

👉Get the caption of your uploaded image with its handle<POLICY>,s:<SIGNATURE>/caption/<HANDLE> 

👉Use caption in a chain with other tasks, e.g., resize<POLICY>,s:<SIGNATURE>/resize=h:1000/caption/<HANDLE> 

👉Use caption with an external URL<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/<EXTERNAL_URL> 

👉Use caption with Storage Aliases<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/src://<STORAGE_ALIAS>/<PATH_TO_FILE> 


How do developers integrate the Filestack image caption generation into web applications?

Follow the steps below to integrate the Filestack image caption generation into the web applications. 

Getting the API key

  • Visit the Filestack website. Click on the Sign-Up Free button and create an account. 
  • Log in to your account and get the API key in the top right corner.
  • Once you get the API key, let’s move to testing the API. 

Testing the API

Take the below URL:<FILESTACK_API_KEY>/security=p:<POLICY>,s:<SIGNATURE>/caption/<EXTERNAL_URL>  

Add the Filestack API Key, Policy, and Signature. 

Next, open the Postman and create a new  GET request. 

New GET request

Next, click on the Send button after adding the required parameters. Suppose we add the image URL below in place of the external URL parameter. The response generated is given below:

Target image

Response generated

If we get the response, our API is working fine. Let’s move to building a sample app:

Integrating the Filestack image caption generator

Here are the steps to follow: 

Step 1: Setting Up the HTML Structure

First, we define the basic structure of our HTML document:

<!DOCTYPE html>

<html lang="en">


  <meta charset="UTF-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>Image Caption Generator</title>


    /* CSS styles will go here */




  <h1>Image Caption Generator by Filestack</h1>

  <button onclick="openPicker()">Upload Image</button>

  <p id="captionDisplay"></p>

  <button id="copyButton" onclick="copyCaption()">Copy Caption</button>

  <p id="copyMessage">Caption copied to clipboard!</p>

  <script src=""></script>


    /* JavaScript code will go here */



  • We start with the `<!DOCTYPE html>` declaration to specify the HTML5 document type.
  • The `<html>` tag includes the `lang` attribute set to “en” for English.
  • Inside the `<head>` tag, we set the character encoding to UTF-8 and make the page responsive with the `viewport` meta tag.
  • The `<title>` tag sets the title of the web page.
  • Inside the `<body>`, we add a header (`<h1>`) for the page title, a button to upload images, a paragraph to display the caption, another button to copy the caption, and a message indicating the caption was copied.
  • We include the Filestack JavaScript library with a `<script>` tag.

Step 2: Adding CSS Styles

Next, we add some CSS to style our web page:


  body {

    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;

    background: linear-gradient(135deg, #f0f0f0, #f8f9fa);

    color: white;

    text-align: center;

    padding: 50px;

    margin: 0;

    background-image: url('');

    background-repeat: no-repeat;

    background-size: cover;


  h1 {

    color: white;

    font-size: 2.5em;

    margin-bottom: 20px;


  button {

    padding: 12px 25px;

    font-size: 16px;

    color: #fff;

    background-color: #3498db;

    border: none;

    border-radius: 5px;

    cursor: pointer;

    transition: background-color 0.3s ease;


  button:hover {

    background-color: #2980b9;


  #captionDisplay {

    margin-top: 30px;

    font-size: 18px;

    color: #555;

    padding: 15px;

    border: 1px solid #ddd;

    border-radius: 10px;

    background: #fff;

    display: inline-block;

    max-width: 500px;

    position: relative;


  #copyButton {

    margin-top: 20px;

    padding: 10px 20px;

    font-size: 16px;

    color: #fff;

    background-color: #27ae60;

    border: none;

    border-radius: 5px;

    cursor: pointer;

    transition: background-color 0.3s ease;

    display: none;


  #copyButton:hover {

    background-color: #219150;


  #copyMessage {

    display: none;

    color: #27ae60;

    margin-top: 10px;


  • We style the body with a gradient background and set the font family.
  • Centered the text, added padding, and set a background image.
  • We style the header to have a larger font size and white color.
  • Styled the buttons with padding, font size, background color, and hover effects.
  • We style the caption display area with padding, border, background color, and other properties.
  • The copy button and message are initially hidden and styled to match the theme.

Step 3: Adding JavaScript for Functionality

Finally, we add JavaScript to handle image uploads, generate captions, and copy captions to the clipboard:

<script src=""></script>


  const apiKey = 'AddYourAPIKey'; // Replace with your Filestack API key

  const client = filestack.init(apiKey);

  function openPicker() {


      onUploadDone: (result) => {

        const file = result.filesUploaded[0];





  function generateCaption(handle) {

    const policy = 'AddYourPolicy'; // Replace with your Filestack policy

    const signature = 'AddSignatureHere'; // Replace with your Filestack signature

    const captionUrl = `${policy},s:${signature}/caption/${handle}`;


      .then(res => res.json())

      .then(data => {

        document.getElementById('captionDisplay').innerText = data.caption;

        document.getElementById('copyButton').style.display = 'inline-block';


      .catch(error => {

        console.error('Error generating caption:', error);

        document.getElementById('captionDisplay').innerText = 'Error generating caption.';



  function copyCaption() {

    const captionText = document.getElementById('captionDisplay').innerText;

    navigator.clipboard.writeText(captionText).then(() => {

      const copyMessage = document.getElementById('copyMessage'); = 'block';

      setTimeout(() => { = 'none';

      }, 2000);

    }).catch(error => {

      console.error('Error copying caption:', error);





  • We initialize the Filestack client with our API key.
  • The `openPicker` function opens the Filestack picker and calls `generateCaption` with the uploaded file’s handle.
  • The `generateCaption` function constructs the URL for the Filestack caption API, fetches the caption, and displays it. If an error occurs, it logs the error and displays an error message.
  • The `copyCaption` function copies the displayed caption to the clipboard and shows a message indicating the caption was copied successfully.

Get the complete code here: 


When you run the app, you get the below web page where you upload the image and generate captions: 

Output 1

Output 2

Output 3

Output 4

Image caption Output copied


What is a caption for an image?

A caption for an image is a short description that explains what is happening in the picture. For example, a photo of a sunset over the ocean might have the caption “Sunset over the ocean with orange and pink colours.” 

Captions give context, highlight important details, or share emotions. Social media, news, and advertisements often use them to help people understand and connect with the image.

What is the best image caption tool?

The best image caption tool is Filestack’s Image Caption Generator. It creates accurate and meaningful captions quickly. 

For example, uploading a photo of a city street might say, “A busy city street during the day.” Filestack is easy to use and works well with different apps. It’s perfect for anyone needing fast and reliable image captions.

How do you generate image captions?

Generating image captions involves using AI models that look at an image and describe it with a sentence. In online shopping, you can upload a product image, and the tool might say, “A red handbag with leather straps,”. Hence making it easier to understand what the product is.

Is it secure to generate image captions via Filestack?

Using Filestack to generate image captions is safe. Filestack uses strong encryption to protect your data when you upload and process images. 

For example, images are sent over a secure HTTPS connection. You can also set security rules to control who can see or edit your files. This means you have extra protection when generating image captions with Filestack.


Sign Up for free at Filestack to generate meaningful captions for your images. 


Read More →