In this note we are going to understand how to create subtitles automatically with Kdenlive.
Kdenlive is an open source video editor. It is free to use and download. It is very complete, and it has a great variety of options to work with audio and video.
All in all it is a very good option to develop our skills in the audiovisual world.
This software can be installed on different operating systems. I am writing this using Kdenlive on a Linux machine. But if you take into account some specific settings, especially regarding the terminal commands, the information will also be useful if you use other systems.
What is our goal today?
The idea is to automate the creation of subtitles.
But more importantly, the goal is to be able to have our own tool for converting spoken audio to text.
Effectively what we are going to do is what is commonly known as “convert audio to text”.
Getting an audio to text conversion done automatically has incredible advantages. For starters, subtitles improve the accessibility of the videos we create.
But that’s not all. We can use this tool to create dictations, transforming our voice into material that we can share online or develop later in a text editor. Or it can be used, for example, to take quick notes while speaking, and then take them back to text to read them later.
So without further introduction, let’s review what we need to accomplish this goal.
How to create subtitles automatically with Kdenlive
Even if we install Kdenlive, we still won’t be able to use this functionality without fulfilling a couple of requirements first.
The editor manages to do the transcription by using tools from the Python repository.
The interesting thing is that it does all this work “behind the scenes”. We don’t need to know Python programming to create the subtitles. That’s why I’m going to try to make this as straightforward as possible.
What we need to do first is…
Install Python and its necessary modules in Linux.
Let’s open the terminal.
First we need to install a Python3 interpreter on our computer. If we are using a Linux system, most likely we already have it. But we can make sure, by trying to install it with the command:
sudo apt install python3
Next we need to install the “python-is-python3” package with the following command:
sudo apt install python-is-python3
What does python-is-python3 do? Well, as I understand it is a package that makes that, when an application tries to invoke a version of Python, the resulting invoked version is Python3. Recall that there is also Python2. This package comes to avoid problems like the program thinking “which version of Python are you referring to?”
Third, we need to have pip, and we install it with the command:
sudo apt install pip
The pip tool allows you to install libraries and dependencies for Python.
Well, we’re way ahead of schedule. Having all this installed on the computer is the basis we need to see how to create subtitles automatically.
The next thing is…
Install Kdenlive on Linux
We need to get Kdenlive in version 21.04.0 or higher. As it is easy to imagine, installing an earlier version is not going to help us at all.
It’s good to clarify that I’m writing all this while using Kdenlive version 23.04.2 on Linux Mint.
Although we can find the program in the Software Center, it may not be the latest version. That’s why we are going to install it from the official kdenlive site.
You can find more details about the installation following this link.
We can also install the program using flatpak, for that I open the terminal and type:
sudo flatpak install kdenlive
When we finish this, we will have the program installed on the computer. The next thing is, as you can imagine, to open the program.
Installing the necessary modules in Kdenlive
We have to install the voice models. This sounds more difficult than it really is. The truth is that this is almost automatic.
For this we open Kdenlive and go to:
Settings > Configure Kdenlive > Speech to Text.
We are going to select the Vosk Speech engine. At this point, in some versions of Kdenlive, the program will ask us to download two Python tools. This may change depending on whether or not we have these tools on the system.
As i understand it, these modules that the program uses are Vosk (for language recognition) and Srt (to synchronize the subtitles with the timeline).
We could also install these modules using the terminal via pip. But since Kdenlive does it directly, let’s take the more relaxed way.
Uploading the Voice models/dictionaries to Kdenlive
This is the last step to complete the installation details.
The software will ask us to upload a dictionary. For that we have to go to the following site:
https://alphacephei.com/vosk/models
And download the model we are looking for. There are more than fifteen languages to use, but in this case we are going to download one for English.
With this we are going to download the model, which comes in a zip file.
Then we use the + sign on the left of the sale, choose the folder where it is saved and load it. We do not need to unpack the file, it is enough to add it directly as we saved it.
Create subtitles automatically with Kdenlive
Well, for this example I used an audio file that I made with my cell phone. Using only an audio or a video is really indistinct, in general terms, for the final result.
The program uses a timeline. What we have to do is to drop our file into the timeline.
Then we go to the menu:
project > subtitles > speech recognition
We get a new window. Here we choose the language, in our case the Spanish dictionary that we installed before.
Then we can choose whether to analyze all uploaded files, only one or only the selected clips. As in my case there is only one clip, marked in all its extension, it does not make much difference to me.
When you click on process, the program will do its job and will return the subtitles.
They will appear at the top of the timeline. Better yet, they appear adjusted to our audio, so we won’t have to move them to adjust them.
Perhaps most importantly: When saving the file, Kdenlive also saves an .srt document containing the text we transcribed. I’m not talking about when you render the project, when you save it with the “save as” option.
That text is accompanied by the timestamps that makes it synchronize when we play it back.
But we can easily remove all these marks in a text editor, to keep only the transcription of the voice.
In this way we can transcribe university notes, write texts by dictation or anything else that comes to mind. All by automating the project.
Conclusion
Thus we saw how to create subtitles automatically with Kdenlive.
There are several details to keep in mind. The project may have problems if the voice is not clear, or if there is a lot of background noise, for example. Some parts may not be transcribed in their entirety.
Still, saving energy and time with the part that is processed correctly is incredibly helpful. And if there are parts that need extra work, they can be fixed using as a basis the text that the program was able to transcribe.
And more important than anything else. There are sites on the internet that do this kind of work, but they all ask us to complete several steps first. At least they ask us to register or leave an email address.
The alternative from Kdenlive we can do it directly from our own computer, without the need to be online to get the transcript.
Finally, if you find any errors in what I have written, I would be very grateful if you could let me know so that I can fix them.
We will follow it in the next note.