Frequently Asked Questions

Definitely! As a new user you get 20 minutes completely free, after registering your email. Then you can try the whole system based on your own files. Only when you need more than those 20 minutes, you have to buy additional credits. Tip: start with shorter recordings, then you can try out many different ones.
At VoiceToScript, we value the trust you place in us and our services. That is why your privacy is our top priority and we do not store your data longer than necessary. After the transcription process is completed, reviewed and emailed to you, we remove all your uploaded files from our systems. Uploaded files that have not been processed will be automatically deleted after 24 hours. We do not store your files on our servers for longer than necessary.
It goes without saying that we also respect the legal GDPR guidelines that have been drawn up in Europe.
We are a relatively new service with the best price/quality ratio. Most transcription services have developed their own speech engine and models at high cost. VoiceToScript is based upon the powerful speech models of Google, Amazon (Alexa) and MicroSoft (Azure). These all have their specific strengths and we always select the best fit for your recording.
In other words, you get the highest quality for the lowest price. Now and in the future!
Besides English, Welsh, we virtually support every European language and most North and South American languages. In total, we supportt over 50 spoken languages , with an accuracy up to 95%!
  • Albanian (sq-AL)
  • American English (en-US)
  • American Spanish (es-US)
  • Argentinian Spanish (es-AR)
  • Australian English (en-AU)
  • Austrian German (de-AT)
  • Basque (eu-ES)
  • Belgian French (fr-BE)
  • Bosnian (bs-BA)
  • Brazilian Portuguese (pt-BR)
  • Bulgarian (bg-BG)
  • Canadian English (en-CA)
  • Canadian French (fr-CA)
  • Catalan (ca-ES)
  • Chilean Spanish (es-CL)
  • Chinese Cantonese (zh-HK)
  • Chinese Mandarin (zh-CN)
  • Croatian (hr-HR)
  • Czech (cs-CZ)
  • Danish (da-DK)
  • Dutch (nl-NL)
  • English (en-GB)
  • Estonian (et-EE)
  • Farsi (Persian) (fa-IR)
  • Finnish (fi-FI)
  • French (fr-FR)
  • Galician (gl-ES)
  • German (de-DE)
  • Greek (el-GR)
  • Gulf Arabic (ar-AE)
  • Hebrew (he-IL)
  • Hindi (hi-IN)
  • Hungarian (hu-HU)
  • Icelandic (is-IS)
  • Indian English (en-IN)
  • Indonesian (id-ID)
  • Irish (ga-IE)
  • Irish English (en-IE)
  • Italian (it-IT)
  • Japanese (ja-JP)
  • Korean (ko-KR)
  • Latvian (lv-LV)
  • Lithuanian (lt-LT)
  • Macedonian (mk-MK)
  • Malay (ms-MY)
  • Maltese (mt-MT)
  • Mexican Spanish (es-MX)
  • Modern Standard Arabic (ar-SA)
  • New Zealand English (en-NZ)
  • Norwegian (nb-NO)
  • Polish (pl-PL)
  • Portuguese (pt-PT)
  • Romanian (ro-RO)
  • Russian (ru-RU)
  • Serbian (sr-RS)
  • Slovak (sk-SK)
  • Slovenian (sl-SI)
  • South African English (en-ZA)
  • Spanish (es-ES)
  • Swedish (sv-SE)
  • Swiss French (fr-CH)
  • Swiss German (de-CH)
  • Swiss Italian (it-CH)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Thai (th-TH)
  • Turkish (tr-TR)
  • Ukrainian (uk-UA)
  • Vietnamese (vi-VN)
  • Welsh (cy-GB)
You can upload any Audio or Video file. The format doesn't matter, as long as it contains sound. So whether it is an .mp3, .mp4, .mpeg, .avi, .aac, .m4a, .wma, .wav, .flac, .avi, .opus, .mov, .ogg or any other format, ' VoiceToScript analyzes the file and checks if there is an audio stream in it. So you don't have to worry about that, we'll do it for you.
Yes you can! Next to uploading your own files, you can also upload files from most social media platforms. YouTube is the easiest one. Just paste the YouTube Video url on the upload page and the file will be automatically uploaded. For other social media platforms, like Vimeo, Instagram, Twitter and others, there is a slightly different approach. This has been described in details in one of our articles: Click here to read the full this article.
You can upload files up to 2 GB (2000 MB), with max. 3 hours of audio. Whether you can actually upload these also depends on the upload speed of your own internet connection. To be able to upload the 2 GB, you need to have an upload speed of at least 5 MB/sec, otherwise the upload will be aborted after a window of 15 minutes.
Yes, because we do not keep your files on our server, you will have to upload them again. We do not keep your files on our server any longer then necessarry, because we put security and confidentiality first. After all: it is your data and your data alone.
Transcribing is the conversion of spoken words to text. Typically based on the audio recordings a transcript is made afterwards. This is often used by journalists to work out recorded interviews or scientists/students to record these for research purposes. With the improved transcription services it is increasingly used to create 'spoken' reports or to transcribe meetings automatically.
We only support non-verbatim transcriptions. This means that the recording is transcribed word-for-word. This means that stuttering, intonation, interjections or repetitions are not included. With a verbatim transcription the latter is included.
We use the best transcription engines currently available, namely those from Google, MicroSoft, IBM and Amazon. They deliver a very high quality up to 98%, but they are not perfect. It is a fully automatic process, where the quality of the supplied recording determines to a large extent the quality of the end result. It is mainly about how clear the speech is and whether there are annoying background noises. It is therefore always necessary to check the delivered texts against the original recording and make corrections where necessary!
The file is send to you by email and consists of a number of textblocks. For each block the time is also given, so you can quickly find the fragment in the original audio file. Some of the words may be highlighted, which indicates these were more difficult to hear and understand by the system. . This helps to identify the areas where you could pay extra attention. For interviews there is a new block for every speaker change (max. 5 speakers).
The file is easily editable with standard editors like MicroSoft Word.
The subtitle file you receive by email is a .srt file (SubRip format). This contains both your spoken text and the exact time codes of when each line of text should be shown in your video. The structure of this file is explained on this website . Here you can also find out how to add it to your video. Because the accuracy of the automatically generated subtitles depends on several factors, it is important to check the file and if necessary correct it before you start using it. You can easily edit the file with any standard text editor, such as WordPad on MicroSoft Windows PCs.
If you are logged in, you will see '$Credits' at the top of the tab. If you click on this tab you will see the prices and the possibility to buy credits. You can pay with PayPal, Credit Card and others. Immediately after your payment you will receive a VAT invoice by email.
We round up the time in whole minutes. For a recording of e.g. 3 minutes and 15 seconds, 4 minutes will be deducted from your credits.
That is certainly possible! We have a separate environment for larger companies. If you would like to make use of this, please contact us to make further arrangements by email.

Speech --> text

Automatically convert speech to text with AI and edit it in Word.

Audio and Video

Upload your (multilingual) recording and get the text by email.

Secure and Reliable.

Accurate up to 98%! Also supports bilingual transcriptions.
In over 50 languages.

  • Albanian (sq-AL)
  • American English (en-US)
  • American Spanish (es-US)
  • Argentinian Spanish (es-AR)
  • Australian English (en-AU)
  • Austrian German (de-AT)
  • Basque (eu-ES)
  • Belgian French (fr-BE)
  • Bosnian (bs-BA)
  • Brazilian Portuguese (pt-BR)
  • Bulgarian (bg-BG)
  • Canadian English (en-CA)
  • Canadian French (fr-CA)
  • Catalan (ca-ES)
  • Chilean Spanish (es-CL)
  • Chinese Cantonese (zh-HK)
  • Chinese Mandarin (zh-CN)
  • Croatian (hr-HR)
  • Czech (cs-CZ)
  • Danish (da-DK)
  • Dutch (nl-NL)
  • English (en-GB)
  • Estonian (et-EE)
  • Farsi (Persian) (fa-IR)
  • Finnish (fi-FI)
  • French (fr-FR)
  • Galician (gl-ES)
  • German (de-DE)
  • Greek (el-GR)
  • Gulf Arabic (ar-AE)
  • Hebrew (he-IL)
  • Hindi (hi-IN)
  • Hungarian (hu-HU)
  • Icelandic (is-IS)
  • Indian English (en-IN)
  • Indonesian (id-ID)
  • Irish (ga-IE)
  • Irish English (en-IE)
  • Italian (it-IT)
  • Japanese (ja-JP)
  • Korean (ko-KR)
  • Latvian (lv-LV)
  • Lithuanian (lt-LT)
  • Macedonian (mk-MK)
  • Malay (ms-MY)
  • Maltese (mt-MT)
  • Mexican Spanish (es-MX)
  • Modern Standard Arabic (ar-SA)
  • New Zealand English (en-NZ)
  • Norwegian (nb-NO)
  • Polish (pl-PL)
  • Portuguese (pt-PT)
  • Romanian (ro-RO)
  • Russian (ru-RU)
  • Serbian (sr-RS)
  • Slovak (sk-SK)
  • Slovenian (sl-SI)
  • South African English (en-ZA)
  • Spanish (es-ES)
  • Swedish (sv-SE)
  • Swiss French (fr-CH)
  • Swiss German (de-CH)
  • Swiss Italian (it-CH)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Thai (th-TH)
  • Turkish (tr-TR)
  • Ukrainian (uk-UA)
  • Vietnamese (vi-VN)
  • Welsh (cy-GB)