We Removed 335 LOC with one Go package
TLDR: Our new service (Podio) puts a Fluent API on top of audio management with features we couldn't find in FFmpeg.
At Prayershub, we were facing significant challenges with FFmpeg. The process of optimizing it for efficiency, along with managing complex audio pipelines for various audio processing tasks, was becoming increasingly tedious and hard to maintain.
As the codebase grew, the overhead of managing these processes was taking a toll on our productivity.
To solve this, we decided to abstract the difficult parts into a separate platform — Podio (as a service under Poxate).
The problem
Prayershub is a worship service helper that lets users create worship (practically, playlists) with songs, bible verses, and prayers. As the complexity grew, so did the need for efficient audio processing—looping tracks, applying fades, and managing background music.
However, we running into walls with FFMpeg. Here's an illustration:
We'll have repeating background music, with 2 seconds before and after the speech/voice.
Click here to view a sample of what that sounds like.
That's not the end of it, however. Sometimes, there will be multiple "Long speeches", such as "a reading of a bible chapter" followed by a prayer.
Thus we had one concat
operation, a stream_loop
, with the results of both piped into amerge
, resulting in this: (much has been stripped from this code for the sake of clarity, don't do most of this in production):
func writeVoiceFragment(ctx context.Context, resultFilePath string, group []models.WorshipContent, backgroundMusic string) error {
tmpSilentFilename := fmt.Sprintf("silent-%s.mp3", uuid.NewString())
{
// Step 1: Download 1-second-of-silence file
req, err := http.Get("https://s3.us-east-2.amazonaws.com/cdn.prayershub.com/assets/1-second-of-silence.mp3")
if err != nil {
return fmt.Errorf("failed to download silent file: %w", err)
}
contents, err := io.ReadAll(req.Body)
if err != nil {
return fmt.Errorf("failed to read silent file: %w", err)
}
if err := os.WriteFile("tmp/"+tmpSilentFilename, contents, 0777); err != nil {
return fmt.Errorf("os.WriteFile failed for silent file: %w", err)
}
defer os.Remove("tmp/" + tmpSilentFilename)
}
filenames := make([]string, len(group))
{
// Step 2: Reencode soundtracks
g, _ := errgroup.WithContext(ctx)
for i, content := range group {
i := i
content := content
g.Go(func() error {
tmpFilename := fmt.Sprintf("voice-%s.mp3", uuid.NewString())
remoteUrl := models.SafelyGetSoundtrack(content.Type, content.Data, map[int]models.Song{})
errBuf := bytes.NewBuffer(nil)
cmd := exec.CommandContext(ctx, "ffmpeg",
"-i", remoteUrl, "-ar", "44100", "-ac", "2", "-b:a", "192k", "tmp/"+tmpFilename,
)
cmd.Stderr = errBuf
if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to reencode: %w || %s", err, errBuf.String())
}
filenames[i] = tmpFilename
return nil
})
}
if err := g.Wait(); err != nil {
return fmt.Errorf("first g.Wait failed: %w", err)
}
for _, filename := range filenames {
defer os.Remove("tmp/" + filename)
}
}
tmpConcatTextFilepath := fmt.Sprintf("tmp/concat-%s.txt", uuid.NewString())
{
// Step 3: Create concat file
concatText := strings.Repeat(fmt.Sprintf("file '%s'\n", tmpSilentFilename), 2)
for _, filename := range filenames {
concatText += "file '" + strings.ReplaceAll(filename, "'", "'\\'") + "'\n"
concatText += strings.Repeat(fmt.Sprintf("file '%s'\n", tmpSilentFilename), 2)
}
if err := os.WriteFile(tmpConcatTextFilepath, []byte(concatText), 0777); err != nil {
return fmt.Errorf("os.WriteFile failed to create tmpConcatTextFilepath: %w", err)
}
defer os.Remove(tmpConcatTextFilepath)
}
concattedBuf := bytes.NewBuffer(nil)
{
// Step 3: Concat demuxer
errBuf := bytes.NewBuffer(nil)
cmd := exec.CommandContext(ctx, "ffmpeg", "-f", "concat", "-i", tmpConcatTextFilepath, "-c", "copy", "-f", "mp3", "pipe:")
cmd.Stdout = concattedBuf
cmd.Stderr = errBuf
if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to concat: %w || %s", err, errBuf.String())
}
}
{
// Step 4: Overlay background audio
errBuf := bytes.NewBuffer(nil)
cmd := exec.CommandContext(ctx, "ffmpeg",
"-stream_loop", "-1", "-i", backgroundMusic,
"-i", "pipe:",
"-shortest", "-filter_complex", "[0:a]volume=0.25[a0];[a0][1:a]amerge=inputs=2[out]",
"-ar", "44100", "-ac", "2", "-b:a", "192k", "-map", "[out]", resultFilePath)
cmd.Stdin = concattedBuf
cmd.Stderr = errBuf
if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to overlay background music: %w || %s", err, errBuf.String())
}
}
return nil
}
You'll notice a few problems immediatly:
- Each step is done in blocking fashion, parallelizing would require even more work
- Many intermediate steps
- The result is held in a in-memory buffer (this is not Go's fault, nor the fault of FFmpeg, but a descision made to get around restrictions that will not be discussed in this article).
There is one feature still missing in this implementation, Fading. With FFmpeg, fading in the first 2 seconds is simple enough.
Fading out proved to be more complicated, as FFmpeg requires a time
in the audio to start fading out, which we won't know. There's a whole StackOverflow post over this.
Workaround #1:
- Compile the audio
fade
first 2 seconds- Reverse entire audio with
areverse
fade
first 2 seconds- Reverse again with
areverse
This was obviously an inefficient workaround, and proved infeasible in the time constraints of our application.
Workaround #2:
- Compile the audio
- Find duration through
ffprobe
- Subtract duration by 2 seconds, then fade out with
ffmpeg
Although much better than workaround #2, it still violated the time constraints of our application. In the end, we decided to drop fading entirely, we significantly decreased the user experience.
The solution
What if there was an easy way to work with audio in Golang, without the hassle of dealing with command-line tools, FFmpeg, or struggling with clunky filters and missing features?
Something a bit more native to Go than Fluent FFMpeg, with the addition of offloading work to a distributed but transparent edge network to avoid overloading our server.
That's why we built Podio – a simple, powerful audio processing library for Go. Podio abstracts the complexity of audio transformations and offloads the heavy lifting to the cloud, allowing you to easily concatenate, loop, fade, and manipulate audio. With an intuitive API and no server management required, you can focus on building your app while Podio handles the audio processing.
The above code rewritten with Podio:
func writeVoiceFragment(ctx context.Context, filepath string, group []models.WorshipContent, backgroundMusic string) error {
voices := []podio.AudioBuilder{}
for _, content := range group {
remoteUrl := models.GetSoundtrack(content.Type, content.Data)
voices = append(voices, podio.Remote(remoteUrl).PadLeft(2*time.Second))
}
ab := podio.Concat(voices...).PadRight(2 * time.Second).WithBackground(
podio.Remote(backgroundMusic).Volume(0.25),
)
outFile, err := os.Open(filepath)
if err != nil {
return err
}
if err := podio.NewClient(os.Getenv("PODIO_API_KEY")).Compile(ctx, podio.MP3, ab, outFile); err != nil {
return err
}
return nil
}
Carrying this across our entire codebase deleted about 335 lines of code, while improving our audio compilation speed by more than 2.4x.
That's not all, we get additional features that would've been a headache with FFmpeg. Let's say, for example, we wanted to get the duration of the final result.
With Podio, it's simple, just pass *time.Duration
, and Podio will update it upon compiling.
func writeVoiceFragment(ctx context.Context, filepath string, group []models.WorshipContent, backgroundMusic string) error {
...
var finalDuration time.Duration
ab := podio.Concat(voices...).PadRight(2 * time.Second).WithBackground(
podio.Remote(backgroundMusic).Volume(0.25),
).SaveDuration(&finalDuration)
...
if err := podio.NewClient(os.Getenv("PODIO_API_KEY")).Compile(ctx, podio.MP3, ab, outFile); err != nil {
return err
}
fmt.Println("The final duration is:", finalDuration)
return nil
}
What about fading in/out? Just as simple:
ab := podio.
Concat(voices...).PadRight(2 * time.Second).
WithBackground(podio.Remote(backgroundMusic).Volume(0.25)).
FadeIn(2 * time.Second).
FadeOut(2 * time.Second).
SaveDuration(&finalDuration)
What if, for some reason, we wanted the duration of each voice as well?
func writeVoiceFragment(ctx context.Context, filepath string, group []models.WorshipContent, backgroundMusic string) error {
voiceDurations := make([]time.Duration, len(group))
voices := []podio.AudioBuilder{}
for i, content := range group {
remoteUrl := models.GetSoundtrack(content.Type, content.Data)
voices = append(voices,
podio.
Remote(remoteUrl).
SaveDuration(&voiceDurations[i]).
PadLeft(2*time.Second),
)
}
var finalDuration time.Duration
ab := podio.
Concat(voices...).PadRight(2 * time.Second).
WithBackground(podio.Remote(backgroundMusic).Volume(0.25)).
FadeIn(2 * time.Second).
FadeOut(2 * time.Second).
SaveDuration(&finalDuration)
outFile, err := os.Open(filepath)
if err != nil {
return err
}
if err := podio.NewClient(os.Getenv("PODIO_API_KEY")).Compile(ctx, podio.MP3, ab, outFile); err != nil {
return err
}
fmt.Println("The final duration is:", finalDuration)
return nil
}
Conclusion
Switching to Podio has significantly streamlined our audio processing at Prayershub. What once took complex FFmpeg pipelines and 335 lines of code is now simplified into just a few method calls using Podio’s intuitive API.
By offloading heavy lifting to a distributed network, we avoided server strain while scaling effortlessly. This not only saved us time but also improved performance and the user experience.
If you're tired of dealing with FFmpeg complexities and want a more efficient, Go-native(ish) solution, Podio is the answer.