🚀 Speeding Up MP3 Compilation by 380%

Banner that says "Improve Performance by 380%"

We are excited to share that we have achieved a remarkable 380% increase in audio compilation speed for Prayershub, thanks to our implementation of effective caching strategies in Golang.

Why Is This Important?

Prayershub is designed to elevate meaningful worship experiences by allowing users to create, share, and engage in worship sessions effortlessly. Central to this platform are "worships"—curated playlists featuring a variety of content types, including:

  • Songs
  • Scriptures
  • Prayers

The Use Case for Audio Compilation

When users add content to a worship session, all audio files are mixed and saved into a single track. This serves three essential purposes:

  1. Reliable Playback: iOS Safari's autoplay limitations necessitate seamless transitions between tracks.
  1. Integrated Content: Scriptures and prayers are interleaved with background music for a cohesive worship experience.
  1. Broadcasting Capability: Worships can be synchronized across users, ensuring everyone participates in real-time.

The Current Design - 3 stages

Stage 1: Concurrent Fragment Compilation: Each fragment of the worship session—comprising songs, scriptures, and prayers—is compiled concurrently. We leverage Goroutines in Golang to utilize multiple threads, allowing for efficient processing of each audio segment.

Stage 2: Fragment Concatenation: Once all fragments are compiled, we use FFmpeg to concatenate them into a seamless audio file. This step ensures a smooth transition between different content types, enhancing the worship experience.

Stage 3: Upload to S3: The resulting audio file is then uploaded to Amazon S3 for storage and easy access.

At any point in this process, the user can make an additional edit to their worship, in which ctx is canceled, and the recompilation is triggered. This is another aspect we'll improve in a future time.

This structured approach ensures that we can deliver a high-quality worship experience while maintaining efficiency in our backend processes.

// all error handling omitted
func CompileWorship(ctx context.Context, worship models.Worship) string {
	contents := worship.Contents()

	fragments := []io.Reader{}

	// Compile each {song,scripture,prayer} fragment
	g, gCtx := errgroup.WithContext(ctx)
	for fragI, fragment := range contents {
		g.Go(func() error {
			fragments[fragI]   = compileFragment(gCtx, fragment)
		})
	}
	g.Wait()

	// Concatenate fragments with ffmpeg
	concattedAudio, _ := ffmpegConcat(fragments)
	
	// Upload to S3
	path, _ := utils.UploadToS3(concattedAudio, "audio/mpeg")
	return path
}

Performance Measurement

In order to properly evaluate performance, we measure the time taken for each worship session compilation in seconds relative to its length. Below are the results:

  • Average Compilation Time: 8.62 seconds
  • Median Compilation Time: 14.102 seconds

Technique #1: Cache downloads (2.03% increase)

We introduced a caching mechanism that stores frequently accessed files for longer periods and removes infrequently used files to optimize storage.

CacheClient Implementation

We created a generic CacheClient that checks for cached files before downloading new ones:

type CacheClient struct {
	// Map string to file
	units map[string]*Unit
}

func Init() *CacheClient {
	client := &CacheClient{
		cache: map[string]*Unit{},
	}
	return client
}

func (client *CacheClient) Download(path string) string {
	// race handling omitted
	if unit, ok := client.units[path]; ok {
		if unit.IsDownloading() {
			cacheMiss++
			downloadHits++
			return unit.Wait()
		}else {
			cacheHit++
			return unit.Get()
		}
	} else  {
		unit = NewUnit(path)
		client.units[path] = unit

		unit.StartDownload()
		mustDownload++
		return unit.Wait()
	}
}

This function returns the absolute path to a temporary file, allowing for seamless integration with FFmpeg commands that typically expect a URL.

After a demo-run, recompiling all worships:

The average time to compile is: 7.41s (1.16x speedup)

The median time to compile is: 11.53s (1.22x speedup)

This represents an improvement of approximately 20%.

While one might expect caching downloads to yield the most significant time savings, our measurements revealed that the majority of time is actually spent on re-encoding (73.48% on average), rather than downloading.

Avg time spent downloading: 0.163s

Median time spent downloading: 0.042s

Avg time re-encoding: 4.027s

Median time spent re-encoding: 2.411s

Cache results

Cache hits (found resource in cache): 327
Cache miss (didn't find resource in cache): 416
Download Hits (found resource already downloading): 101
Download Fail (failed downloads): 0
Must Download (resource not cached or being downloaded): 315

Every downloaded resource is re-encoded using a hard-coded preset. As a result, while we successfully eliminated redundant downloads, we still faced unnecessary re-encoding of local files.

Technique #2: Caching Downloads and Re-Encodes

To further enhance performance, we integrated caching for both downloads and re-encodes. Instead of streaming directly to a file, we implemented an encoding stream that allows us to efficiently process audio data:

package main
 
func(unit * Unit) StartDownload(path string) string {
    // ...
    // error handling omitted
 
    r, _: = http.Get(path)
    tmpFile, _: = os.CreateTemp("", "encoded-*.mp3")
 
    // io.Copy(tmpFile, r.Body)
 
    if err: = EncodeAudio(tmpFile, r.Body);
    err != nil {
        // ...
    }
 
    // ...
}
 
func EncodeAudio(out io.Writer, in io.Reader) {
    // ...
}

This was seamless to integrate with Golang with its io.Reader and io.Writer interfaces.

Recompiling again, we find:

The average time: 5.05s (sped up by 1.71x)

The median time: 3.71s (sped up by 3.80x)

Cache invalidation

Cache invalidation can be a complex challenge; however, we have implemented a system where all remote files are treated as immutable. When an admin updates a song's soundtrack, the original file is not modified. Instead, a new version is created.

This approach allows the cache to differentiate resources based on the soundtrack (a string) rather than an entity ID. As a result, the old soundtrack remains in the system but becomes unused over time, eventually leading to its cleanup.

Future endeavors

Possible technique #3: Concurrently upload each fragment

Due to the speed of our network, we initially opted not to implement Multipart Upload. Instead, after compiling each audio fragment, we processed them through an FFmpeg concat demuxer, streaming the final result sequentially to S3.

An alternative approach could involve using Multipart Upload, allowing us to compile each fragment independently and upload them to the server concurrently without waiting for prior uploads to complete. This would enable S3 to automatically concatenate each fragment.

However, there’s a limitation: S3 performs a raw concatenation (essentially combining files like file1 + file2), which aligns with FFmpeg's concat protocol. This method only works for MPEG-2 transport streams or .ts files.

Since we are targeting web browsers, our audio files must be in .mp3 or .wav format, which do not support proper seeking for concatenation. Consequently, this option is not feasible for our use case.

Possible technique #4, Task Interleaving

Task interleaving involves combining sequential steps to enhance efficiency. In our current system, we have three main steps in the audio compilation process:

  1. Compile all fragments
  1. Concat all fragments
  1. Upload to S3

Interleaving the second and third steps is straightforward, as we can stream the output of the ffmpeg concat process directly to s3.Upload.

However, interleaving the first and second steps presents more challenges. Each fragment must be processed in order, which means that as each fragment is completed, we need to find a way to feed it into an ongoing ffmpeg process. This requires careful management to ensure that the order is maintained while maximizing the efficiency of the overall workflow.

This will be the focus of our next article, where we'll explore how to tightly integrate these steps to maximize speed and efficiency.


At Poxate, we specialize in backend optimization and efficiency improvements, particularly in Golang and Node.js. I'm dedicated to enhancing the performance and stability of your backend systems, ensuring they run smoothly and efficiently. If you're looking to streamline your processes and boost your application's performance, we're here to help! Get in touch

© 2024 Poxate. All rights reserved.