UI Automation: Synthesizing 2,616 MIDI Files to WAV at Scale
I'm the head of the technology team for a small haitian church and the lead developer of a worship app called Prayershub, and we assist the Praise Team with their singing practices.
In a praise session, there's often 4 vocal roles played by the singers: soprano, alto, tenor, basse, with soprano being the most high-pitched range and basse being the lowest pitch range.
To assist our singers with their performance, we create separate audio tracks for each voice, like the examples below for the French adaption of "Count Your Blessings"
Quand le vol de la tempête … - 527:- Soprano:
- Alto:
- Tenor:
- Basse:
Credits to Troisanges.org which hosted all of the 654 MIDI files for the 654 songs in the SDA French Hymanl (Hymnes et louanges) that made this possible.
However, during training, the producer noticed that it was difficult for our singers to follow the various voices with the music. For example, Tenor is nearly unrecognizeable compared with the other songs, thus the singers couldn't find their place and didn't know where each word begun and end.
So, to aid in their training, we devised that we could have the tracks played by a artifical human voice, rather than a piano. Yes, that tenor track still would sound substantially different (which is okay, that's how music works), but hopefully it would be slightly easier to pick out the words and follow along.
Here are the 4 voices for the same song, synthesized with Synthesizer V Studio 2 Basic
- Soprano:
- Alto:
- Tenor:
- Basse:
We tried this sample with the group, and it seemed to help quite a bit. So, I had the go-ahead to create voices for every song in the hymnal.
To get a scale of what this conversion process looked like, here's some back-of-the-envelope calculations of the scale of this operation:
- The Hymnes et louanges has 654 songs (.midis)
- Each .midi has 4 voices
- Each voice on average plays 4 verses
- Each verse on average is about 40 seconds
- So (654 .midis) * (4 voices) * (4 verses) * (40 seconds) ≈ 418,560 seconds ≈ 116 hours of synthesized audio
Preparations
As mentioned before, I already had the 654 midis for all the songs in the hymnal. The next step was to import them into SynthV (Synthesizer V Studio 2 Basic)
However, the free version SynthV's default voicebank will only synthesize 1 track in a MIDI file. I needed 4 tracks synthesized.
So, I found a GUI tool (built with C#) that can split each track into its own Midi file (VirtuosicAI/MIDI-Splitter-Lite), which I then forked and converted into a CLI tool to make the process automatable.

I then used this CLI tool to split the 654 midi files into 2616 midis into a format that looked like this:
001/soprano.mid001/alto.mid001/basse.mid001/tenor.mid002/soprano.mid002/alto.mid002/basse.mid002/tenor.mid...
Next step was to import these midis into SynthV and synthesize them. Here's a short recording of that process by hand.
Planning
I'm available for work! If you're looking to have some of your business processes automated, or have any general need of software, feel free to contact me at bookofcooks123@gmail.com.
When it comes to automation, there are three popular ways of going about it:
- API: the tool exposes SDKs, REST APIs, or libraries
- CLI: the GUI also provides a CLI version for direct tasks
- UI Automation: simulating user interactions on the GUI
APIs and CLIs are the best way to automatically interact with software. You can run tasks in the background of the computer, with extremely low overhead compared to interacting with a GUI.
However, most GUI applications don't provide programmatic access to their applications, with the most-widely used and somewhat related tool I can think of being MuseScore CLI. So, I wasn't suprised to find that SynthV didn't have these options either, so I moved on with UI Automation.
Libraries & Tools
I'm using FlaUI, a UI automation library for .Net, which provides a nice abstraction layer over UIA2 and UIA3. Basic usage looks like this:
using FlaUI.UIA3;
var app = FlaUI.Core.Application.Launch("notepad.exe");
using (var automation = new UIA3Automation())
{
var window = app.GetMainWindow(automation);
Console.WriteLine(window.Title);
...
} To help me inspect applications, I used Accessibility Insights, a tool open sourced by Microsoft to solve accessibility issues. Incidentally, it uses the same underlying UIA apis as FlaUI to inspect software, and proved to be immensely helpful in debugging.
Here's a short preview of using this tool on File Explorer:
Execution
Now that I've covered the basics, tools, and libraries, I'll go through each step along the way of actually automating this. By the end of it, the actions I perform manually with the mouse will be fully simulated by the program.
The steps can be broken down into three main steps:
- Import the .midi file
- Change voice to the "Eri" voice bank
- Export a .wav
I'll break these steps down further, one for each mouse click (see video above):
- Import the .midi file
- Click on File in the toolbar
- Click on Import in the menu
- Click on Discard in the Discard unsaved changes box (except for the first imported .midi)
- Enter .midi filepath into Filename box in File Explorer (different from video)
- Press OK in Import Midi box
- Change voice to the "Eri" voice bank
- Click on (No default voice)
- Click on Eri (FLT)
- Export a .wav
- Enter directory path into Destination Folder directly. (skips unnecessary interactions)
- Enter voice name into File Name box
- Press Bounce to Files button
- Wait till button no longer says Abort (pending...) to signify completion
- Back to step 1
Launching SynthV
To launch Synthesizer V Studio 2 Basic, simply create a Automation instance and launch a window
private async Task Run()
{
// Get jobs (the .midis)
List<Job> jobs = GetJobs();
Console.WriteLine($"Fetched {jobs.Count} jobs");
var app = FlaUI.Core.Application.Launch("PATH_TO_SYTNHV.exe");
// Launch automation
using UIA3Automation automation = new();
Window window = app.GetMainWindow(automation);
await RenderJobs(automation, window, jobs);
} It's not shown here, but I created a function PrepareWindow() which looked for an existing
window of SynthV, and if not found, launched a new one. That way, I could hot-reload without having
to wait for SynthV to startup every time.
Step #1: Click on 'File' in the toolbar
internal class Worker {
private Window window;
private UIA3Automation automation;
public Worker(Window window, UIA3Automation automation)
{
this.window = window;
this.automation = automation;
}
public async Task Do(Job job)
{
Directory.CreateDirectory(Path.GetDirectoryName(job.SavePath)!);
Console.WriteLine("1. Importing Midi file");
await ImportMidiFile();
}
// IMPORTANT PART OVER HERE
private async Task ImportMidiFile() {
// Focus is required for the "File" menu option to be available
window.Focus();
MenuItem fileMenuItem = await Tools.Until(async () => window.FindFirstChild(b => b.ByName("File"))?.AsMenuItem())
?? throw new Exception("Cannot find File menu");
fileMenuItem.Click();
}
} Step #2: Click on Import in the menu
When we open the File menu in SynthV, it clears out all the UI elements, leaving only the child menu options accessible.
This runs contrary to other applications like Camtasia Studio, which renders the menu while keeping the elements behind accessible.
Weird, but should be inconsequential.
internal class Worker {
private async Task ImportMidiFile() {
// Focus is required for the "File" menu option to be available
window.Focus();
MenuItem fileMenuItem = await Tools.Until(async () => window.FindFirstChild(b => b.ByName("File"))?.AsMenuItem())
?? throw new Exception("Cannot find File menu");
fileMenuItem.Click();
// Find the "Import..." menu option
MenuItem importMenuItem = await Tools.Until(async () => window.FindFirstChild(b => b.ByName("Import...")).AsMenuItem())
?? throw new Exception("Cannot find File > Import... menu");
importMenuItem.Click();
}
} However, after running this function, I got an exception, "Cannot find File > Import... menu". Weird? I tried logging all of the child elements of the window, but got a simple empty list.
Perhaps the window needs to be refocused? But that doesn't work. Accessibility Insights uses the same UIA apis as FlaUI, so both of our applications should see the same elements, but for some reason, are not.
I searched through the UIA documentation, exploring the window's ModalWindow, Popup, etc, but to no avail. Eventually, I had the idea of logging all the windows on the desktop while the File menu was opened, which revealed that there were two SynthV windows.
foreach (var child in window.Parent.FindAllChildren())
{
Console.WriteLine(child);
}
Console.ReadKey();
// Output:
AutomationId:, Name:Synthesizer V Studio Basic, ControlType:window, FrameworkId:JUCE
AutomationId:Synthesizer V Studio Basic, Name:Synthesizer V Studio Basic, ControlType:window, FrameworkId:JUCE
AutomationId:, Name:, ControlType:pane, FrameworkId:Win32
AutomationId:, Name:, ControlType:pane, FrameworkId:Win32
AutomationId:, Name:, ControlType:pane, FrameworkId:Win32
AutomationId:, Name:, ControlType:pane, FrameworkId:Win32
AutomationId:, Name:Taskbar, ControlType:pane, FrameworkId:Win32
AutomationId:, Name:System tray overflow window., ControlType:pane, FrameworkId:Win32
AutomationId:, Name:Program.cs - auto_synth_v - Visual Studio Code, ControlType:window, FrameworkId:Win32
AutomationId:, Name:Program Manager, ControlType:pane, FrameworkId:Win32 Only then did I realize that my missing menu was miserably hiding in a new window separate from the MainWindow. And because they share the same name, there was no way to distinguish which one contained the menu. So, I had to search inside all windows called Synthesizer V Studio Basic (because I'd have multiple parallel workers) until I found my menu item.
internal class Worker {
private async Task ImportMidiFile() {
// Focus is required for the "File" menu option to be available
window.Focus();
MenuItem fileMenuItem = await Tools.Until(async () => window.FindFirstChild(b => b.ByName("File"))?.AsMenuItem())
?? throw new Exception("Cannot find File menu");
fileMenuItem.Click();
// Find the "Import..." menu option
MenuItem importMenuItem = await Tools.Until(async () =>
{
// Search through all SynthV windows in the desktop
return automation.GetDesktop()
.FindAllChildren(b => b.ByName("Synthesizer V Studio Basic"))
.Select(window => window.FindFirstChild(b => b.ByName("Import...")))
.FirstOrNull().AsMenuItem();
})
?? throw new Exception("Cannot find File > Import... menu");
importMenuItem.Click();
}
} Step #3: Click on Discard in the Discard unsaved changes box
There are two ways about going this, the simple way, and the hard way. You see, it'll only ask you to discard changes if you've imported another project before.
Now, remember how I mentioned that my application looks for an existing window of SynthV to speed up development? Well, that also means it's possible I'll have unsaved changes (from a previous run) on that window, even if my current instance hasn't begun doing any work.
The simple solution is to always launch a new window on every run, and dismiss the dialog if job.index > 0. The complex solution is to check if the dialog existed, and if
did, dismiss it.
I went with the complex method, as I wanted my app app (AutoSynthV) to be resilient to these things.
private async Task ImportMidiFile() {
// Press File
// Press File > Import...
//...
await AttemptToDiscard();
}
// Every 200 milliseconds, it performs a series of checks:
// Check 1: If the "Discard" dialog exists, dismiss it and return true
// Check 2: If the File Explorer window exists, return false (there were no changes to dismiss)
// Check 3: If it's been more than a minute, throw an exception, otherwise, loop back to Check 1
//
// Returns true if it pressed the "Discard" button
private async Task<bool> AttemptToDiscard()
{
var desktop = automation.GetDesktop();
Stopwatch stopwatch = new();
stopwatch.Start();
while (stopwatch.ElapsedMilliseconds < 60_000)
{
await Task.Delay(200);
var discardBtn = window.Parent
.FindFirstChild(b => b.ByName("Import..."))?
.FindFirstDescendant(b => b.ByName("Discard"))?
.AsButton();
if (discardBtn != null)
{
discardBtn.Click();
return true;
}
var fileExplorer = desktop
.FindAllChildren()
.Where(w => w.ControlType == ControlType.Window && w.Name == "Import...")
.FirstOrNull()?
.AsWindow();
if (fileExplorer != null)
{
// No need to discard, continue happily :)
return false;
}
}
throw new TimeoutException("Failed to find Discard button, and File Explorer never popped up");
} Step #4: Enter .midi filepath into Filename box in File Explorer
Officially, I did this by entering the file path into the filename box, and then pressing Okay. However, as I was writing this blog post, I realized it'd be easier and less error-prone to simply press Enter after typing the path. This way, I can skip searching for the "Okay" button.
private async Task EnterFileInExplorer(string filepath)
{
var desktop = automation.GetDesktop();
Window fileExplorer = await Tools.Until(async () =>
{
return desktop.FindAllChildren().Where(w => w.ControlType == ControlType.Window && w.Name == "Import...").FirstOrNull()?.AsWindow();
}) ?? throw new Exception("Could not find FIle Explorer 'Import' window");
fileExplorer.Focus();
AutomationElement[] importComboBox = await Tools.Until(async () =>
{
return fileExplorer
.FindAllChildren(b => b.ByControlType(ControlType.ComboBox))
.Where(b => b.Name == "File name:")
.FirstOrNull()?
.FindAllChildren();
}) ?? throw new Exception("Cannot find box that contains filepath box"); ;
TextBox fileNameBox = await Tools.Until(async () => importComboBox.FirstOrNull()?.AsTextBox())
?? throw new Exception("Cannot find textbox to enter the Midi's File Path.");
// Wrap in quotes so File Explorer knows it's a absolute path
fileNameBox.Text = $"\"{filepath}\"";
Keyboard.Press(FlaUI.Core.WindowsAPI.VirtualKeyShort.ENTER);
// No need to search for the Okay button anymore
// Button okayButton = await Tools.Until(async () =>
// {
// return fileExplorer
// .FindAllChildren(b => b.ByControlType(ControlType.SplitButton))
// .Where(b => b.Name == "Open")
// .FirstOrNull()?.AsButton();
// }) ?? throw new Exception("Could not find 'Okay' button in File Explorer");
// okayButton.DragClick();
} Step #5: Press OK in Import Midi box
This is Self explanatory.

Button importOkBtn = await Tools.Until(async () =>
{
return automation.GetDesktop()
.FindFirstChild(b => b.ByName("Import Midi"))?
.FindFirstDescendant(b => b.ByName("OK"))?
.AsButton();
}) ?? throw new Exception("Could not find 'Import Midi > Ok' button");
importOkBtn.DragClick(); Step #6: Change the voice to Eri (FLT)

Click on the first text box was pretty easier, but the second was a bit trickier. It's the same deal as the File menu, as that menu is launched in a completely new window, requiring us to search outside the MainWindow. But other than that, nothing too complicated.
private async Task ChangeVoice()
{
// Click on "(No default voice)"
TextBox noDefaultVoice = await Tools.Until(async () =>
{
return window
.FindFirstChild((b) => b.ByText("(No default voice)"))?
.AsTextBox();
}) ?? throw new Exception("Cannot find (No default voice)");
noDefaultVoice.Click();
// Choose "Eri (FLT)"
MenuItem eriVoice = await Tools.Until(async () =>
{
return window.Parent
.FindAllChildren(b => b.ByName("Synthesizer V Studio Basic"))
.SelectMany(w => w.FindAllChildren(by => by.ByControlType(ControlType.MenuItem)))
.Where(menuItem => menuItem.AsMenuItem().Text == "Eri (FLT)")
.FirstOrNull()?
.AsMenuItem();
}) ?? throw new Exception("Cannot find ERI Voice");
eriVoice.Click();
} Step #7: Toggling the Render Tab
This step was extraordinarily difficult to execute, because the input box is hidden behind the Render Tab.
So? Just click on the Render Tab?
What if it's already opened?
Well, check if it's opened.
You can't! The UI barely has any accessibility labels. It's like traveling with a map that has state borders but no labels.
This isn't shown in the video, but the entire list 50 to 100 UI elements in SynthV are at the top level. There's no tree structure whatsoever. And most of the UI is made of custom boxes with no names, thus making it nearly impossible to navigate programmatically.
I suspect this is happens because SynthV doesn't use WPF or some native Windows UI that might come with accessibility built-in. Rather, it uses a C++ Desktop framework called JUCE, which likely renders its the UI with its own drawing engine. Thus these accessibility controls is something that must be tacked on, rather than out of the box.
P.S: I haven't explored and really studied the internals of JUCE, but I'm relying on it being relatively similar to Flutter's custom engine for mobile apps.
In any case, this has to be solved. This problem can be split into two parts:
- How do we know the Render Tab is opened?
- How do we open the Render Tab if it's closed?
How do we know the Render Tab is opened?
As you can see from the video, we can't inspect the "Render" title bar, as it just displays as
a label-less custom '' under Accessibility Insights, of which we have dozens.
In addition, multiple tabs may be opened at once.

The best solution I could come up is checking the presence of the 'Bounce to Files' button. However, like I said before, SynthV lacks any sort of structure at all (accessibility-wise). Thus, I can only check if that button is present in the entirety of the SynthV window.

Nevertheless, as long as 'Bounce to Files' never appears anywhere else, this is a pretty reliable way to determine if the tab is opened.
var bounceBtn = window.FindFirstChild(b => b.ByName("Bounce to Files"));
bool isTabOpened = bounceBtn != null;
if (!isTabOpened)
{
OpenRenderTab();
return null;
} How do we open the Render Tab if it's closed?
As shown in the diagram below, all the tab buttons are rendered as GraphicalButton with no labels to differentiate them.

Zero, one, two, three, four! It's the 4th button!
That's a matter of perspective.

It took me a day going back and forth on various methods before it finally hit me. All of the tab toggles are rendered sequentially, one after the other. Therefore, I could simply look for the first group of 7 consecutive GraphicalButtons. While the application had buttons grouped together, this group was the only one that reached 7 buttons, thus making this approach relatively safe.
List<AutomationElement>? FindToolbarBtns()
{
List<AutomationElement> consecutiveBtns = [];
foreach (var element in window.FindAllChildren())
{
if (element.ControlType == ControlType.Button && element.Name == "GraphicalButton")
consecutiveBtns.Add(element);
else
consecutiveBtns.Clear();
if (consecutiveBtns.Count == 7)
{
return consecutiveBtns;
}
}
return null;
} Now that I have the list of toolbar buttons, I can get the button at index 4, click on it, and open the tab.
Conclusion (with video)
This blog post ended up being much longer than expected, so I'm cutting short and will write a Part 2 later. There's much more to cover, including optimizations to search for UI elements faster, and concurrency to have multiple workers running in parallel.
Anyways, I love doing automation work like this, because it provides many small (and big) puzzles you need to solve while yielding a huge amount of value for the business or organization. Converting all these MIDI files would've been months (perhaps a year) of work doing this on the side.
I'm available for work! If you're looking to have some of your business processes automated, or have any general need of software, feel free to contact me at bookofcooks123@gmail.com.
Here's a video of the AutoSynthV program in action: