Podcast.hu – Domain-Name-Driven Development

+ By Áron Igmándy

“Guys, we got the podcast.hu domain, what do you think the best way would be to utilize it?

T_Transcript_podcast.hu_blogposzt_cover_v2
The first waves of the Hungarian podcast scene date back to the early 2010s – a few cult podcasts paved the way for the new medium, not surprisingly with a strong tech focus and very limited reach over the population.

Back in those times, there was no obvious entry point into the Hungarian pod scene: you were only able to find new content by following the right people, reading one of the very few articles about podcasting, and from time to time catching up with the recommendations of already known hosts.

Fast forward 10 years, to the year 2021, and we were still almost at the same place: an emerging subculture with really valuable content, now with broader penetration into the people’s attention market, still struggling with the same discoverability issues, spiced up with a global platform war, where Hungary is just a rounding mistake. A time when finding a Hungarian podcast was easier than ever, but you never had the feeling that you definitely saw all the content related to your interest.

That was the point when we got our opportunity from Telekom – in a very unusual form – to help the community address this and many more issues on the path to creating a mature domestic medium.

You have an unread chat message: podcast.hu?

As a creativity-first agency, we have frequent talks about how the briefing process should evolve over the years for different clients – insights, understandings, requirements, KPIs, one-pager, long-form, presentation, pdf, under and over-researched, we tried a lot of things and have an idea or two about what works and what does not – but this brief arrived in a very unusual form: as a WhatsApp message.

For a bit more context, it’s important to know that we have an ongoing, good and direct relationship with Telekom, we’ve created a great number of digital projects together over the years, and from time to time we pitch crazy ideas to each other, not just in a regular way, but on the phone, between meetings at the cafeteria or over beers at TEB parties. So given our history, it came as no surprise when we found ourselves on a WhatsApp channel with the Telekom marketing team who had some very new information for us: “Guys, we got the podcast.hu domain, what do you think the best way would be to utilize it? Tight timeframe, limited but reasonable resources, and the goal: real impact on the industry.” 

Based on the fact that you are reading these lines, you can guess that so far we are 100% on board with projects briefed this way, but I’m dubious whether this trendline will continue horizontally or not. Anyway, let’s see how we were able to pull this gig off.

From a single domain to an idea

The infamous chat included only two decision makers, an account, and three field experts (creative, digital, social) – and we arrived at the final vision in the very first three messages after ‘hello’ (mainly building on the scars our Head of Social Bálint obtained when he tried to find a quote he had heard in a podcast earlier).

We need a podcast hub, independent from the main platforms, which is able to search the spoken words in Hungarian podcasts. Is it possible?

Not yet, but with time, sure. The goal was ambitious, and we decided to face whatever it takes to achieve it.

We didn’t have to do much research to find out that no one is even trying to do anything like this here in Hungary, and of course there’s a good reason for that. While English-language-based transcription is now a widely available, multi-actor market, with more than impressive results, sentence separation, speaker separation, and even services using a voice synthesizer that can reverse the whole process, the market in our country is really thin. So much so, that according to the first research results of publicly available open services, the “best” result was produced by Google’s transcription API – which is not even specifically created for the Hungarian language, but has a holistic linguistic approach that promises good results for more exotic languages like ours – and after them, the second best results were already hardly readable / understandable. Rough start. 

Probably the funniest transcription test result: the swear words were spot on but everything else is a complete mess.

However, we started to dig into the scene with Hungarian field experts to find industrial solutions and we got a great lead. It turned out that Magyar Telekom and T-Systems already had experience in the field, and among the features of their virtual assistant Vanda, which is behind the speech interpretation, there is a transcription feature developed by the SpeechTex team, which has already proven itself in other areas such as transcribing state TV news for the hearing impaired. Right after the first meeting, we knew it was a match.

Image

Things happened so quickly (as we found out, a tight timeframe meant 10 days from first contact until public reveal) that we had the first specification, cost analysis, and information architecture before we could even agree on the design of the T-shirt the Telekom team could wear standing up to make the announcement during Internet Hungary.

The epicenter

The news of the launch of the platform already generated quite a lot of interest, but we managed to boost it even more with a teaser campaign lasting several weeks, using tactics that hadn’t really been used in the country before, such as this tag-based targeting, which let us reach those who couldn’t miss such a launch with pinpoint accuracy – sliding right into their notifications.

Image
Image

When we launched podcast.hu in December, just 8 weeks after the announcement – for the time being only as a platform-independent podcast aggregator site – we already had over 100 podcasts joining the site. It was perhaps at this point that we really got to grips with the mathematics of the scalability of the vision we had set out to achieve – more than 100 podcasts, two hours of content archived every two weeks, stretched over months or even years. At around 5,000 words in an hour, it takes a weekly podcast a year to create content equal to the first four Harry Potter books in size. That’s a lot.

Podcasts, word-by-word at 1.5x speed

As we arrived at the next stage of the planning with these numbers in hand, we quickly learned that transcribing a podcast is an unimaginably slow and difficult task for an eye used to millisecond-level optimizations on the web. Even with ideal hardware, half or one third of the work involved in transcribing a couple of hours of voice-based content may consist of listening in real time. After a bit of maths, we concluded that increasing the number of target hardware and spreading out the tasks still yields time scales that would put searching through spoken text at an impossible distance. Realizing this, we decided to start with a smaller number of partner channels to begin with, which gave us the space both to look at the usefulness of the feature at a smaller scale and to optimize our search feature at a manageable scale, so it would be able to cope with an exponential growth in data as time goes on and new podcasts are added – something we are happy to have to work on with an experimental development like this.

Of course, this was not the only obstacle: the model we use had been tried and tested on a wide range of vocabulary, but now it had to operate on “foreign territory”. We spent a lot of time trying to ensure that despite the resulting so far unavoidable limitations – such as the merging of transcription under competing speeches – we could eventually present search results at the UI level in such a way that a misprinted text passage could still make sense to users searching for topics with the right context.

All of these shortcuts taken reveal how we just couldn’t wait for the AI to get into the jacket, so we tailored one that will fit it until it grows into the task perfectly. And based on the analytics, users couldn’t wait either: we serve hundreds of transcription results every day. As I write these lines, we are close to reaching 700 fully searchable episodes, that is more than 6 million words that were buried under wavelength files, now searchable for everyone – providing better and better results every day. 

Next episode, please

There is still a lot of growth potential in the site both in terms of usability and transcription-wise, but seeing the thousands of visitors every month tells us that the direction is right and together with Magyar Telekom we can finally fill the gaping hole that has been there for 10 years in one of the most valuable mediums in Hungary.

As of today, we are proud to say that more than three-quarters of the active Hungarian podcasts can be found in one place, on this site, easily searchable by anyone who types “Hungarian podcasts” into Google and clicks on the first link. That’s podcast.hu, the rightful epicenter of our local podcasts.