Search This Blog

Tuesday 13 February 2024

Multi-lingual versions of natural language processing lectures

Last year I taught an introduction to Natural Language Processing at Macquarie University. I grabbed all the recordings, and started editing them so that I could publish them. I’m not even half way through the first two hour lecture, but at least I’ve made some progress.
Here’s what’s working so far, and what isn’t.
I have a “intro theme music” for the first few seconds of the videos that I publish.

https://soundcloud.com/greg-baker-574386084/another-solresol-company-theme


That theme music consists of several phrases in the musical language Solresol, an artificial language that lasted almost as long as Esperanto has. The alto flute opens with a call out (in solresol) “today; tomorrow”, then there’s a gong because everyone needs a gong and it says something. The timpani then taps out (in morse code) the name of my company. Meanwhile, the harp is saying (in solresol) “wisely useful” then “wisdom - create a model”. The bells sing out (in solresol) “behold: yours! Behold!”. The orchestral strings just fit all the pieces in the background to make it sound complete.


This all seems very appropriate theme music for a series of lectures on natural language processing, particularly on the first few that focus on encoding text in different languages around the world.

I signed up to ElevenLabs (affiliate link http://elevenlabs.io/?from=partnerrobles3623 ) so that I could translate my lectures.
They seem to be using Whisper to do speech-to-text into English. This interacts strangely with the start of my videos, because it seems to think that I’m saying something (which indeed, I am! But in Solresol) and often hallucinates greetings or other comments like that. So it tries to modify the Solresol music with words, which just warps the sound and makes it slightly out of tune.


Other issues:

  • I’m finding that its algorithm for identifying the number of speakers is unreliable. It often thinks that there are two of me.
  • Translation into Indonesian and Malay (which are close to the same thing) was not recognisable as being in those languages. It’s not just me thinking “that doesn’t sound like Bahasa”; fluent speakers weren’t even sure if they were listening to Bahasa or random babble.
But overall, I’m impressed. It would be an enormous amount of effort for me to re-record these videos in each of these languages. I’m not sure I even could do it in Japanese or Arabic (which I have studied) let alone Hindi (which I’ve never studied).


English https://www.youtube.com/playlist?list=PLnUusltxXvTdi6q_uTj4zt4ArCRoGBBUA

Chinese https://www.youtube.com/playlist?list=PLnUusltxXvTeryUuujEiLkhPYwRmYyufj

Hindi https://www.youtube.com/playlist?list=PLnUusltxXvTfOXqfpc5aIbrcgjxSZ9CoN

Spanish https://www.youtube.com/playlist?list=PLnUusltxXvTczVNLSAUFwSoyOf-PWZAcl

Japanese https://www.youtube.com/playlist?list=PLnUusltxXvTeRozp7XbEd90Gh32GAgP9H

Arabic https://www.youtube.com/playlist?list=PLnUusltxXvTfUEtbOqGBL6chpMWx1sApM

Korean https://www.youtube.com/playlist?list=PLnUusltxXvTcE4kecg6x0wHpTOdDMoseK

If you or your colleagues or friends speak one of these languages and want to hear a bit about the history of text encoding, forward them on and let me know if they are useful.

Friday 2 February 2024

Programming language theology

What does the Bible have to say about different programming languages?

  • Perl (Matthew 13:45-46)
  • Haskell (we are called to a life of purity)
  • Ocaml (it’s easier for OCaml to pass through the eye of a needle and all that, Matthew 19:24)
  • Forth It's part of the great commission in Mark 16:15)
  • Go Again, it's part of the great commission, but also Isaiah 18:2 seems to promote message passing between Golang and Swift)
  • Java (covered in He-brews)
  • Lazarus (which is cheating a bit, because it’s a framework for FreePascal, but John 11 covers it in some detail)
  • SQL isn't specifically mentioned, but Proverbs 2:3-5 seems to describe it
  • C/C++/C# in Genesis 1;10 God gathered the waters together and called them the seas. But in Revelation 21:1 it says that there will be no more C, so presumably C++ and C# take over in a new heaven and new earth.
  • Ada according to Genesis 36:4 Eliphas was bored by Ada, which does indeed speak holy truth about programming in Ada
  • Python This is one of those difficult topics. My best interpretation of John 3:14 says that Jesus should be lifted up like a snake was. I guess that means we should kill Python and see if it comes back to life: presumably we did that with the Python 2 -> 3 transition.
  • Swift I'm unsure of what the Bible says about this generally. Isaiah 5:26 applies to programmers at the ends of the earth, and Romans 3:15 presumably refers to programming with one's feet so hard that they bleed.
  • Rust It seems we should avoid Rust based on Matthew 6:19-20.