Last active
May 5, 2025 19:39
-
-
Save PatheticMustan/b7c937a5eec3994ad531d347ff8dc274 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What are computers for? | |
- entertainment | |
- communication | |
- navigation | |
- science | |
+ unlocks whole new frontier in what is calculatable, what is doable | |
+ "big data", but slightly irrelevant to topic | |
Why do Computers need to understand language? | |
- Just like we use language to communicate meaning/intent to other people, language is one of the ways we can interact with our tools/computers | |
- Computers can even be a language aid | |
+ language translation | |
+ emails, voice transcription | |
+ voice recording/sending | |
+ hearing aids | |
- or we can use it to turn natural language into computer queries to make certain tasks easier (information/action) | |
+ "What's the temperature at 2pm tomorrow?" | |
+ "What time is the next bus?" | |
+ "When does Presti's Bakery close on Wednesday?" | |
+ "Call Mom" | |
+ "Text Dad to buy cat food" | |
+ "Turn on the kitchen lights" | |
Popular Example | |
- Virtual Assistants (siri/alexa/google assistant) | |
- watson | |
Speech Synthesis | |
- vocaloids | |
Natural Language Processing (NLP) | |
- google translate | |
- turning voice sample into machine queries | |
- understanding user intent | |
Speech Recognition | |
Difficulties with V | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What is Synthetic Speech? History | |
- text to speech | |
Common Types | |
- recording words, putting them together | |
- recording individual phonetics | |
- miku --> vocaloids | |
- how does siri work? | |
- using ML to create realistic voices | |
- SOTA ML to recreate voice from just sample | |
+ https://vocloner.com/tts.php?voiceid=028fcee5c13c45c49d182566cc6d5f8c | |
+ show live demo? | |
Where is it commonly used? | |
- trains | |
- smart assistants (hey google/siri/alexa) | |
- smart devices | |
Popular example, siri | |
- siri was actually not originally from apple, was originally an app (SRI international AI center) | |
- show siri timeline (https://en.wikipedia.org/wiki/Siri) | |
- siri original VA didn't realize she was the voice of siri (https://www.youtube.com/watch?v=QP-iVhdXjPk) | |
- show change of siri voice over time (https://www.youtube.com/watch?v=QP-iVhdXjPk) | |
- usecases, how "natural" it's become over the years | |
- https://www.youtube.com/watch?v=4ryQTkDWmBg | |
- side tangent: apple fumbled user voice data to google/amazon, who were able to improve voice recognition/naturalness | |
+ even led to apple being seen as "stagnant" (https://www.youtube.com/watch?v=4ryQTkDWmBg) | |
- siri delay (https://www.youtube.com/watch?v=nSdvj6yphoY) | |
- siri not originally created for voice assistant, but rather just "voice recognition" | |
unrel: | |
- https://qz.com/1222958/a-siri-creator-is-still-surprised-by-how-much-siri-cant-do | |
- https://qz.com/1222958/a-siri-creator-is-still-surprised-by-how-much-siri-cant-do | |
side tangent: Watson (jeopardy) | |
- https://www.youtube.com/watch?v=P18EdAKuC1U |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment