{"id":2792,"date":"2024-09-03T18:09:48","date_gmt":"2024-09-03T18:09:48","guid":{"rendered":"https:\/\/blog.samarthya.me\/wps\/?p=2792"},"modified":"2024-09-03T18:10:55","modified_gmt":"2024-09-03T18:10:55","slug":"from-speech-to-text-a-python-deep-dive","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2024\/09\/03\/from-speech-to-text-a-python-deep-dive\/","title":{"rendered":"From Speech to Text: A Python Deep Dive"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Ever wished you could effortlessly convert your spoken words into written text? Look no further! This blog post will guide you through building a simple yet powerful speech-to-text application using Python. We&#8217;ll break down the code step-by-step, explaining the logic behind each stage, and even show you how to run the application yourself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting the Stage: Libraries and Setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we begin, ensure you have the necessary Python libraries installed. You can easily install them using pip:<\/p>\n\n\n\n<pre class=\"wp-block-code has-white-color has-vivid-green-cyan-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-3ce34c865cdf52e035b9125eb6f6942b\"><code>pip install speech_recognition docx spellchecker<\/code><\/pre>\n\n\n<div class=\"wp-block-image is-style-rounded\">\n<figure class=\"aligncenter size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t-1024x1024.jpeg\" alt=\"\" class=\"wp-image-2793\" srcset=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t-1024x1024.jpeg 1024w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t-150x150@2x.jpeg 300w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t-150x150.jpeg 150w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t.jpeg 1536w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/09\/s2t-300x300@2x.jpeg 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">These libraries are our tools of the trade:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>speech_recognition:<\/strong>&nbsp;The powerhouse behind capturing audio and converting it into text.<\/li>\n\n\n\n<li><strong>docx:<\/strong>&nbsp;Allows us to interact with Microsoft Word documents, perfect for saving our transcribed text.<\/li>\n\n\n\n<li><strong>spellchecker:<\/strong>&nbsp;Ensures our output is polished and error-free by correcting any spelling mistakes.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Building the Engine: The&nbsp;<code>SpeechToText<\/code>&nbsp;Class<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">At the heart of our application lies the&nbsp;<code>SpeechToText<\/code>&nbsp;class, meticulously crafted to handle the entire process. Let&#8217;s dissect it:<\/p>\n\n\n\n<figure class=\"wp-block-pullquote has-black-color has-luminous-vivid-amber-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-ff7f0df4470056ce1c378ac8cd1a86fd\" style=\"border-width:4px;border-radius:24px\"><blockquote><p>Source code is available <a href=\"https:\/\/github.com\/samarthya\/py-speech\" data-type=\"link\" data-id=\"https:\/\/github.com\/samarthya\/py-speech\">here<\/a><\/p><cite>Saurabh<\/cite><\/blockquote><\/figure>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-feef98313efe9cf8969ce803556d96a4\"><code>import logging\n\nimport speech_recognition as sr\nfrom docx import Document\nfrom spellchecker import SpellChecker\n\n# Create and configure logger\nlogging.basicConfig(filename='speech_to_text.log',\n                    level=logging.DEBUG,\n                    format='%(asctime)s - %(levelname)s - %(message)s')\n\n\nclass SpeechToText:\n    # ... (rest of the class code)\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Initialization (<code>__init__<\/code>)<\/strong>:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-60674c70ed59d6d14f57e720912822ea\"><code>def __init__(self, language=\"en-US\"):\n    self.recognizer = sr.Recognizer()\n    self.microphone = sr.Microphone()\n    self.language = language\n    logging.debug(\n        'SpeechToText object initialized with language: %s', language)\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Here, we initialize our speech recognizer (<code>self.recognizer<\/code>), set up the microphone (<code>self.microphone<\/code>) as our audio source, and specify the language (<code>self.language<\/code>) for recognition (defaulting to US English).<\/li>\n\n\n\n<li><strong>Listening and Transcribing (<code>listen_and_transcribe<\/code>)<\/strong>:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-60f3fdd6ece72ce3f22d83e61172cf4e\"><code>def listen_and_transcribe(self):\n    logging.info(\"Starting listening session...\")\n    print(\"Listening... Say 'program stop' to end the session.\")\n\n    full_text = &#91;]\n\n    with self.microphone as source:\n        self.recognizer.adjust_for_ambient_noise(source)\n\n        while True:\n            try:\n                audio = self.recognizer.listen(source)\n                text = self.recognizer.recognize_google(\n                    audio, language=self.language).lower()\n\n                print(f\"Recognized: {text}\")\n                logging.debug('Recognized text: %s', text)\n\n                if \"program stop\" in text:\n                    print(\"Stopping the program...\")\n                    break\n\n                full_text.append(text)\n            except sr.UnknownValueError:\n                print(\"Could not understand audio\")\n                logging.warning(\"Could not understand audio\")\n            except sr.RequestError as request_error:\n                print(f\"Could not request results; {request_error}\")\n                logging.error(\"Could not request results: %s\", request_error)\n\n\n    return \" \".join(full_text)\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This function captures audio from your microphone, transcribes it using Google Speech Recognition, and neatly stores the recognized text. The loop continues until you say &#8220;program stop.&#8221;<\/li>\n\n\n\n<li><strong>Spell Checking (<code>spell_check<\/code>)<\/strong>:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-c90db5972b80d77e581751d3dd66c703\"><code>def spell_check(self, text):\n    logging.debug('Spell checking text: %s', text)\n    spell = SpellChecker()\n    words = text.split()\n    corrected_words = &#91;]\n\n    for word in words:\n        if \".\" in word and any(c.isalpha() for c in word):\n            corrected_words.append(word)  # Don't try to correct URLs\n        else:\n            corrected_word = spell.correction(word)\n            corrected_words.append(corrected_word)\n\n    logging.debug('Corrected text: %s', \" \".join(corrected_words))\n    return \" \".join(corrected_words)\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Our application ensures accuracy by spell-checking the transcribed text. It intelligently identifies and ignores potential URLs to avoid incorrect corrections.<\/li>\n\n\n\n<li><strong>Saving to Word Document (<code>save_to_word<\/code>)<\/strong>:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-9f5b1650a4820d9af209714557b98535\"><code>def save_to_word(self, text, filename=\"output.docx\"):\n    logging.info('Saving text to file: %s', filename)\n    doc = Document()\n    doc.add_paragraph(text)\n    doc.save(filename)\n    print(f\"Text saved to {filename}\")\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Finally, this function saves the polished, transcribed text into a Word document, ready for you to access and use.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Putting It All Together: The&nbsp;<code>main<\/code>&nbsp;Function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The&nbsp;<code>main<\/code>&nbsp;function acts as the conductor, orchestrating the entire process:<\/p>\n\n\n\n<pre class=\"wp-block-code has-black-color has-pale-cyan-blue-background-color has-text-color has-background has-link-color has-medium-font-size wp-elements-d494cf8a23508e21b7a87529b1a5cf8f\"><code>def main():\n    speech_to_text = SpeechToText() \n    transcribed_text = speech_to_text.listen_and_transcribe()\n    corrected_text = speech_to_text.spell_check(transcribed_text)\n    speech_to_text.save_to_word(corrected_text)\n\nif __name__ == \"__main__\":\n    main()\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">It creates a&nbsp;<code>SpeechToText<\/code>&nbsp;object, initiates the listening and transcription process, spell-checks the result, and finally saves it to a Word document.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Running the Application<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Save the code:<\/strong>&nbsp;Save the&nbsp;<code>SpeechToText<\/code>&nbsp;class code as&nbsp;<code>sp.py<\/code>&nbsp;and the&nbsp;<code>main<\/code>&nbsp;function code as&nbsp;<code>main.py<\/code>&nbsp;in the same directory.<\/li>\n\n\n\n<li><strong>Execute:<\/strong>&nbsp;Open your terminal or command prompt, navigate to the directory where you saved the files, and run the command:&nbsp;<code>python main.py<\/code><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Sample Output<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">After running the application, speak clearly into your microphone. Once you say &#8220;program stop,&#8221; the transcribed and corrected text will be saved in a Word document named &#8220;output.docx&#8221; in the same directory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Congratulations! You&#8217;ve successfully built a basic yet functional speech-to-text application using Python. This simple example demonstrates the power and flexibility of Python for tackling real-world tasks. Feel free to experiment with different languages, explore advanced speech recognition features, or even integrate this into a larger project. The possibilities are endless!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ever wished you could effortlessly convert your spoken words into written text? Look no further! This blog post will guide you through building a simple yet powerful speech-to-text application using Python. We&#8217;ll break down the code step-by-step, explaining the logic behind each stage, and even show you how to run the application yourself. Setting the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2794,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[34,239],"tags":[270],"class_list":["post-2792","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","category-technical-2","tag-python"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2792","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=2792"}],"version-history":[{"count":2,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2792\/revisions"}],"predecessor-version":[{"id":2796,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2792\/revisions\/2796"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/2794"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=2792"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=2792"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=2792"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}