{"id":375,"date":"2020-02-10T15:06:21","date_gmt":"2020-02-10T15:06:21","guid":{"rendered":"https:\/\/www.samarthya.me\/wps\/?p=375"},"modified":"2020-02-10T15:10:33","modified_gmt":"2020-02-10T15:10:33","slug":"golang-proteins-rna","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2020\/02\/10\/golang-proteins-rna\/","title":{"rendered":"GOLANG: Proteins &#038; RNA"},"content":{"rendered":"\n<p>It was an interesting exercise, and I solved in three ways. More details available here @ <a href=\"https:\/\/exercism.io\/my\/solutions\/94b768430cb84699bcbc07605d3bbe71\">exercism.io<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>RNA can be broken into three nucleotide sequences called codons, and then translated to a polypeptide like so:<\/p>\n\n\n\n<p>RNA: <code>\"AUGUUUUCU\"<\/code> =&gt; translates to<\/p>\n\n\n\n<p>Codons: <code>\"AUG\", \"UUU\", \"UCU\"<\/code>\n=&gt; which become a polypeptide with the following sequence =&gt;<\/p>\n\n\n\n<p>Protein: <code>\"Methionine\", \"Phenylalanine\", \"Serine\"<\/code><\/p>\n\n\n\n<p>There are 64 codons which in turn correspond to 20 amino acids; \nhowever, all of the codon sequences and resulting amino acids are not \nimportant in this exercise.  If it works for one codon, the program \nshould work for all of them.\nHowever, feel free to expand the list in the test suite to include them \nall.<\/p>\n\n\n\n<p>There are also three terminating codons (also known as &#8216;STOP&#8217; \ncodons); if any of these codons are encountered (by the ribosome), all \ntranslation ends and the protein is terminated.<\/p>\n\n\n\n<p>All subsequent codons after are ignored, like this:<\/p>\n\n\n\n<p>RNA: <code>\"AUGUUUUCUUAAAUG\"<\/code> =&gt;<\/p>\n\n\n\n<p>Codons: <code>\"AUG\", \"UUU\", \"UCU\", \"UAA\", \"AUG\"<\/code> =&gt;<\/p>\n\n\n\n<p>Protein: <code>\"Methionine\", \"Phenylalanine\", \"Serine\"<\/code><\/p>\n\n\n\n<p>Note the stop codon <code>\"UAA\"<\/code> terminates the translation and the final methionine is not translated into the protein sequence.<\/p>\n\n\n\n<p>Below are the codons and resulting Amino Acids needed for the exercise.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"\"><thead><tr><th>Codon<\/th><th>Protein<\/th><\/tr><\/thead><tbody><tr><td>AUG<\/td><td>Methionine<\/td><\/tr><tr><td>UUU, UUC<\/td><td>Phenylalanine<\/td><\/tr><tr><td>UUA, UUG<\/td><td>Leucine<\/td><\/tr><tr><td>UCU, UCC, UCA, UCG<\/td><td>Serine<\/td><\/tr><tr><td>UAU, UAC<\/td><td>Tyrosine<\/td><\/tr><tr><td>UGU, UGC<\/td><td>Cysteine<\/td><\/tr><tr><td>UGG<\/td><td>Tryptophan<\/td><\/tr><tr><td>UAA, UAG, UGA<\/td><td>STOP<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Version 1.0<\/h3>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-code\"><code>package protein\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n\t\"strings\"\n)\n\nvar (\n\t\/\/ ErrStop stop error\n\tErrStop = errors.New(\"STOP word has been found\")\n\t\/\/ ErrInvalidBase invalid base\n\tErrInvalidBase = errors.New(\"the base is invalid\")\n)\n\n\/\/ FromRNA Returns proteins\nfunc FromRNA(rna string) (out []string, err error) {\n\ti := 0\n\tout := make([]string, 0)\n\n\tif rna != \"\" {\n\t\tnewString := rna[i:3]\n\t\tfor {\n\t\t\tfmt.Println(\" Element: \", strings.ToUpper(newString))\n\n\t\t\tprt, err := FromCodon(newString)\n\t\t\tif err != nil &amp;&amp; strings.Compare(err.Error(), ErrInvalidBase.Error()) == 0 {\n\t\t\t\treturn out, err\n\t\t\t}\n\n\t\t\tif prt == \"\" {\n\t\t\t\treturn out, nil\n\t\t\t}\n\n\t\t\tout = append(out, prt)\n\t\t\ti += 3\n\t\t\tif i+3 &lt;= len(rna) {\n\t\t\t\tnewString = rna[i : i+3]\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tbreak\n\t\t}\n\n\t\treturn out, nil\n\t}\n\treturn out, ErrInvalidBase\n}\n\n\/\/ FromCodon Returns proteins\nfunc FromCodon(codon string) (string, error) {\n\tif codon != \"\" &amp;&amp; len(codon)%3 == 0 {\n\t\tswitch strings.ToUpper(codon) {\n\t\tcase \"AUG\":\n\t\t\treturn \"Methionine\", nil\n\t\tcase \"UUU\":\n\t\t\treturn \"Phenylalanine\", nil\n\t\tcase \"UUC\":\n\t\t\treturn \"Phenylalanine\", nil\n\n\t\tcase \"UUA\":\n\t\t\treturn \"Leucine\", nil\n\t\tcase \"UUG\":\n\t\t\treturn \"Leucine\", nil\n\t\tcase \"UCU\":\n\t\t\treturn \"Serine\", nil\n\t\tcase \"UCC\":\n\t\t\treturn \"Serine\", nil\n\t\tcase \"UCA\":\n\t\t\treturn \"Serine\", nil\n\t\tcase \"UCG\":\n\t\t\treturn \"Serine\", nil\n\t\tcase \"UAU\":\n\t\t\treturn \"Tyrosine\", nil\n\t\tcase \"UAC\":\n\t\t\treturn \"Tyrosine\", nil\n\t\tcase \"UGU\":\n\t\t\treturn \"Cysteine\", nil\n\t\tcase \"UGC\":\n\t\t\treturn \"Cysteine\", nil\n\t\tcase \"UGG\":\n\t\t\treturn \"Tryptophan\", nil\n\n\t\tcase \"UAA\":\n\t\t\treturn \"\", ErrStop\n\t\tcase \"UAG\":\n\t\t\treturn \"\", ErrStop\n\t\tcase \"UGA\":\n\t\t\treturn \"\", ErrStop\n\t\tdefault:\n\t\t\treturn \"\", ErrInvalidBase\n\t\t}\n\t}\n\treturn \"\", ErrInvalidBase\n}<\/code><\/pre>\n<\/div><\/div>\n\n\n\n<p>It was the simplest possible, wherein I had two functions as expected<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>func FromCodon(codon string) (string, error)<\/code><\/li><li><code>func FromRNA(rna string) ([]string, error)<\/code><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">FromCodon<\/h3>\n\n\n\n<p>It has a simple SWITCH-CASE control loop that takes in the input string and returns the corresponding <code>Protein<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">FromRNA<\/h3>\n\n\n\n<p>It has a little looping to iterate through the input string and tokenize it into <code>codon<\/code> and return the final array.<\/p>\n\n\n\n<p>A simple for loop is employed which creates slice of length 3 to fetch the codon for using the <code>FromCodon<\/code><\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<p><code>newString := rna[i:3]<\/code><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> i += 3\n if i+3 &lt;= len(rna) {\n newString = rna[i : i+3]\n continue\n }<\/pre>\n<\/div><\/div>\n\n\n\n<p>This loops runs through the whole length and find the corresponding proteins and returns the final array.<\/p>\n\n\n\n<p>E.g.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Input : &#8220;AUGUUUUAA&#8221;       <\/li><li>Output: []string{&#8220;Methionine&#8221;, &#8220;Phenylalanine&#8221;},<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Version 2.0<\/h2>\n\n\n\n<p>The previous version was a bad optimization, so came up with the next version.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-code\"><code>var (\n\t\/\/ ErrStop stop error\n\tErrStop = errors.New(\"STOP word has been found\")\n\n\t\/\/ ErrInvalidBase invalid base\n\tErrInvalidBase = errors.New(\"the base is invalid\")\n\n\t\/\/ Proteins that are to mapped\n\tProteins = map[string]string{\n\t\t\"AUG\": \"Methionine\",\n\t\t\"UUU\": \"Phenylalanine\",\n\t\t\"UUC\": \"Phenylalanine\",\n\t\t\"UUA\": \"Leucine\",\n\t\t\"UUG\": \"Leucine\",\n\t\t\"UCU\": \"Serine\",\n\t\t\"UCC\": \"Serine\",\n\t\t\"UCA\": \"Serine\",\n\t\t\"UCG\": \"Serine\",\n\t\t\"UAU\": \"Tyrosine\",\n\t\t\"UAC\": \"Tyrosine\",\n\t\t\"UGU\": \"Cysteine\",\n\t\t\"UGC\": \"Cysteine\",\n\t\t\"UGG\": \"Tryptophan\",\n\t\t\"UAA\": \"STOP\",\n\t\t\"UAG\": \"STOP\",\n\t\t\"UGA\": \"STOP\",\n\t}\n)\n\n\/\/ FromRNA Returns proteins\nfunc FromRNA(rna string) ([]string, error) {\n\ti := 0\n\tout := make([]string, 0)\n\n\tif rna != \"\" {\n\t\tnewString := rna[i:3]\n\t\tfor {\n\n\t\t\tprt, err := FromCodon(newString)\n\t\t\tif err != nil &amp;&amp; strings.Compare(err.Error(), ErrInvalidBase.Error()) == 0 {\n\t\t\t\treturn out, err\n\t\t\t}\n\n\t\t\tif prt == \"\" {\n\t\t\t\treturn out, nil\n\t\t\t}\n\n\t\t\tout = append(out, prt)\n\t\t\ti += 3\n\t\t\tif i+3 &lt;= len(rna) {\n\t\t\t\tnewString = rna[i : i+3]\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tbreak\n\t\t}\n\n\t\treturn out, nil\n\t}\n\treturn out, ErrInvalidBase\n}\n\n\/\/ FromCodon Returns proteins\nfunc FromCodon(codon string) (string, error) {\n\tout := \"\"\n\tif codon != \"\" {\n\n\t\tcodon = strings.ToUpper(codon)\n\t\tout = Proteins[codon]\n\n\t\tif strings.Compare(out, \"STOP\") == 0 {\n\t\t\treturn \"\", ErrStop\n\t\t} else if out == \"\" {\n\t\t\treturn \"\", ErrInvalidBase\n\t\t}\n\t}\n\treturn out, nil\n}<\/code><\/pre>\n<\/div><\/div>\n\n\n\n<p>I did away with the Switch-Case and used Map.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<p>Proteins = map[string]string{<br> <strong>        &#8220;AUG&#8221;: &#8220;Methionine&#8221;,<br>         &#8220;UUU&#8221;: &#8220;Phenylalanine&#8221;,<br>         &#8220;UUC&#8221;: &#8220;Phenylalanine&#8221;,<br>         &#8220;UUA&#8221;: &#8220;Leucine&#8221;,<br>         &#8220;UUG&#8221;: &#8220;Leucine&#8221;,<br>         &#8220;UCU&#8221;: &#8220;Serine&#8221;,<br>         &#8220;UCC&#8221;: &#8220;Serine&#8221;,<br>         &#8220;UCA&#8221;: &#8220;Serine&#8221;,<br>         &#8220;UCG&#8221;: &#8220;Serine&#8221;,<br>         &#8220;UAU&#8221;: &#8220;Tyrosine&#8221;,<br>         &#8220;UAC&#8221;: &#8220;Tyrosine&#8221;,<br>         &#8220;UGU&#8221;: &#8220;Cysteine&#8221;,<br>         &#8220;UGC&#8221;: &#8220;Cysteine&#8221;,<br>         &#8220;UGG&#8221;: &#8220;Tryptophan&#8221;,<br>         &#8220;UAA&#8221;: &#8220;STOP&#8221;,<br>         &#8220;UAG&#8221;: &#8220;STOP&#8221;,<br>         &#8220;UGA&#8221;: &#8220;STOP&#8221;,<\/strong><br>     }<\/p>\n<\/div><\/div>\n\n\n\n<p>and to find the corresponding protein from the codon it was just a matter of finding the key and value.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-preformatted\">    out = Proteins[codon]    \n    if strings.Compare(out, \"STOP\") == 0 {\n        return \"\", ErrStop\n    } else if out == \"\" {\n        return \"\", ErrInvalidBase\n    }<\/pre>\n<\/div><\/div>\n\n\n\n<p>The code for the FromRNA remained same.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Version 3.0<\/h2>\n\n\n\n<p>The feeling of still can be optimized was there, so came up with the version 3.0.<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-code\"><code>\/\/ FromRNA Returns proteins\nfunc FromRNA(rna string) ([]string, error) {\n\t\/\/ 3 character long\n\tvar rex = regexp.MustCompile(`...`)\n\tcodons := rex.FindAllString(rna, -1)\n\n\tvar proteins []string\n\n\tfor _, codon := range codons {\n\t\tprotein, err := FromCodon(codon)\n\n\t\tswitch err {\n\t\tcase ErrStop:\n\t\t\treturn proteins, nil\n\t\tcase ErrInvalidBase:\n\t\t\treturn proteins, err\n\t\t}\n\n\t\tproteins = append(proteins, protein)\n\t}\n\n\treturn proteins, nil\n}<\/code><\/pre>\n<\/div><\/div>\n\n\n\n<p>Utilized the regular expression to chunk the incoming data and utilized the <code>FindAllString<\/code> to get the codons.<\/p>\n\n\n\n<h2 class=\"has-text-align-center wp-block-heading\">&#8212; THE &#8211; END &#8212;<\/h2>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It was an interesting exercise, and I solved in three ways. More details available here @ exercism.io Introduction RNA can be broken into three nucleotide sequences called codons, and then translated to a polypeptide like so: RNA: &#8220;AUGUUUUCU&#8221; =&gt; translates to Codons: &#8220;AUG&#8221;, &#8220;UUU&#8221;, &#8220;UCU&#8221; =&gt; which become a polypeptide with the following sequence =&gt; [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":378,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[34],"tags":[23,49,50],"class_list":["post-375","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-golang","tag-maps","tag-regexp"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=375"}],"version-history":[{"count":0,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/375\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/378"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}