GOLANG: Proteins & RNA

Saurabh Sharma

It was an interesting exercise, and I solved in three ways. More details available here @ exercism.io

Introduction

RNA can be broken into three nucleotide sequences called codons, and then translated to a polypeptide like so:

RNA: "AUGUUUUCU" => translates to

Codons: "AUG", "UUU", "UCU" => which become a polypeptide with the following sequence =>

Protein: "Methionine", "Phenylalanine", "Serine"

There are 64 codons which in turn correspond to 20 amino acids; however, all of the codon sequences and resulting amino acids are not important in this exercise. If it works for one codon, the program should work for all of them. However, feel free to expand the list in the test suite to include them all.

There are also three terminating codons (also known as ‘STOP’ codons); if any of these codons are encountered (by the ribosome), all translation ends and the protein is terminated.

All subsequent codons after are ignored, like this:

RNA: "AUGUUUUCUUAAAUG" =>

Codons: "AUG", "UUU", "UCU", "UAA", "AUG" =>

Protein: "Methionine", "Phenylalanine", "Serine"

Note the stop codon "UAA" terminates the translation and the final methionine is not translated into the protein sequence.

Below are the codons and resulting Amino Acids needed for the exercise.

CodonProtein
AUGMethionine
UUU, UUCPhenylalanine
UUA, UUGLeucine
UCU, UCC, UCA, UCGSerine
UAU, UACTyrosine
UGU, UGCCysteine
UGGTryptophan
UAA, UAG, UGASTOP

Version 1.0

package protein

import (
	"errors"
	"fmt"
	"strings"
)

var (
	// ErrStop stop error
	ErrStop = errors.New("STOP word has been found")
	// ErrInvalidBase invalid base
	ErrInvalidBase = errors.New("the base is invalid")
)

// FromRNA Returns proteins
func FromRNA(rna string) (out []string, err error) {
	i := 0
	out := make([]string, 0)

	if rna != "" {
		newString := rna[i:3]
		for {
			fmt.Println(" Element: ", strings.ToUpper(newString))

			prt, err := FromCodon(newString)
			if err != nil && strings.Compare(err.Error(), ErrInvalidBase.Error()) == 0 {
				return out, err
			}

			if prt == "" {
				return out, nil
			}

			out = append(out, prt)
			i += 3
			if i+3 <= len(rna) {
				newString = rna[i : i+3]
				continue
			}
			break
		}

		return out, nil
	}
	return out, ErrInvalidBase
}

// FromCodon Returns proteins
func FromCodon(codon string) (string, error) {
	if codon != "" && len(codon)%3 == 0 {
		switch strings.ToUpper(codon) {
		case "AUG":
			return "Methionine", nil
		case "UUU":
			return "Phenylalanine", nil
		case "UUC":
			return "Phenylalanine", nil

		case "UUA":
			return "Leucine", nil
		case "UUG":
			return "Leucine", nil
		case "UCU":
			return "Serine", nil
		case "UCC":
			return "Serine", nil
		case "UCA":
			return "Serine", nil
		case "UCG":
			return "Serine", nil
		case "UAU":
			return "Tyrosine", nil
		case "UAC":
			return "Tyrosine", nil
		case "UGU":
			return "Cysteine", nil
		case "UGC":
			return "Cysteine", nil
		case "UGG":
			return "Tryptophan", nil

		case "UAA":
			return "", ErrStop
		case "UAG":
			return "", ErrStop
		case "UGA":
			return "", ErrStop
		default:
			return "", ErrInvalidBase
		}
	}
	return "", ErrInvalidBase
}

It was the simplest possible, wherein I had two functions as expected

  • func FromCodon(codon string) (string, error)
  • func FromRNA(rna string) ([]string, error)

FromCodon

It has a simple SWITCH-CASE control loop that takes in the input string and returns the corresponding Protein.

FromRNA

It has a little looping to iterate through the input string and tokenize it into codon and return the final array.

A simple for loop is employed which creates slice of length 3 to fetch the codon for using the FromCodon

newString := rna[i:3]

 i += 3
 if i+3 <= len(rna) {
 newString = rna[i : i+3]
 continue
 }

This loops runs through the whole length and find the corresponding proteins and returns the final array.

E.g.

  • Input : “AUGUUUUAA”
  • Output: []string{“Methionine”, “Phenylalanine”},

Version 2.0

The previous version was a bad optimization, so came up with the next version.

var (
	// ErrStop stop error
	ErrStop = errors.New("STOP word has been found")

	// ErrInvalidBase invalid base
	ErrInvalidBase = errors.New("the base is invalid")

	// Proteins that are to mapped
	Proteins = map[string]string{
		"AUG": "Methionine",
		"UUU": "Phenylalanine",
		"UUC": "Phenylalanine",
		"UUA": "Leucine",
		"UUG": "Leucine",
		"UCU": "Serine",
		"UCC": "Serine",
		"UCA": "Serine",
		"UCG": "Serine",
		"UAU": "Tyrosine",
		"UAC": "Tyrosine",
		"UGU": "Cysteine",
		"UGC": "Cysteine",
		"UGG": "Tryptophan",
		"UAA": "STOP",
		"UAG": "STOP",
		"UGA": "STOP",
	}
)

// FromRNA Returns proteins
func FromRNA(rna string) ([]string, error) {
	i := 0
	out := make([]string, 0)

	if rna != "" {
		newString := rna[i:3]
		for {

			prt, err := FromCodon(newString)
			if err != nil && strings.Compare(err.Error(), ErrInvalidBase.Error()) == 0 {
				return out, err
			}

			if prt == "" {
				return out, nil
			}

			out = append(out, prt)
			i += 3
			if i+3 <= len(rna) {
				newString = rna[i : i+3]
				continue
			}
			break
		}

		return out, nil
	}
	return out, ErrInvalidBase
}

// FromCodon Returns proteins
func FromCodon(codon string) (string, error) {
	out := ""
	if codon != "" {

		codon = strings.ToUpper(codon)
		out = Proteins[codon]

		if strings.Compare(out, "STOP") == 0 {
			return "", ErrStop
		} else if out == "" {
			return "", ErrInvalidBase
		}
	}
	return out, nil
}

I did away with the Switch-Case and used Map.

Proteins = map[string]string{
“AUG”: “Methionine”,
“UUU”: “Phenylalanine”,
“UUC”: “Phenylalanine”,
“UUA”: “Leucine”,
“UUG”: “Leucine”,
“UCU”: “Serine”,
“UCC”: “Serine”,
“UCA”: “Serine”,
“UCG”: “Serine”,
“UAU”: “Tyrosine”,
“UAC”: “Tyrosine”,
“UGU”: “Cysteine”,
“UGC”: “Cysteine”,
“UGG”: “Tryptophan”,
“UAA”: “STOP”,
“UAG”: “STOP”,
“UGA”: “STOP”,

}

and to find the corresponding protein from the codon it was just a matter of finding the key and value.

    out = Proteins[codon]    
    if strings.Compare(out, "STOP") == 0 {
        return "", ErrStop
    } else if out == "" {
        return "", ErrInvalidBase
    }

The code for the FromRNA remained same.

Version 3.0

The feeling of still can be optimized was there, so came up with the version 3.0.

// FromRNA Returns proteins
func FromRNA(rna string) ([]string, error) {
	// 3 character long
	var rex = regexp.MustCompile(`...`)
	codons := rex.FindAllString(rna, -1)

	var proteins []string

	for _, codon := range codons {
		protein, err := FromCodon(codon)

		switch err {
		case ErrStop:
			return proteins, nil
		case ErrInvalidBase:
			return proteins, err
		}

		proteins = append(proteins, protein)
	}

	return proteins, nil
}

Utilized the regular expression to chunk the incoming data and utilized the FindAllString to get the codons.

— THE – END —