Package String

Saurabh Sharma

Just like Java or C++ there are standard libraries (packages) available with golang. The one that you might end up using the most often is strings.

For this blog, I was writing another practice exercise that is meant to count number of words in an input string.

Go string is a sequence of variable-width characters (each 1 to 4 bytes).

Definition of Word

  1. A number composed of one or more ASCII digits (ie “0” or “1234”) OR
  2. A simple word composed of one or more ASCII letters (ie “a” or “they”) OR
  3. A contraction of two simple words joined by a single apostrophe (ie “it’s” or “they’re”)

Package : Strings

The package string exposes utility functions to manipulate UTF-8 encoded strings.

Strings are value types and immutable, which means that once created, you cannot modify the contents of the string. In other words, strings are immutable arrays of bytes.

— Official GoDocs

Solution

package wordcount

import (
    "strings"
    "unicode"
)

// Frequency of the words
type Frequency map[string]int

// WordCount Count the words occurrence.
func WordCount(in string) Frequency {
    var myMap Frequency = make(map[string]int, 0)

    in = strings.ToLower(in)
    inStrings := strings.FieldsFunc(in, func(c rune) bool {
        return c == ' ' || c == '\n' || c == '\t' || c == ','
    })

    for _, val := range inStrings {
        val = strings.TrimFunc(val, func(r rune) bool {
            return !unicode.IsLetter(r) && !unicode.IsDigit(r)
        })
        myMap[val]++
    }
    return myMap
}

Functions used

  • FieldsFunc
  • TrimFunc

FieldsFunc

More details here.

func FieldsFunc(s string, f func(rune) bool) []string

FieldsFunc splits the string s at each run of Unicode code points c satisfying f(c) and returns an array of slices of s. If all code points in s satisfy f(c) or the string is empty, an empty slice is returned. FieldsFunc makes no guarantees about the order in which it calls f(c). If f does not return consistent results for a given c, FieldsFunc may crash.

golang official documentation
func(c rune) bool {
        return c == ' ' || c == '\n' || c == '\t' || c == ','
    })

The function FieldFunc is supplied with an input string s and a function that runs through the runes (unicode point) and returns the split if the condition evaluates to be true.

Example

in > testing, 1, 2 testing

The function will split it as under

{"testing", "1", "2", "testing"}

TrimFunc

More details here.

func TrimFunc(s string, f func(rune) bool) string

TrimFunc returns a slice of the string s with all leading and trailing Unicode code points c satisfying f(c) removed.

golang documentation

This function trims down the unwanted characters in the string.

func(r rune) bool {
            return !unicode.IsLetter(r) && !unicode.IsDigit(r)
}

Essentially we remove everything that is not letter or digit form the string.

Example

in> car: carpet as java: javascript!!&@$%^&

It will strip car: to car likewise javascript will be stripped of the unwanted characters in the end.

— THE – END —