2021-10-03: Learning Python, Open Source at Work

I am not great with writing. Writing up nicely structured stories around things and posting them regularly as blog posts is something which I have not done a lot. For this exact reason, I like doing these smaller posts about whatever I want to write about, with as much less/more detail I want to provide and I have been liking it now that I have been doing it for a few weeks.

Learning and working with Python

Last week I got to work on an internal tool at the company (more on in this in the next section) which I wrote in Python. I was discussing with someone on how started to learn Python in 2013 when I was in college.

I remember learning about Python as the first programming language outside of my school’s curriculum. I was lucky to get great Computer Science teachers during my school time which made CS more interesting. In school, we were taught OOP programming with Java. In late 2011 when I was in school, I tried to look for other programming languages and I came across the TIOBE index. I found about the existence of Python through this index. Till then I was working with Java on BlueJ environment which I think is still continued to be used in schools. TBH, I liked it (I don’t know why we couldn’t get a more intuitive environment for C. We had Boreland Turbo C)

I started spending more time learning Python in 2013 when I got my own laptop (which I formatted like 3 times within 14 days of purchasing it with Debian, Ubuntu etc.). I learnt via official docs but mostly through YouTube videos. I also subscribed to the Python Tutor mailing list where new Python users could ask questions and learn. I unsubscribed to this list in 2019 when I was cleaning up my inbox, but I sincerely appreciate the people who have been helping new people for more than 2 decades now.

Here’s a code which I had written in November 2013. For some reason I thought, I’ll write code to perform basic differentiation using Python. I had some older code as well but they were mostly written by following some tutorial etc. Other self written code I don’t have backed up now. Behold.

""" Input standards :
   # Input will be of string type (By default it accepts string)
   # All inputs are polynomial with each term space seperated ! : eg : 'term1 term2 term3...'
   # Each term has its sign, coefficient, variable, and its power
   # Standard for writing each term : eg : 1.x^4, 4.x^-4 are valid or like -- <sign><coeff>.<variable>^<power>
   # Sample valid inputs : '+3.x^3 +4.x^5 -5.x^2'
   # to write constant for eg: +5 write as +5.x^0
    # Input as string
    # split from spaces each term
    # split each term - first part coeff(with sign) second part exponent
    # apply differentiation rules
    # append final evaluated expression and print
class diff:
    def __init__(self):
        self.poly = input('Enter the polynomial : ')     # Takes input expression
        self.splitexp = []                            # initialises the list split expression which stores the splitted expression from spaces ! 
        self.splitterm = []                           # Initialises split term list which stores each element of split exp as coeff and powers !
        self.final = []                               # # stores final evaluated terms ! 
    def __calpoly__(self):
        senti = 0
        self.splitexp = self.poly.split(' ')          # spliting input expression from spaces and storin  in split exp ! 
        for elem in range(0,len(self.splitexp)):
            senti = 9999
            self.splitterm = []
            self.splitterm = self.splitexp[elem].split('.')
            if self.splitterm[1][2] == '-':
                pro = eval(self.splitterm[0]) * -eval(self.splitterm[1][3])
                pwr = -eval(self.splitterm[1][3]) - 1
                pro = eval(self.splitterm[0]) * eval(self.splitterm[1][2])
                pwr = eval(self.splitterm[1][2]) - 1
            #DEBUG 1 : print(self.final)
            for elem1 in range(0,len(self.final)):
                    #DEBUG 2 : print('-------------------------itering-----------------------------------')
                    sep = []
                    sep = self.final[elem1].split('.')
                    if sep[1] == (self.splitterm[1][0] + '^' + str(pwr)):
                        #DEBUG 3 : print('powers are equal ')
                        sep[0] = str(eval(sep[0]) + pro)
                        pos = elem1
                        if eval(sep[0]) >= 0:
                            #DEBUG 4 : print('positive coeff ! ')
                            self.final[pos] = '+' + sep[0] + self.splitterm[1][0] + '^' + str(pwr)
                            self.final[pos] = sep[0] + self.splitterm[1][0] + '^' + str(pwr)
                        senti = -9999
            if senti == 9999:
                #DEBUG 5 : print('sentinel unchanged ! ')
                if pro >= 0:
                    #DEBUG 6 : print('product >= 0 ')
                    if pro == 0:
                        self.final.append('+' + str(pro) + '.' + self.splitterm[1][0] + '^' + str(pwr))
                    #DEBUG 7 : print('product < 0 ')
                    self.final.append(str(pro) + '.' + self.splitterm[1][0] + '^' + str(pwr))
        exp = ' '.join(self.final)


Open source at work

Talking about Python, I was recently working on an internal tool at work for ML teams to use written in Golang called trail.

Nothing too crazy, just a CLI tool which calls an HTTP service with some dataset and generate reports with its response. The good part is the initiative which we take at work to keep our projects open sourced on GitHub.

Doing this has also made me to keep things clean, simple and documented at all times. Feels nice to do continuous releases for these projects and people get to see what we work on.

One another project I worked on at work last week was sentinel, also open sourced, a system to filter phone calls which we get and flag them based on certain constraints. From this description you could understand that these filters will always evolve and would need to be continuously added over time. Hence, I considered that it’s important to make this process easier for future filter plugin writers.

So the easiest workflow I thought was:

  • Add/implement a filter logic.
  • Add its name in the config file you use.

and that’s it. There should be minimal code involved while adding features to the system.

1. New filters are made by implementing an ABC like below
class FilterBase(ABC):
    Base class for anomaly filters.
        successor (FilterBase): filter function instances implemented from
                                  this base class.
        handle(df: pd.DataFrame): Calls filter functions chained as its successor.
        preprocess(df: pd.DataFrame): :TODO:
        process(df: pd.DataFrame): Process serving sample (untagged call/turn dataframe).

    def __init__(self, successor: "FilterBase" = None, **kwargs):
        self.successor = successor

    def handle(self, df: pd.DataFrame, df_list: List[pd.DataFrame]):
        Calls filter functions chained as its successor.
            df (pd.DataFrame): dataframe.
            df_list (list): list of dataframes.

        if self.successor:
            self.successor.handle(df, df_list)
        return df_list

    def preprocess(self, df: pd.DataFrame):

    def process(self, df: pd.DataFrame) -> pd.DataFrame:
        Process serving sample (untagged call/turn dataframe).
            df (pd.DataFrame): serving sample dataframe.
2. They are then registered to the filter registry using a filter factory decorator like below.
@FilterFactory.register(name="call_end_state", description="Calls with a particular end state")
class EndStateFilter(FilterBase):
    def __init__(self, *args, **kwargs):
        self.end_state = kwargs.get("end_state", ["COF"])
        super().__init__(*args, **kwargs)

    def preprocess(self, df: pd.DataFrame):
        raise NotImplementedError

    def process(self, df: pd.DataFrame) -> pd.DataFrame:
        filtered_df = self._filter_last_turn(df)
        filtered_df = self.annotate(filtered_df, filtered_df.state,
                                    self._filter_state, "call_end_state")

        return filtered_df

    def _filter_state(self, item: str):
        if item in self.end_state:
            return True

    def _filter_last_turn(self, df: pd.DataFrame) -> pd.DataFrame:
        filtered_df = df.groupby(["call_uuid", "state"], as_index=False).first()
        filtered_df = filtered_df.drop_duplicates(subset=["call_uuid"], keep="last")

        return filtered_df

Implementation of the registry decorator here.

3. Updating config

People use config like below. Now, they just need to add name of the new filter they just created and supply it via the CLI command.

name: anomaly
description: Get anomalous calls 

data_url: s3://bucket/path/to/dataframe.csv 

    webhook_url: ""
      - user@example.com
      - user2@example.com

    limit: 10
    annotation_key: "prediction_low_confidence"
      confidence_threshold: 0.50
    limit: 10
    annotation_key: "low_asr_confidence"
      confidence_threshold: 0.50
    limit: 10
    annotation_key: "no_alternatives"
    limit: 10
    annotation_key: "call_end_state"
      end_state: ["REPROMPT_INTENT"]

Idea here was to have minimal context about the other parts of the system. This might evolve over time when more complex bits are added but this is its current state and usage pattern.

GitHub as a social feed

Recently noticed that David Baezley has started his blog on GitHub. He writes markdown and publishes a release with link to his markdown file.

GitHub releases show up on your followers’ homepage feed. On GitHub releases you can also react via emojis. So basically, you can create a repository publish releases, add your text in the release notes and should show up in your followers’ feed which people can react on. Kinda basic social network?

Meeting people

Met a lot of colleagues last week, great people, had fun conversations. Celebrated 5 year anniversary of the company last weekend 🎉