Photo by Brianna Santellan on Unsplash

I am starting a series of blog posts aiming to cover the basics of various data science and machine learning concepts. I’m mainly doing this to understand better these concepts myself. I hope that during this process, I can help others understand them too. Okay, let’s do it!

In the field of machine learning, a confusion matrix (also known as an error matrix) is a table that allows us to visualize the performance of an algorithm. It is used for classification tasks only.


After reading this article you’ll know:

  • What are functions and how to define them
  • What are the parameters and arguments
  • How to return values from functions
  • How to define a function documentation
  • What represents the scope in Python
  • What are keyword arguments
  • Flexible and Default arguments
  • What are the Python exceptions and how to handle and how to raise them
  • Assert Statements


A function is an organized reusable piece of code solving a specific task. Functions help us to keep our code clean and provide us with the power of code reusability.

Built-in Functions

Python has several built-into functions which are always…

After reading this blog post you’ll know:

  • What is the tuple data type in Python
  • How to initialize tuples
  • How to iterate over tuples
  • Common sequence operations over tuples
  • What is tuple unpacking
  • What’s the difference between tuples and lists


Sequences are a very common type of iterable. Some examples for built-in sequence types are lists, strings, and tuples. They support efficient element access using integer indices and define a method that returns the length of the sequence.



After reading this blog post you’ll know:

  • What are an object’s identity, type, and value
  • What are mutable and immutable objects

Introduction (Objects, Values, and Types)

All the data in a Python code is represented by objects or by relations between objects. Every object has an identity, a type, and a value.



An object’s type defines the possible values and operations (e.g. “does it have a…


After reading this blog post, you’ll know:

  • How the iteration in Python works under the hood
  • What are iterables and iterators and how to create them
  • What is the iterator protocol
  • What is a lazy evaluation
  • What are the generator functions and generator expressions

Python’s for loop

Python doesn’t have traditional for loops. Let’s see a pseudocode of how a traditional for loop looks in many other programming languages.

A Pseudocode of for loop
  • The initializer section is executed only once, before entering the loop.
  • The condition section must be a boolean expression. …

Icons source:

After reading this blog post, you’ll know some basic techniques to extract features from some text, so you can use these features as input for machine learning models.

What is NLP (Natural Language Processing)?

NLP is a subfield of computer science and artificial intelligence concerned with interactions between computers and human (natural) languages. It is used to apply machine learning algorithms to text and speech.

For example, we can use NLP to create systems like speech recognition, document summarization, machine translation, spam detection, named entity recognition, question answering, autocomplete, predictive typing and so on.

Nowadays, most of us have smartphones that have speech recognition. These smartphones…



As data scientists, we often work with tons of data. The data we want to load can be stored in different ways. The most common formats are the CSV files, Excel files, or databases. Also, the data can be available throughout web services. Of course, there are many other formats. To work with the data, we need to represent it in a tabular structure. Anything tabular is arranged in a table with rows and columns.

In some cases, the data is already tabular and it’s easy to load it. In other cases, we work with unstructured data. The unstructured data


When a data scientist works with data, typically that data is stored in CSV files, excel files, databases, and other formats. Also, this data is commonly loaded as pandas DataFrame. For simplicity in the examples, I’ll be using Python lists that contains our data. I’m assuming that you have some knowledge about Python data types, functions, methods, and packages. If you don’t have that knowledge, I suggest you read my previous article that covers these topics.

Data Visualizaion

Data visualization is a very important part of data analysis. You can use it to explore your data. If you understand your data well…

Python Data Types

In Python, we have many data types. The most common ones are float (floating point), int (integer), str (string), bool (Boolean), list, and dict (dictionary).

  • float - used for real numbers.
  • int - used for integers.
  • str - used for texts. We can define strings using single quotes'value', double quotes"value", or triple quotes"""value""". The triple quoted strings can be on multiple lines, the new lines will be included in the value of the variable. They’re also used for writing function documentation.
  • bool - used for truthy values. Useful to perform a filtering operation on a data.
  • list - used to…

Ventsislav Yordanov

Аn aspiring learner of Data Science and Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store