Browse Category

Tutorials

Linux tutorials

How to install nginx on debian 11

NGINX is a free, open-source web server that is well-known for its high performance, stability, and low resource consumption. It can be used to serve static and dynamic content, reverse proxy, load balance, and cache HTTP requests. In this blog post, we will go through the step-by-step process of installing NGINX on Debian 11.

Keep Reading

Separation Anxiety: A Tutorial for Isolating Your System with Linux Namespaces

With the advent of tools like Docker, Linux Containers, and others, it has become super easy to isolate Linux processes into their own little system environments. This makes it possible to run a whole range of applications on a single real Linux machine and ensure no two of them can interfere with each other, without having to resort to using virtual machines. These tools have been a huge boon to PaaS providers. But what exactly happens under the hood?

These tools rely on a number of features and components of the Linux kernel. Some of these features were introduced fairly recently, while others still require you to patch the kernel itself. But one of the key components, using Linux namespaces, has been a feature of Linux since version 2.6.24 was released in 2008.

Anyone familiar with chroot already has a basic idea of what Linux namespaces can do and how to use namespace generally. Just as chroot allows processes to see any arbitrary directory as the root of the system (independent of the rest of the processes), Linux namespaces allow other aspects of the operating system to be independently modified as well. This includes the process tree, networking interfaces, mount points, inter-process communication resources and more.

Keep Reading

A Data Engineer’s Guide To Non-Traditional Data Storages

Data Engineering

With the rise of big data and data science, many engineering roles are being challenged and expanded. One new-age role is data engineering.

Originally, the purpose of data engineering was the loading of external data sources and the designing of databases (designing and developing pipelines to collect, manipulate, store, and analyze data).

It has since grown to support the volume and complexity of big data. So data engineering now encapsulates a wide range of skills, from web-crawling, data cleansing, distributed computing, and data storage and retrieval.

For data engineering and data engineers, data storage and retrieval is the critical component of the pipeline together with how the data can be used and analyzed.

In recent times, many new and different data storage technologies have emerged. However, which one is best suited and has the most appropriate features for data engineering?

Keep Reading

An HDFS Tutorial for Data Analysts Stuck With Relational Databases

Introduction

By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

HDFS utilizes commodity hardware along with open source software to reduce the overall cost per byte of storage.

With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. It does not require the underpinnings and overhead to support transaction atomicity, consistency, isolation, and durability (ACID) as is necessary with traditional relational database systems.

Moreover, when compared with enterprise and commercial databases, such as Oracle, utilizing Hadoop as the analytics platform avoids any extra licensing costs.

One of the questions many people ask when first learning about HDFS is: How do I get my existing data into the HDFS?

In this article, we will examine how to import data from a PostgreSQL database into HDFS. We will use Apache Sqoop, which is currently the most efficient, open source solution to transfer data between HDFS and relational database systems. Apache Sqoop is designed to bulk-load data from a relational database to the HDFS (import) and to bulk-write data from the HDFS to a relational database (export).

Keep Reading

Landing Page Design: Building the Ultimate Landing Page

When I started to implement the Ultimate Hacking Keyboard, I wasn’t very marketing savvy. As an engineer, all I could see ahead was product development and technical challenges. However, marketing is just as important and must not be overlooked. A good landing page is a must-have.

Luckily for us, we realized that there’s a lot to do before we start our crowdfunding campaign, and an attractive site could turn this otherwise idle time to our advantage by capturing people’s attention, generating more subscribers and priming us for the campaign.

Keep Reading