Data Science Series #3: Apache Kafka

The concept of “big data”, which has evolved and grown ever since the Internet entered our homes and pockets, has also become increasingly important. Even a branch of science has developed so that this data, which is of a size that cannot be managed with traditional database tools, can be analyzed: data science. If you want to get an idea of the uses of this science, you can find other articles in the data science series here.

Big data continues to grow as you read this article. Click, click, click on the site from the x link, tick, plug, stay on the site for more than 1 minute, tick, plug... The common goal in collecting this data anonymously on every system you use over the internet is to give you a better experience. So how do we store this instantaneously growing data? More importantly, how do we make sense of them? Various technologies exist to extract data from its source and make it analyzable. In this article we will talk about apache kafka, which is an important part of data science and which we also use. Enjoyable readings...

Apache kafka, in its simplest definition, allows us to transfer flowing data into a queue to other systems such as hadoop, spark, elasticsearch.

Apache kafka is an open source project developed within linkedin and added to github in 2011 and developed with java and scala languages. Kafka is a messaging system that can work with big data, is scalable, suitable for distributed systems, can transfer instantaneous collected data quickly and without errors.

Messaging system? Is Kafka writing a messaging app?
If you are only a user in the software field, it is very normal that this problem first occurred to you. The answer is, no. When we say “messaging”, we are actually referring to the transfer of data from one application to another. Thus, applications are not interested in whether the data is correct, but rather focus on analyzing it. In structures like this, the reliability of the data is very important.

The concept of “message queue” is used for the exact transfer of data. Message queues queue the data to be transferred and transfer accordingly. Messaging patterns have evolved into two types: point to point and publish-subscribe.

Point to point: This method is sometimes referred to as “producer-consumer”. The “producer” produces a message and writes it somewhere, the “consumer” consumes. Messages are queued, sequentially and demanded by consumers. Only 1 consumer can consume a message, then the message is deleted from the queue. You can consider the products on your e-commerce site as an example. In the same way, when a message is read, a push notification goes off.

Publish Subscribe: In this system, the messages remain on one topic. Let's explain it through an example. We (publishers) upload posts to this blog, and you (subscribers) are aware of our articles by subscribing to our newsletter and reading them. Anyone who wants to the system can subscribe and read the same post over and over again.



As a result...
Kafka's main role is to create a combination of messaging, storage and broadcast processing systems. It is not very common for these systems to work together, which is the competitive advantage of Kafka.
Architectures such as hdfs (hadoop distributed file system) offer the ability to store and process data from the past. A traditional messaging system serves to forward messages that will come to you after subscribing. Kafka combines these two qualities.

If you have questions about Apache kafka and artificial intelligence, you can contact us through our contact information, which can be found on our website. We've come to the end of a post in the data science series. Subscribe to follow more or follow us on instagram, facebook and linkedin! See you soon...

Sources:

https://www. I'm angry. com/news/apache-kafka-what's/
https://medium. com/devopsturkiye/apache-caucus-input%c 5% 9f-3399e5f33f8e
http://www. Buyukveri. CO/apache-kafka-what's/
https://kafka. It's Apache. Org/intro

Disclaimer: All rights to any articles and content published belong to Efilli Software. All or part of any content, such as text, audio, video, and even if the source is shown or the active link is provided, cannot be used, published, shared or modified.