Batch processing is a common requirement of big data processing. Batch processing mainly operates large-capacity static data sets and returns the results after the accounting process is completed. In view of this processing mode, batch processing has an obvious disadvantage, that is, in the face of large-scale data, the power of bookkeeping processing is not satisfactory.
At present, batch processing is excellent in dealing with many persistent data, so it is often used to analyze historical data.
2. Stream processing
Another common demand after batch processing is stream processing, which processes the data entering the system in real time, and the processing results are available immediately, and will be constantly updated with the arrival of new data.
In terms of real-time, stream processing is excellent, but stream processing can only process one (real stream processing) or several (micro-batch processing) data at the same time, and only few conditions are maintained between different records, which requires high hardware.
3. Batch processing+stream processing
In practice, there are many scenes where batch processing and stream processing coexist, and the mixed processing framework is designed to deal with this kind of problem. Providing a general processing scheme for data processing can not only provide the methods needed for data processing, but also provide their own integrated items, libraries and objects, which can meet various scenarios such as graphic analysis, machine learning and interactive query.
What are the commonly used processing frameworks for big data? Qingteng Bian Xiao will share them with you here. If you are interested in big data engineering, I hope this article can help you. If you want to know more about the skills and information of data analysts and big data engineers, you can click on other articles on this site to learn.
Listening to the ticking of the clock every day, every time I see the sun in the west, I will always find some footprints of time. It slowly