Home  >  Article  >  Backend Development  >  How to load and parse large data sets using STL in C++?

How to load and parse large data sets using STL in C++?

WBOY
WBOYOriginal
2024-06-01 21:18:59997browse

How to load and parse large data sets using STL? Use std::ifstream to load data files. For CSV files, use std::getline() to read the data line by line. Split each line using std::stringstream and std::getline() to get the fields. Store parsed fields in a data structure such as std::unordered_map. Use the parsed data for further processing.

如何在 C++ 中使用 STL 加载和解析大型数据集?

How to load and parse large data sets using STL in C++

STL (Standard Template Library) for C++ programmers Provides powerful tools for managing and processing various data structures. In this article, we will discuss how to use STL to load and parse large data sets.

Loading the data set

The first step in loading the data set is to open the file using std::ifstream:

std::ifstream input("data.csv");

For large data sets, consider using the memory mapped file trick to improve performance. This can be achieved using the std::memfd_create() and std::mmap() functions.

Parsing the Dataset

After the dataset is loaded, the next step is to parse it. For CSV files, we can use std::getline() to read the data line by line. We can then split each line into separate fields using std::stringstream and std::getline():

std::string line;
while (std::getline(input, line)) {
  std::stringstream ss(line);
  std::string field;
  std::vector<std::string> fields;
  while (std::getline(ss, field, ',')) {
    fields.push_back(field);
  }
  // 处理已解析的字段
}

Practical case :Parsing a Sales Dataset

Suppose we have a large CSV file containing sales data in the following format:

product_id,product_name,quantity_sold,price
1,iPhone 13 Pro,100,999
2,Apple Watch Series 7,50,399
3,MacBook Air M2,75,1299

We can load and parse this data set using STL:

std::ifstream input("sales.csv");
std::unordered_map<int, std::pair<std::string, int>> sales;
std::string line;
while (std::getline(input, line)) {
  std::stringstream ss(line);
  int product_id;
  std::string product_name;
  int quantity_sold;
  float price;
  std::getline(ss, product_id, ',');
  std::getline(ss, product_name, ',');
  std::getline(ss, quantity_sold, ',');
  std::getline(ss, price, ',');
  sales[product_id] = {product_name, quantity_sold};
}

// 使用已解析的数据

Conclusion

STL provides efficient and convenient tools for loading and parsing a variety of data structures, including large data sets. We can easily work with datasets by using std::ifstream to load files and std::stringstream to parse the data.

The above is the detailed content of How to load and parse large data sets using STL in C++?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn