Skip to content
rayyildiz.com

CC2P - Convert CSV to Parquet Files

CC2P (Convert CSV To Parquet) is a high-performance command-line tool written in Rust that efficiently converts CSV files to the Apache Parquet format. Parquet is a columnar storage file format that offers efficient data compression and encoding schemes, making it ideal for big data processing.

Why Use CC2P?

Installation

If you have Rust installed, you can install CC2P directly from crates.io:

cargo install cc2p

From GitHub Releases

You can download pre-built binaries from the GitHub Releases page.

From Source

To build from source:

# Clone the repository
git clone https://github.com/rayyildiz/cc2p.git
cd cc2p

# Build in release mode
cargo build --release

# The binary will be in target/release/cc2p

Usage

Basic usage:

cc2p [OPTIONS] [PATH]

Where PATH is the path to a CSV file or a glob pattern (default: *.csv).

Examples

Convert a single CSV file:

cc2p data.csv

Convert all CSV files in the current directory:

cc2p

Convert CSV files with semicolon delimiter:

cc2p --delimiter ";" *.csv

Convert CSV files without headers:

cc2p --no-header data_files/*.csv

Use 4 worker threads for faster processing:

cc2p --worker 4 large_data.csv

Options

$ cc2p --help

Convert a CSV to parquet file format

Usage: cc2p [OPTIONS] [PATH]

Arguments:
  [PATH]  Represents the folder path for CSV search. [default: *.csv]

Options:
  -d, --delimiter <DELIMITER>  Represents the delimiter used in CSV files. [default: ,]
  -n, --no-header              Indicates whether to include the header in the CSV search column.
  -w, --worker <WORKER>        Number of worker threads to use for performing the task. [default: 1]
  -s, --sampling <SAMPLING>    Number of rows to sample for inferring the schema. [default: 2048]
  -i, --interactive            Show an interactive UI.
  -h, --help                   Print help
  -V, --version                Print version

Features

Technical Features

Performance Benefits

Interactive Mode

CC2P includes an interactive Terminal User Interface (TUI) that allows you to browse CSV files in your directory, view their inferred schemas, and selectively export specific columns.

To start the interactive mode:

cc2p -i

Controls

KeyAction
/Navigate through files or columns
TabSwitch between File List and Column List panels
SpaceSelect/Unselect the highlighted column
EnterExport the selected columns of the current file to Parquet
QQuit the application

Platform-Specific Notes

macOS Users

NOTE for macOS Users: Our Apple signing/notarization is not entirely done yet, thus you have to run the following command once to run the application. Download the app and run this command:

xattr -c cc2p

Linux Users

On Linux, you can also install CC2P via Snap:

Get it from the Snap Store

sudo snap install cc2p

Technical Requirements

Contributing

If you wish to contribute, please feel free to fork the repository, make your changes, and submit a pull request. All contributions are welcome!

Development Setup

  1. Clone the repository
  2. Install Rust (1.88.0 or later)
  3. Run cargo build to build the project
  4. Run cargo test to run the tests

License

This project is licensed under MIT, see the LICENSE file for details.

Contact