# C Learning Project: Key-Value Store A educational project to learn C development from first principles, working toward building an RDBMS. ## Project Goals - Learn C development fundamentals: memory management, pointers, build systems, modularity - Build a simple key-value store CLI application - Incrementally work toward understanding relational database concepts ## Current Status **Phase 1: Design & Foundation** - ✅ Build system (Makefile with clang) - ✅ Core data structures designed (kvstore, entries) - ✅ CLI module foundation started - ✅ String utility library interface designed - ⏳ Implementation in progress ## Project Structure ``` kvstore/ ├── Makefile # Build configuration ├── README.md # This file ├── ARCHITECTURE.md # Design decisions and module overview ├── include/ │ ├── kv_store.h # Key-value store interface │ ├── cli.h # CLI module interface │ └── string.h # String utility functions ├── src/ │ ├── main.c # Entry point │ ├── cli.c # CLI implementation (in progress) │ ├── kv_store.c # Key-value store implementation (not started) │ └── string.c # String utilities (not started) ├── build/ # Object files (generated) └── bin/ # Binary output (generated) ``` ## Building ```bash make # Compile the project make run # Build and run make clean # Remove build artifacts ``` ## Modules ### KV Store (`include/kv_store.h`) Core key-value store with: - **Entry management**: Create, copy, and free individual key-value pairs - **Store lifecycle**: Initialize and free stores with configurable capacity - **CRUD operations**: Get, set, delete, and list entries - **Error handling**: Consistent return codes (1=success, 0=not found, -1=error) ### CLI (`include/cli.h`) Command-line interface for batch operations: - Parse command-line arguments - Execute kvstore commands - Display help and results ### String Utilities (`include/string.h`) Helper functions for string operations: - Copy, compare, trim, and search strings - Safe memory management for dynamic strings ## Design Principles - **Ownership is explicit**: Every allocated pointer is owned by someone who must free it - **Separation of concerns**: kvstore provides data/status; CLI formats and displays - **Batch mode**: Single execution per program run; persistence through files (future) - **Error handling**: Consistent, simple return codes rather than exceptions - **Learning-focused**: Prioritize clarity and understanding over optimization ## Learning Focus Areas 1. **Memory management**: malloc, free, ownership, pointers 2. **C idioms**: Out parameters, return codes, struct lifecycle 3. **Modularity**: Clear interfaces, separation of concerns 4. **Build systems**: Makefiles, compilation, linking 5. **String handling**: C strings, pointer semantics See `ARCHITECTURE.md` for detailed design decisions and implementation notes. ``` Now for the architecture file: ``` # Architecture & Design ## Data Structures ### kv_store_entry_t ```c typedef struct { char *key; char *value; } kv_store_entry_t; ``` A single key-value pair. Both key and value are dynamically allocated strings (C's `char *`). This structure is relatively simple and will evolve as we add persistence and type support. ### kv_store_t ```c typedef struct { kv_store_entry_t *entries; // Dynamic array of entries int length; // Current number of entries int capacity; // Allocated capacity } kv_store_t; ``` The main store container. Uses a dynamic array (vector-like) for storage. Tracks both used entries and available capacity. **Design note:** Uses simple array storage for learning purposes. Later evolution might include: - Hash tables for O(1) lookup - B-trees for sorted iteration and range queries - Disk persistence ## Modules ### kv_store (Core Data Structure) **Status:** Interface designed, implementation pending **Key functions:** - `kv_store_entry_init()`: Allocate and initialize an entry - `kv_store_entry_copy()`: Create a deep copy of an entry - `kv_store_entry_free()`: Free an entry's memory - `kv_store_init()`: Create an empty store with initial capacity - `kv_store_free()`: Free a store and all its entries - `kv_store_get_entry()`: Retrieve an entry (returns allocated copy) - `kv_store_set_entry()`: Add or update an entry - `kv_store_delete_entry()`: Remove an entry **Design decisions:** 1. **Copying on get**: `get_entry()` returns an allocated copy of the entry. This ensures the caller cannot modify the store's internal state and protects against use-after-free if the store changes. 2. **Copying on set**: When storing an entry, we deep-copy the key/value strings. This prevents external modifications and clarifies ownership. 3. **Error codes**: - `1` = success/found - `0` = not found/created new - `-1` = error 4. **Pointer parameters**: Store operations that modify take `kv_store_t *` (not const). Read-only operations take `const kv_store_t *`. ### CLI (Command-line Interface) **Status:** Interface designed, implementation pending **Key functions:** - `cli_print_help()`: Display usage information and commands - `cli_execute()`: Parse and execute a command - `cli_print_result()`: Format and display results **Design:** - **Batch mode**: Single execution per program invocation - **Help handling**: Main checks for `--help` or `-h` before passing to cli_execute - **GNU-style**: Follow standard CLI conventions for help text and error messages **Supported commands (planned):** - `set `: Store a value - `get `: Retrieve a value - `delete `: Remove an entry - `list`: Show all entries ### String Utilities **Status:** Interface designed, implementation pending Simple helpers for string operations with safe memory management: - `string_copy()`: Allocate and copy a string - `string_compare()`: Compare two strings - `string_trim()`: Copy with whitespace trimming - `string_search()`: Find substring - `string_free()`: Safe free (NULL-safe) ## Implementation Notes ### Memory Management Pattern The project uses this consistent pattern: 1. **Allocation functions** return pointers and document that the caller owns the memory 2. **Free functions** take pointers and handle NULL safely 3. **Read operations** return allocated copies, not internal references 4. **Modification operations** deep-copy input data to maintain ownership Example: ```c // Caller allocates and owns kv_store_entry_t *entry = kv_store_entry_init("key", "value"); // Store makes its own copy when storing kv_store_set_entry(store, entry); // Caller must free their copy kv_store_entry_free(entry); // When reading, get a new copy to work with kv_store_entry_t *retrieved = kv_store_get_entry(store, "key"); // ... use retrieved ... kv_store_entry_free(retrieved); ``` ### Error Handling C doesn't have exceptions. We use: - **Return codes** for operational success/failure - **NULL pointers** to indicate allocation failures - **Documentation** to clarify what each code means No exceptions or verbose error messages at the library level—those are CLI concerns. ## Next Implementation Steps ### Phase 1: Core Store (High Priority) 1. Implement `string.c` - string utilities 2. Implement `kv_store.c` - core store operations 3. Write basic tests (using Unity framework) 4. Test with simple program ### Phase 2: CLI (Medium Priority) 1. Implement `cli.c` - help display 2. Implement `cli_execute()` - command parsing and routing 3. Wire commands to store operations 4. Test each command ### Phase 3: Persistence (Future) 1. Add file I/O to load/save stores 2. Consider simple serialization format 3. Handle startup with existing data ### Phase 4: Advanced Features (Future) 1. Internal data structure improvements (hash table, B-tree) 2. Type support (int, float, blob) 3. Transactions or multiple stores 4. Performance optimization ## Testing Strategy **Future:** Use Unity testing framework - Unit tests for each module - Integration tests for CLI commands - Edge cases: empty store, duplicate keys, NULL inputs ## Lessons Learned & Teaching Points As you implement, pay attention to: 1. **Pointers and ownership**: Who allocates, who frees? 2. **const correctness**: What can and cannot be modified? 3. **Error propagation**: How do errors bubble up from library to CLI? 4. **Interface design**: How do you make it easy to use correctly and hard to use incorrectly? 5. **Memory safety**: Are there ways this could leak or crash? This is learning code—clarity and correctness matter more than optimization.