This project implements a string compression algorithm in multiple languages (Python, Java, C++, Prolog, and SML) while connecting the solution to theory in formal languages such as context-free grammars (CFGs), regular expressions, and nondeterministic finite state machines (NDFSMs).
- String traversal & manipulation
- Memory and performance optimization
- Pattern recognition with automation (FSMs, CFGs)
- Cross-paradigm thinking (imperative, functional, logical programming)
- Recursive vs iterative logic construction
This CFG describes the structure of a valid compressed string (e.g., a3b2c1
):
S → C S | C
C → L D
L → a | b | c | ... | z | A | B | ... | Z
D → D D | N
N → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S
: A full compressed stringC
: A single letter-count unitL
: A valid letterD
: One or more digitsN
: Single digits used to build counts
I defined a simple NDFSM to recognize strings of the form [a-zA-Z][0-9]+
repeated one or more times.
States:
q0
: Startq1
: Read a letter[a-zA-Z]
q2
: Read one or more digits[0-9]
Transitions:
q0 --[a-zA-Z]--> q1
q1 --[0-9]--> q2
q2 --[0-9]--> q2
q2 --[a-zA-Z]--> q1 (repeats for next char group)
Accepting State: q2
Paradigm: Imperative, Object-Oriented, Low-Level Memory Access
Why this implementation: C++ is performance-focused and gives you full control over memory. The string compression function is implemented using a manual loop and string building with +=
, which mimics how memory is appended. It uses std::string
, but unlike higher-level languages, no garbage collection or built-in dynamic typing helps with string management—so care is taken with efficiency and data structures.
Best for: Performance-critical applications, embedded systems, systems programming.
Paradigm: Object-Oriented, Imperative, Platform-Independent
Why this implementation: Java emphasizes object-oriented structure, even for small utilities. We use a class-based static method and a mutable StringBuilder, which is idiomatic for string manipulation in Java to avoid performance issues with +
concatenation. Java’s verbosity encourages wrapping utilities in reusable classes, and type safety ensures method robustness.
Best for: Enterprise apps, Android development, cross-platform systems with strong type safety.
Paradigm: Declarative, Logic-Based
Why this implementation: Prolog doesn't loop in the traditional sense—it recursively declares logical rules that describe relationships. The implementation is centered around pattern matching and unification, building results by recursively traversing the input and appending (Char, Count)
pairs. It’s a great showcase of declarative programming: instead of "how to compress", we describe what compression is.
Best for: Knowledge representation, symbolic AI, rule-based logic systems.
Paradigm: Functional, Strongly Typed
Why this implementation:SML (Standard ML) embraces immutability and recursion. The implementation is a textbook example of tail-recursive accumulation, passing along an updated count and result list. SML’s use of pattern matching allows concise decomposition of the list (x::xs
) and conditional branching with no side effects. This aligns with how functional languages encourage expressing computation through value transformation, not mutation.
Best for: Theoretical computation, type theory, compilers, proof assistants.
Paradigm: Multi-paradigm (Imperative, Object-Oriented, Functional)
Why this implementation:Python prioritizes readability and brevity, so the implementation is clean and linear. It uses a for-loop, a simple counter, and a list
to collect intermediate results.
Python’s dynamic typing and flexible string manipulation make this approach intuitive and fast to write. It’s also easy to adapt to a recursive, functional, or even generator-based style, depending on needs.
Best for: Rapid prototyping, scripting, data science, educational examples.
Language | Paradigm(s) | Highlights in This Problem |
---|---|---|
C++ | Imperative, OOP | Manual string building, efficient memory control |
Java | OOP, Imperative | Class-based utility, StringBuilder usage |
Prolog | Declarative, Logic | Pattern-based rule recursion |
SML | Functional | Immutability, tail recursion, type safety |
Python | Multi-paradigm | Concise, readable, flexible string ops |