Spaces:
Sleeping
Sleeping
| title: Pseudo2Code | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 5.35.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Convert pseudocode to C++ using a Transformer model. | |
| # π Pseudo2Code β Transformer-based Pseudocode to C++ Converter | |
| [](LICENSE) | |
| [](https://www.python.org/) | |
| [](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) | |
| [](https://github.com/asadsandhu/Pseudo2Code) | |
| > A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford. | |
| --- | |
| ## πΌοΈ Demo | |
| Try it live on **Hugging Face Spaces**: | |
| π https://huggingface.co/spaces/asadsandhu/Pseudo2Code | |
|  | |
| --- | |
| ## π§ Model Architecture | |
| - Developed using the **Transformer** architecture from scratch in PyTorch | |
| - No pre-trained models (pure from-scratch implementation) | |
| - Token-level sequence generation using greedy decoding | |
| - Custom vocabulary construction for both pseudocode and C++ output | |
| ``` | |
| Input: Pseudocode lines (line-by-line) | |
| Model: Transformer (Encoder-Decoder) | |
| Output: C++ code line for each pseudocode line | |
| ``` | |
| --- | |
| ## π Dataset | |
| We used the **SPoC dataset** from Stanford: | |
| - β Clean pseudocodeβC++ line pairs | |
| - β Token-level annotations for syntax handling | |
| - β Multiple test splits (generalization to problems/workers) | |
| - β Custom preprocessing and vocabulary building implemented | |
| > π Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | |
| --- | |
| ## π Directory Structure | |
| ``` | |
| . | |
| βββ app.py # Gradio web app for inference | |
| βββ train.py # Transformer training code | |
| βββ model.pth # Trained model weights | |
| βββ spoc/ # Dataset directory | |
| β βββ train/ | |
| β βββ spoc-train.tsv | |
| β βββ split/spoc-train-eval.tsv | |
| βββ assets/ | |
| β βββ demo.png # App screenshot | |
| βββ README.md # You're here | |
| ```` | |
| --- | |
| ## π οΈ How to Run Locally | |
| ### βοΈ 1. Clone Repo & Install Requirements | |
| ```bash | |
| git clone https://github.com/asadsandhu/Pseudo2Code.git | |
| cd Pseudo2Code | |
| pip install -r requirements.txt | |
| ```` | |
| Or manually install: | |
| ```bash | |
| pip install torch gradio tqdm | |
| ``` | |
| ### π 2. Launch the App | |
| Make sure `model.pth` is present (or train using `train.py`): | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will open in your browser. | |
| --- | |
| ## π§ͺ Training the Model | |
| You can retrain the model using the `train.py` script: | |
| ```bash | |
| python train.py | |
| ``` | |
| By default, it downloads data from the public repo and trains for 10 epochs. | |
| Outputs a `model.pth` file with learned weights and vocab. | |
| --- | |
| ## π§ Key Hyperparameters | |
| | Parameter | Value | | |
| | -------------- | ----------- | | |
| | Model Type | Transformer | | |
| | Max Length | 128 | | |
| | Embedding Dim | 256 | | |
| | FFN Dim | 512 | | |
| | Heads | 4 | | |
| | Encoder Layers | 2 | | |
| | Decoder Layers | 2 | | |
| | Batch Size | 64 | | |
| | Epochs | 10 | | |
| | Optimizer | Adam | | |
| | Learning Rate | 1e-4 | | |
| --- | |
| ## π§© Example Input | |
| ```text | |
| n , nn, ans = integers with ans =0 | |
| Read n | |
| for i=2 to n-1 execute | |
| set nn to n | |
| while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i | |
| } | |
| set o to gcd(ans, n-2) | |
| print out ans/o "/" (n-2)/o | |
| ``` | |
| ### β© Output C++ | |
| ```cpp | |
| int main() { | |
| int n , nn , ans = 0 ; | |
| cin > > n ; | |
| for ( int i = 2 ; i < = n - 1 ; i + + ) { | |
| nn = n ; | |
| while ( nn = = 0 ) ans + = nn % i , nn / = i ; | |
| } | |
| o = gcd ( ans , n - 2 ) ; | |
| cout < < ans / 2 / o ( n - 2 ) / o < < endl ; | |
| return 0; | |
| } | |
| ``` | |
| --- | |
| ## π¦ Deployment | |
| This app is deployed live on: | |
| * **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) | |
| * **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code) | |
| --- | |
| ## π Acknowledgements | |
| * π **SPoC Dataset** by Stanford University | |
| Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326) | |
| * π§ Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) | |
| --- | |
| ## π§βπ» Author | |
| **Asad Ali** | |
| [GitHub: asadsandhu](https://github.com/asadsandhu) | |
| [Hugging Face: asadsandhu](https://huggingface.co/asadsandhu) | |
| [LinkedIn: asadxali](https://www.linkedin.com/in/asadxali) | |
| --- | |
| ## π License | |
| This project is licensed under the MIT License. | |
| Feel free to use, modify, and share with credit. | |