This project contains two Jupyter notebooks focused on natural language processing (NLP) tasks: text classification and text generation. The notebooks explore different techniques and models for handling these tasks, offering insights into common challenges and solutions.
- NLP Classification.ipynb: Demonstrates text classification using a linear classifier.
- NLP QnA.ipynb: Focuses on text generation using a recurrent neural network (RNN).
- Complexity: Handling various text preprocessing steps, feature extraction, and model training.
- Solution: Utilizes
CountVectorizer
andTfidfTransformer
for feature extraction andSGDClassifier
for classification. - Challenge: Achieving high accuracy on a small dataset.
- Outcome: Model accuracy is 25%, indicating room for improvement with more data and tuning.
- Complexity: Preprocessing text data, creating a vocabulary, converting texts to sequences, and training an RNN model.
- Solution: Implements text preprocessing using regular expressions and word indexing, followed by sequence generation using a trained model.
- Challenge: Generating coherent and contextually relevant text.
- Outcome: The model generates text based on given prompts, demonstrating the fundamental process of text generation.
- Clone the repository:
git clone https://github.com/faisal-fida/NLP-Classification-QnA-for-Beginners.git
- Navigate to the project directory and open the Jupyter notebooks:
cd NLP-Classification-QnA-for-Beginners jupyter notebook
- Run the cells in each notebook to execute the code and observe the results.
This project serves as an introductory exploration of NLP techniques, addressing key complexities and challenges in text classification and generation. Future work could involve expanding the dataset, experimenting with different models, and fine-tuning hyperparameters to improve performance.