an image of a chatbot icon with chat bubbles around it

As an eager trainee entering the world of remote work with an overseas company, my first week was a mix of excitement, learning, and hands-on experience. With the guidance of our OJT adviser, my teammates and I were tasked with a fascinating project: creating a system to extract text from PDFs using Tesseract OCR, upload the extracted text to a MySQL database, and build a chatbot using Streamlit and an open-source LLM. Here’s a detailed look at my journey through the first week:

Day 1

The first day was all about getting the essentials in place. I delved into researching tools connected to the usage of MySQL. Installing MySQL Server, Client, Workbench, and other tools in the MySQL ecosystem was a necessary step. I also installed the mysql-connector-python module and spent time familiarizing myself with its interface. This setup was crucial for the tasks ahead, ensuring I had a solid foundation to build upon.

Day 2

With the groundwork laid, I moved on to creating a mock database to store the extracted text. This involved coding a notebook for creating and manipulating the MySQL database. The hands-on experience of setting up tables, defining schemas, and inserting sample data was both challenging and rewarding. This day solidified my understanding of database management and prepared me for more complex tasks.

Day 3

No work was scheduled on Day 3 due to the Independence Day celebration. It was a welcome break, allowing me to reflect on the progress made so far and prepare for the tasks ahead.

Day 4

Returning to work, I focused on revamping the database schema and ensuring it interfaced seamlessly with other parts of the system. Storing the extracted text into a new database was a critical step. I also coded a procedure to convert the text in the database into a dictionary format, which would be used by the chatbot’s LLM model. This day involved a lot of problem-solving to ensure everything worked harmoniously.

Day 5

The final day of the week was all about integration and automation. I integrated the MySQL database with Tesseract OCR extraction, ensuring that the extracted text could be seamlessly uploaded to the database. Further improvements were made to the database manipulation notebook, enhancing its functionality and ease of use. Creating a pipeline to automate the uploading of extracted text to the database was a significant milestone, marking the completion of the initial setup phase.