Blog Post

an artistic depiction of an android with a translucent head, its internal circuits vaguely visible

Day 1

The week began by assisting in the integration of the Postgres database with both the OCR system and the LLM. I created a sentence embedding component using the all-MiniLM-L2-v6 model, integrating it with dynamic vector size allocation. Experimenting with chunking by sentence instead of paragraph, I generated a dump file of a sample database with columns for chunk embeddings. A demo file for pgvector querying of the sample database was also created, showcasing the new capabilities.

Day 2

On the second day, I attempted to create sentence embedding components that utilize GPU for improved performance. Clear log messages were added to improve traceability, and unit tests were extended to cover the new components and functionalities.

Day 3

The third day involved creating a Jupyter notebook to demonstrate the pgvector integration. This notebook serves as a valuable resource for showcasing the capabilities and functionalities of the pgvector integration with our database system.

Day 4

Documentation was the focus of the fourth day. Comprehensive documentation of the application was created, detailing the system architecture, integration processes, and usage instructions. This documentation is crucial for future development and maintenance, ensuring that all aspects of the system are well-understood and accessible.

Day 5

The final day of the week is dedicated to preparing for post-OJT requirements. This includes compiling weekly reports, finalizing documentation, and creating blog posts to summarize the weekly activities and achievements.