leahsolarova e-portfolio

About me

I am an Associate Software Engineer at Red Hat. My work focuses on providing insights for OpenShift customers. I am part of the processing team, where we maintain and manage data pipelines to ensure that collected data is processed correctly and that final recommendations are available.

Regarding my studies, in 2024 I obtained a law master’s degree, which allows me to work in the legal field in Czechia. However, I quickly realized that it wasn’t my true passion, so I also started an undergraduate program in Cybersecurity. Currently, I am proudly pursuing an MSc in Data Science at the University of Essex Online and am enjoying every moment of it. Balancing work and school can be challenging, but where would the fun in life be without a few struggles?

Outside of work and studies, I enjoy trail running (anything from 10 to 15 km is perfect), hiking, weightlifting, and reading scientific papers on nutrition (especially veganism, fiber or cancer causing compounds). I also love reading classical literature—F. Kafka and E. A. Poe are my favorite authors—but I sometimes enjoy modern fiction as well, especially horror or supernatural thrillers.

Professional Tech Stack

Languages & Scripting
- Python
- Go
- YAML
- Bash / Shell scripting
Cloud & Platforms
- OpenShift
- Kubernetes
- AWS (S3)
- Grafana
- Prometheus
- Kafka
Version Control & Collaboration
- GitHub
- GitLab
- CI/CD pipelines
Containers & Virtualization
- Docker
- Podman
Other Tools
- Data pipelines
- Monitoring & observability (Grafana, Prometheus)

Resume

Education

University of Essex Online
2025 - 2027
A postgraduate (MSc.) course in Data Science.
Masaryk University (Faculty of Informatics)
2022 - 2025
An undergraduate course in Cybersecurity.
Masaryk University (Faculty of Law)
2019 - 2024
A five year Master's degree course in Czech Law. Finished with a Master's degree.

Experience

Red Hat
November 2025 — Present
Associate software engineer working on Insights for OpenShift.
February 2025 — October 2025
Software engineering intern working on Insights for OpenShift.
Pierstone
November 2022 — April 2023
Paralegal working primarily on M&A, cybersecurity and labour law.

School

Launch into Computing Final Project

This project had two parts:

Part A was focused on comparing programming languages and their usage. I chose to compare C and Python because I am familiar with both and know they have extreme differences. I also chose to demonstrate these differences using a data processing application - something that is very easy to do in Python using Pandas and only took a couple of hours in total. In C, the same exercise took around three times longer because of no libraries that would be capable of doing the same job. It also took a lot of manual work when parsing the CSV file, something that was only one line in Python suing Pandas, tunrned out to be a whole file of code in C.

Link to a GitHub project: https://github.com/lenasolarova/accident-analysis

Part B was delivering a 15-minute presentation on a topic regarding new emerging technologie, their practical usage and legal frameworks. I chose AI in people management as it is something very topical for most of us, yet underrepresented in many scientific papers. I focused on defining AI from both technical and legal perspective, using plenty of examples and finally demonstrating issues on a case study of AI usage in hiring in Amazon.

Skills learned: Python programming, basic algorithm design, and effective problem-solving.

Discussion post

Large-scale data collection for IoT (focused on health monitoring)

The initial post titled was focused on IoT devices in healthcare and specifically in health monitoring. Highlighting the opportunities such monitoring provides not only when it comes to individual health but also focusing on the bigger picture and mass life-saving capabilities.

The following sections were diving deeper into the challenges IoT devices inevitably have such as the physical limitations of collecting vast amount of data or processing the data effectively - working with noise, missing values or unexpected data. All of which in the field of healthcare affects not only the daily work of data scientists but possibly also the health of patients.

This field of study is also highly topical from the personal data and sensitive data perspective as most of us are allowing our devices to collect wide range of data even without any special devices (f.e. our phones are capable of recording our location in real time -> speed or pace -> possibly assessing our health using algorithms).

Feedback on my discussion post

My post concerning IoT in health monitoring was received positively in all purely technical areas yet people were rightfully critical towards the lack of deeper examination of the regulatory framework concerning biometric data and personal data as a whole. This is obviously crucial in any field which has the capacity to collect, process and store data as sensitive as biometric ones or even generic personal data. It is necessary to always keep in mind that GDPR and potentially national laws provide a quality legal framework for data collected, processed and stored in and out of the EU.

Note: This was also a topic of the discussions in units 8 to 10 as is further dissected there.

My feedback on others' discussion posts

I provided feedbak on multiple posts as well. The other students' posts ranged from IoT in Smart factories which is a very intriguing space where IoT devices can be used to limit the labour costs and at the same time protect the workers by employing smart sensors. It is also a specific field where perfectly structured and clean data is practically non existent and the pressure on the data cleaning and processing is high. I pointed out a similar deficit as was pointed out in my post and that is the lack of an regularoty framwork dissection.

Another posts were more generic and did not provide any specific examples or areas where IoT would be used, which I pointed out as a flaw.

Web Scraping Activity

Goal

The goal of this project was to teach us how to utilise bs4 and requests libraries in python to do very simple web scraping using the websites' HTML.

General idea

The general idea of the exercise was about choosing any website and using the appropriate libraries to scrape the website and find the term "data scientist". I decided to scrape a job portal (Czechia's biggest one after Indeed - jobs.cz) and specifically the seach results for "data scientist".

Challenges and Implementation

I initially chose to scrape indeed.com but I came across many issues with the website returning error 403 (Access denied) even when I used a token since it recognised I was trying to web scrape it and was not a normal user. Instead of using indeed.com I decided to focus on jobs.cz which is a smaller more local website and the access was no issue then.

Implementation itself was very simple, the only issue was inspecting the job portal website's HTML to find out which elements contained the appropriate data to filter out. This also brings our one of the main challenges this solution would have on a larger scale and that's consistency across websites. While jobs.cz may have the data in h2 elements this is not the case for other websites thus confirming the data collection and filtering process is usually one of the most time consuming tasks of a data scientist.

The result is saved into JSON.

Link to a GitHub project: https://github.com/lenasolarova/web_scraping_jobs

Team Project - Executive Summary of a Database Design

The general idea

The general point of this team project was to choose a sector of our interest and prepare a database design for this company considering all requirements and limitations, providing a quality design we will be able to actually build and test at a later stage individually.

The team

The team project started around second week of the course by assembling our team of four people. Everyone turned out to be smart and kind so the team work was fairly straightforward with minor hiccups just like any team work has.

The process

We had decided to work on a database design for a large B2B retailer as two people from the team had experience in the field, me included. We sechduled weekly meetings and on our first meeting we simply talked about what we wanted to work on and general idea about timelines and decided to work on a draft before the second meeting. I had drafted a document covering every area of the assignment and other team members helped primarily review and add ideas at this stage. At the second meeting we divided the work properly into four sections and I got to work on an area that I was not extremely familiar with but excited to learn and that was the technical database design, diagrams and data types and attributes of each table. By the following week we all had a decent draft of our section and all that remained was formatting, corrections and making sure we managed to fit into our word count which was rather stressful but we managed it really well in the end two days before the deadline.

Normalisation and Data build

Database Normalisation

Original Table (Unnormalised)

Student Number	Student Name	Exam Score	Support	Date of Birth	Course Name	Exam Boards	Teacher Name
1001	Bob Baker	78	No	25/08/2001	Computer Science, Maths, Physics	BCS, EdExcel, OCR	Mr Jones, Ms Parker, Mr Peters
1002	Sally Davies	55	Yes	02/10/1999	Maths, Biology, Music	AQA, WJEC, AQA	Ms Parker, Mrs Patel, Ms Daniels
1003	Mark Hanmill	90	No	05/06/1995	Computer Science, Maths, Physics	BCS, EdExcel, OCR	Mr Jones, Ms Parker, Mr Peters
1004	Anas Ali	70	No	03/08/1980	Maths, Physics, Biology	AQA, OCR, WJEC	Ms Parker, Mr Peters, Mrs Patel
1005	Cheuk Yin	45	Yes	01/05/2002	Computer Science, Maths, Music	BCS, EdExcel, AQA	Mr Jones, Ms Parker, Ms Daniels

First Normal Form (1NF)

Actions taken:

Identified primary key (Student Number), Removed repeating groups, Ensured all values are atomic, Moved course-related data into a separate table, Created a junction table for the many-to-many relationship

STUDENT

Student Number (PK)	Student Name	Exam Score	Support	Date of Birth
1001	Bob Baker	78	No	25/08/2001
1002	Sally Davies	55	Yes	02/10/1999
1003	Mark Hanmill	90	No	05/06/1995
1004	Anas Ali	70	No	03/08/1980
1005	Cheuk Yin	45	Yes	01/05/2002

COURSE

Course ID (PK)	Course Name	Exam Board	Teacher Name
1	Computer Science	BCS	Mr Jones
2	Maths	EdExcel	Ms Parker
3	Physics	OCR	Mr Peters
4	Maths	AQA	Ms Parker
5	Biology	WJEC	Mrs Patel
6	Music	AQA	Ms Daniels

STUDENT_COURSE

Student Number	Course ID
1001	1
1001	2
1001	3
1002	4
1002	5
1002	6
1003	1
1003	2
1003	3
1004	4
1004	3
1004	5
1005	1
1005	2
1005	4

Second normal form (2NF)

No composite primary keys are used in the STUDENT or COURSE tables, therefore, there are no partial dependencies so the database already satisfies Second Normal Form

Third normal form (3NF)

Actions taken:

Removed transitive dependencies, Replaced string identifiers with surrogate keys, Separated repeating teacher and exam board data into their own tables

TEACHER

TeacherID (PK)	TeacherName
T1	Mr Jones
T2	Ms Parker
T3	Mr Peters
T4	Mrs Patel
T5	Ms Daniels

BOARD

ExamBoardID (PK)	ExamBoardName
B1	BCS
B2	EdExcel
B3	OCR
B4	AQA
B5	WJEC

COURSE 3NF

CourseID (PK)	CourseName	ExamBoardID (FK)	TeacherID (FK)
1	Computer Science	B1	T1
2	Maths	B2	T2
3	Physics	B3	T3
4	Maths	B4	T2
5	Biology	B5	T4
6	Music	B4	T5

STUDENT 3NF

Student Number (PK)	Student Name	Exam Score	Support	Date of Birth
1001	Bob Baker	78	No	25/08/2001
1002	Sally Davies	55	Yes	02/10/1999
1003	Mark Hanmill	90	No	05/06/1995
1004	Anas Ali	70	No	03/08/1980
1005	Cheuk Yin	45	Yes	01/05/2002

Database Build

After the initial exercise of noramizing the table into 3NF, it was time to start building and testing it in real life. I chose to use PostgreSQL as a databse of choice as I never worked with it directly.

This phase consisted first of learning how to even start the PostgreSQL server, which turned out to be a few simple commands

sudo systemctl start postgresql

sudo systemctl status postgresql

And then I needed to initialize the database for the first time:

psql -U postgres -d postgres

CREATE DATABASE university;

And then it was time to fill the database with data and write the actual tests for the database

The test suite is based on pytest.

pip install -r requirements.txt

pytest -v

Additional Remarks

The assignment requires testing of referential integrity, which means that every foreign key must refer to an existing primary key value in the linked table. Simply put, if a course references a teacher with teacher_id = T100, then a teacher with teacher_id = T100 must exist in the TEACHER table.

PostgreSQL automatically enforces referential integrity through foreign key constraints. For example, the following constraint ensures that each course references a valid teacher:

FOREIGN KEY (teacher_id) REFERENCES teacher(teacher_id)

Referential integrity is also explicitly tested in the automated test suite using pytest, specifically in the test_referential_integrity test case.

Link to a GitHub project: https://github.com/lenasolarova/accident-analysis

Discussion - GDPR

GDPR vs. Personal Data Processing Act (Czechia)

The original post focused on the principle of personal data security and compared how this principle is addressed in the General Data Protection Regulation (GDPR) and in the Czech Personal Data Processing Act (Zákon č. 110/2019 Sb). The introduction briefly outlined the challenges lawmakers face in protecting personal data in an increasingly digital and interconnected world where data frequently crosses national borders.

The following sections explored how GDPR defines the security of personal data through the requirement of appropriate technical and organisational measures. These include practices such as pseudonymisation, encryption, system redundancy, and regular testing, all aimed at preventing unauthorised access, data loss, or damage while taking into account proportionality and risk.

In contrast, the Czech Personal Data Processing Act was discussed mainly as a supplementary framework. Rather than redefining security obligations, it builds upon GDPR by addressing specific processing contexts, such as law enforcement or national security, while relying on GDPR as the primary and directly applicable source of security requirements.

Overall, my post concluded that both legal frameworks emphasise the importance of protecting personal data, with GDPR setting the core standards and Czech legislation refining their application in local and sector-specific scenarios. The discussion highlighted that effective data protection is not about absolute security, but about demonstrating that appropriate and proportionate measures are in place.

Link to the post: https://www.my-course.co.uk/mod/forum/discuss.php?d=339998

API security requirements

API Security Requirements Specification: GitHub API vs. GitLab API

There is only a small number of APIs used for various and extensive purposes, such as those provided by the two largest Git hosting services – GitHub and GitLab (Ghodke and Chavan, 2024) – which is exactly why they should follow best practices when it comes to security. This report examines both APIs using a real-life example of a Python-based application that scrapes merged pull and merge requests and stores the results in JSON files for further visualisation in Grafana (GitHub, no date).

System Overview

The application is executed via a CI/CD pipeline triggered on a schedule, manually, or on repository commits. The pipeline runs a Python script that queries the GitHub and GitLab APIs over HTTPS, retrieves metadata related to merged pull and merge requests in JSON format, and stores the processed data in version-controlled JSON files within a GitHub repository. Grafana then reads these JSON files to visualise the collected metrics.

Threat Risk Identification

The most common threats related to API usage originate from multiple sources. APIs typically require authentication mechanisms such as access tokens or credentials and enforce rate limits to prevent excessive or abusive usage. Client-side risks include token leakage, insecure token storage, or the use of overly broad access scopes. Additional risks may arise on the provider side, including the unintended exposure of data beyond the expected scope or the inclusion of sensitive information within API responses (Basak and Tiwari, 2025, pp. 19–20).

Security Requirements

Area	Security Requirement
Authentication	API access must use scoped personal access tokens or OAuth tokens.
Authorization	Tokens must be limited to read-only permissions.
Secret Management	Tokens must be stored using CI/CD secret management mechanisms.
Transport Security	All API communication must use HTTPS with TLS.
Rate Limiting	Clients must respect platform-imposed rate limits.
Input Validation	JSON responses must be validated before processing.
Logging	Access and error events must be logged without exposing secrets.

API Comparison

Both the GitHub and GitLab APIs require personal access tokens to authenticate requests. When the scraping tool is executed within GitHub Actions, a token is provided automatically by the execution environment. In contrast, access to the GitLab API requires an explicitly configured token, even when executed within GitLab CI/CD. Both platforms impose rate limits that vary depending on request type, thereby reducing the risk of automated or bot-driven scraping (GitHub Docs, no date; GitLab Docs, no date).

Logging is an important security mechanism for detecting abnormal behaviour and potential breaches. GitLab provides user-accessible logging with multiple log levels ranging from DEBUG to UNKNOWN (GitLab Docs, no date). GitHub, however, offers limited direct visibility into request-level logging for API consumers.

Conclusion

Both the GitHub and GitLab APIs fulfil essential security requirements when used correctly. By enforcing authenticated access, scoped permissions, encrypted transport, rate limiting, and secure secret management through CI/CD pipelines, the identified risks can be effectively mitigated. This analysis demonstrates that a Python-based integration using JSON data formats can be securely implemented when appropriate security requirements are clearly defined and consistently applied.

References

Basak, A. and Tiwari, D. (2025) API Security Risks and Resilience in Financial Institutions. Master's thesis. Laurea University of Applied Sciences. Available at: https://www.theseus.fi/bitstream/handle/10024/883344/Basak_Tiwari.pdf (Accessed: 30 December 2025).

Ghodke, G. M. and Chavan, T. (2024) ‘An Overview of Git’, International Journal of Scientific Research in Modern Science and Technology, 3(6). Available at: https://doi.org/10.59828/ijsrmst.v3i6.216

GitHub Docs (no date) Rate limits for the REST API. Available at: https://docs.github.com/en/

Evaluation of the database design proposal versus the database build

Database proposal

This database design proposal focused on addressing the data management needs of ElectroSpares, a B2B wholesale supplier of office equipment to corporate clients. The proposal focused on not only fulfilling the technical needs of such client when it comes to proper logical design, but also when it comes to data security or efficiency.

Evaluation against the database build

The database build structurally follows and builds on top the proposal from unit 6 focusing more on business needs of such comapny. The final project therefore includes not only the properly built database tested with artificial data to ensure proper data handling but also focuses on the analytical side which was only lightly touched on in the proposal.

The biggest improvement personally between the initial proposal and the final build was gaining vast amount of knowdledge and context in the module via reading or working on smaller activities. These activities made it possible for me to consider more than was possible in the initial proposal such as the business intelligence needs or proper database building tools which I was not aware of before. Based on that knowdledge I was able to implement various parts of the data pipeline instead of only focusing on the database itself. I managed to create the tables, connect them to business intelligence tools and create visualisations. All while experimenting with different tools and searching for the ideal ones for the task. This research was again one of the most memorable moments of the assignment as it made me connect all the things I learned so far with existing technologies such as considering the usage of NoSQL databases versus SQL ones for business intelligence purposes or researching how S3 fits into a properly designed data piepleine or even how to correctly connect a database and visualisstion tools without causing strain on the main database system.

The initial proposal was a vital piece of work for the build since our team spent a lot of effort on it. This made me realise that it is important to not underestimate the value of every single assignemnet. Every assignemnt well done is a small step towards being a better data science student and professional. This is not to say that every assignemnt has gone to plan and that was certainly true for my database build as I at times missed the team who would come up with ideas or correct assumptions which is something I value immensly.

Impact and learning outcomes

Leaning oucomes

✅ 1. Identify and manage challenges, security issues and risks, limitations, and opportunities in data wrangling.

✅ 2. Critically analyse data wrangling problems and determine appropriate methodologies, tools, and techniques (involving preparing, cleaning, exploring, creating, optimising and evaluating big data) to solve them.

✅ 3. Design, develop and evaluate solutions for processing datasets and solving complex problems in various environments using relevant programming paradigms.

✅ 4. Systematically develop and implement the skills required to be effective member of a development team in a virtual professional environment, adopting real life perspectives on team roles and organisation.

How were the aims fulfilled

In this module, we shall:

✨ 1. Introduce and review various concepts of big data, technologies, and data management to enable you identify and manage challenges associated with security risks and limitations.

These topics which up to this point have been very abstract to me were reinforced by working on assignments such as the API security review where I had to consider not only the ease of use of an API but also dig deeper into the implications of using an API in a real world project. For me, I had reviewed two APIs I have used in my project at work and evaluating them from the security perspective was a welcomed change of angle as we rarely get to do that.

✨ 2. Critically analyse data wrangling problems and determine appropriate methodologies and tools in problem solving.

This has been an ongoing topic throughout the whole module as data wrangling has been a part of most exercises and assignments. That has made me aware of many different angles and approaches I was unfamiliar with and which I can now apply or even better understand the decision to use them at work which is reinforcing the idea of applying the theory in practice.

✨ 3. Explore different data types and formats. Evaluate various data storage formats ranging from structured, quasi structured, semi structured, and unstructured formats. We explore the various memory and storage requirements.

It has been very interesting to dive deeper into the theory of data types, formats or memory requirements. This has been best highlighted in the team project followed by the individual database build as much of the base laid in the team project when it comes to evaluating proper data types or the right database system for the client, had been used in the final project and thus tested inside a real build.

✨ 4. Critically examine various data collection methods and sources. Review fact finding methods to determine the integrity, reliability and readiness of data extracted and presented for pre-processing, cleaning, and usage.

The data science web scraping activity stands out the most to me as I have previously used regex to scrape data off websites and using a library specifically designed for the task has made the assignment way easier. Yet it was not without its challenges as instead of matching patterns as I would do when using regex, now I had to dive deeper in the html code and find how every particular website used the html tags.

✨ 5. Examine data exploration methods and analyse data for presentation in an organisation. Critically evaluate data readability, readiness, and longevity within the data Pipeline. Examine cloud services, API (Application Programming Interfaces) and how this enables data interoperability and connectivity.

This has been woven into the whole course as in many assignments we had to combine skills regarding data collection and presentation which made it abundantly clear just how easily insights or data can be lost unless proper care is executed. It has been best visible in the final project where all skills we learned thus far were tested and failure or improper handling of data in one stage had catastrophic results.

✨ 6. Examine and analyse the ideas and theoretical concepts underlying DBMS (Database Management Systems) Database Design and Modelling.

The theoretical concepts behind DMBS such as normalisation, which were explored in variety of exercises including the project or a specific exercise on normalisation, were vital for undertstanding how even small chnages in how we store the data may affect the querying. It has also been a good base for undertstanding the trade offs between normalisation and performance or complexity of the design.

✨ 7. Explore the future of use of data and deciphering by examining some fundamental ideas and concepts of machine learning and how these concepts are applied in various methods in handling big data.

Data science is the core of machine learning and artificial intelligence field as a whole so while it was not the main focus of the course, it is crucial to understand how a proper pipeline build can make or break the system consuming the data such as an AI model.

Professional Tech Stack

Languages & Scripting

Cloud & Platforms

Version Control & Collaboration

Containers & Virtualization

Other Tools