TUTORIAL2C – Testing geospatial data in Python

November 9, 2023 from 13:00 to 16:00

Speaker: Michal Pilarski, Mateusz Adamczak, Jeppesen (PL)

During his career, Michal has been always connected with geospatial data and GIS geoprocessing. He likes to find and overcome challenges in Testing Big Data with geometry attributes. He has experience in preparing the testing strategies for ETL systems that extract, transform and load massive geospatial data. His technology stack is related to Python, Pytest, ArcGIS, QGIS, FME, Robot Framework, HP ALM, and Geopandas.

With around 7 years of experience in Aviation Software, Mateusz covered most of the available functions – tester, developer, DevOps engineer, and also a scrum master for a little while. This gives him an excellent overview of the software production process that he likes to share. Currently, most of his attention is directed towards the introduction of good software practices and software design to less experienced colleagues. He works at Jeppesen/Boeing office in Gdańsk.

Application Functionality and API are the most popular software tests. Testing data transformation systems or data itself is different. In data lifecycle, collection, storing and sharing phases are full of bugs which could bite the data and make it corrupted. Crucial bugs occurrence can show up during data transformation/conversion/calculation steps. Especially vulnerable type of it is geospatial data. First, because it’s huge amount of information where bugs could hide pretty well when data is changing content, structure or even format. Second, there are data with no only standard attributes but also with location and shape attributes represented by geometry objects (points, lines, polygons) at certain latitude and longitude which brings another level of complexity in testing. The first question is how to prevent bugs which could corrupt the geospatial data resulting in costs counting not only in money, but also in our safety. The proposed workshop is the answer, which gives the opportunity for participants to practically go through process of testing geospatial data. The second question is – can we test data manually? Yes, but main objective of the workshop is to show the most optimal and easy way of testing data which is connected with automation in the very popular Python programming language. Python and all related geospatial modules are very easy to learn (which is needed to design automated tests). Therefore, the workshop is based mostly on tech stack as: Pytest, Geopandas, MatPlotLib, Folium configured in PyCharm IDE. Additionally, free GIS Software as QGIS is useful to display data on the map. To completely close testing workflow, Jenkins CI gives us opportunity of full automation reporting tests executions for projects requirements.

  • Takeaways:
    • Geospatial data is not mystery anymore, if you need to test Geographic Coordinates System or Geometry Objects Latitude/Longitude you will know how to tackle it
    • Full Testing Geospatial Data Process knowledge (from reading requirements, gathering data, through tests cases design and performing, to reporting tests execution in Continuous Integration manner)
    • If you are manual tester only you will mostly take away how to automate your tests in Python (basic concept)
    • If you are experienced Pythonic tester you will mostly gather best practices of testing GIS data (Geographic Information Systems data) and recognize the most common challenges and issues
    • Participant feeling that spent time in good atmosphere and gather real practical knowledge