Insert output file into the next step of pipeline

abby.vincent · 14 April 2023 03:28

How can I use the file processed in an earlier step in the pipeline further down within the pipeline; without manually adding it myself. Ideally, I would be able to run the pipeline and the output files would move down to the next step, be altered, and then the output would be moved down to the next step again.
My steps are roughly as follows:

[spatial-join] of build to area data
data is edited i.e. (floor x area)
(floor x area) is grouped via polygon
3.1. CSV file is joined back into the pipeline (I wonder is there also a way to do this without manually having to make a bookmark and setting the attribute as a float, so it doesn’t pop up as text) - this allows for the total polygon area to be added to each property within the polygon.
(area x floor)/ total polygon area x floor = division factor
variables of interest * division factor is grouped by polygon to gain a new value
Input hazard and the multiple scenarios

Any advice as to stepping away from manual handling is appreciated, at the moment all of these steps are their own chunks of pipeline and I am muting the unnecessary bits when running bit by bit. Ideally, these would all be steps within the same pipeline - if that is possible.

timbeale · 14 April 2023 04:47

Hi Abby,

In general, you should be able to keep chaining processing steps onto the end of your pipeline using the -> operator. If you want to save the output at intermediary steps as well, then you can do so by naming the step and using the save() pipeline step, e.g.

select({*}) as first_step
# output goes into the next pipeline step
-> select({*}) as next_step
# in a separate pipeline branch, we can also save the same output from the first step to file
first_step -> save('intermediary-results')

However, what you’re trying to do here is slightly trickier. Here’s an example pipeline that I think does roughly what you’re trying to do. It should run against the RiskScape getting-started data.

input('Buildings_SE_Upolu.shp', name: 'exposure')
 ->
# join buildings to region
select({ *, sample_one(exposure, to_coverage(bookmark('Samoa_constituencies.shp'))) as region })
 ->
# aggregate total building area by region
group(by: region,
      select: {
          region.Region,
          sum(exposure.area) as total_area
      })
# join the results back to the exposure-layer
-> join_total_area.rhs

# next 2 pipeline steps are duplicated to join the buildings to region again
input('Buildings_SE_Upolu.shp', name: 'exposure')
 ->
select({ *, sample_one(exposure, to_coverage(bookmark('Samoa_constituencies.shp'))) as region })
 ->
# join the buildings to the total_area by region and calculate a division factor 
join(on: region.Region = Region) as join_total_area
 ->
select({ *, exposure.area / total_area as division_factor })

I’ve duplicated the steps that match the building data to the regions here, because gets tricky joining data to itself (at least, when the group step uses a by parameter). Without the duplicated steps, the pipeline mechanics can unfortunately result in deadlock.

The other thing to note is you can sometimes move some of the pipeline processing into the bookmark, which might help simplify things. E.g. the step 2 (data is edited) could potentially be done using set-attribute in the bookmark.

[bookmark buildings]
location = buildings.shp
set-attribute.tot_area = area * floor_levels

Hope that helps.

Cheers,
Tim

Topic		Replies	Views
Specify csv output filename on command line Community	3	29	3 December 2024
Cut by layer and append value Community	2	91	3 April 2024
Join producing no output Community	4	179	27 July 2023
Using command line parameters with 'riskscape pipeline eval' Community	2	180	23 July 2023
Use group step to produce a shapefile with aggregated information - help! Community	2	151	3 October 2023

Insert output file into the next step of pipeline

Related topics