Join producing no output

I am trying to join 2 files but Riskscape is returning an empty file as output

project.ini

[model addProbs]
description = takes an SSP pathway and year and adds to the output from addSSPIds
location = addProbabilties.txt
framework = pipeline

addProbabilities.txt

input('SSPFull.csv') as allSSPs
-> filter("year" = '2040')
-> filter("measurementName" = 'SSP5-8.5 (low confidence)')
-> filter("siteId" = '689')

-> SSP_join.rhs


input('AAL_High_sspid.csv', name: 'AALs')
->
join(on: 'AALs.hazard_coverage_siteId' = "siteId") as SSP_join
->
save(name: 'output')

SSPFull.csv

"siteId","year","p17","p50","p83","lon","lat","measurementName"
"689",2040,0.14,0.18,0.29,174.6772,-36.6197,"SSP5-8.5 (low confidence)"

AAL_High_sspid.csv

hazard_coverage_siteId,hazard_coverage.lat,hazard_coverage.lon,hazard_coverage.geom
689,0,0,0
C:\Users\powellj\>riskscape model run addProbs
[WARNING] A Java upgrade is recommended. Your computer is currently using Java version 11.0.19, which will no longer be supported by future versions of RiskScape (RiskScape v1.5.0 onwards). Please upgrade to Java 17 (or higher) to ensure RiskScape will continue to run on your computer.
[WARNING] The 'beta' plugin is enabled. This contains experimental features that may significantly change or be deprecated in future releases.
[WARNING] No output has been produced from pipeline step(s) 'SSP_join:[join]'. Possible causes may be: the pipeline uses a filter condition that does not match any data; the model uses a join condition based on geometry, but the datasets do not overlap geographically; or for CSV-based data, the incorrect CRS (or `crs-longitude-first`) bookmark setting is being used.
file:/C:/Users/powellj/output/addProbs/2023-07-27T15_36_40/output.csv

The warning says that the filter step might be removing all data but if I run the pipeline with everything after → filter(“siteId” = ‘689’) commented out it works fine and returns the expected rhs.

I think the problem is the left-hand side of the join condition uses singe-quotes, so it’s being treated as a literal text-string:

join(on: 'AALs.hazard_coverage_siteId' = "siteId") as SSP_join

Just remove the single-quotes and the join should start matching things correctly, i.e.

join(on: AALs.hazard_coverage_siteId = "siteId") as SSP_join

Thanks Tim that has worked. In the actual pipeline the column name is

hazard_coverage.siteId

I cant find a combination of quotes that allows me to have that dot in there. Is it possible?

Failed to validate model for execution
  - Failed to validate 'join(on: AALs.hazard_coverage.siteId = "siteId") as SSP_join' step for execution
    - Could not find 'AALs.hazard_coverage.siteId' among [AALs.hazard_coverage.siteId, AALs.hazard_coverage.lat, AALs.hazard_coverage.lon, AALs.hazard_coverage.geom, siteId, year, p17, p50, p83, lon, lat, measurementName]

Yes, it’s possible, but the syntax starts getting a little tricky. Try this:

join(on: AALs."hazard_coverage.siteId" = siteId) as SSP_join
1 Like

Here’s my attempt at explaining what’s going on here a little more.

Unlike Python, double-quotes behave quite differently to single-quotes in a RiskScape pipeline. Double-quotes are only ever used for identifiers (i.e. attribute names). In particular, they’re used to escape other special characters. For example, the expression 5--8.5 would evaluate to 13.5, whereas the expression "5--8.5" would be treated as an attribute name (i.e. a CSV column called 5--8.5).

In a RiskScape expression, . is an operator (similar to how + or - are operators). It references an attribute within a struct. Double-quotes will escape this behaviour, the same way they do for + and -.

So in your example pipeline:

  • 'AALs.hazard_coverage_siteId' is just a text-string literal (same as in Python code)
  • AALs.hazard_coverage_siteId is referencing the hazard_coverage_siteId attribute from the AALs struct. Here all the attributes from the AAL_High_sspid.csv file are wrapped up in a AALs struct.
  • "AALs.hazard_coverage_siteId" is a single attribute with a . in the name. You can use special characters like this in pipeline attribute names, but they need to be enclosed in double-quotes so that RiskScape doesn’t try to interpret them as some other kind of expression.
  • AALs."hazard_coverage.siteId" is referencing the "hazard_coverage.siteId" attribute from the AALs struct. Because this attribute has a . in the name, it needs to be escaped with double-quotes (but the first . in the ‘AALs.’ bit needs to sit outside the double-quotes, so that the . operator still takes effect).

Basically, it’s simplest to avoid . in attribute names and then you don’t have to worry about double-quotes at all. However, we need to fix up writing CSV files (and probably GeoJSON too), so they don’t include the '.'s in the attribute names.

In the meantime, the .* ‘splat’ operator might be handy. It pulls all of the attributes out of a struct, so it effectively gets rid of the . in the attribute names. E.g. in your pipeline where you generate the AAL_High_sspid.csv file, you could add the following step before you save the results, which would remove the ‘hazard_coverage.’ bit from the CSV column names.

 ->
select({ hazard_coverage.* })
 ->
save('AAL_High_sspid', format: 'csv')