Is it possible in Riskscape to convert long data into wide eg
address ARI depth
1 10 0
1 20 1
1 30 2
2 10 10
2 20 15
2 30 20
to
address Depth_ARI10 Depth_ARI20 Depth_ARI30
1 0 1 2
2 10 15 20
Sure, you can use the bucket()
aggregation function for that, E.g.
input(value: { depth: 15, ARI: 20, address: 'foo' }) ->
group(by: address,
select: {
*,
bucket(by: ARI,
pick: b -> b = ARI,
select: {
max(depth) as Depth
},
buckets: {
"10": 10,
"20": 20,
"30": 30,
}) as ARI
})
evaluating that pipeline produces:
address,ARI.10.Depth,ARI.20.Depth,ARI.30.Depth
foo,,15,
1 Like
I have added
-> select({*, sample_one(geometry: geom, coverage: event.coverage) as hazard_sampled})
-> group(by: address_id,
select: {
*,
bucket(by: event.ari,
pick: b -> b = event.ari,
select: {
max(hazard_sampled) as Depth
},
buckets: {
"2": 2,
"5": 5,
"10": 10,
"20": 20,
"50": 50,
"100": 100,
"200": 200,
"1000": 1000,
}) as ARI
})
to my pipeline. I now get a csv in the shape I want but with no data in it
address_id ARI.2.Depth ARI.5.Depth ARI.10.Depth ARI.20.Depth ARI.50.Depth ARI.100.Depth ARI.200.Depth ARI.1000.Depth
1709559
1709558
1641441
845209
845208
1641442
1641440
845210
1709560
845216
1641445
845215
1641446
845218
1641443
845217
1641444
845212
845211
845214
845213
If I remove the group by step then my output file does have values in the hazard_sampled column
Might be relevant that for each ARI each address point will have 2 values, one with a float and one blank eg
address_id cell ari hazard_sampled
1 10 2
1 30 2 1.5
I presume the event data is being loaded from CSV, in which case event.ari
is probably a text string (i.e. '2'
) rather than an integer value (i.e. 2
), so the pick
expression won’t find any matches. Try changing the pick
expression so that it converts the event.ari
value from a text string to an integer, e.g.
pick: b -> b = int(event.ari),
Thanks Tim that has worked. I thought it might have been something to do with casting but I was focused on the
select: {
max(hazard_sampled) as Depth
},
part.
I ended up grouping by 2 clauses so this was the final code
-> group(by: [address_id,event.cell],
select: {
mode(address_id) as Address_ID,
mode(event.cell) as Cell,
bucket(by: event.ari,
pick: b -> b = int(event.ari),
select: {
max(hazard_sampled) as Depth
},
buckets: {
"2": 2,
"5": 5,
"10": 10,
"20": 20,
"50": 50,
"100": 100,
"200": 200,
"1000": 1000,
}) as "event.ari"
})
Just a note that mode()
can be quite memory-intensive, as it collects all the values into a list. It’s probably fine in your case, but could become a memory bottleneck with larger datasets. A better approach is to use something like:
-> group(by: { address_id as Address_ID, event.cell as Cell },
select: {
*,
bucket(by: event.ari,
...
Here the *
includes all the attributes in the by:
clause.