Explain the problem as you see it
On Slack I suggested using contextual nodes on inline references to emulate "footnotes" or "comments". The entirety of my suggestion can be seen in this screenshot:
One drawback to this suggestion is that there is no visual indication that an inline reference has contextual content. The user needs to expand the reference to discover whether there is a "footnote/comment".
Why is this a problem for you?
The above is quite an elegant solution, but the fact that there is no way to tell that an inline reference has contextual content undermines its usefulness.
Suggest a solution
There should be some sort of visual indication that an inline reference has contextual content. Perhaps some sort of icon?