The post-doc and I got about half-way through our Defining the USS paper today. We fixed up the Methods section and almost all of the Introduction, and walked ourselves through the Results. Maybe because we hadn't looked at them for a couple of weeks, we had some new ideas.
We had been commenting on how the USSs in coding sequences show significantly different USS motifs, depending on which reading frame they're in. But now we look again, we realize that the real story isn't how they are different, but how similar they are. Despite coding for entirely different amino acids, they all have strong matches to the canonical USS core and flanking segments. The differences are so small that they might even be attributable to random effects due to small sample sizes.
This might mean that needing to code for proteins doesn't significantly constrain USS sequences. Rather, USSs accumulate only in places where the existing protein coding constraints meet their need for uptake efficiency.
We also clarified our analysis of the correlations between the bases at different positions in the USS, especially for USSs that are likely to be acting as terminators. Because terminators act by folding into hairpins, we expected to see that the parts of these USSs that come together when folded would show correlations. But they don't. Instead the strong correlations are all between bases that are adjacent in the sequence or separated by only a single base, just as they are in the non-terminator data set. We think this means that the correlations arise from the need for adjacent base pairs to physically cooperate when the USS is kinked during uptake, and that selection for terminator function is much weaker.
15 hours ago in Variety of Life