Skip to contents

This function generates sequence features from given intervals. It calculates dinucleotide frequencies and GC content, normalizes these features, and returns them as a matrix.

Usage

create_sequence_features(
  intervals,
  size = NULL,
  normalize = TRUE,
  norm_quant = 0.05
)

Arguments

intervals

A data frame containing interval information with columns start and end.

size

The size of the sequences to extract. If NULL, the size is calculated from the first interval.

normalize

A logical value indicating whether to normalize the features to the range 0-10.

norm_quant

The quantile to use for normalization. Values below this quantile and above 1-quantile are truncated, and the rest are linearly scaled to 0-10.

Value

A matrix of normalized sequence features including GC content and dinucleotide frequencies.