Developer Interface

Functions

pypub.clean(input_string, tag_dictionary={'body': [], 'em': ['id', 'title'], 'head': [], 'blockquote': ['id'], 'img /': ['align', 'border', 'height', 'id', 'src', 'width'], 'h2': [], 'big': [], 'dd': ['id', 'title'], 'h1': [], 'h6': [], 'h4': [], 'h5': [], 'hr /': ['color', 'id', 'width'], 'ol': ['id'], 'br': ['id'], 'font': ['color', 'face', 'id', 'size'], 'strike': ['class', 'id'], 'a': ['href', 'id', 'name'], 'sub': ['id'], 'b': ['id'], 'span': ['bgcolor', 'title'], 'center': [], 'img': ['align', 'border', 'height', 'id', 'src', 'width'], 'ul': ['class', 'id'], 'i': ['class', 'id'], 'var': [], 'html': [], 'li': ['class', 'id', 'title'], 'p': ['align', 'id', 'title'], 's': ['id', 'style', 'title'], 'dfn': [], 'del': [], 'strong': ['class', 'id'], 'h3': [], 'small': ['id'], 'u': ['id'], 'div': ['align', 'id', 'bgcolor'], 'cite': [], 'sup': ['class', 'id']})

Sanitizes HTML. Tags not contained as keys in the tag_dictionary input are removed, and child nodes are recursively moved to parent of removed node. Attributes not contained as arguments in tag_dictionary are removed. Doctype is set to <!DOCTYPE html>.

Parameters:
  • input_string (basestring) – A (possibly unicode) string representing HTML.
  • tag_dictionary (Option[dict]) – A dictionary with tags as keys and attributes as values. This operates as a whitelist–i.e. if a tag isn’t contained, it will be removed. By default, this is set to use the supported tags and attributes for the Amazon Kindle, as found at https://kdp.amazon.com/help?topicId=A1JPUWCSD6F59O
Returns:

A (possibly unicode) string representing HTML.

Return type:

str

Raises:

TypeError – Raised if input_string isn’t a unicode string or string.

Classes

class pypub.Epub(title, creator='pypub', language='en', rights='', publisher='pypub', epub_dir=None)

Class representing an epub. Add chapters to this and then output your ebook as an epub file.

Parameters:
  • title (str) – The title of the epub.
  • creator (Option[str]) – The creator of your epub. By default this is pypub.
  • language (Option[str]) – The language of your epub.
  • rights (Option[str]) – The rights of your epub.
  • publisher (Option[str]) – The publisher of your epub. By default this is pypub.
add_chapter(c)

Add a Chapter to your epub.

Parameters:c (Chapter) – A Chapter object representing your chapter.
Raises:TypeError – Raised if a Chapter object isn’t supplied to this method.
create_epub(output_directory, epub_name=None)

Create an epub file from this object.

Parameters:
  • output_directory (str) – Directory to output the epub file to
  • epub_name (Option[str]) – The file name of your epub. This should not contain .epub at the end. If this argument is not provided, defaults to the title of the epub.
class pypub.Chapter(content, title, url=None)

Class representing an ebook chapter. By and large this shouldn’t be called directly but rather one should use the class ChapterFactor to instantiate a chapter.

Parameters:
  • content (str) – The content of the chapter. Should be formatted as xhtml.
  • title (str) – The title of the chapter.
  • url (Option[str]) – The url of the webpage where the chapter is from if applicable. By default this is None.
content

str

The content of the ebook chapter.

title

str

The title of the chapter.

url

str

The url of the webpage where the chapter is from if applicable.

html_title

str

Title string with special characters replaced with html-safe sequences

write(file_name)

Writes the chapter object to an xhtml file.

Parameters:file_name (str) – The full name of the xhtml file to save to.
class pypub.ChapterFactory(clean_function=<function clean at 0x7f10804bbc80>)

Used to create Chapter objects.Chapter objects can be created from urls, files, and strings.

Parameters:clean_function (Option[function]) – A function used to sanitize raw html to be used in an epub. By default, this is the pypub.clean function.
create_chapter_from_file(file_name, url=None, title=None)

Creates a Chapter object from an html or xhtml file. Sanitizes the file’s content using the clean_function method, and saves it as the content of the created chapter.

Parameters:
  • file_name (string) – The file_name containing the html or xhtml content of the created Chapter
  • url (Option[string]) – A url to infer the title of the chapter from
  • title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns:

A chapter object whose content is the given file and whose title is that provided or inferred from the url

Return type:

Chapter

create_chapter_from_string(html_string, url=None, title=None)

Creates a Chapter object from a string. Sanitizes the string using the clean_function method, and saves it as the content of the created chapter.

Parameters:
  • html_string (string) – The html or xhtml content of the created Chapter
  • url (Option[string]) – A url to infer the title of the chapter from
  • title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns:

A chapter object whose content is the given string and whose title is that provided or inferred from the url

Return type:

Chapter

create_chapter_from_url(url, title=None)

Creates a Chapter object from a url. Pulls the webpage from the given url, sanitizes it using the clean_function method, and saves it as the content of the created chapter. Basic webpage loaded before any javascript executed.

Parameters:
  • url (string) – The url to pull the content of the created Chapter from
  • title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns:

A chapter object whose content is the webpage at the given url and whose title is that provided or inferred from the url

Return type:

Chapter

Raises:

ValueError – Raised if unable to connect to url supplied