Developer Interface¶
Functions¶
- pypub.clean(input_string, tag_dictionary={'body': [], 'em': ['id', 'title'], 'head': [], 'blockquote': ['id'], 'img /': ['align', 'border', 'height', 'id', 'src', 'width'], 'h2': [], 'big': [], 'dd': ['id', 'title'], 'h1': [], 'h6': [], 'h4': [], 'h5': [], 'hr /': ['color', 'id', 'width'], 'ol': ['id'], 'br': ['id'], 'font': ['color', 'face', 'id', 'size'], 'strike': ['class', 'id'], 'a': ['href', 'id', 'name'], 'sub': ['id'], 'b': ['id'], 'span': ['bgcolor', 'title'], 'center': [], 'img': ['align', 'border', 'height', 'id', 'src', 'width'], 'ul': ['class', 'id'], 'i': ['class', 'id'], 'var': [], 'html': [], 'li': ['class', 'id', 'title'], 'p': ['align', 'id', 'title'], 's': ['id', 'style', 'title'], 'dfn': [], 'del': [], 'strong': ['class', 'id'], 'h3': [], 'small': ['id'], 'u': ['id'], 'div': ['align', 'id', 'bgcolor'], 'cite': [], 'sup': ['class', 'id']})¶
Sanitizes HTML. Tags not contained as keys in the tag_dictionary input are removed, and child nodes are recursively moved to parent of removed node. Attributes not contained as arguments in tag_dictionary are removed. Doctype is set to <!DOCTYPE html>.
Parameters: - input_string (basestring) – A (possibly unicode) string representing HTML.
- tag_dictionary (Option[dict]) – A dictionary with tags as keys and attributes as values. This operates as a whitelist–i.e. if a tag isn’t contained, it will be removed. By default, this is set to use the supported tags and attributes for the Amazon Kindle, as found at https://kdp.amazon.com/help?topicId=A1JPUWCSD6F59O
Returns: A (possibly unicode) string representing HTML.
Return type: str
Raises: TypeError – Raised if input_string isn’t a unicode string or string.
Classes¶
- class pypub.Epub(title, creator='pypub', language='en', rights='', publisher='pypub', epub_dir=None)¶
Class representing an epub. Add chapters to this and then output your ebook as an epub file.
Parameters: - title (str) – The title of the epub.
- creator (Option[str]) – The creator of your epub. By default this is pypub.
- language (Option[str]) – The language of your epub.
- rights (Option[str]) – The rights of your epub.
- publisher (Option[str]) – The publisher of your epub. By default this is pypub.
- add_chapter(c)¶
Add a Chapter to your epub.
Parameters: c (Chapter) – A Chapter object representing your chapter. Raises: TypeError – Raised if a Chapter object isn’t supplied to this method.
- create_epub(output_directory, epub_name=None)¶
Create an epub file from this object.
Parameters: - output_directory (str) – Directory to output the epub file to
- epub_name (Option[str]) – The file name of your epub. This should not contain .epub at the end. If this argument is not provided, defaults to the title of the epub.
- class pypub.Chapter(content, title, url=None)¶
Class representing an ebook chapter. By and large this shouldn’t be called directly but rather one should use the class ChapterFactor to instantiate a chapter.
Parameters: - content (str) – The content of the chapter. Should be formatted as xhtml.
- title (str) – The title of the chapter.
- url (Option[str]) – The url of the webpage where the chapter is from if applicable. By default this is None.
- content¶
str
The content of the ebook chapter.
- title¶
str
The title of the chapter.
- url¶
str
The url of the webpage where the chapter is from if applicable.
- html_title¶
str
Title string with special characters replaced with html-safe sequences
- write(file_name)¶
Writes the chapter object to an xhtml file.
Parameters: file_name (str) – The full name of the xhtml file to save to.
- class pypub.ChapterFactory(clean_function=<function clean at 0x7f10804bbc80>)¶
Used to create Chapter objects.Chapter objects can be created from urls, files, and strings.
Parameters: clean_function (Option[function]) – A function used to sanitize raw html to be used in an epub. By default, this is the pypub.clean function. - create_chapter_from_file(file_name, url=None, title=None)¶
Creates a Chapter object from an html or xhtml file. Sanitizes the file’s content using the clean_function method, and saves it as the content of the created chapter.
Parameters: - file_name (string) – The file_name containing the html or xhtml content of the created Chapter
- url (Option[string]) – A url to infer the title of the chapter from
- title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns: A chapter object whose content is the given file and whose title is that provided or inferred from the url
Return type: Chapter
- create_chapter_from_string(html_string, url=None, title=None)¶
Creates a Chapter object from a string. Sanitizes the string using the clean_function method, and saves it as the content of the created chapter.
Parameters: - html_string (string) – The html or xhtml content of the created Chapter
- url (Option[string]) – A url to infer the title of the chapter from
- title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns: A chapter object whose content is the given string and whose title is that provided or inferred from the url
Return type: Chapter
- create_chapter_from_url(url, title=None)¶
Creates a Chapter object from a url. Pulls the webpage from the given url, sanitizes it using the clean_function method, and saves it as the content of the created chapter. Basic webpage loaded before any javascript executed.
Parameters: - url (string) – The url to pull the content of the created Chapter from
- title (Option[string]) – The title of the created Chapter. By default, this is None, in which case the title will try to be inferred from the webpage at the url.
Returns: A chapter object whose content is the webpage at the given url and whose title is that provided or inferred from the url
Return type: Chapter
Raises: ValueError – Raised if unable to connect to url supplied