PyPy Development: PyPy 7.3.2 triple release: python 2.7, 3.6, and 3.7(9 hours, 18 minutes ago)

 

The PyPy team is proud to release version 7.3.2 of PyPy, which includes three different interpreters:
  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.13
  • PyPy3.6: which is an interpreter supporting the syntax and the features of Python 3.6, including the stdlib for CPython 3.6.9.
  • PyPy3.7 alpha: which is our first release of an interpreter supporting the syntax and the features of Python 3.7, including the stdlib for CPython 3.7.9. We call this an alpha release since it is our first. It is based off PyPy 3.6 so issues should be around compatibility and not stability. Please try it out and let us know what is broken or missing. We have not implemented some of the documented changes in the re module, and other pieces are also missing. For more information, see the PyPy 3.7 wiki page

The interpreters are based on much the same codebase, thus the multiple release. This is a micro release, all APIs are compatible with the 7.3.0 (Dec 2019) and 7.3.1 (April 2020) releases, but read on to find out what is new.

Conda Forge now supports PyPy as a python interpreter. The support is quite complete for linux and macOS. This is the result of a lot of hard work and good will on the part of the Conda Forge team. A big shout out to them for taking this on.

Development of PyPy has transitioning to https://foss.heptapod.net/pypy/pypy. This move was covered more extensively in this blog post. We have seen an increase in the number of drive-by contributors who are able to use gitlab + mercurial to create merge requests.

The CFFI backend has been updated to version 1.14.2. We recommend using CFFI rather than c-extensions to interact with C, and using cppyy for performant wrapping of C++ code for Python.

NumPy has begun shipping wheels on PyPI for PyPy, currently for linux 64-bit only. Wheels for PyPy windows will be available from the next NumPy release. Thanks to NumPy for their support.

A new contributor took us up on the challenge to get windows 64-bit support. The work is proceeding on the win64 branch, more help in coding or sponsorship is welcome.

As always, this release fixed several issues and bugs. We strongly recommend updating. Many of the fixes are the direct result of end-user bug reports, so please continue reporting issues as they crop up.

You can find links to download the v7.3.2 releases here:

We would like to thank our donors for the continued support of the PyPy project. Please help support us at Open Collective. If PyPy is not yet good enough for your needs, we are available for direct consulting work.

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better. Since the previous release, we have accepted contributions from 8 new contributors, thanks for pitching in.

If you are a python library maintainer and use c-extensions, please consider making a cffi / cppyy version of your library that would be performant on PyPy. In any case both cibuildwheel and the multibuild system support building wheels for PyPy.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7, 3.6, and 3.7. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This PyPy release supports:

  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux
  • 64-bit ARM machines running Linux.

PyPy does support ARM 32 bit processors, but does not release binaries.

What else is new?

For more information about the 7.3.2 release, see the full changelog.

Please update, and continue to help us make PyPy better.

Cheers,
The PyPy team

 

 

Codementor: Find all the prime numbers less than 'n' in O(n) Time complexity(10 hours, 5 minutes ago)

Given a number n, find all prime numbers in a segment [2;n] in Linear Time Complexity

Mike Driscoll: CodingNomads Tech Talk Series!(1 day ago)

Recently CodingNomads invited me on their Tech Talk series. CodingNomads does online code camps for Python and Java.

The Tech Talks are a series of videos that teach or talk about tech. In my case, I got to talk about my favorite programming language, Python!

The first talk I did was on wxPython. In this video, I show how to create a simple image viewer:

Amazingly, I was invited to do a second talk. This time, I decided it would be fun to do an intro to Jupyter Notebook.

CodingNomads is not a sponsor of Mouse vs Python. They are a neat group that kindly asked me to be a part of their series after I volunteered some of my time to mentor people for them over the summer.

The post CodingNomads Tech Talk Series! appeared first on The Mouse Vs. The Python.

PyBites: 10 Things We Picked Up From Code Reviewing(1 day ago)

We originally sent the following 10 tips to our Friends List; we got requests to post it here for reference, so here you go ...

Ever wondered what you could learn from a code review?

Here are some things we picked up from code reviews that when addressed can make your code a lot cleaner:

  1. Break long functions (methods) into multiple smaller ones - this will make your code more reusable and easier to test.

    Remember each function should do only one thing. Example: a function that parses a csv file, builds up a result list and prints the results does 3 things and should be split accordingly.

  2. Move magic numbers sprinkled in your code, to constants (at the top of your module) - again easier to reuse, more readable, less surprises later on.

  3. Watch out for anything that you put in the global scope, localize variables (data) as much as possible - less unexpected side consequences.

  4. Use flake8 (or black) - more consistent (PEP8 compliant code) is easier to read and earns you more respect from fellow developers (also remember: "how you do the small things determines how you do the big things" - very true with software development).

    This also goes back to developers writing code not only for machines, but also (and more importantly) for other developers. Really long lines might annoy your colleagues that use vsplit to look at multiple code files at once.

  5. Keep try/except blocks narrow (ask yourself: "Are all those lines in between really going to throw this exception?!") and avoid bare exceptions or just using pass or reraising an exception without additional error handling code (e.g. at least log the error).

  6. Leverage the Python language (Pythonic code) - for example replace a try/finally with a with statement, don't overly check conditions (leaping), just try/except (ask for forgiveness). Here is a great article on this topic: Idiomatic Python: EAFP versus LBYL.

    Another example is relying on Python's concept of truthiness (e.g. just do if my_list instead of if len(my_list) > 0).

  7. Use the right data structure - if you check for membership in a big collection it's often better to use a set over a list which would be scanned sequentially and is therefor slower.

  8. Leverage the Standard Library - you don't have to reinvent the wheel.

    For example if you have a collections.Counter object you don't need to use max on it, you can use its most_common method. Counting values manually? You can use sum that receives an iterable. The all/any builtins are wonderful. Or for more complex operations, itertools is an excellent module.

  9. Long if-elif-elif-elif-elif-else's are quite ugly and hard to maintain. You can beautifully refactor those using dictionaries (mappings) - less lines of code, easier to maintain.

  10. Flat is better than nested (Zen of Python, btw pipe import this to your printer now ...) - closely related to number 1., but worth emphasizing: if you have a for in a for, and the inner for has a bunch of nested ifs, it's time to rethink what you are trying to do, because this code will be very hard to test and maintain in the future.

Hope that helps! What cool tips have you learned from going through code reviews? Comment below ...


Keep calm and code in Python!

-- Bob

PyCharm: Webinar: “virtualenv – a deep dive” with Bernat Gabor(1 day, 2 hours ago)

virtualenv is a tool that builds virtual environments for Python. It was first created in September 2007 and just went through a rewrite from scratch. Did you ever want to know what parts virtual environments can be broken down into? Or how they work? And how does virtualenv differ from the Python builtin venv? This is the webinar you want.

  • Wednesday, October 7
  • 17h (CET) / 11 AM (EDT)
  • Register here
  • Intermediate audience – for those who want to get a better understanding of the virtual environments in Python

Speaking To You

Bernat Gabor has been using Python since 2011 and has been a busy participant in the open-source Python community. He is the maintainer of the virtualenv package, which allows the creation of Python virtual environments for all Python versions and interpreter types, including CPython, Jython, and PyPy. He also maintains tox and has contributed to various other Python packages.

Bernat works at Bloomberg, a technology company with more than 6,000 software engineers around the world – 2,000 of whom use Python in their daily roles. Finally, he is part of the company’s Python Guild, a group of engineers dedicated to improving the adoption, usage, and best practices of Python within the company.

Stack Abuse: Facial Detection in Python with OpenCV(1 day, 4 hours ago)

Introduction

Facial detection is a powerful and common use-case of Machine Learning. It can be used to automatize manual tasks such as school attendance and law enforcement. In the other hand, it can be used for biometric authorization.

In this article, we'll perform facial detection in Python, using OpenCV.

OpenCV

OpenCV is one of the most popular computer vision libraries. It was written in C and C++ and also provides support for Python, besides Java and MATLAB. While it's not the fastest library out there, it's easy to work with and provides a high-level interface, allowing developers to write stable code.

Let's install OpenCV so that we can use it in our Python code:

$ pip install opencv-contrib-python

Alternatively, you can install opencv-python for just the main modules of OpenCV. The opencv-contrib-python contains the main modules as well as the contrib modules which provide extended functionality.

Detecting Faces in an Image Using OpenCV

With OpenCV installed, we can import it as cv2 in our code.

To read an image in, we will use the imread() function, along with the path to the image we want to process. The imread() function simply loads the image from the specified file in an ndarray. If the image could not be read, for example in case of a missing file or an unsupported format, the function will return None.

We will be using an image from Kaggle dataset:

import cv2

path_to_image = 'Parade_12.jpg'
original_image = cv2.imread(path_to_image)

The full RGB information isn't necessary for facial detection. The color holds a lot of irrelevant information on the image, so it's more efficient to just remove it and work with a grayscale image. Additionally, the Viola-Jones algorithm, which works under the hood with OpenCV, checks the difference in intensity of an image's area. Grayscale images point this difference out more dramatically.

Note: In the case of color images, the decoded images will have the channels stored in BGR order, so when changing them to grayscale, we need to use the cv2.COLOR_BGR2GRAY flag:

image = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)

This could have been done directly when using imread(), by setting the cv2.IMREAD_GRAYSCALE flag:

original_image = cv2.imread(path_to_image, cv2.IMREAD_GRAYSCALE)

The OpenCV library comes with several pre-trained classifiers that are trained to find different things, like faces, eyes, smiles, upper bodies, etc.

The Haar features for detecting these objects are stored as XML, and depending on how you installed OpenCV, can most often be found in Lib\site-packages\cv2\data. They can also be found in the OpenCV GitHub repository.

In order to access them from code, you can use a cv2.data.haarcascades and add the name of the XML file you'd like to use.

We can choose which Haar features we want to use for our object detection, by adding the file path to the CascadeClassifier() constructor, which uses pre-trained models for object detection:

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

Now, we can use this face_cascade object to detect faces in the Image:

detected_faces = face_cascade.detectMultiScale(image=image, scaleFactor=1.3, minNeighbors=4)

When object detection models are trained, they are trained to detect faces of a certain size and might miss faces that are bigger or smaller than they expect. With this in mind, the image is resized several times in the hopes that a face will end up being a "detectable" size. The scaleFactor lets OpenCV know how much to scale the images. In our case, 1.3 means that it can scale 30% down to try and match the faces better.

As for the minNeighbors parameter, it's used to control the number of false positives and false negatives. It defines the minimum number of positive rectangles (detect facial features) that need to be adjacent to a positive rectangle in order for it to be considered actually positive. If minNeighbors is set to 0, the slightest hint of a face will be counted as a definitive face, even if no other facial features are detected near it.

Both the scaleFactor and minNeighbors parameters are somewhat arbitrary and set experimentally. We have chosen values that worked well for us, and gave no false positives, with the trade-off of more false negatives (undetected faces).

The detectMultiScale() method returns a list of rectangles of all the detected objects (faces in our first case). Each element in the list represents a unique face. This list contains tuples, (x, y, w, h), where the x, y values represent the top-left coordinates of the rectangle, while the w, h values represent the width and height of the rectangle, respectively.

We can use the returned list of rectangles, and use the cv2.rectangle() function to easily draw the rectangles where a face was detected. Keep in mind that the color provided needs to be a tuple in RGB order:

for (x, y, width, height) in detected_faces:
    cv2.rectangle(
        image,
        (x, y),
        (x + width, y + height),
        color,
        thickness=2
    )

Now, let's put that all together:

import cv2

def draw_found_faces(detected, image, color: tuple):
    for (x, y, width, height) in detected:
        cv2.rectangle(
            image,
            (x, y),
            (x + width, y + height),
            color,
            thickness=2
        )

path_to_image = 'Parade_12.jpg'
original_image = cv2.imread(path_to_image)

if original_image is not None:
    # Convert image to grayscale
    image = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY)

    # Create Cascade Classifiers
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
    profile_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")
    
    # Detect faces using the classifiers
    detected_faces = face_cascade.detectMultiScale(image=image, scaleFactor=1.3, minNeighbors=4)
    detected_profiles = profile_cascade.detectMultiScale(image=image, scaleFactor=1.3, minNeighbors=4)

    # Filter out profiles
    profiles_not_faces = [x for x in detected_profiles if x not in detected_faces]

    # Draw rectangles around faces on the original, colored image
    draw_found_faces(detected_faces, original_image, (0, 255, 0)) # RGB - green
    draw_found_faces(detected_profiles, original_image, (0, 0, 255)) # RGB - red

    # Open a window to display the results
    cv2.imshow(f'Detected Faces in {path_to_image}', original_image)
    # The window will close as soon as any key is pressed (not a mouse click)
    cv2.waitKey(0) 
    cv2.destroyAllWindows()
else:
    print(f'En error occurred while trying to load {path_to_image}')

We used two different models on this picture. The default model for detecting front-facing faces, and a model built to better detect faces looking to the side.

Faces detected with the frontalface model are outlined in green, and faces detected with the profileface model are outlined with red. Most of the faces the first model found would have also been found by the second, so we only drew red rectangles where the profileface model detected a face but frontalface didn't:

profiles_not_faces = [x for x in detected_profiles if x not in detected_faces]

The imshow() method simply shows the passed image in a window with the provided title. With the picture we selected, this would provide the following output:

frontal and profile face detection

Using different values for scaleFactor and minNeighbors will give us different results. For example, using scaleFactor = 1.1 and minNeighbors = 4 gives us more false positives and true positives with both models:

face detection lower scale factor

We can see that the algorithm isn't perfect, but it is very efficient. This is most notable when working with real-time data, such as a video feed from a webcam.

Real-Time Face Detection Using a Webcam

Video streams are simply streams of images. With the efficiency of the Viola-Jones algorithm, we can do face detection in real-time.

The steps we need to take are very similar to the previous example with only one image - we'll be performing this on each image in the stream.

To get the video stream, we'll use the cv2.VideoCapture class. The constructor for this class takes an integer parameter representing the video stream. On most machines, the webcam can be accessed by passing 0, but on machines with several video streams, you might need to try out different values.

Next, we need to read individual images from the input stream. This is done with the read() function, which returns retval and image. The image is simply the retrieved frame. The retval return value is used to detect whether a frame has been retrieved or not, and will be False if it hasn't.

However, it tends to be inconsistent with video input streams (doesn't detect that the webcam has been disconnected, for example), so we will be ignoring this value.

Let's go ahead and modify the previous code to handle a video stream:

import cv2

def draw_found_faces(detected, image, color: tuple):
    for (x, y, width, height) in detected:
        cv2.rectangle(
            image,
            (x, y),
            (x + width, y + height),
            color,
            thickness=2
        )

# Capturing the Video Stream
video_capture = cv2.VideoCapture(0)

# Creating the cascade objects
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_eye_tree_eyeglasses.xml")

while True:
    # Get individual frame
    _, frame = video_capture.read()
    # Covert the frame to grayscale
    grayscale_image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
	# Detect all the faces in that frame
    detected_faces = face_cascade.detectMultiScale(image=grayscale_image, scaleFactor=1.3, minNeighbors=4)
    detected_eyes = eye_cascade.detectMultiScale(image=grayscale_image, scaleFactor=1.3, minNeighbors=4)
    draw_found_faces(detected_faces, frame, (0, 0, 255))
    draw_found_faces(detected_eyes, frame, (0, 255, 0))

    # Display the updated frame as a video stream
    cv2.imshow('Webcam Face Detection', frame)

    # Press the ESC key to exit the loop
    # 27 is the code for the ESC key
    if cv2.waitKey(1) == 27:
        break

# Releasing the webcam resource
video_capture.release()

# Destroy the window that was showing the video stream
cv2.destroyAllWindows()

Conclusion

In this article, we've created a facial detection application using Python and OpenCV.

Using the OpenCV library is very straight-forward for basic object detection programs. Experimentally adjusting the scaleFactor and minNeighbors parameters for the types of images you'd like to process can give pretty accurate results very efficiently.

Andrew Dalke: chemfp's chemistry toolkit I/O API(1 day, 5 hours ago)

This is part of a series of essays about working with SD files at the record and simple text level. In the last two essays I showed examples of using chemfp to process SDF records and to read two record data items. In this essay I'll introduce chemfp's chemistry toolkit I/O API, which I developed to have a consistent way to handle structure input and output when working with the OEChem, RDKit, and Open Babel toolkits.

You can follow along yourself by installing chemfp (under the Base License Agreement) using:

python -m pip install chemfp -i https://chemfp.com/packages/

chemfp is a package for high-performance cheminformatics fingerprint similarity search. You'll also need at least one of the chemistry toolkits I mentioned.

Add an SDF data item using the native APIs

Every cheminformatics toolkit deserving of that description can add properties to an SDF record. Here's how to do it in several different toolkits, using the input file chebi16594.sdf (a modified version of CHEBI:16594) which contains the following:

CHEBI: 16594

Shortened for demonstration purposes.
  9  8  0  0  0  0  0  0  0  0  2 V2000
   19.3348  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   20.4867  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   22.7903  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   23.9421  -19.3671    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   22.7903  -17.3721    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   18.1830  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.3348  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  3  2  1  0  0  0  0
  4  3  1  0  0  0  0
  5  4  1  0  0  0  0
  6  4  2  0  0  0  0
  7  3  1  0  0  0  0
  8  1  1  0  0  0  0
  9  1  1  0  0  0  0
M  CHG  1   5  -1
M  END
> <ChEBI ID>
CHEBI:16594

> <ChEBI Name>
2,4-diaminopentanoate

$$$$

For each toolkit I'll add an "MW" data item where the value is the molecular weight, as determined by the toolkit.

OEChem

For OEChem I create an oemolistream ("OpenEye molecule input stream") with the given filename. By default it auto-detects the format from the filename extension. The oemolistream's GetOEGraphMols() returns a molecule iterator. I'll use next() to get the first molecule, then iterate over the data items to report the existing data items:

>>> from openeye.oechem import *
>>> mol = next(oemolistream("chebi16594.sdf").GetOEGraphMols())
>>> [(data_item.GetTag(), data_item.GetValue()) for data_item in OEGetSDDataPairs(mol)]
[('ChEBI ID', 'CHEBI:16594'), ('ChEBI Name', '2,4-diaminopentanoate')]

The OECalculateMolecularWeight() function computes the molecule weight, so I'll use that to add an "MW" data item (with the weight rounded to 2 decimal digits), check that the item was added, then write the result to stdout in SD format:

>>> OECalculateMolecularWeight(mol)
131.15303999999998
>>> OEAddSDData(mol, "MW", f"{OECalculateMolecularWeight(mol):.2f}")
True
>>> [(data_item.GetTag(), data_item.GetValue()) for data_item in OEGetSDDataPairs(mol)]
[('ChEBI ID', 'CHEBI:16594'), ('ChEBI Name', '2,4-diaminopentanoate'), ('MW', '131.15')]
>>> ofs = oemolostream()
>>> ofs.SetFormat(OEFormat_SDF)
True
>>> OEWriteMolecule(ofs, mol)
CHEBI:16594
  -OEChem-09242013332D
Shortened for demonstration purposes.
  9  8  0     0  0  0  0  0  0999 V2000
   19.3348  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   20.4867  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   22.7903  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   23.9421  -19.3671    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   22.7903  -17.3721    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   18.1830  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.3348  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  1  0  0  0  0
  4  6  2  0  0  0  0
  3  7  1  0  0  0  0
  1  8  1  0  0  0  0
  1  9  1  0  0  0  0
M  CHG  1   5  -1
M  END
> <ChEBI ID>
CHEBI:16594

> <ChEBI Name>
2,4-diaminopentanoate

> <MW>
131.15

$$$$
0

That final 0 is the interactive Python shell printing the return value of OEWriteMolecule. It is not part of what OEChem wrote to stdout.

RDKit

In RDKit you need to know which file reader to use for a given file, in this case, FowardSDMolSupplier(). (An upcoming release will offer a generic reader function which dispatches to the appropriate file reader.) The reader is a molecule iterator so again I'll use next() to get the first molecule, then see which data items are present:

>>> from rdkit import Chem
>>> mol = next(Chem.ForwardSDMolSupplier("chebi16594.sdf"))
>>> mol.GetPropsAsDict()
{'ChEBI ID': 'CHEBI:16594', 'ChEBI Name': '2,4-diaminopentanoate'}

I'll use Descriptors.MolWt() to compute the molecular weight and set the "MW" data item. You can see that even though I set the MW as a string, GetPropsAsDict() returns it as a float. This is because GetPropsAsDict() will try to coerce strings which look like floats or integers into native Python floats or integers (including "nan" and "-inf"). To prevent coercion, use the GetProp() method:

>>> from rdkit.Chem import Descriptors
>>> Descriptors.MolWt(mol)
131.155
>>> mol.SetProp("MW", f"{Descriptors.MolWt(mol):.2f}")
>>> mol.GetPropsAsDict()
{'ChEBI ID': 'CHEBI:16594', 'ChEBI Name': '2,4-diaminopentanoate', 'MW': 131.16}
>>> [(name, mol.GetProp(name)) for name in mol.GetPropNames()]
[('ChEBI ID', 'CHEBI:16594'), ('ChEBI Name', '2,4-diaminopentanoate'), ('MW', '131.16')]

Finally, I'll write the molecule to stdout.

>>> import sys
>>> writer = Chem.SDWriter(sys.stdout)
>>> writer.write(mol)
>>> writer.close()
CHEBI:16594
     RDKit          2D

  9  8  0  0  0  0  0  0  0  0999 V2000
   19.3348  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   20.4867  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -19.3671    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   22.7903  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   23.9421  -19.3671    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   22.7903  -17.3721    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   18.1830  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.3348  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0
  3  2  1  0
  4  3  1  0
  5  4  1  0
  6  4  2  0
  7  3  1  0
  8  1  1  0
  9  1  1  0
M  CHG  1   5  -1
M  END
>  <ChEBI ID>  (1)
CHEBI:16594

>  <ChEBI Name>  (1)
2,4-diaminopentanoate

>  <MW>  (1)
131.16

$$$$

Open Babel

Open Babel, because of the pybel interface, is the easiest of the bunch. The following uses Open Babel 3.0, which moved pybel to a submodule of openbabel. I ask readfile() to open the given file as "sdf" format. That returns an iterator. I get the first molecule. It has a special "data" attribute with the SD data items combined with some internal Open Babel data items (RDKit does the same thing, but by default they are hidden)

>>> from openbabel import pybel
>>> mol = next(pybel.readfile("sdf", "chebi16594.sdf"))
>>> mol.data
{'MOL Chiral Flag': '0', 'ChEBI ID': 'CHEBI:16594', 'ChEBI Name': '2,4-diaminopentanoate',
'OpenBabel Symmetry Classes': '8 5 7 9 1 6 3 2 4'}

Pybel molecules have a molwt attribute containing the molecule weight, or I can compute it via the underlying OpenBabel OBMol object. I save it to the data attribute object, export the contents as an string in "sdf" format, and write the output to stdout, asking print() to not include the terminal newline:

>>> mol.molwt
131.15304 
>>> mol.OBMol.GetMolWt()
131.15304 
>>> mol.data["MW"] = f"{mol.molwt:.2f}"
>>> mol.data
{'MOL Chiral Flag': '0', 'ChEBI ID': 'CHEBI:16594', 'ChEBI Name': '2,4-diaminopentanoate',
'OpenBabel Symmetry Classes': '8 5 7 9 1 6 3 2 4', 'MW': '131.15'}
>>> print(mol.write("sdf"), end="")
CHEBI:16594
 OpenBabel09242014012D
Shortened for demonstration purposes.
  9  8  0  0  0  0  0  0  0  0999 V2000
   19.3348  -19.3671    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
   20.4867  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -19.3671    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
   22.7903  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   23.9421  -19.3671    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   22.7903  -17.3721    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6385  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   18.1830  -18.7021    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.3348  -20.6971    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  3  2  1  0  0  0  0
  4  3  1  0  0  0  0
  5  4  1  0  0  0  0
  6  4  2  0  0  0  0
  7  3  1  0  0  0  0
  8  1  1  0  0  0  0
  9  1  1  0  0  0  0
M  CHG  1   5  -1
M  END
>  <ChEBI ID>
CHEBI:16594

>  <ChEBI Name>
2,4-diaminopentanoate

>  <MW>
131.15

$$$$

chemfp's chemistry toolkit API

Chemfp supports Open Babel, OEChem+OEGraphSim, and RDKit. Each toolkit has its own way of handling chemical structure I/O. Following the fundamental theorem of software engineering, I "solved" the problem by introducing an extra level of indirection - I created a chemistry toolkit I/O API and developed wrapper implementations for each of the underlying chemistry toolkits.

Here's a side-by-side comparison:

Comparison of toolkit native and chemfp wrapper APIs
OEChem
nativechemfp
from openeye.oechem import *

mol = next(oemolistream("chebi16594.sdf").GetOEGraphMols())
mw = OECalculateMolecularWeight(mol)
OEAddSDData(mol, "MW", f"{mw:.2f}")
ofs = oemolostream()
ofs.SetFormat(OEFormat_SDF)
OEWriteMolecule(ofs, mol)
from chemfp import openeye_toolkit as OETK
from openeye.oechem import OECalculateMolecularWeight

mol = next(OETK.read_molecules("chebi16594.sdf"))
mw = OECalculateMolecularWeight(mol)
OETK.add_tag(mol, "MW", f"{mw:.2f}")
print(OETK.create_string(mol, "sdf"), end="")
RDKit
nativechemfp
import sys
from rdkit import Chem
from rdkit.Chem import Descriptors

mol = next(Chem.ForwardSDMolSupplier("chebi16594.sdf"))
mw = Descriptors.MolWt(mol)
mol.SetProp("MW", f"{mw:.2f}")

writer = Chem.SDWriter(sys.stdout)
writer.write(mol)
writer.close()
from chemfp import rdkit_toolkit as RDTK
from rdkit.Chem import Descriptors

mol = next(RDTK.read_molecules("chebi16594.sdf"))
mw = Descriptors.MolWt(mol)
RDTK.add_tag(mol, "MW", f"{mw:.2f}")
print(RDTK.create_string(mol, "sdf"), end="")
Open Babel
native (pybel)chemfp
from openbabel import pybel

mol = next(pybel.readfile("sdf", "chebi16594.sdf"))
mol.data["MW"] = f"{mol.molwt:.2f}"
print(mol.write("sdf"), end="")
from chemfp import openbabel_toolkit as OBTK

mol = next(OBTK.read_molecules("chebi16594.sdf"))
OBTK.add_tag(mol, "MW", f"{mol.GetMolWt():.2f}")
print(OBTK.create_string(mol, "sdf"), end="")

The point is not that chemfp's toolkit API is all that much shorter than the underlying toolkit API, but rather that it's consistent across the three toolkits. This becomes more useful when you start working with more than one toolkit and have to remember the nuances of each one.

Format and format option discovery

One of the important features I wanted in chemfp was full support for all of formats supported by the underlying toolkits, and all of the options for each of those toolkits. And I wanted to make that information discoverable. For example, the following shows the formats available through chemfp for each toolkit:

>>> from chemfp import rdkit_toolkit
>>> print(", ".join(fmt.name for fmt in rdkit_toolkit.get_formats()))
smi, can, usm, sdf, smistring, canstring, usmstring, molfile,
rdbinmol, fasta, sequence, helm, mol2, pdb, xyz, mae, inchi, inchikey,
inchistring, inchikeystring
>>> from chemfp import openeye_toolkit
>>> print(", ".join(fmt.name for fmt in openeye_toolkit.get_formats()))
smi, usm, can, sdf, molfile, skc, mol2, mol2h, sln, mmod, pdb, xyz,
cdx, mopac, mf, oeb, inchi, inchikey, oez, cif, mmcif, fasta,
sequence, csv, json, smistring, canstring, usmstring, slnstring,
inchistring, inchikeystring
>>> from chemfp import openbabel_toolkit
>>> print(", ".join(fmt.name for fmt in openbabel_toolkit.get_formats()))
smi, can, usm, smistring, canstring, usmstring, sdf, inchi, inchikey,
inchistring, inchikeystring, fa, abinit, dalmol, pdbqt, mmcif, xsf,
    ... many lines removed ...
acesout, POSCAR, pcjson, gzmat, mae, pointcloud, gamess, mopcrt,
confabreport

For each format type, there are properties to say of it is an input format or output format (InChIKey, for example, is only an output format), or if the format can handle file I/O or only handle string-based I/O. (The "smistring" format can only parse a SMILES string while the "smi" format can parse a SMILES file, specified by filename or by the contents in a string.)

There are also ways to figure out the default values for the readers and writers:

>>> from chemfp import rdkit_toolkit
>>> fmt = rdkit_toolkit.get_format("sdf")
>>> fmt.get_default_reader_args()
{'sanitize': True, 'removeHs': True, 'strictParsing': True, 'includeTags': True}
>>> fmt.get_default_writer_args()
{'includeStereo': False, 'kekulize': True, 'v3k': False}

Reader and writer args

Those reader_args and writer_args can be passed to the input and output methods. For example, the RDKit writer's v3k writer_arg, if True, asks RDKit to always generate a V3000 record, even if the molecule can be expressed as a V2000 record:

>>> from chemfp import rdkit_toolkit
>>> mol = rdkit_toolkit.parse_molecule("C#N", "smistring")
>>> print(rdkit_toolkit.create_string(mol, "sdf"))

     RDKit

  2  1  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  3  0
M  END
$$$$

>>> print(rdkit_toolkit.create_string(mol, "sdf", writer_args={"v3k": True}))

     RDKit

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 2 1 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C 0 0 0 0
M  V30 2 N 0 0 0 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 3 1 2
M  V30 END BOND
M  V30 END CTAB
M  END
$$$$

and here's an example where I enable OEChem's "strict" SMILES parser so that multiple sequential bond symbols are not accepted:

>>> from chemfp import openeye_toolkit
>>> mol = openeye_toolkit.parse_molecule("C=#-C", "smistring")
>>> openeye_toolkit.create_string(mol, "smistring")
'CC'
>>> mol = openeye_toolkit.parse_molecule("C=#-C", "smistring", reader_args={"flavor": "Default|Strict"})
Warning: Problem parsing SMILES:
Warning: Bond without end atom.
Warning: C=#-C
Warning:   ^

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
       .... many lines omitted ... 
  File "<string>", line 1, in raise_tb  
chemfp.ParseError: OEChem cannot parse the smistring record: 'C=#-C'

The OpenEye flavor reader and writer args support the raw OEChem integer flags, as well as a string-based syntax to express them symbolically. In this case Default|Strict says to start with the default flags for this format then add the Strict option to it.

OEChem flavor help

The format API doesn't have a way to get detailed help about each option. For most cases it's not hard to guess from the name and Python data type. This doesn't work for OEChem's flavor options. The quickest way to get interactive help is to pass an invalid flavor and see the error message:

>>> from chemfp import openeye_toolkit
>>> openeye_toolkit.parse_molecule(mol, "sdf", reader_args={"flavor": "x"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
       .... many lines omitted ... 
    raise err
ValueError: OEChem sdf format does not support the 'x' flavor option.
Available flavors are: FixBondMarks, SuppressEmptyMolSkip, SuppressImp2ExpENHSTE

Why did I develop chemfp's toolkit API?

I developed the API starting with chemfp 2.0 because there was a clear need to allow users to configure input processing in a way that respected what the underlying toolkits could do.

As a somewhat extreme example, the most recent version of RDKit supports the FASTA format, with a flavor option to configure it to interpret the input as protein (0 or 1), RNA (2-5), or DNA (6-9), with different values for L- or L+D- amino acids, or different options for 3' and 5' caps on the nucleotides. Thus, AA could be dialanine or a nucleotide sequence with two adenines. The following gives an example of computing the MACCS fingerprint for both cases, where I use the -R parameter to specify a reader argument:

% printf ">dialanine\nAA\n" | rdkit2fps --maccs -R flavor=0 --in fasta | tail -1
00000000000020000040084800201004842452fa09	dialanine
% printf ">diadenine capped RNA\nAA\n" | rdkit2fps --maccs -R flavor=5 --in fasta | tail -1
000000102084002191d41ccf33b3907bde6feb7d1f	diadenine capped RNA

There's less need to handle writer options since chemfp doesn't really need to write structure files. The closest is if the fingerprints or the results of a similarity search are added to an SDF output, which will be the topic of tomorrow's essay.

But really, that part of chemfp is probably more of a vanity project than anything else. I have some strong opinions on what a good API should be, and had the chance to implement it, show it handles the needs of multiple chemistry toolkits, and document it. Just like I drew some inspiration from pybel, perhaps others will draw some inspiration from the chemfp API.

I personally find it really satisfying to be able to develop, say, a HELM to SLN conversion tool which uses RDKit to convert the HELM string into an SDF record, then OEChem to convert the SDF record to SLN.

>>> from chemfp import rdkit_toolkit, openeye_toolkit
>>>
>>> def helm_to_sln(helm_str):
...   rdmol = rdkit_toolkit.parse_molecule(helm_str, "helm") 
...   sdf_record = rdkit_toolkit.create_string(rdmol, "sdf")
...   oemol = openeye_toolkit.parse_molecule(sdf_record, "sdf")
...   return openeye_toolkit.create_string(oemol, "slnstring")
...
>>> helm_to_sln("PEPTIDE1{[dA].[dN].[dD].[dR].[dE].[dW]}$$$$")
'NH2CH(C(=O)NHCH(C(=O)NHCH(C(=O)NHCH(C(=O)NHCH(C(=O)NHCH(C(=O)
OH)CH2C[1]=CHNHC[2]:C(@1):CH:CH:CH:CH:@2)CH2CH2C(=O)OH)CH2CH2C
H2NHC(=NH)NH2)CH2C(=O)OH)CH2C(=O)NH2)CH3'

This specific function may not be useful, but the ability to specify this sort of work in only a few lines makes it easier to try out new ideas.

Abhijeet Pal: Sending Emails With CSV Attachment Using Python(1 day, 9 hours ago)

In this tutorial, we will learn how to send emails with CSV attachments using Python. Pre-Requirements I am assuming you already have an SMTP server setup if not you can use the Gmail SMTP or Maligun or anything similar to ... Read more

The post Sending Emails With CSV Attachment Using Python appeared first on Django Central.

Python Insider: Python 3.8.6 is now available(1 day, 10 hours ago)

Python 3.8.6 is the sixth maintenance release of Python 3.8. Go get it here:

https://www.python.org/downloads/release/python-386/

 

Maintenance releases for the 3.8 series will continue at regular bi-monthly intervals, with 3.8.7 planned for mid-November 2020.

What’s new?

The Python 3.8 series is the newest feature release of the Python language, and it contains many new features and optimizations. See the “What’s New in Python 3.8” document for more information about features included in the 3.8 series.

Python 3.8 is becoming more stable. Our bugfix releases are becoming smaller as we progress. This one contains 122 changes, less than two thirds of the previous average for a new release. Detailed information about all changes made in version 3.8.6 specifically can be found in its change log. Note that compared to 3.8.5 this release also contains all changes present in 3.8.6rc1.

We hope you enjoy Python 3.8!

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.

Your friendly release team,
Ned Deily @nad
Steve Dower @steve.dower
Łukasz Langa @ambv