[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Outreachy project - Xen Code Review Dashboard
Hi Jesus, While using the Elasticsearch python library (https://elasticsearch-py.readthedocs.io/en/master/) to add mbox messages to an index, I would get a UnicodeEncodeError: "'utf-8' codec can't encode character '\udca0' in position 767: surrogates not allowed". Investigating in Grimoire elk https://github.com/grim oirelab/GrimoireELK/blob/96b00bc682485976104a6825ca63ae0 8639deacc/grimoire_elk/elk/mbox.py#L200 seems to show that perhaps that tool instead uses Latin-1 encoding, but I found that to then produce a serialization error (their custom error message: "Unable to serialize %r (type: %s)"). I suppose this is because now it's bytes; of course, converting back to string after encoding just cycles back to the first error. As somewhat of a Python newbie I don't really know how to tackle this! My thought atm is to splice the offending character out of the message. And to clarify, my understanding is that the final result of this task is an index of Xen data, with two types: commits and messages. Each commit document should contain its original information from git, plus the name of the branch it was developed in. And should only the mbox messages which appear to be associated with a specific commit exist in the final index? Is there some key information in messages that is supposed to indicate the association of a given commit with a git branch? I would be grateful if you could specify the end goal a little more. :D Thanks so much! Heather On Sat, Apr 8, 2017 at 10:02 AM, Jesus M. Gonzalez-Barahona <jgb@xxxxxxxxxxxx> wrote: On Fri, 2017-04-07 at 15:49 -0700, Heather Booker wrote: _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |