使用 CID 字体从 PDF 中提取文本

2023-12-20

我正在编写一个 Web 应用程序，用于提取 PDF 中每个页面顶部的一行。 PDF 来自产品的不同版本，并且可以通过多个 PDF 打印机，同样具有不同的版本和不同的设置。

到目前为止，我已经使用 PDFSharp 和 iTextSharp 成功使其适用于所有版本的 PDF。我的难题是具有 CID 字体 (Identity-H) 的文档。

我已经编写了一个部分解析器来查找字体表引用和文本块，但是将它们转换为可读文本让我很困难。

有没有人有： - 一个解析器（就像这个https://stackoverflow.com/a/1732265/5169050 https://stackoverflow.com/a/1732265/5169050) 处理 CID 字体；或者 - 有关如何解析页面资源字典以查找页面字体并获取其 ToUnicode 流的一些示例代码，以帮助完成此示例（https://stackoverflow.com/a/4048328/5169050 https://stackoverflow.com/a/4048328/5169050)

我们必须使用 iTextSharp 4.1 来保留免费使用许可证。

这是我的部分解析器。

public string ExtractTextFromCIDPDFBytes(byte[] input)
{
    if (input == null || input.Length == 0) return "";

    try
    {
        // Holds the final result to be returned
        string resultString = "";
        // Are we in a block of text or not
        bool blnInText = false;
        // Holds each line of text before written to resultString
        string phrase = "";
        // Holds the 4-character hex codes as they are built
        string hexCode = "";
        // Are we in a font reference or not (much like a code block)
        bool blnInFontRef = false;
        // Holds the last font reference and therefore the CMAP table
        // to be used for any text found after it
        string currentFontRef = "";

        for (int i = 0; i < input.Length; i++)
        {
            char c = (char)input[i];

            switch (c)
            {
                case '<':
                    {
                        blnInText = true;
                        break;
                    }
                case '>':
                    {
                        resultString = resultString + Environment.NewLine + phrase;
                        phrase = "";
                        blnInText = false;
                        break;
                    }
                case 'T':
                    {
                        switch (((char)input[i + 1]).ToString().ToLower())
                        {
                            case "f":
                                {
                                    // Tf represents the start of a font table reference
                                    blnInFontRef = true;
                                    currentFontRef = "";
                                    break;
                                }
                            case "d":
                                {
                                    // Td represents the end of a font table reference or
                                    // the start of a text block
                                    blnInFontRef = false;
                                    break;
                                }
                        }
                        break;
                    }
                default:
                    {
                        if (blnInText)
                        {
                            // We are looking for 4-character blocks of hex characters
                            // These will build up a number which refers to the index
                            // of the glyph in the CMAP table, which will give us the
                            // character
                            hexCode = hexCode + c;
                            if (hexCode.Length == 4)
                            {
                                // TODO - translate code to character
                                char translatedHexCode = c;



                                phrase = phrase + translatedHexCode;
                                // Blank it out ready for the next 4
                                hexCode = "";
                            }
                        }
                        else
                        {
                            if (blnInFontRef)
                            {
                                currentFontRef = currentFontRef + c;
                            }
                        }
                        break;
                    }
            }
        }

        return resultString;
    }
    catch
    {
        return "";
    }
}

我花了一段时间，但终于有了一些代码来从 Identity-H 编码的 PDF 中读取纯文本。我将其发布在这里是为了帮助其他人，并且我知道会有一些方法可以对其进行改进。例如，我还没有接触过字符映射（beginbfchar），并且我的范围实际上不是范围。我已经在这上面花了一个多星期了，除非我们找到工作方式不同的文件，否则无法证明花费的时间是合理的。对不起。

Usage:

PdfDocument inputDocument = PDFHelpers.Open(physcialFilePath, PdfDocumentOpenMode.Import)
foreach (PdfPage page in inputDocument.Pages)
{
    for (Int32 index = 0; index < page.Contents.Elements.Count; index++)
    {
        PdfDictionary.PdfStream stream = page.Contents.Elements.GetDictionary(index).Stream;
        String outputText = new PDFParser().ExtractTextFromPDFBytes(stream.Value).Replace(" ", String.Empty);

        if (outputText == "" || outputText.Replace("\n\r", "") == "")
        {
            // Identity-H encoded file
            string[] hierarchy = new string[] { "/Resources", "/Font", "/F*" };
            List<PdfItem> fonts = PDFHelpers.FindObjects(hierarchy, page, true);
            outputText = PDFHelpers.FromUnicode(stream, fonts);
        }
    }
}

还有实际的帮助器类，我将完整地发布它，因为它们都在示例中使用，并且是因为当我试图解决这个问题时，我自己发现了很少的完整示例。该助手使用 PDFSharp 和 iTextSharp 来打开 1.5 之前和之后的 PDF，使用 ExtractTextFromPDFBytes 读取标准 PDF，使用我的 FindObjects（搜索文档树并返回对象）和 FromUnicode 获取加密文本和一个字体集合来翻译它。

using PdfSharp.Pdf;
using PdfSharp.Pdf.Content;
using PdfSharp.Pdf.Content.Objects;
using System;
using System.Collections.Generic;
using System.IO;

namespace PdfSharp.Pdf.IO
{
    /// <summary>
    /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
    /// </summary>
    static public class PDFHelpers
    {
        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(string PdfPath, PdfDocumentOpenMode openmode)
        {
            return Open(PdfPath, null, openmode);
        }
        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(string PdfPath, string password, PdfDocumentOpenMode openmode)
        {
            using (FileStream fileStream = new FileStream(PdfPath, FileMode.Open, FileAccess.Read))
            {
                int len = (int)fileStream.Length;
                // TODO: Setting this byteArray causes the out of memory exception which is why we
                // have the 70mb limit. Solve this and we can increase the file size limit
                System.Diagnostics.Process proc = System.Diagnostics.Process.GetCurrentProcess();
                long availableMemory = proc.PrivateMemorySize64 / 1024 / 1024; //Mb of RAM allocated to this process that cannot be shared with other processes
                if (availableMemory < (fileStream.Length / 1024 / 1024))
                {
                    throw new Exception("The available memory " + availableMemory + "Mb is not enough to open, split and save a file of " + fileStream.Length / 1024 / 1024);
                }

                try
                {
                    Byte[] fileArray = new Byte[len];
                    fileStream.Read(fileArray, 0, len);
                    fileStream.Close();
                    fileStream.Dispose();


                    PdfDocument result = Open(fileArray, openmode);
                    if (result.FullPath == "")
                    {
                        // The file was converted to a v1.4 document and only exists as a document in memory
                        // Save over the original file so other references to the file get the compatible version
                        // TODO: It would be good if we could do this conversion without opening every document another 2 times
                        PdfDocument tempResult = Open(fileArray, PdfDocumentOpenMode.Modify);

                        iTextSharp.text.pdf.BaseFont bfR = iTextSharp.text.pdf.BaseFont.CreateFont(Environment.GetEnvironmentVariable("SystemRoot") + "\\fonts\\arial.ttf", iTextSharp.text.pdf.BaseFont.IDENTITY_H, iTextSharp.text.pdf.BaseFont.EMBEDDED);
                        bfR.Subset = false;

                        tempResult.Save(PdfPath);
                        tempResult.Close();
                        tempResult.Dispose();
                        result = Open(fileArray, openmode);
                    }

                    return result;
                }
                catch (OutOfMemoryException)
                {
                    fileStream.Close();
                    fileStream.Dispose();

                    throw;
                }
            }
        }

        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(byte[] fileArray, PdfDocumentOpenMode openmode)
        {
            return Open(new MemoryStream(fileArray), null, openmode);
        }
        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(byte[] fileArray, string password, PdfDocumentOpenMode openmode)
        {
            return Open(new MemoryStream(fileArray), password, openmode);
        }

        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(MemoryStream sourceStream, PdfDocumentOpenMode openmode)
        {
            return Open(sourceStream, null, openmode);
        }
        /// <summary>
        /// uses itextsharp 4.1.6 to convert any pdf to 1.4 compatible pdf, called instead of PdfReader.open
        /// </summary>
        static public PdfDocument Open(MemoryStream sourceStream, string password, PdfDocumentOpenMode openmode)
        {
            PdfDocument outDoc = null;
            sourceStream.Position = 0;

            try
            {
                outDoc = (password == null) ?
                      PdfReader.Open(sourceStream, openmode) :
                      PdfReader.Open(sourceStream, password, openmode);

                sourceStream.Position = 0;
                MemoryStream outputStream = new MemoryStream();
                iTextSharp.text.pdf.PdfReader reader = (password == null) ?
                      new iTextSharp.text.pdf.PdfReader(sourceStream) :
                      new iTextSharp.text.pdf.PdfReader(sourceStream, System.Text.ASCIIEncoding.ASCII.GetBytes(password));
                System.Collections.ArrayList fontList = iTextSharp.text.pdf.BaseFont.GetDocumentFonts(reader, 1);
            }
            catch (PdfSharp.Pdf.IO.PdfReaderException)
            {
                //workaround if pdfsharp doesn't support this pdf
                sourceStream.Position = 0;
                MemoryStream outputStream = new MemoryStream();
                iTextSharp.text.pdf.PdfReader reader = (password == null) ?
                      new iTextSharp.text.pdf.PdfReader(sourceStream) :
                      new iTextSharp.text.pdf.PdfReader(sourceStream, System.Text.ASCIIEncoding.ASCII.GetBytes(password)); 
                iTextSharp.text.pdf.PdfStamper pdfStamper = new iTextSharp.text.pdf.PdfStamper(reader, outputStream);
                pdfStamper.FormFlattening = true;
                pdfStamper.Writer.SetPdfVersion(iTextSharp.text.pdf.PdfWriter.PDF_VERSION_1_4);
                pdfStamper.Writer.CloseStream = false;
                pdfStamper.Close();

                outDoc = PdfReader.Open(outputStream, openmode);
            }

            return outDoc;
        }

        /// <summary>
        /// Uses a recurrsive function to step through the PDF document tree to find the specified objects.
        /// </summary>
        /// <param name="objectHierarchy">An array of the names of objects to look for in the tree. Wildcards can be used in element names, e.g., /F*. The order represents 
        /// a top-down hierarchy if followHierarchy is true. 
        /// If a single object is passed in array it should be in the level below startingObject, or followHierarchy set to false to find it anywhere in the tree</param>
        /// <param name="startingObject">A PDF object to parse. This will likely be a document or a page, but could be any lower-level item</param>
        /// <param name="followHierarchy">If true the order of names in the objectHierarchy will be used to search only that branch. If false the whole tree will be parsed for 
        /// any items matching those in objectHierarchy regardless of position</param>
        static public List<PdfItem> FindObjects(string[] objectHierarchy, PdfItem startingObject, bool followHierarchy)
        {
            List<PdfItem> results = new List<PdfItem>();
            FindObjects(objectHierarchy, startingObject, followHierarchy, ref results, 0);
            return results;
        }

        static private void FindObjects(string[] objectHierarchy, PdfItem startingObject, bool followHierarchy, ref List<PdfItem> results, int Level)
        {
            PdfName[] keyNames = ((PdfDictionary)startingObject).Elements.KeyNames;
            foreach (PdfName keyName in keyNames)
            {
                bool matchFound = false;
                if (!followHierarchy)
                {
                    // We need to check all items for a match, not just the top one
                    for (int i = 0; i < objectHierarchy.Length; i++)
                    {
                        if (keyName.Value == objectHierarchy[i] ||
                            (objectHierarchy[i].Contains("*") &&
                                (keyName.Value.StartsWith(objectHierarchy[i].Substring(0, objectHierarchy[i].IndexOf("*") - 1)) &&
                                keyName.Value.EndsWith(objectHierarchy[i].Substring(objectHierarchy[i].IndexOf("*") + 1)))))
                        {
                            matchFound = true;
                        }
                    }
                }
                else
                {
                    // Check the item in the hierarchy at this level for a match
                    if (Level < objectHierarchy.Length && (keyName.Value == objectHierarchy[Level] || 
                        (objectHierarchy[Level].Contains("*") &&
                                (keyName.Value.StartsWith(objectHierarchy[Level].Substring(0, objectHierarchy[Level].IndexOf("*") - 1)) &&
                                keyName.Value.EndsWith(objectHierarchy[Level].Substring(objectHierarchy[Level].IndexOf("*") + 1))))))
                    {
                        matchFound = true;
                    }
                }

                if (matchFound)
                {
                    PdfItem item = ((PdfDictionary)startingObject).Elements[keyName];
                    if (item != null && item is PdfSharp.Pdf.Advanced.PdfReference)
                    {
                        item = ((PdfSharp.Pdf.Advanced.PdfReference)item).Value;
                    }

                    System.Diagnostics.Debug.WriteLine("Level " + Level.ToString() + " - " + keyName.ToString() + " matched");

                    if (Level == objectHierarchy.Length - 1)
                    {
                        // We are at the end of the hierarchy, so this is the target
                        results.Add(item);
                    }
                    else if (!followHierarchy)
                    {
                        // We are returning every matching object so add it
                        results.Add(item);
                    }

                    // Call back to this function to search lower levels
                    Level++;
                    FindObjects(objectHierarchy, item, followHierarchy, ref results, Level);
                    Level--;
                }
                else
                {
                    System.Diagnostics.Debug.WriteLine("Level " + Level.ToString() + " - " + keyName.ToString() + " unmatched");
                }
            }
            Level--;
            System.Diagnostics.Debug.WriteLine("Level " + Level.ToString());
        }

        /// <summary>
        /// Uses the Font object to translate CID encoded text to readable text
        /// </summary>
        /// <param name="unreadableText">The text stream that needs to be decoded</param>
        /// <param name="font">A List of PDFItems containing the /Font object containing a /ToUnicode with a CMap</param>
        static public string FromUnicode(PdfDictionary.PdfStream unreadableText, List<PdfItem> PDFFonts)
        {
            Dictionary<string, string[]> fonts = new Dictionary<string, string[]>();

            // Get the CMap from each font in the passed array and store them by font name
            for (int font = 0; font < PDFFonts.Count; font++)
            {
                PdfName[] keyNames = ((PdfDictionary)PDFFonts[font]).Elements.KeyNames;
                foreach (PdfName keyName in keyNames)
                {
                    if (keyName.Value == "/ToUnicode") {
                        PdfItem item = ((PdfDictionary)PDFFonts[font]).Elements[keyName];
                        if (item != null && item is PdfSharp.Pdf.Advanced.PdfReference)
                        {
                            item = ((PdfSharp.Pdf.Advanced.PdfReference)item).Value;
                        }
                        string FontName = "/F" + font.ToString();
                        string CMap = ((PdfDictionary)item).Stream.ToString();

                        if (CMap.IndexOf("beginbfrange") > 0)
                        {
                            CMap = CMap.Substring(CMap.IndexOf("beginbfrange") + "beginbfrange".Length);

                            if (CMap.IndexOf("endbfrange") > 0)
                            {
                                CMap = CMap.Substring(0, CMap.IndexOf("endbfrange") - 1);

                                string[] CMapArray = CMap.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
                                fonts.Add(FontName, CMapArray);
                            }
                        }
                        break;
                    }
                }
            }

            // Holds the final result to be returned
            string resultString = "";

            // Break the input text into lines
            string[] lines = unreadableText.ToString().Split(new string[] {"\n"} , StringSplitOptions.RemoveEmptyEntries);

            // Holds the last font reference and therefore the CMAP table
            // to be used for any text found after it
            string[] currentFontRef = fonts["/F0"];

            // Are we in a block of text or not? They can break across lines so we need an identifier
            bool blnInText = false;

            for (int line = 0; line < lines.Length; line++)
            {
                string thisLine = lines[line].Trim();

                if (thisLine == "q")
                {
                    // I think this denotes the start of a text block, and where we need to reset to the default font
                    currentFontRef = fonts["/F0"];
                }
                else if (thisLine.IndexOf(" Td <") != -1)
                {
                    thisLine = thisLine.Substring(thisLine.IndexOf(" Td <") + 5);
                    blnInText = true;
                }

                if (thisLine.EndsWith("Tf"))
                {
                    // This is a font assignment. Take note of this and use this fonts ToUnicode map when we find text
                    if (fonts.ContainsKey(thisLine.Substring(0, thisLine.IndexOf(" "))))
                    {
                        currentFontRef = fonts[thisLine.Substring(0, thisLine.IndexOf(" "))];
                    }
                } 
                else if (thisLine.EndsWith("> Tj"))
                {
                    thisLine = thisLine.Substring(0, thisLine.IndexOf("> Tj"));
                }

                if(blnInText)
                {
                    // This is a text block
                    try
                    {
                        // Get the section of codes that exist between angled brackets
                        string unicodeStr = thisLine;
                        // Wrap every group of 4 characters in angle brackets
                        // This will directly match the items in the CMap but also allows the next for to avoid double-translating items
                        unicodeStr = "<" + String.Join("><", unicodeStr.SplitInParts(4)) + ">";

                        for (int transform = 0; transform < currentFontRef.Length; transform++)
                        {
                            // Get the last item in the line, which is the unicode value of the glyph
                            string glyph = currentFontRef[transform].Substring(currentFontRef[transform].IndexOf("<"));
                            glyph = glyph.Substring(0, glyph.IndexOf(">") + 1);

                            string counterpart = currentFontRef[transform].Substring(currentFontRef[transform].LastIndexOf("<") + 1);
                            counterpart = counterpart.Substring(0, counterpart.LastIndexOf(">"));

                            // Replace each item that matches with the translated counterpart
                            // Insert a \\u before every 4th character so it's a C# unicode compatible string
                            unicodeStr = unicodeStr.Replace(glyph, "\\u" + counterpart);
                            if (unicodeStr.IndexOf(">") == 0)
                            {
                                // All items have been replaced, so lets get outta here
                                break;
                            }
                        }
                        resultString = resultString + System.Text.RegularExpressions.Regex.Unescape(unicodeStr);
                    }
                    catch
                    {
                        return "";
                    }
                }

                if (lines[line].Trim().EndsWith("> Tj"))
                {
                    blnInText = false;
                    if (lines[line].Trim().IndexOf(" 0 Td <") == -1)
                    {
                        // The vertical coords have changed, so add a new line
                        resultString = resultString + Environment.NewLine;
                    }
                    else
                    {
                        resultString = resultString + " ";
                    }
                }
            }
            return resultString;
        }

        // Credit to http://stackoverflow.com/questions/4133377/
        private static IEnumerable<String> SplitInParts(this String s, Int32 partLength)
        {
            if (s == null)
                throw new ArgumentNullException("s");
            if (partLength <= 0)
                throw new ArgumentException("Part length has to be positive.", "partLength");

            for (var i = 0; i < s.Length; i += partLength)
                yield return s.Substring(i, Math.Min(partLength, s.Length - i));
        }
    }
}

public class PDFParser
{
    /// BT = Beginning of a text object operator
    /// ET = End of a text object operator
    /// Td move to the start of next line
    ///  5 Ts = superscript
    /// -5 Ts = subscript

    #region Fields

    #region _numberOfCharsToKeep
    /// <summary>
    /// The number of characters to keep, when extracting text.
    /// </summary>
    private static int _numberOfCharsToKeep = 15;
    #endregion

    #endregion



    #region ExtractTextFromPDFBytes
    /// <summary>
    /// This method processes an uncompressed Adobe (text) object
    /// and extracts text.
    /// </summary>
    /// <param name="input">uncompressed</param>
    /// <returns></returns>
    public string ExtractTextFromPDFBytes(byte[] input)
    {
        if (input == null || input.Length == 0) return "";

        try
        {
            string resultString = "";

            // Flag showing if we are we currently inside a text object
            bool inTextObject = false;

            // Flag showing if the next character is literal
            // e.g. '\\' to get a '\' character or '\(' to get '('
            bool nextLiteral = false;

            // () Bracket nesting level. Text appears inside ()
            int bracketDepth = 0;

            // Keep previous chars to get extract numbers etc.:
            char[] previousCharacters = new char[_numberOfCharsToKeep];
            for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' ';


            for (int i = 0; i < input.Length; i++)
            {
                char c = (char)input[i];

                if (inTextObject)
                {
                    // Position the text
                    if (bracketDepth == 0)
                    {
                        if (CheckToken(new string[] { "TD", "Td" }, previousCharacters))
                        {
                            resultString += "\n\r";
                        }
                        else
                        {
                            if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters))
                            {
                                resultString += "\n";
                            }
                            else
                            {
                                if (CheckToken(new string[] { "Tj" }, previousCharacters))
                                {
                                    resultString += " ";
                                }
                            }
                        }
                    }

                    // End of a text object, also go to a new line.
                    if (bracketDepth == 0 &&
                        CheckToken(new string[] { "ET" }, previousCharacters))
                    {

                        inTextObject = false;
                        resultString += " ";
                    }
                    else
                    {
                        // Start outputting text
                        if ((c == '(') && (bracketDepth == 0) && (!nextLiteral))
                        {
                            bracketDepth = 1;
                        }
                        else
                        {
                            // Stop outputting text
                            if ((c == ')') && (bracketDepth == 1) && (!nextLiteral))
                            {
                                bracketDepth = 0;
                            }
                            else
                            {
                                // Just a normal text character:
                                if (bracketDepth == 1)
                                {
                                    // Only print out next character no matter what.
                                    // Do not interpret.
                                    if (c == '\\' && !nextLiteral)
                                    {
                                        nextLiteral = true;
                                    }
                                    else
                                    {
                                        if (((c >= ' ') && (c <= '~')) ||
                                            ((c >= 128) && (c < 255)))
                                        {
                                            resultString += c.ToString();
                                        }

                                        nextLiteral = false;
                                    }
                                }
                            }
                        }
                    }
                }

                // Store the recent characters for
                // when we have to go back for a checking
                for (int j = 0; j < _numberOfCharsToKeep - 1; j++)
                {
                    previousCharacters[j] = previousCharacters[j + 1];
                }
                previousCharacters[_numberOfCharsToKeep - 1] = c;

                // Start of a text object
                if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters))
                {
                    inTextObject = true;
                }
            }
            return resultString;
        }
        catch
        {
            return "";
        }
    }
    #endregion

    #region CheckToken
    /// <summary>
    /// Check if a certain 2 character token just came along (e.g. BT)
    /// </summary>
    /// <param name="search">the searched token</param>
    /// <param name="recent">the recent character array</param>
    /// <returns></returns>
    private bool CheckToken(string[] tokens, char[] recent)
    {
        foreach (string token in tokens)
        {
            if (token.Length > 1)
            {
                if ((recent[_numberOfCharsToKeep - 3] == token[0]) &&
                    (recent[_numberOfCharsToKeep - 2] == token[1]) &&
                    ((recent[_numberOfCharsToKeep - 1] == ' ') ||
                    (recent[_numberOfCharsToKeep - 1] == 0x0d) ||
                    (recent[_numberOfCharsToKeep - 1] == 0x0a)) &&
                    ((recent[_numberOfCharsToKeep - 4] == ' ') ||
                    (recent[_numberOfCharsToKeep - 4] == 0x0d) ||
                    (recent[_numberOfCharsToKeep - 4] == 0x0a))
                    )
                {
                    return true;
                }
            }
            else
            {
                return false;
            }

        }
        return false;
    }
    #endregion
}

感谢所有提供帮助和片段的人，使我最终能够找到一个可行的解决方案

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

使用 CID 字体从 PDF 中提取文本的相关文章

Chrome 和 Firefox 不会在 iframe 中显示大多数 PDF 文档

我运行的是 Windows 7 64 位我最近更新了操作系统以安装最新更新从那时起大多数 PDF 文档已停止在我的程序中呈现我的程序想要在单独的 iFrame 中显示 PDF 文档我通过获取 PDF 文档的 URL 并设置来做到这
TCPDF 为一个文档中的不同页面设置不同的页眉

有没有办法使文档中第一页的页眉徽标和第二页的页眉徽标不同我认为在添加页面之间更改标题数据可能会解决问题但在我的测试中添加第一页后设置标题似乎没有效果 other stuff pdf gt setHeaderFont array PDF
如何将魔杖图像对象转换为 numpy 数组（不使用 OpenCV）？

我正在使用将 pdf 文件转换为图像Wand http docs wand py org en 0 4 4 然后我使用 ndimage 进行进一步的图像处理我想直接将 Wand 图像转换为 ndarray 我已经看到答案here htt
使用 GhostScript 获取页面大小

是否可以使用 GhostScript 获取页面大小例如从 PDF 文档页面我见过 bbox 设备但它返回的是边界框每页不同而不是 PDF 页面的 TrimBox 或 CropBox 看http www prePressure co
Swift Siesta 访问响应原始数据

我的 API 中有一个返回 PDF 文件内容的方法如何在成功回调中获取响应的原始数据所有午睡响应均以原始数据开始以基础类型的形式Data 然后运行变压器管道 http bustoutsolutions github io siesta
c#/asp.net - 如何捕获“System.Web.HttpException：请求超时”？

在我的 asp net c 项目中我使用 iTextsharp dll 从许多 pdf 文档中读取文本但有时会出现此错误 System Web HttpException 请求超时但实现它的代码是 public static bool
如何使用itext7缩放固定矩形内的文本？

我正在尝试使用 c 中的 itext7 制作一个 pdf 文档该文档应该具有包含不同文本的固定矩形这些文本应该在不可见矩形的边界内缩放我试图寻找是否有自动缩放功能但到目前为止只发现表单域的自动缩放功能由于 pdf 将用于绘制文
如何获取 WinRT 应用程序中字体的大小（以像素为单位）？

正如标题所示在 NET 4 5 中我们有一个字体类可以为您提供像素高度但在 WinRT 中又如何呢我可以使用任何 API 来获取它使用的像素吗由于 Windows 应用商店应用程序的 NET API 中甚至不存在 Formatt
如何检查字体是否有符号

我的文档中有以下 css 规则 font family Trebuchet MS Tahoma 但我发现有些浏览器有 Trebuchet MS 字体没有我需要的符号某些带有非拉丁字符的语言在这种情况下浏览器显示方块而不是符号我怎样才
在 Rails 中强制内联渲染 PDF 文档

我正在编写一个从一组 XML 文件生成 PDF 文件的服务正在正确生成 PDF 但是每次我单击查看 PDF 链接时浏览器都会要求用户下载 PDF 文件我需要 PDF 内联显示就像任何常规 HTML 页面一样我虽然我写的代码是正
Swing - 是否可以在 JTable 单元格中设置“特定”文本的字体颜色？

我有一个 JTable 其中一列按以下格式显示值 423545 50 7568787 50 53654656 2021947 50 021947 2021947 50 8021947 50 我想知道是否可以用红色显示方括号内的值在过去的几
在 Android 中使用 iText 读取或打开 PDF 文件

我是 Android 应用程序开发新手使用 iText 我完成了 PDF 创建并在创建的文件上写入现在我想阅读该 PDF 文件如何使用 iText 打开或阅读 PDF 文件例子将是可观的那么提前哪个是渲染 PDF 文件的最佳库
使用自定义 TTF 字体进行 DrawString 图像渲染

我在服务器端使用 GDI 创建图像并将其传输到用户的浏览器没有一个标准字体符合我的要求因此我想加载 TrueType 字体并使用此字体将字符串绘制到图形对象 using var backgroundImage new Bitmap b
@fontface - 禅宗购物车中的 403 禁止错误

我不确定这是否是发布此内容的正确位置因为我不知道问题出在哪里基本上字体现在对我来说真的很痛苦而且没有任何效果我尝试从 google fonts 加载字体但遇到了 IE 问题所以我决定下载它们并自己提供服务但现在它无法在任何浏
从 Windows 命令行打印 PDF

我正在尝试打印当前目录中的所有 pdf 文件当我在 cmd 中调用这个 bash 脚本时 singlepdf sh C Program Files x86 Adobe Reader 10 0 Reader AcroRd32 exe t G
使用书法库的某些活动使用不同的字体

我在用书法 https github com chrisjenx Calligraphy在我的应用程序中使用自定义字体的库我使用默认字体为整个应用程序设置了自定义字体CalligraphyConfig in my Application类
PDFTron。 FreeText 注释字体

我需要从以下位置获取字体信息字体系列颜色等 FreeText在 Net 中使用 PDFTron 进行注释而我只看到FreeText GetFontSize方法如何获取字体信息FreeText注解如果存在外观流则使用 E leme
使用 PDFSharp 合并多个 PDF

我正在尝试将多个 PDF 合并为一个 PDF PDF 来自 SSRS 来自我处理的一些 LocalReports 我正在使用 PDFSharp 因为它已经在整个项目中使用但是那outputDocument addPage page 方法
在 iOS 上将 SwiftUI 视图转换为 PDF

我用 SwiftUI 画了一些漂亮的图表因为它非常简单且容易做然后我想将整个 SwiftUI 视图导出为 PDF 以便其他人可以以良好的方式查看图表 SwiftUI 没有直接为此提供解决方案 Cheers Alex 经过一番思考我想到
UIFont Woes（一些自定义字体加载，但其他字体不加载）

我在加载某些自定义字体时遇到问题我遵循了这个问题的 400 多个赞同的传统答案中的建议并且它非常适合一个项目然而在我正在从事的另一个项目中我遇到了加载 UIFont 的问题这些问题与帖子中发现的问题有些相似向 UIAppFont

随机推荐

在 Docker 容器内运行 Ruby Sinatra 无法连接（通过 Mac 主机）或查找命令（在不同的场景中）？

我尝试了两种形式的 Dockerfile 来运行一个简单的 Ruby Sinatra 应用程序但在这两种情况下它都会因不同的原因而失败我稍后会解释这两种情况实际上我想从我的主机使用 Boot2Docker 的 Mac OS X
Java HotSpot vm 中 String Literal 加载到 StringTable 的时间

当我学习 java lang String Java API 时这个问题就出现了我找到了一篇中文文章 Java 中new String 字面量中字面量是何时进入字符串常量池的 https www zhihu com questio
ORA-04091: 表 [blah] 正在发生变化，触发器/函数可能看不到它

我最近开始开发一个大型复杂的应用程序由于以下错误我刚刚被分配了一个错误 ORA 04091 table SCMA TBL1 is mutating trigger function may not see it ORA 06512 at
在没有 SMTP 服务器的情况下使用 C# 发送电子邮件？ [复制]

这个问题在这里已经有答案了我正在制作一个简单的网站它托管在我的 VPS 上我运行 IIS 7 并且具有完全访问权限 DNS 已设置并配置但未配置邮件服务器或任何内容我希望用户能够通过非常简单的表单发送反馈然而我没有 SMTP
Spring MVC 3 & Tiles 2：静态资源不会显示，即使在非 Tiles 页面中也是如此

我正在尝试配置和集成瓷砖2 2到一个 webapp 构建春季MVC 3 and 春季安全3但我遇到了一些困难不显示图块定义也没有加载任何静态资源 css js 即使是从不使用 Tiles 的页面调用的资源 NOTE 1 当我单击 css
.properties 文件中的反斜杠 (\) 被 Spring 的“环境”变量忽略

我正在尝试在 Spring 中加载 config properties 文件数据 Configurationjava类使用 PropertySource and Environment多变的例子 http docs spring io s
如何检查和控制纵向数据混合效应模型中的自相关？

我拥有多组鸟类超过 10 天的观察行为数据我想调查某些行为是否存在时间模式例如配偶竞争是否会随着时间的推移而增加并且我被告知必须考虑数据的自相关性因为每天的行为不太可能是独立的不过我想知道两件事由于我对 y 之间的差异不感兴趣
如何将对象传递给机器人框架中的关键字？

我有一个 python 课程MyClass写在文件中MyClass py class MyClass object def init self self myvar list def setvar self val self myvar v
安全读取同时更改元素的 long[] 内容的最快方法

当你有一个 long myArray new long 256 其项目由多个线程使用更改 Interlocked Increment ref myArray x 肯定无法获得快照myArray在某个时间点由于有非锁定写入同时进行所以我不
在具有不同扇区大小的 HDD 上备份数据库

在我们的开发环境中我们长期以来一直通过各种 SQL Server 版本和不同的环境配置为每个产品使用特定的备份和恢复脚本没有出现任何问题最近我们已升级到 SQL Server 2012 作为我们的标准开发服务器其 SQL 兼容性级
mysql_connect(): 没有这样的文件或目录

我刚刚在旧的 RedHat 7 上安装了 MySQL 服务器版本 3 23 58 由于依赖关系我无法安装更新的 MySQL 版本我无法更新此 RedHat 服务器上的库但是我在使用 PHP 连接数据库时遇到问题首先我使用 PDO
在 R 中绘制重叠的横向图

我在 R 中有以下代码 x c rep 2 10 rep 4 10 y1 c 5 1 3 4 2 4 1 4 8 4 0 5 4 15 3 4 5 y2 c 9 1 8 9 2 8 2 7 9 5 8 8 9 3 10 10 4 y c y
Java：类是其自身的子类吗？

如果一个类的内部类扩展了外部类则该类可以是其自身的子类所以该类以某种方式扩展自身而不会抛出任何异常那么这真的意味着一个类也是它自身的子类吗 Thanks 类不是其自身的子类内部类可以是其他类的子类但那是一个单独的类您可以通过
如何找出 Android“发现无效颜色”错误的原因

在构建我的 Android 应用程序期间我收到以下错误 app mergeDev testingDebugResources error found an invalid color 我没有得到任何代码行甚至没有得到哪个文件有问题我有
更改html数据属性值

document ready function bla on click function alert this data bla this attr data bla 2 div button div 所以我需要改变data bla值从
检查密码复杂性：与最后 X 个密码不同

大多数服务程序等都有各种密码复杂性检查无需深入研究此类检查的有效性 http xkcd com 936 我想到了一个可能有趣但也可能存在问题的检查新密码必须是Y与上一个不同的字符X密码这将阻止人们使用类似的密码Password1 P
线性规划 - 等于表达式符号的变量

我正在尝试编写一个线性程序需要一个等于 x c 符号的变量 z 其中 x 是另一个变量 c 是常数我考虑过z x c x c 不幸的是如果 x c 则会除以 0 我不能使用 z x c 因为我不想通过 x 和 c 之间的差异大小来对其
如何使用 TF1.3 中新的 Dataset api 映射具有附加参数的函数？

我正在玩Tensorflow v1 3 中的数据集 API https www tensorflow org programmers guide datasets 这很棒可以使用所描述的函数来映射数据集here https www ten
如何让 Javascript 的 IMPORT EXPORT 工作。我需要编译器吗？

我对此很困惑我想做的只是将我的 javascript 分解成模块并将它们包含在某些页面中有些页面可能需要我的 user module js 有些页面可能不需要我已经谷歌搜索阅读了教程但它仍然不适合我这是一个简单的测试用例 1
使用 CID 字体从 PDF 中提取文本

我正在编写一个 Web 应用程序用于提取 PDF 中每个页面顶部的一行 PDF 来自产品的不同版本并且可以通过多个 PDF 打印机同样具有不同的版本和不同的设置到目前为止我已经使用 PDFSharp 和 iTextSharp 成功

使用 CID 字体从 PDF 中提取文本

使用 CID 字体从 PDF 中提取文本 的相关文章

随机推荐

热门标签

使用 CID 字体从 PDF 中提取文本的相关文章